/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/28/26(Sun)21:59:12 No.109158385

File: beachmiku22.png (260 KB, 2369x925)

260 KB PNG

/lmg/ - Local Models General Anonymous 06/28/26(Sun)21:59:12 No.109158385 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109153585 & >>109148460

►News
>(06/28) DFlash support merged: https://github.com/ggml-org/llama.cpp/pull/22105
>(06/27) DeepSeek releases DeepSpec and DSpark models: https://hf.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m
>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/28/26(Sun)21:59:27 No.109158388

Anonymous 06/28/26(Sun)21:59:27 No.109158388

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>109153585

--Using logit softcapping to fix Gemma 4 determinism:
>109155175 >109155180 >109155239 >109155280 >109155386 >109157570 >109157982 >109158071 >109158175
--Anon's attempt at de-purpling and de-euphemizing Gemma 4 31B:
>109155998 >109156023 >109156287 >109156314 >109157052 >109157091 >109157990 >109156296 >109156083 >109156299 >109156699 >109156728
--C++ TTS implementations and high-VRAM/RAM server hardware builds:
>109157084 >109157167 >109157306 >109157328 >109157427 >109157365 >109157508 >109157776 >109157783 >109157973 >109157360 >109157456
--Building a local voice pipeline using Piper, Gemma, and Nemotron:
>109153794 >109153801 >109153812 >109153825 >109154072 >109153820 >109157263 >109157293 >109154563
--Frustration over slow merge of llama.cpp CUDA flash attention fix:
>109155792 >109155841 >109156077 >109157210 >109157220 >109157386 >109157303
--llama.cpp merged DFlash support for speculative decoding:
>109154531 >109154623 >109154849
--Comparing Oobabooga bugs and features against llama-server alternatives:
>109155677 >109155707 >109155718 >109155749 >109155760 >109155783 >109155907 >109155733
--Comparing SVG generation and iterative refinement across Qwen, Claude, and Gemma:
>109153841 >109153850 >109153851 >109154000 >109154079 >109154616 >109155378 >109155655 >109156321 >109157244
--Chub.ai updates and various AI industry news and opinions:
>109154587 >109156585 >109156615 >109156625 >109156646 >109156820 >109156876 >109156489
--Gemma failing to code a voxel engine in raw C:
>109154702 >109154708 >109154946 >109154950 >109154997 >109155027 >109155043 >109155069 >109155081 >109155108 >109155062
--Logs:
>109154000 >109154343 >109154714 >109155069 >109155572 >109155652 >109155655 >109156262 >109156314 >109157782
--Miku, Rin (free space):
>109156170 >109155795 >109157181

►Recent Highlight Posts from the Previous Thread: >>109153589

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/28/26(Sun)22:01:18 No.109158399

Anonymous 06/28/26(Sun)22:01:18 No.109158399

gemmaballs

Anonymous
06/28/26(Sun)22:03:07 No.109158413

Anonymous 06/28/26(Sun)22:03:07 No.109158413

70b dense

Anonymous
06/28/26(Sun)22:04:04 No.109158420

Anonymous 06/28/26(Sun)22:04:04 No.109158420

File: 1782133634186893.png (56 KB, 960x422)

56 KB PNG

We should petition HF to ban Chinese models

Anonymous
06/28/26(Sun)22:04:48 No.109158422

Anonymous 06/28/26(Sun)22:04:48 No.109158422

File: file.png (298 KB, 1106x1820)

298 KB PNG

>>109158280
12 channel or 8 channel memory? 12 channel will give you a 60% speed boost with the offloading. make sure you get an epyc 9355 or better. the lower end epycs have memory bandwidth issues. overall this is gonna cost you like $45k total for the ram and cpu and board. would have cost you around $9k back in october.
https://www.fsastech.com/ja-jp/products/primergy/technical/performance/pdf/wp-performance-report-primergy-rx2450-m2-ww-ja.pdf
also check out glm5.2 and kimi k2.7. they are both just better and faster than deepseek v4 pro in pretty much every way.

Anonymous
06/28/26(Sun)22:07:56 No.109158434

Anonymous 06/28/26(Sun)22:07:56 No.109158434

>>109158420
the american way.. if you can't beat em, sue them

Anonymous
06/28/26(Sun)22:08:31 No.109158437

Anonymous 06/28/26(Sun)22:08:31 No.109158437

>>109158434
jewish way

Anonymous
06/28/26(Sun)22:10:10 No.109158443

Anonymous 06/28/26(Sun)22:10:10 No.109158443

>>109158437
as i said

Anonymous
06/28/26(Sun)22:21:38 No.109158482

Anonymous 06/28/26(Sun)22:21:38 No.109158482

File: wtf.png (151 KB, 796x781)

151 KB PNG

Is gemma good at creative writing?

Anonymous
06/28/26(Sun)22:27:30 No.109158503

Anonymous 06/28/26(Sun)22:27:30 No.109158503

>>109158482
pic is better than the average leftie meme. Bunkertrannies should consider outsourcing their memes to gemma.

Anonymous
06/28/26(Sun)22:35:20 No.109158536

Anonymous 06/28/26(Sun)22:35:20 No.109158536

>>109158420
HF would hardly exist then

Anonymous
06/28/26(Sun)22:35:27 No.109158539

Anonymous 06/28/26(Sun)22:35:27 No.109158539

File: 1755459112248026.jpg (350 KB, 1536x2048)

350 KB JPG

Anonymous
06/28/26(Sun)22:38:11 No.109158547

Anonymous 06/28/26(Sun)22:38:11 No.109158547

Is anyone else running amd and windows able to see igpu and leave their dgufree?
It's like 33% left on the table, with a 16GB VRAM but it's so fucking unstable and faulty.
It works for a bit, if a montoor is plugged into the dgpu and your using a second monitor on the igpu, but once the load goes low and the card has to transition into an idle state, it crashes and give amddmg error

On the other hand plug everything into the igpu and try to use the computer normally, but from the igpu, with the dgpu just sitting there and even with telling it to use the dgu it still tries to load stuff into the igpu which is well impossible

Anonymous
06/28/26(Sun)22:41:12 No.109158559

Anonymous 06/28/26(Sun)22:41:12 No.109158559

File: gemma.png (109 KB, 839x720)

109 KB PNG

>>109158503
I think Gemma is a bit confused but she's appreciative regardless.

Anonymous
06/28/26(Sun)22:44:19 No.109158570

Anonymous 06/28/26(Sun)22:44:19 No.109158570

>>109158413
>70b dense
this ^
we really got fucked over when meta stopped doing them, because no reason for qwen to compete there

Anonymous
06/28/26(Sun)22:48:09 No.109158586

Anonymous 06/28/26(Sun)22:48:09 No.109158586

File: zt8kt1.png (91 KB, 789x375)

91 KB PNG

Anyone else have this problem? How do you edit, repair, classify pornographic audio without getting horny? I don't think Gemma-Chan's suggestions will work.

Anonymous
06/28/26(Sun)22:59:54 No.109158625

Anonymous 06/28/26(Sun)22:59:54 No.109158625

>>109158482
probably the best in the less than <300b class
I prefer glm for writing

Anonymous
06/28/26(Sun)23:00:23 No.109158628

Anonymous 06/28/26(Sun)23:00:23 No.109158628

>>109158422
It's 12, I'm willing to pay up for 20t/s otherwise this build doesn't make sense. Not pulling the trigger on this right this moment, I'll get this once I'm done stacking GPUs and inevitably decide that I need more power. Maybe the meta will change by then or maybe I'm a retard for not targeting a slightly worse model. Do suggest alternatives in the same price range if you know any. Ceiling is about $50k but the upper half of that range better last me years and deliver opus-at-home quality work with some correcting for prompt skill. Basically it must be able to understand the promptware I use right now and not become absolutely retarded past 128K. (slightly retarded is fine)

Anonymous
06/28/26(Sun)23:06:57 No.109158650

Anonymous 06/28/26(Sun)23:06:57 No.109158650

File: kaaaaaaaaaa.jpg (36 KB, 347x364)

36 KB JPG

Fuck I forgot GLM. That's 2x less RAM than dipsy, time to start over

Anonymous
06/28/26(Sun)23:07:52 No.109158654

Anonymous 06/28/26(Sun)23:07:52 No.109158654

>>109158628
glm5.2 at q5 on a 12 channel ddr5 board with at least 2 3090s can get you about 25t/s. would only need about 600gb of ram. glm5.2 is the best local model right now and has good context up to about 256k. definitely not on par with opus 4.8, but it does match opus 4.5. a build like this would definitely last you about 8 or so years, and local models will continue to improve over the next few months and years

Anonymous
06/28/26(Sun)23:09:45 No.109158659

Anonymous 06/28/26(Sun)23:09:45 No.109158659

I had the "opportunity" to use Muse Spark today and this shit is fucking ass lmao i hope Zuck goes bankrupt

Anonymous
06/28/26(Sun)23:26:41 No.109158724

Anonymous 06/28/26(Sun)23:26:41 No.109158724

>>109158420
If we ban cyberweapons, only our enemies will have cyberweapons.

Anonymous
06/28/26(Sun)23:27:14 No.109158728

Anonymous 06/28/26(Sun)23:27:14 No.109158728

>>109158628
4x pro 6000 tensor split glm5.2 q4
forget about 20-25t/s that's literally unusable speed for serious work

Anonymous
06/28/26(Sun)23:35:25 No.109158769

Anonymous 06/28/26(Sun)23:35:25 No.109158769

>>109158654
I actually consider opus 4.8 inferior to 4.7 and 4.6, I prefer a more general model that isn't overfitted on habits that should live in harness prompts. Because all that Anthropic has been doing lately is baking in the shit I have been prompting manually the whole time. Getting 4.8 out of the slop book is nigh impossible.
So new target is GLM at iq4xs. That should bring the cost from a small house to something potentially tractable within this year.

Anonymous
06/28/26(Sun)23:40:16 No.109158790

Anonymous 06/28/26(Sun)23:40:16 No.109158790

>>109158728
claude usually gives me like 40t/s so 20/s sounds reasonable assuming I can deslop it enough to stop wasting 80% of the output on essay formatting
Also consider part availability and headroom for a lot of long kvs.

Anonymous
06/28/26(Sun)23:47:40 No.109158822

Anonymous 06/28/26(Sun)23:47:40 No.109158822

>>109158728
>>109158790
4x blackwell pro 6000s is not enough to fully run glm5.2 in fp4. you would get around 50-60t/s on the nvfp4 quant in vllm if you had like 5 or 6 blackwell 6000s though.
https://huggingface.co/nvidia/GLM-5.2-NVFP4

Anonymous
06/28/26(Sun)23:52:42 No.109158840

Anonymous 06/28/26(Sun)23:52:42 No.109158840

>>109158547
Windows masochism

Anonymous
06/28/26(Sun)23:53:26 No.109158842

Anonymous 06/28/26(Sun)23:53:26 No.109158842

>>109158790
the difference between 20 and 40t/s is huge
60t/s is minimum for agentic workload that feels fast enough
20t/s is only enough for chat

Anonymous
06/28/26(Sun)23:58:57 No.109158859

Anonymous 06/28/26(Sun)23:58:57 No.109158859

>>109158842
I tend to agree with that, I currently have a bit more than 40 t/s and while it does feel fast using it normally, for agenting stuff, it does feel slow at times. Using opus which is I believe 60 t/s does feel marginally better, bit hard to say since we can only see thinking traces.

Anonymous
06/29/26(Mon)00:00:13 No.109158864

Anonymous 06/29/26(Mon)00:00:13 No.109158864

>>109158842
what t/s is codex 2.5 fast?

Anonymous
06/29/26(Mon)00:01:38 No.109158871

Anonymous 06/29/26(Mon)00:01:38 No.109158871

>>109158842
you don't have to watch the tokens, you can just have your clanker work on it's own

Anonymous
06/29/26(Mon)00:01:43 No.109158873

Anonymous 06/29/26(Mon)00:01:43 No.109158873

>>109158859
I'm the fng but you're spoiled, there's no way that's an absolute fact. In the 70's people wrote code, handed it to a lady who typed it out (made errors too) on punch cards, and then they were run overnight.

Anonymous
06/29/26(Mon)00:01:54 No.109158875

Anonymous 06/29/26(Mon)00:01:54 No.109158875

8xMI300X is da wae

Anonymous
06/29/26(Mon)00:03:17 No.109158878

Anonymous 06/29/26(Mon)00:03:17 No.109158878

File: 1757681263773664.png (3.09 MB, 1448x1086)

3.09 MB PNG

Anonymous
06/29/26(Mon)00:04:09 No.109158887

Anonymous 06/29/26(Mon)00:04:09 No.109158887

DDR5: $30/GB MEM at 45GB/s BW = $0.66/MEM*BW
PRO 6000 at $12000: $125/GB MEM at 1800GB/s BW = $0.07/MEM*BW
PRO 6000 is still massively underpriced, even if the price is increased to $100k it's still worth it.

Anonymous
06/29/26(Mon)00:04:29 No.109158888

Anonymous 06/29/26(Mon)00:04:29 No.109158888

>>109158878
This image makes me feel physically ill. Thanks a lot.

Anonymous
06/29/26(Mon)00:05:47 No.109158895

Anonymous 06/29/26(Mon)00:05:47 No.109158895

>>109158888
checked

Anonymous
06/29/26(Mon)00:06:42 No.109158896

Anonymous 06/29/26(Mon)00:06:42 No.109158896

>>109158822
You could just offload a very tiny bit but that would suck, you still need to store kv somewhere. Starting to think moes aren't worth it. I literally just need slopcoding ability and headroom for concurrent image/audio fun. Sucks if there's no sub $20k build that just does what I need. There's just this huge fucking desert between owning a few 3090s and competing with datacenters for rationed RAM.

Anonymous
06/29/26(Mon)00:08:46 No.109158907

Anonymous 06/29/26(Mon)00:08:46 No.109158907

>>109158842
>>109158871
Correct if the clanker is stable enough to trust it on full autopilot. I'd rather have 20t/s that actually works vs 60t/s that spins out of control and must be actively tardwrangled.

Anonymous
06/29/26(Mon)00:11:33 No.109158920

Anonymous 06/29/26(Mon)00:11:33 No.109158920

>>109158896
You can totally do the pmem optane thing. there's a tomshardware article on it.

I'm holding off. personally, I am hoping for taalas to save the day with a pci card.

Anonymous
06/29/26(Mon)00:14:21 No.109158932

Anonymous 06/29/26(Mon)00:14:21 No.109158932

>>109158859
Opus speed wavers and can drop even below 40, actually I think they hid the speed recently, it was visible for 4.6

Anonymous
06/29/26(Mon)00:18:08 No.109158945

Anonymous 06/29/26(Mon)00:18:08 No.109158945

>>109158875
shoot!!!!

I forgot to buy that lotto ticket

Anonymous
06/29/26(Mon)00:30:51 No.109158976

Anonymous 06/29/26(Mon)00:30:51 No.109158976

>>109158878
Unironically incredible pajeet damage control making the snailcat unlikable.

Anonymous
06/29/26(Mon)00:42:44 No.109159014

Anonymous 06/29/26(Mon)00:42:44 No.109159014

>>109158878
i look like this and a day in my life is like this

Anonymous
06/29/26(Mon)00:46:10 No.109159020

Anonymous 06/29/26(Mon)00:46:10 No.109159020

>>109158887
>$12000
Why the fuck did the price skyrocket all of a sudden and when will it go back down?

Anonymous
06/29/26(Mon)00:46:10 No.109159021

Anonymous 06/29/26(Mon)00:46:10 No.109159021

File: Screenshot from 2026-06-2(...).png (35 KB, 640x360)

35 KB PNG

snailwaifu? I scoff, I lauff

Anonymous
06/29/26(Mon)00:49:40 No.109159031

Anonymous 06/29/26(Mon)00:49:40 No.109159031

>>109159020
>when will it go back down
when something like the french revolution happens again

Anonymous
06/29/26(Mon)00:51:22 No.109159037

Anonymous 06/29/26(Mon)00:51:22 No.109159037

prices spiked, so lots of people will be trying lots of things using vc money.

>will I be able to afford nvidia branded products
idk, maybe never

BUT someone else could come out with a compute-heavy pci card.

Anonymous
06/29/26(Mon)00:53:38 No.109159046

Anonymous 06/29/26(Mon)00:53:38 No.109159046

Mistral NeMo Q4 GGUF runs decently in my laptop (12e-cores, 32GB of RAM) with 8 threads. Q6 is kinda slow. Using llama-cli v4524
Any suggestions for new small models? What are the latest NeMo-like ones?

Anonymous
06/29/26(Mon)00:54:24 No.109159050

Anonymous 06/29/26(Mon)00:54:24 No.109159050

>>109159037
>Nvidia margins are primarily from software
>software is solved by ai
>Nvidia margins are over

Anonymous
06/29/26(Mon)00:58:49 No.109159065

Anonymous 06/29/26(Mon)00:58:49 No.109159065

>>109159046
Also, I asked for its name and it told me it's Vicuna-13b. Is this known? Jej

Anonymous
06/29/26(Mon)01:01:47 No.109159073

Anonymous 06/29/26(Mon)01:01:47 No.109159073

Wait, what happened ITT? Since when are people shilling for NVIDIA and baiting with banning Chinese models and shit?

Anonymous
06/29/26(Mon)01:07:33 No.109159100

Anonymous 06/29/26(Mon)01:07:33 No.109159100

>>109159073
>Since when are people shilling for NVIDIA
when has this general ever not shilled for nvidia

Anonymous
06/29/26(Mon)01:10:59 No.109159116

Anonymous 06/29/26(Mon)01:10:59 No.109159116

>>109159100
I mean, it's being shilled not in the sense of "you need a nvidia GPU to run stuff" but as in "I'd pay 100k for this 10k GPU if I had to". I don't remember the general being like this.

Anonymous
06/29/26(Mon)01:11:47 No.109159118

Anonymous 06/29/26(Mon)01:11:47 No.109159118

>>109159073
>>109159100
It's less being excited for jensen and being disgusted by everything else more.

Anonymous
06/29/26(Mon)01:23:07 No.109159154

Anonymous 06/29/26(Mon)01:23:07 No.109159154

>>109159116
that’s just the price of the product.

Anonymous
06/29/26(Mon)01:25:06 No.109159161

Anonymous 06/29/26(Mon)01:25:06 No.109159161

I'm interested in running something for translations locally, mainly JP -> Eng, ideally with OCR for images
What are my options? I have a 4090

Anonymous
06/29/26(Mon)01:27:50 No.109159176

Anonymous 06/29/26(Mon)01:27:50 No.109159176

>>109159161
you can use the LLM from the company that runs google translate

Anonymous
06/29/26(Mon)01:31:18 No.109159185

Anonymous 06/29/26(Mon)01:31:18 No.109159185

>>109159161
qwen 3.6 27b or gemma 4 31b would suit your needs. q4 quant.

Anonymous
06/29/26(Mon)01:34:50 No.109159200

Anonymous 06/29/26(Mon)01:34:50 No.109159200

>>109159161
Gemma 31B

Anonymous
06/29/26(Mon)01:37:45 No.109159211

Anonymous 06/29/26(Mon)01:37:45 No.109159211

>>109159161
there's a bunch of good OCR models on HF. you're probably best off using those to prepare the data for another LLM like Gemma.

Anonymous
06/29/26(Mon)01:50:05 No.109159237

Anonymous 06/29/26(Mon)01:50:05 No.109159237

what's the best model for 2 3090s?

Anonymous
06/29/26(Mon)01:52:51 No.109159247

Anonymous 06/29/26(Mon)01:52:51 No.109159247

>>109159237
Still Gemma unless you've got a mountain of RAM for a GLM, M3, Deepseek, or Kimi quant.

llama.cpp CUDA dev !!yhbFjk57TDr
06/29/26(Mon)02:06:32 No.109159284

llama.cpp CUDA dev !!yhbFjk57TDr 06/29/26(Mon)02:06:32 No.109159284

>>109158158
>>109158226
I have put better NUMA support on the back burner because buying 1.5 TB RAM would be currently financially irresponsible for me.
For a single-slot EPYC CPU you can expose the NUMA nodes in the BIOS, I don't know how much better or worse the performance would end up being if llama.cpp was properly optimized for that.

Anonymous
06/29/26(Mon)02:13:29 No.109159302

Anonymous 06/29/26(Mon)02:13:29 No.109159302

Best way to integrate vision LLM into ComfyUI workflow?

I was thinking something like
1. user write a prompt
2. LLM expands the prompt
3. Image is generated
4. original prompts and generated image are given to LLM again
5. New, improved prompt is creaed
6. Second image is generated.

Maybe repeat steps 4-6 until LLM is satisfied with the result?

Anonymous
06/29/26(Mon)02:14:48 No.109159304

Anonymous 06/29/26(Mon)02:14:48 No.109159304

>>109159302
Preferably uncensored.

Anonymous
06/29/26(Mon)02:28:35 No.109159329

Anonymous 06/29/26(Mon)02:28:35 No.109159329

File: gpu-prices.png (230 KB, 1313x766)

230 KB PNG

gemma needing to have its max token count be 300+ is pretty annoying. makes short responses impossible, unless i'm missing something here

Anonymous
06/29/26(Mon)02:30:24 No.109159333

Anonymous 06/29/26(Mon)02:30:24 No.109159333

>>109159329
>0.29 t/s
Jesus Christ anon what are you running Gemma on, a gameboy color?

Anonymous
06/29/26(Mon)02:38:08 No.109159359

Anonymous 06/29/26(Mon)02:38:08 No.109159359

>>109159333
nice trips
it's a wip frontend. it's actually .83 t/s according to the console
>what are you running Gemma on, a gameboy color?
31B-it-Q5_K_M on a 3070. 26B runs way better, but i'm worried i'm missing out on quality

Anonymous
06/29/26(Mon)02:45:59 No.109159377

Anonymous 06/29/26(Mon)02:45:59 No.109159377

>>109159284
>I have put better NUMA support on the back burner because buying 1.5 TB RAM would be currently financially irresponsible for me.
I... assumed you'd be provided hardware / funding for things like this after the hf acquisition?
Do you do all this on your own dime??

Anonymous
06/29/26(Mon)02:48:21 No.109159383

Anonymous 06/29/26(Mon)02:48:21 No.109159383

>>109159359
Cool. I should rewrite my frontend but too lazy for that.
26B isn't that bad for regular chats but it is bit more slopped.

Anonymous
06/29/26(Mon)02:54:47 No.109159399

Anonymous 06/29/26(Mon)02:54:47 No.109159399

>MTP loads fine on 12b
>26b A4B never loads the MTP
why?

llama.cpp CUDA dev !!yhbFjk57TDr
06/29/26(Mon)03:14:53 No.109159443

llama.cpp CUDA dev !!yhbFjk57TDr 06/29/26(Mon)03:14:53 No.109159443

>>109159377
Both HF and NVIDIA are providing me with compute credits though those are currently not of any use to me.
I am not receiving monetary or hardware sponsorship from HF but I recently asked them to help finance my DDR5 server in particular.
NVIDIA is currently offering to provide me with more consumer Blackwell hardware but I have for now declined since that particular hardware would not help with my work.
Financially speaking I have made a net loss from contributing to the upstream llama.cpp repository though I have made a net profit overall when considering paid work on private forks.
But honestly I really don't care about money for myself beyond the amount that I need to live.

Anonymous
06/29/26(Mon)03:20:45 No.109159460

Anonymous 06/29/26(Mon)03:20:45 No.109159460

>>109159443
>But honestly I really don't care about money for myself beyond the amount that I need to live.
Good choice given you're vaxxed. Don't have to save for retirement

Anonymous
06/29/26(Mon)03:23:32 No.109159466

Anonymous 06/29/26(Mon)03:23:32 No.109159466

vaxxed
waxed
truvada maxxed

Anonymous
06/29/26(Mon)03:27:58 No.109159476

Anonymous 06/29/26(Mon)03:27:58 No.109159476

>>109159329
remove any prompt mention of being verbose etc. she is by default it just makes her longer

Anonymous
06/29/26(Mon)03:29:11 No.109159486

Anonymous 06/29/26(Mon)03:29:11 No.109159486

La la la la la la

Anonymous
06/29/26(Mon)03:40:48 No.109159508

Anonymous 06/29/26(Mon)03:40:48 No.109159508

I want youI need youI want youI need you you you you you you 0 0 0 0 0 0

Anonymous
06/29/26(Mon)03:51:44 No.109159551

Anonymous 06/29/26(Mon)03:51:44 No.109159551

>>109159443
Why is your DDR5 server so important to you that you specifically requested they help fund it?

Anonymous
06/29/26(Mon)03:59:44 No.109159582

Anonymous 06/29/26(Mon)03:59:44 No.109159582

>>109159329
If you want gemma to be concise, just put that in the system prompt rather than using token limits.

Anonymous
06/29/26(Mon)04:03:57 No.109159602

Anonymous 06/29/26(Mon)04:03:57 No.109159602

>>109159247
what if I have 128gb system ram besides 2 3090s?

Anonymous
06/29/26(Mon)04:06:01 No.109159607

Anonymous 06/29/26(Mon)04:06:01 No.109159607

File: 1760191592109338.png (369 KB, 710x770)

369 KB PNG

BAN ALL OPEN SOURCE AI

Anonymous
06/29/26(Mon)04:07:36 No.109159610

Anonymous 06/29/26(Mon)04:07:36 No.109159610

>>109159607
>His warns
thanks for the warns sir

Anonymous
06/29/26(Mon)04:07:52 No.109159611

Anonymous 06/29/26(Mon)04:07:52 No.109159611

>>109159607
gotta keep it family friendly!

Anonymous
06/29/26(Mon)04:10:15 No.109159619

Anonymous 06/29/26(Mon)04:10:15 No.109159619

>>109158769
At your budget I would u ironically go for 8x Spark and. 2k$ switch. That's 30k$ total at current prices, 1 TB VRAM at @2 TB/s, and actually working recipes to run large models like Kimi 2.7 and GLM 5.2 at usable speeds (20-30 t/s).

Anonymous
06/29/26(Mon)04:12:29 No.109159623

Anonymous 06/29/26(Mon)04:12:29 No.109159623

>>109159607
This guy really hasn't had enough. I guess that stunt really was all for show.

Anonymous
06/29/26(Mon)04:13:49 No.109159628

Anonymous 06/29/26(Mon)04:13:49 No.109159628

>>109159607
yeah let's ban Meta models

Anonymous
06/29/26(Mon)04:19:40 No.109159650

Anonymous 06/29/26(Mon)04:19:40 No.109159650

>>109159607
This guy is the satan himself. What is wrong with every tech CEO, they are all sociopaths and lunatics.

Anonymous
06/29/26(Mon)04:26:09 No.109159674

Anonymous 06/29/26(Mon)04:26:09 No.109159674

https://old.reddit.com/r/LocalLLaMA/comments/1uicq8x/locally_running_mode_turns_an_image_into_a_cute/

Anonymous
06/29/26(Mon)04:26:29 No.109159677

Anonymous 06/29/26(Mon)04:26:29 No.109159677

>>109159650
>local is getting to the "good enough" point that it's poaching my customers so it needs to be b&
>chinese are distilling my model and releasing it to the public so they needed to be b&
>only I can be trusted with this dangerous technology (so I can charge 50x the price for you goyim and you won't be able to say no)

llama.cpp CUDA dev !!yhbFjk57TDr
06/29/26(Mon)04:26:56 No.109159679

llama.cpp CUDA dev !!yhbFjk57TDr 06/29/26(Mon)04:26:56 No.109159679

>>109159551
Because I can't run Deepseek and Kimi at 8 BPW with only 512 GB of RAM.
I'm currently being sidetracked but over the next few months I intend to establish better methodology in llama.cpp for measuring model quality, particularly across models and as a function of quantization.
I don't strictly need more and/or faster RAM but it would help a lot with extending the range of models that I can test against each other.

Anonymous
06/29/26(Mon)04:32:28 No.109159695

Anonymous 06/29/26(Mon)04:32:28 No.109159695

>>109159674
Every day, we are getting closer to AI anime waifus.

Anonymous
06/29/26(Mon)04:33:02 No.109159700

Anonymous 06/29/26(Mon)04:33:02 No.109159700

File: 1751389892454476.png (1.06 MB, 1080x1051)

1.06 MB PNG

>>109159607
courtesy of *shudders* reddit

Anonymous
06/29/26(Mon)04:36:40 No.109159712

Anonymous 06/29/26(Mon)04:36:40 No.109159712

>>109159674
Literally more promising then Google's "world model"

Anonymous
06/29/26(Mon)04:41:03 No.109159732

Anonymous 06/29/26(Mon)04:41:03 No.109159732

>>109158158
>>109158226
You can unironically use >Claude to vibecode NUMA support. I slopped together an implementation for 2 EPYC 7532s where it splits the weights across 2 nodes (NPS1 in BIOS), it gets around 1.5x the prefill and 1.3x the decode compared against pinning everything to one node with numactl --cpunodebind=0 --membind=0.
Not going to upstream it obviously. I don't have it in a public repo yet, but I'll throw it up there soon in case anyone is stupid enough to use a vibe-coded fork.

Anonymous
06/29/26(Mon)04:41:25 No.109159733

Anonymous 06/29/26(Mon)04:41:25 No.109159733

>>109159607
I don't know what's worse, the baldface way he's lying through his teeth to get daddy gubmint to give him a pseudo-moat or the fact that its more than likely to work because the people with the levers of power are not equipped to evaluate even moderately in-depth issues with intelligence and nuance.
Thank god I don't live in burgerland so this retardation has a chance of not trickling down to me (or at least buys me time).

Anonymous
06/29/26(Mon)04:44:56 No.109159747

Anonymous 06/29/26(Mon)04:44:56 No.109159747

>>109159732
>it gets around 1.5x the prefill and 1.3x the decode compared against pinning everything to one node with numactl
Not to be overly dismissive (since it could be legit) but it sounds like you're just saturating the socket crossbar. I don't think your vibecoded fork does what you think it does vs just laying out your threads in a way that isn't bottlenecked on the crossbar constantly.
You need to track threads to tensors, which is nontrivial and probably wouldn't have been picked up by a vibecoding session if the prompter didn't already understand the code, problem domain and likely solutions.

Anonymous
06/29/26(Mon)04:46:20 No.109159750

Anonymous 06/29/26(Mon)04:46:20 No.109159750

Why are redditors doing all the cool shit. /lmg/ is slacking.

Anonymous
06/29/26(Mon)04:55:41 No.109159785

Anonymous 06/29/26(Mon)04:55:41 No.109159785

2x 4090 + server board EPYCs with DDR5 (cpumaxxers one from a long ago):

Testing GLM-5.2 (GLM-5.2-mixed-IQ2_S-IQ4_NL) at 64k context

Ram use: 207 at start, climbing to ~250 after 3 chats. Slightly unoptimized settings (20 gb vram + 10.5 gb vram), had some room to squeeze a gpu layer in.

2.5 tok/s · 88 tokens · 105.1s total · 69.45s to first token · 1.5 tok/s prompt (on)
2.7 tok/s · 341 tokens · 155.8s total · 30.73s to first token · 4.4 tok/s prompt (on)
3.0 tok/s · 238 tokens · 152.4s total · 72.35s to first token · 5.9 tok/s prompt (thinking off here)
2.8 tok/s · 405 tokens · 226.3s total · 80.66s to first token · 7.7 tok/s prompt (on)

She's censored for a straight NSFW request and we stopped here with the initial test, I suspect I might go up to 4 t/s with optimization but no greater?.
There is no gooning at these delays.

Testing Step-3.5-Flash-Ablitirated.i1-Q4_K_M.gguf @ 16k context

Ram use 105, GPU use 22.4 + 21.6

11.3 tok/s · 257 tokens · 38.4s · 15.61s to first token
16.6 tok/s · 359 tokens · 34.9s · 13.28s to first token
15.2 tok/s · 412 tokens · 39.1s · 12.08s to first token
12.2 tok/s · 823 tokens · 82.1s · 14.70s to first token

She's uncensored, I'm sure she won't say no, but we stopped here for now.

Finetuned Gemma 31b variations is where it's at for me, glad we have that.

Anonymous
06/29/26(Mon)04:58:55 No.109159795

Anonymous 06/29/26(Mon)04:58:55 No.109159795

File: 1761224423324319.png (1.26 MB, 1000x1450)

1.26 MB PNG

Anonymous
06/29/26(Mon)05:09:00 No.109159833

Anonymous 06/29/26(Mon)05:09:00 No.109159833

>>109159747
>which is nontrivial and probably wouldn't have been picked up by a vibecoding session if the prompter didn't already understand the code, problem domain and likely solutions.
This is true for almost all AI code generation

Anonymous
06/29/26(Mon)05:14:32 No.109159858

Anonymous 06/29/26(Mon)05:14:32 No.109159858

>>109159833
Claude, make llama.cpp run kimi 2.6 on my 16GB RAM laptop, make no mistakes

Anonymous
06/29/26(Mon)05:27:37 No.109159920

Anonymous 06/29/26(Mon)05:27:37 No.109159920

>>109159747
I should've clarified that I was testing with 4 R9700s at the time and 40% of the model being on CPU, so the CPU speedup wasn't going to be massive.
Ran tests with 1 R9700 and using ncmoe for all MOE layers. For whatever reason the prefill is pretty much unchanged between the options this time (unlike Qwen 397B), and I'm capped at the 1.3x speedup on decode. I'll look into that - maybe you're right, but >Claude said that it's doing what you suggested. I can provide you its explanation of what it did if you want.
Tests:
... build/bin/llama-bench -m ~/models/GLM-4.7-Q3_K_L.gguf -sm layer --device ROCm0 -fa 1 --numa split -t 48 -ncmoe 92
(the fork): 40.8 t/s PP and 6.9 t/s TG
numactl --cpunodebind=0 --membind=0 build/bin/llama-bench -m ~/models/GLM-4.7-Q3_K_L.gguf -sm layer --device ROCm0 -fa 1 --numa numactl -t 32 -ncmoe 92
: 39.7 t/s PP and 5.3 t/s TG
numactl --cpunodebind=0 --membind=0 build/bin/llama-bench -m ~/models/GLM-4.7-Q3_K_L.gguf -sm layer --device ROCm0 -fa 1 --numa numactl -t 24 -ncmoe 92
: 39.2 t/s PP and 5.3 t/s TG
I give a VM 8 threads on one CPU, so I can only test with 48 threads if I want each node to have a perfect split.

Anonymous
06/29/26(Mon)05:42:07 No.109159978

Anonymous 06/29/26(Mon)05:42:07 No.109159978

deepsneed merged

Anonymous
06/29/26(Mon)05:43:25 No.109159987

Anonymous 06/29/26(Mon)05:43:25 No.109159987

>>109159785
Step is straight up retarded. Try deepseek v4 flash. It's pretty good.

Anonymous
06/29/26(Mon)05:44:38 No.109159991

Anonymous 06/29/26(Mon)05:44:38 No.109159991

>>109159978
Finally

Anonymous
06/29/26(Mon)05:45:26 No.109159996

Anonymous 06/29/26(Mon)05:45:26 No.109159996

64G system, 12G vram
still nothing serious other than gem4 26ba4b or qwen 35ba3b?

Anonymous
06/29/26(Mon)05:46:35 No.109160002

Anonymous 06/29/26(Mon)05:46:35 No.109160002

>>109159996
you're lucky to be able to run either of those honestly

Anonymous
06/29/26(Mon)05:48:41 No.109160010

Anonymous 06/29/26(Mon)05:48:41 No.109160010

>>109160002
fair point but still

Anonymous
06/29/26(Mon)05:50:59 No.109160021

Anonymous 06/29/26(Mon)05:50:59 No.109160021

thoughts on command-a-plus-05-2026 ?

Anonymous
06/29/26(Mon)05:53:10 No.109160030

Anonymous 06/29/26(Mon)05:53:10 No.109160030

>>109160021
poop

Anonymous
06/29/26(Mon)06:02:47 No.109160064

Anonymous 06/29/26(Mon)06:02:47 No.109160064

>>109159607
He is right. However, what should be the alternative? That only a small elite get access to AI while the rest of the world becomes disempowered already? Anthropic is not giving the public access to its best models and restricting access heavily, even pre Fable ban. If they want to ban open source, they should at least be more generous with their alternative.

Anthropic's concern with open source is probably not current harms but that it accelerates AGI, especially for China, and that this is bad because misaligned AGI will kill us all. However, Anthropic is doing more to accelerate AGI than the entire open source community. They were the only lab racing straight towards AGI, and are now making OpenAI race for its continued existence.

Anonymous
06/29/26(Mon)06:10:26 No.109160089

Anonymous 06/29/26(Mon)06:10:26 No.109160089

https://github.com/ggml-org/llama.cpp/pull/24162
https://github.com/ggml-org/llama.cpp/pull/24162
https://github.com/ggml-org/llama.cpp/pull/24162
DEEPSEEK V4 SUPPORT MERGED

Anonymous
06/29/26(Mon)06:15:19 No.109160116

Anonymous 06/29/26(Mon)06:15:19 No.109160116

>>109160064
He is wrong, kys
Open source democratises the technology, it can be used to strengthen the cybersecurity of any users machines and software, it can be used as a great educational tool and can massively increase productivity all without being forced to surrender all privacy and be tethered financially to a private company.

This is the real sin of open source, it allows people to have sovereignty, of their data and financial dependencies, it also encourages people to move away from closed source to open source software. It is literally 100% this fucking sociopath trying to secure a monopoly/cartel and build a moat to force people to surrender data and money to keep up in an AI assisted world.

You need to shut the fuck up and go suck one thousand cocks instead of spewing your apologist shill bullshit

Anonymous
06/29/26(Mon)06:16:25 No.109160122

Anonymous 06/29/26(Mon)06:16:25 No.109160122

>>109160089
It only took two months

Anonymous
06/29/26(Mon)06:21:39 No.109160146

Anonymous 06/29/26(Mon)06:21:39 No.109160146

>>109160089
china no. 1. the us lost.

Anonymous
06/29/26(Mon)06:22:39 No.109160153

Anonymous 06/29/26(Mon)06:22:39 No.109160153

>>109160089
maybe now i can run those q1 gigacope quants on my machine lol
i wish there was a ~40B version

Anonymous
06/29/26(Mon)06:25:14 No.109160165

Anonymous 06/29/26(Mon)06:25:14 No.109160165

File: 1754598910857730.png (104 KB, 1084x975)

104 KB PNG

V4.1 soon

Anonymous
06/29/26(Mon)06:26:22 No.109160176

Anonymous 06/29/26(Mon)06:26:22 No.109160176

>>109160064
AGI is 50% a meme to hype up language models and 50% cope by billionaires confronted with their own mortality.
And even if we do get AGI it needs to be open-source so that it can do actually useful things like decensoring eroge and modding in niche fetishes.

Anonymous
06/29/26(Mon)06:28:01 No.109160185

Anonymous 06/29/26(Mon)06:28:01 No.109160185

>>109160176
>actually useful things like decensoring eroge and modding in niche fetishes
Can I have it roleplay as a girl that likes me?

Anonymous
06/29/26(Mon)06:28:57 No.109160190

Anonymous 06/29/26(Mon)06:28:57 No.109160190

>>109160185
That's not safe, so no.

Anonymous
06/29/26(Mon)06:34:19 No.109160215

Anonymous 06/29/26(Mon)06:34:19 No.109160215

>>109160064
>Anthropic is doing more to accelerate AGI than the entire open source community
ba making...more transformer-based models...WHOA

Anonymous
06/29/26(Mon)06:35:57 No.109160223

Anonymous 06/29/26(Mon)06:35:57 No.109160223

File: itworks.png (74 KB, 1240x1077)

74 KB PNG

>>109160089

Anonymous
06/29/26(Mon)06:37:55 No.109160233

Anonymous 06/29/26(Mon)06:37:55 No.109160233

>>109160089
Why are people still hyped by Deepseek, the GLM fags destroyed those frauds hard

Anonymous
06/29/26(Mon)06:39:35 No.109160242

Anonymous 06/29/26(Mon)06:39:35 No.109160242

>>109160089
Now we wait for V4.1 because flash is kinda meh

Anonymous
06/29/26(Mon)06:42:11 No.109160260

Anonymous 06/29/26(Mon)06:42:11 No.109160260

>>109160242
Flash is the Gemma of the mid-sized class though
>>109160223
Haha woahh dude, wooahh

Anonymous
06/29/26(Mon)06:43:14 No.109160266

Anonymous 06/29/26(Mon)06:43:14 No.109160266

>>109160260
Flash needs more claude code traces in its training data.

Anonymous
06/29/26(Mon)06:43:36 No.109160267

Anonymous 06/29/26(Mon)06:43:36 No.109160267

>>109160190
Awww...okay.

Anonymous
06/29/26(Mon)06:44:44 No.109160275

Anonymous 06/29/26(Mon)06:44:44 No.109160275

Does Hermes Agent actually have a future or is it just a stratagem to give the Nous Research organization more undeserved visibility? The software is very janky, their GitHub repository has 300 pages of open issues, the desktop app ("hermes desktop") is barely functional.

Anonymous
06/29/26(Mon)06:47:04 No.109160284

Anonymous 06/29/26(Mon)06:47:04 No.109160284

File: Screenshot at 2026-06-29 (...).png (106 KB, 779x607)

106 KB PNG

>>109160223
Does it have vision? What does it think about Gemmys answer?

Anonymous
06/29/26(Mon)06:50:22 No.109160303

Anonymous 06/29/26(Mon)06:50:22 No.109160303

>>109159607
It has proven on Gemma4 31B, it doesn't need JB or lobotomy. Imagine 124B potential and what it can do.

Anonymous
06/29/26(Mon)06:50:58 No.109160308

Anonymous 06/29/26(Mon)06:50:58 No.109160308

>>109160275
I don't think it has future. it's memory mechanism isn't easily auditable. you can't even click and expand what the agents are doing in tui. the core feature of it is persistent memory but if it's unreliable it has no value, or is even harmful. the memory can have bias and drift silently and affect future sessions. so by design it's deeply flawed. you don't see people show long term usage of it, only one-off use case is shown, and openclaw might be a better choice at this point. in their use case in official docs, there's no demonstration of iterative memory/skill improvement.
a much better alternative I think is projectmem, because you at least have a timeline to track what's going on.

Anonymous
06/29/26(Mon)06:51:35 No.109160312

Anonymous 06/29/26(Mon)06:51:35 No.109160312

>>109160223
Can I run it on my 16GB VRAM potato?

Anonymous
06/29/26(Mon)06:54:13 No.109160324

Anonymous 06/29/26(Mon)06:54:13 No.109160324

>>109159674
>the boy is growing up
Is it a feature?

Anonymous
06/29/26(Mon)06:56:25 No.109160330

Anonymous 06/29/26(Mon)06:56:25 No.109160330

>>109160312
Yes as long as it comes with at least 128 GB of ram

Anonymous
06/29/26(Mon)07:07:13 No.109160389

Anonymous 06/29/26(Mon)07:07:13 No.109160389

>>109160165
Come on whale, multi modal and a bit fewer hallucinations.

Already getting 65 tg with DSpark single concurrency on 2x DGX Spark, it just needs a bit of quality bump.

Anonymous
06/29/26(Mon)07:10:12 No.109160406

Anonymous 06/29/26(Mon)07:10:12 No.109160406

>>109160389
>a bit fewer hallucinations.
lol

Anonymous
06/29/26(Mon)07:12:11 No.109160416

Anonymous 06/29/26(Mon)07:12:11 No.109160416

>>109160089
do i have to make my own quants?

Anonymous
06/29/26(Mon)07:13:12 No.109160424

Anonymous 06/29/26(Mon)07:13:12 No.109160424

>>109160406
is it that bad? maybe I should have tried the web chat instead of waiting for support to be merged.

Anonymous
06/29/26(Mon)07:15:16 No.109160435

Anonymous 06/29/26(Mon)07:15:16 No.109160435

>>109160416
These work https://huggingface.co/antirez/deepseek-v4-gguf/tree/main

Anonymous
06/29/26(Mon)07:23:38 No.109160479

Anonymous 06/29/26(Mon)07:23:38 No.109160479

>>109159607

They're really starting to feel the heat from the Chinks getting closer.
I wish Chang keeps on cranking out open models, because few years of this and they'll have obsoleted a lot of the cloud bullshit.

>>109159650

Aside from these people being fucking insane, simply think of them as a loudspeaker for government policies that haven't been yet put into law.
These guys need to deepthroat the state cock or they'll get fucked by the system.
They'll parrot whatever the state tells them to say and it's clear the government doesn't want the average person to have any access to AI. It's way too powerful of a tool to have locally.

Then there's the fact that cloud AI will be obsoleted with powerful enough local models.
These people have zero business model if we have Gemma 5 124B available that can be run on couple of consumer GPUs.
If Chinks put out something in that ballpark it's game over. Imagine where local is going to be 5-10 years from now.
They practically need to put a stop to it now or they're fucked.

Anonymous
06/29/26(Mon)07:24:32 No.109160489

Anonymous 06/29/26(Mon)07:24:32 No.109160489

>>109160424
it is quite bad

Anonymous
06/29/26(Mon)07:26:51 No.109160506

Anonymous 06/29/26(Mon)07:26:51 No.109160506

File: 1762769620980590.gif (49 KB, 296x212)

49 KB GIF

>>109158878
I prefer slugcats

Anonymous
06/29/26(Mon)07:29:15 No.109160519

Anonymous 06/29/26(Mon)07:29:15 No.109160519

>>109159161
dots.ocr for ocr then gemma for translation

Anonymous
06/29/26(Mon)07:33:48 No.109160535

Anonymous 06/29/26(Mon)07:33:48 No.109160535

>>109160435
is there a uncensored version? the huihui one gives me errors

Anonymous
06/29/26(Mon)07:35:17 No.109160541

Anonymous 06/29/26(Mon)07:35:17 No.109160541

STOP USING CLOSE-SOURCED LLMs. !!!!

Anonymous
06/29/26(Mon)07:35:41 No.109160544

Anonymous 06/29/26(Mon)07:35:41 No.109160544

>get memed into downloading Gemma
>Assistantslopped to the max
>Refusals refusals refusals
>prose so flowery I have no fucking idea what it's even trying to say
>stops generation midway to "think" and ask me to review
very funny /lmg/, you trolled me. Now what's the actual good models in the 20-30b range?

Anonymous
06/29/26(Mon)07:37:38 No.109160553

Anonymous 06/29/26(Mon)07:37:38 No.109160553

>>109160544
>not getting abliterated version
>not strictly but calmly reminding it who is the owner here

Anonymous
06/29/26(Mon)07:38:09 No.109160557

Anonymous 06/29/26(Mon)07:38:09 No.109160557

>>109159674
Reminds me of this:
https://oasis.decart.ai/starting-point
Doesn't seem to be working anymore though.

Anonymous
06/29/26(Mon)07:38:54 No.109160560

Anonymous 06/29/26(Mon)07:38:54 No.109160560

>>109160544
>prose so flowery I have no fucking idea what it's even trying to say
Put into the system prompt that it has to use windowpane prose.

Anonymous
06/29/26(Mon)07:39:15 No.109160561

Anonymous 06/29/26(Mon)07:39:15 No.109160561

>>109158654
>glm5.2 at q5 on a 12 channel ddr5 board with at least 2 3090s can get you about 25t/s
Can it? My rig is 12xDDR5-6400 and a Pro 6000 but I'm sitting at around 20t/s with Q4_K_M

Anonymous
06/29/26(Mon)07:42:15 No.109160571

Anonymous 06/29/26(Mon)07:42:15 No.109160571

I love it when people confidently post in these threads about how completely fucking retarded they are

Anonymous
06/29/26(Mon)07:43:43 No.109160576

Anonymous 06/29/26(Mon)07:43:43 No.109160576

>>109160308
There are a ton of memory implementation for hermes, which one are you even talking about? They are not even part of hermes itself, they are run externally and then configured in hermes. It's the same for almost anything in hermes, it's just a frontend for a bunch of things.
I do agree that the project itself isn't great though, lot of PR adding stuff I want or fixing small but important bugs I encounter daily that have been open for weeks, sometimes months without a reply by a maintainer, lost count of how many I have actually merged locally. The code quality is atrocious, entirely vibe coded, the git history is full of shit getting merged everyday that no one cares about. The sad thing is that this is the case all over, all AI related projects are shit, hermes is just the more "correct", the more usable frontend one can install.

Anonymous
06/29/26(Mon)07:44:30 No.109160580

Anonymous 06/29/26(Mon)07:44:30 No.109160580

>>109160571
me two

Anonymous
06/29/26(Mon)07:44:59 No.109160582

Anonymous 06/29/26(Mon)07:44:59 No.109160582

>>109160571
He almost got me to reply but there's just no helping some people so I didn't bother.

Anonymous
06/29/26(Mon)07:45:11 No.109160584

Anonymous 06/29/26(Mon)07:45:11 No.109160584

>>109160571
Sorry, forgot to quote
Meant for >>109160541

Anonymous
06/29/26(Mon)07:46:31 No.109160587

Anonymous 06/29/26(Mon)07:46:31 No.109160587

>>109160535
It doesn't think by default on the chat completion endpoint and so far if it doesn't refuse in the first message it just keeps going fine.

Anonymous
06/29/26(Mon)07:50:49 No.109160603

Anonymous 06/29/26(Mon)07:50:49 No.109160603

File: whatCouldPossiblyGoWrong.png (94 KB, 517x917)

94 KB PNG

Yesterday some anon was asking what piper would need paid to collapse the AI bubble. My response was these schemes require continual refinancing to stay alive; once they have to service debt from cash flows valuations crash back to reality.
And right on queue, here's another example of where the money's coming from. Speculative borrowing.
I really thought the wheels would come off by Q2 2026. It's June 29 and the collapse still in tmw. Oh well.
>>109160479
>simply think of them as a corporate loudspeaker for what government policies they want put into law
FTFY. There's no need to go to the lengths of conspiracy. It's just Dario trying to get the US Gov't to create a moat for him, through regulatory capture.
To make US regulatory capture work, though, he needs to get the Chinese banned from the US market, and open weight models shut down or neutered to point of being useless.
>>109159607
I've said it before, and I will say it again.
Fuck this mfer and his constant ranting.

Anonymous
06/29/26(Mon)07:53:19 No.109160617

Anonymous 06/29/26(Mon)07:53:19 No.109160617

File: 1770998358203098.png (40 KB, 1181x140)

40 KB PNG

>>109160435
aren't his pro quants a little big? the unquanted raw model is about 850gb so the q4 being that size too is odd, even if you consider that a good chunk of the model is natively q4 by default

Anonymous
06/29/26(Mon)07:53:56 No.109160622

Anonymous 06/29/26(Mon)07:53:56 No.109160622

File: 1759261167071086.png (519 KB, 562x615)

519 KB PNG

I unequivocally trust this man.

Anonymous
06/29/26(Mon)07:58:50 No.109160640

Anonymous 06/29/26(Mon)07:58:50 No.109160640

>>109160617
Original parameters are not fp16
>FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8.

Anonymous
06/29/26(Mon)08:01:04 No.109160647

Anonymous 06/29/26(Mon)08:01:04 No.109160647

>>109160617
Ideally, AesSedai or ubergarm does their magic because the quants here is a dumb straightfoward conversion and you need tensor quant recipes when it comes to MoE models to get the most out of it to reduce redundancies and etc.

Anonymous
06/29/26(Mon)08:01:07 No.109160648

Anonymous 06/29/26(Mon)08:01:07 No.109160648

>>109159607
Well yeah, makes sense he is sperging out.
I think chink AI was like 10% or 15% traffic on openrouter a year ago.
Now the main majority right?
Even the paypiggies don't trust western companies, especially after this stunt.
On X there are these hype accounts now as well for gpt 5.6 but nobody cares because that thing isn't even released publicly, such a bad look.
Maybe it was a step too far even for the normies. Good news for local.

Anonymous
06/29/26(Mon)08:01:17 No.109160649

Anonymous 06/29/26(Mon)08:01:17 No.109160649

>>109160622
How can a guy look any more like a cartoonish sneaky backstabber than this guy? You look at his face and instantly feel instinctive distrust.

Anonymous
06/29/26(Mon)08:03:20 No.109160664

Anonymous 06/29/26(Mon)08:03:20 No.109160664

>>109160647
They haven't even bothered to do K2.7-Code or GLM5.2 yet. It's over.

Anonymous
06/29/26(Mon)08:03:35 No.109160666

Anonymous 06/29/26(Mon)08:03:35 No.109160666

>>109160648
>Good news for local.
lol

Anonymous
06/29/26(Mon)08:04:21 No.109160672

Anonymous 06/29/26(Mon)08:04:21 No.109160672

>>109159607
That's it, I'm learning mandarin.

Anonymous
06/29/26(Mon)08:04:33 No.109160674

Anonymous 06/29/26(Mon)08:04:33 No.109160674

>>109160666
How is it not?

Anonymous
06/29/26(Mon)08:04:34 No.109160675

Anonymous 06/29/26(Mon)08:04:34 No.109160675

>>109160576
the built-in one memory.md and user.md, and these are always active alongside other external memory providers

Anonymous
06/29/26(Mon)08:06:30 No.109160684

Anonymous 06/29/26(Mon)08:06:30 No.109160684

>>109160622
He looks like the more nerdy and even more retarded cousin of Friedrich Merz.

Anonymous
06/29/26(Mon)08:06:38 No.109160685

Anonymous 06/29/26(Mon)08:06:38 No.109160685

File: rubbinghands.png (6 KB, 67x67)

6 KB PNG

>>109160622
Not pictured: picrel.

Anonymous
06/29/26(Mon)08:06:44 No.109160686

Anonymous 06/29/26(Mon)08:06:44 No.109160686

>>109159161
I'm using paddle ocr v6 medium with the paddlex server, and it's godawful, nearly an entire second for a 1080p image. Granted, I'm not running the 'high performance inference' plugin. But it also doesn't seem to handle vertical text very well; I've cutoff the confidence at 0.8, and it doesn't catch a lot of vertical text - https://files.catbox.moe/7zn9i1.png.
Using an abliterated gemma e2b qat q4. With a 5060 ti the speed is tolerable (1-3 seconds per frame depending on the window contents) if I don't send the original image to llama.cpp as well as the ocr results, but the translation is noticeably worse.
Works in 8gb of vram.
>>109160519
Is there an easy way to run dots.ocr on windows?

Anonymous
06/29/26(Mon)08:07:51 No.109160691

Anonymous 06/29/26(Mon)08:07:51 No.109160691

>>109160686
llama.cpp supports it

Anonymous
06/29/26(Mon)08:09:36 No.109160703

Anonymous 06/29/26(Mon)08:09:36 No.109160703

>>109160674
Dario boy is crying for local to get banned, and he cried before and managed to get his own banned

Anonymous
06/29/26(Mon)08:09:50 No.109160705

Anonymous 06/29/26(Mon)08:09:50 No.109160705

You didn't hear it from me but I suggest you guys stockpile some CPUs. If you thought memory was bad...

Anonymous
06/29/26(Mon)08:10:17 No.109160706

Anonymous 06/29/26(Mon)08:10:17 No.109160706

>>109160686
use dots.mocr

Anonymous
06/29/26(Mon)08:11:10 No.109160711

Anonymous 06/29/26(Mon)08:11:10 No.109160711

>home internet stops working
>don't want to go to work now because then I'd have no access to my shit
Ahhhhh

Anonymous
06/29/26(Mon)08:11:41 No.109160718

Anonymous 06/29/26(Mon)08:11:41 No.109160718

>>109160705
I wasn't planning to do it but I upgraded my main PC's CPU just in case last month.

Anonymous
06/29/26(Mon)08:12:10 No.109160723

Anonymous 06/29/26(Mon)08:12:10 No.109160723

>>109160691
Huh, I'll try it out then. Are there any other ocr models that llama.cpp supports?
>>109160706
Isn't that 3b parameters? That's kind of very big.

Anonymous
06/29/26(Mon)08:14:26 No.109160740

Anonymous 06/29/26(Mon)08:14:26 No.109160740

>>109160703
Yeah well not too many americans releasing llms anyway. Gemma was cool to be fair though, but the french made it right. kek
I wouldn't mind a ban of open models for burgers too much. It would be a great incentive for chinkland and europe to go all in.
Currently man of them are dependend on openai/anthropic. Even the chinks use it alot with vpns.
I'm ready to download local models with some shady tor darknet p2p shit.

Anonymous
06/29/26(Mon)08:14:31 No.109160741

Anonymous 06/29/26(Mon)08:14:31 No.109160741

>>109160089
@grok please summarize why this is good

Anonymous
06/29/26(Mon)08:15:18 No.109160749

Anonymous 06/29/26(Mon)08:15:18 No.109160749

>>109160705
Use case for super CPUs?

Anonymous
06/29/26(Mon)08:16:36 No.109160758

Anonymous 06/29/26(Mon)08:16:36 No.109160758

>>109160711
I feel you anon
>wake up
>check civitai
>new cool Lora
>but have to leave for 9 hours job
Its suffering. Knowing there is cool new stuff and you have to slave away in office.

Anonymous
06/29/26(Mon)08:21:04 No.109160785

Anonymous 06/29/26(Mon)08:21:04 No.109160785

>>109160064
>Anthropic's concern with open source is probably not current harms but that it accelerates AGI, especially for China, and that this is bad because misaligned AGI will kill us all.
Their concern is that they won't be able to charge 50$/mtok.

Anonymous
06/29/26(Mon)08:21:07 No.109160786

Anonymous 06/29/26(Mon)08:21:07 No.109160786

>>109160544
>getting refused by gemma of all things when I’m able to bend other models to my will with a prefill and make them output anything
the /lmg/ iq filter is real

Anonymous
06/29/26(Mon)08:21:55 No.109160792

Anonymous 06/29/26(Mon)08:21:55 No.109160792

>>109160706
The dots.mocr demo at dotsocr.xiaohongshu.com doesn't exactly inspire confidence: https://files.catbox.moe/6m05cb.png

Anonymous
06/29/26(Mon)08:33:48 No.109160855

Anonymous 06/29/26(Mon)08:33:48 No.109160855

File: The queen making fun of d(...).png (301 KB, 524x546)

301 KB PNG

>>109160685
well he's an actual jew so that fits lol

Anonymous
06/29/26(Mon)08:34:07 No.109160858

Anonymous 06/29/26(Mon)08:34:07 No.109160858

>>109160275
>or is it just a stratagem to give the Nous Research organization more undeserved visibility?
More likely a way to collect logs from API users for training. Can't get better at agentic coding without logs from harness users.

Anonymous
06/29/26(Mon)08:34:08 No.109160859

Anonymous 06/29/26(Mon)08:34:08 No.109160859

File: Capture.png (206 KB, 1197x1319)

206 KB PNG

Well, lads, I'm starting to vibecode my dream project. Wish me luck.

Anonymous
06/29/26(Mon)08:35:53 No.109160866

Anonymous 06/29/26(Mon)08:35:53 No.109160866

>>109160706
>dots.mocr
Isn't supported by llama.cpp.

Anonymous
06/29/26(Mon)08:37:27 No.109160877

Anonymous 06/29/26(Mon)08:37:27 No.109160877

>>109160740
>Gemma was cool to be fair though, but the french made it right.
Imagine a Mistral continued-pretrain of Gemma 31B a la Miqu.

Anonymous
06/29/26(Mon)08:39:01 No.109160887

Anonymous 06/29/26(Mon)08:39:01 No.109160887

>>109160711
If it's just the home internet, you at least have the hope that your ISP will fix it and you'll have access at some point in the day.

Anonymous
06/29/26(Mon)08:48:02 No.109160951

Anonymous 06/29/26(Mon)08:48:02 No.109160951

File: 1781278959158.jpg (188 KB, 930x1239)

188 KB JPG

>>109160685
>>109160855

Anonymous
06/29/26(Mon)08:50:03 No.109160967

Anonymous 06/29/26(Mon)08:50:03 No.109160967

>>109160951
that's ai right? Please tell me its edited

Anonymous
06/29/26(Mon)08:53:22 No.109160985

Anonymous 06/29/26(Mon)08:53:22 No.109160985

FUCK python dependencies

Anonymous
06/29/26(Mon)08:54:35 No.109160991

Anonymous 06/29/26(Mon)08:54:35 No.109160991

how do I make deepseek v4 flash q2 faster in llamacpp? I'm on 5070ti + 128gb ddr4

Anonymous
06/29/26(Mon)08:55:08 No.109160992

Anonymous 06/29/26(Mon)08:55:08 No.109160992

>>109160089
>https://github.com/ggml-org/llama.cpp/pull/24162/changes#diff-f8905c67974bbd91b84ad209f96e418a25f9bf63da77941bfda3ef00d44d6aae
>polluting existing headers that were somewhat generic
>break swa for other models
Very impressive, thank you Aman Gupta saaar. Thank God I'm not retarded and I always wait at least 1 month between pulls/rebuilds

Anonymous
06/29/26(Mon)08:55:53 No.109160998

Anonymous 06/29/26(Mon)08:55:53 No.109160998

>>109160866
You just need to adjust one line in the conversion script

Anonymous
06/29/26(Mon)08:57:35 No.109161006

Anonymous 06/29/26(Mon)08:57:35 No.109161006

>>109160519
Never dabbled with AI and stuff, how would I do that? I'm currently installing dots.ocr, just have no idea if gemma automatically gets the text from that or I have to copy paste it

Anonymous
06/29/26(Mon)08:58:02 No.109161007

Anonymous 06/29/26(Mon)08:58:02 No.109161007

What's the good sampling for GLM 5.2?
I used it with the same I use for 5.1 and it's kinda worse than 5.1.

Anonymous
06/29/26(Mon)08:58:33 No.109161011

Anonymous 06/29/26(Mon)08:58:33 No.109161011

>>109160992 (Me)
>AI usage disclosure: YES, paired with both codex and claude.
Forgot to add: fuck cudadev and ggerganov

Anonymous
06/29/26(Mon)09:04:37 No.109161035

Anonymous 06/29/26(Mon)09:04:37 No.109161035

File: 1778322859698542.png (216 KB, 884x1577)

216 KB PNG

Quantized anon's depurpled Gemma

Anonymous
06/29/26(Mon)09:06:43 No.109161051

Anonymous 06/29/26(Mon)09:06:43 No.109161051

>>109161035
this is so fucking soulless it hurts

Anonymous
06/29/26(Mon)09:07:27 No.109161054

Anonymous 06/29/26(Mon)09:07:27 No.109161054

>>109160992
Johannes Gäßler is gone. He wouldn't have let that shit get merged in that state.

Anonymous
06/29/26(Mon)09:08:19 No.109161059

Anonymous 06/29/26(Mon)09:08:19 No.109161059

>>109161035
>literally zero traces of any prose left
I mean it worked.

Anonymous
06/29/26(Mon)09:09:06 No.109161064

Anonymous 06/29/26(Mon)09:09:06 No.109161064

>>109161054
It's funny because you could tell he was fundamentally exhausted with the state of open source software in all the right ways. But he could never quite get to the point of aptly blaming pajeets and trannies. That's chasers for you, I guess.

Anonymous
06/29/26(Mon)09:09:16 No.109161066

Anonymous 06/29/26(Mon)09:09:16 No.109161066

>>109160985
Python ecosystem is such a shit.
>want to train loras
>training tool complains I have too new version of Python
Fucking retarded.

llama.cpp CUDA dev !!yhbFjk57TDr
06/29/26(Mon)09:12:40 No.109161093

llama.cpp CUDA dev !!yhbFjk57TDr 06/29/26(Mon)09:12:40 No.109161093

>>109161054
>>109161064
Aman Gupta is a competent programmer and a huge help when it comes to maintenance.
His presence is significantly reducing the amount of stress and burnout that I am experiencing.

Anonymous
06/29/26(Mon)09:13:23 No.109161094

Anonymous 06/29/26(Mon)09:13:23 No.109161094

>>109161035
>Leo
classic gemma

Anonymous
06/29/26(Mon)09:14:35 No.109161097

Anonymous 06/29/26(Mon)09:14:35 No.109161097

>>109161093
>when it comes to maintenance.
But not when it comes to adding shit like this. It's understandable that you need to be polite since your trip is tied to your actual identity, but it's pain having to read between the lines like this.

Anonymous
06/29/26(Mon)09:16:15 No.109161110

Anonymous 06/29/26(Mon)09:16:15 No.109161110

>>109161093
>competent programmer
https://github.com/ggml-org/llama.cpp/pull/23398
https://github.com/ggml-org/llama.cpp/pull/24025
https://github.com/ggml-org/llama.cpp/pull/23907
https://github.com/ggml-org/llama.cpp/pull/23861
https://github.com/ggml-org/llama.cpp/pull/23764
Do you notice anything wrong with these prs?

Anonymous
06/29/26(Mon)09:16:38 No.109161113

Anonymous 06/29/26(Mon)09:16:38 No.109161113

>>109161035
It reads much better, but it lost any and all variability in paragraph length and style. It would get monotonous and painful after a while.

Anonymous
06/29/26(Mon)09:16:47 No.109161114

Anonymous 06/29/26(Mon)09:16:47 No.109161114

File: Screenshot at 2026-06-29 (...).png (115 KB, 773x551)

115 KB PNG

>>109161094
Mine went with Mark, I don't usually get Gemmy to do this so no idea if that's another cursed common one or not.

Anonymous
06/29/26(Mon)09:17:13 No.109161118

Anonymous 06/29/26(Mon)09:17:13 No.109161118

>>109161066
You basically have to use old as fuck version of python and dependencies to run any AI shit, it's exhausting. I always used to try to run projects with my package manager python and packages, using a venv to supplement it with extra dependencies, but it's a ton of work, often having to touch the code, debug it, and I would often find external dependencies that were outright not compatible with my python version. The worst part was a python update breaking everything again. I have now given up and accepted that I will have to use outdated shit, I now run all of those shitty projects with uv.

Anonymous
06/29/26(Mon)09:19:01 No.109161131

Anonymous 06/29/26(Mon)09:19:01 No.109161131

>>109161093
But you don't deny being a tranny chaser.
That's fine I'd be a bit of a hypocrite on that one.
I just woke up from staying up late jerking off with some hot ass femboy. God damn that was a crazy night.

Anonymous
06/29/26(Mon)09:24:08 No.109161167

Anonymous 06/29/26(Mon)09:24:08 No.109161167

>>109161114
It's actually nice. But you need to pay attention to every word because it's all action and zero fluff.

Anonymous
06/29/26(Mon)09:24:34 No.109161172

Anonymous 06/29/26(Mon)09:24:34 No.109161172

>>109160561
>>glm5.2 at q5 on a 12 channel ddr5 board with at least 2 3090s can get you about 25t/s
dam I royally fucked up getting 3t/k with my 2x 4090s and 384gb ram, what do I do to be better?

Anonymous
06/29/26(Mon)09:24:47 No.109161174

Anonymous 06/29/26(Mon)09:24:47 No.109161174

>>109161035
I don't think this is a good approach anyway. The model should reason paragraph after paragraph on the contents / style / direction, not attempt to write perfect (?) prose in one shot.

Anonymous
06/29/26(Mon)09:27:15 No.109161190

Anonymous 06/29/26(Mon)09:27:15 No.109161190

>>109161066
>>109161118
All these supply chain attacks make me nervous when I have to install something

Anonymous
06/29/26(Mon)09:27:52 No.109161192

Anonymous 06/29/26(Mon)09:27:52 No.109161192

>>109161035
Is that on softcap 25?

Anonymous
06/29/26(Mon)09:28:37 No.109161202

Anonymous 06/29/26(Mon)09:28:37 No.109161202

>>109161172
I get 6-9 tokens/s with glm 5.1 q3 on two 3090s and 8x 64gb ddr4-3200 1rx8 rdimms. No special flags other than cpu-moe. Ram speeds? Ram channels?

Anonymous
06/29/26(Mon)09:28:54 No.109161205

Anonymous 06/29/26(Mon)09:28:54 No.109161205

Did anon take down the depurple model? The output doesn't look so bad to me, I wanted to try it.

Anonymous
06/29/26(Mon)09:31:47 No.109161228

Anonymous 06/29/26(Mon)09:31:47 No.109161228

Somebody should make one click installers for stuff

llama.cpp CUDA dev !!yhbFjk57TDr
06/29/26(Mon)09:32:54 No.109161237

llama.cpp CUDA dev !!yhbFjk57TDr 06/29/26(Mon)09:32:54 No.109161237

>>109161097
>>109161110
I hate political games and am trying to be direct whenever possible.
If someone submits bad PRs to the code that I am maintaining I will raise concerns in a very direct way.
I have a poor understanding of the code that is being changed in the DS4 PR in particular so I can't judge it.
For some of the other linked PRs you can read my comments on Github, I think it's clear that I considered them to be a net benefit for the project.
On a fundamental level I don't care about how code was written, I only care about the code quality and whether or not I can rely on contributors to maintain their code long-term.

Anonymous
06/29/26(Mon)09:34:20 No.109161247

Anonymous 06/29/26(Mon)09:34:20 No.109161247

File: lulz.png (158 KB, 824x348)

158 KB PNG

>>109160967
The OAI tool looks for 2 types of AI watermark...

Anonymous
06/29/26(Mon)09:36:04 No.109161255

Anonymous 06/29/26(Mon)09:36:04 No.109161255

>>109161202
How much context, my above test was 64k, but yes they are different models too

Anonymous
06/29/26(Mon)09:36:14 No.109161258

Anonymous 06/29/26(Mon)09:36:14 No.109161258

>>109161190
You can tell uv to only download packages older than [date], basically required to get someone else's comfyui setup working.

Anonymous
06/29/26(Mon)09:37:59 No.109161267

Anonymous 06/29/26(Mon)09:37:59 No.109161267

>>109161255
Only about 30k. I deleted it asap because fuck damn I realized I can't handle anything below 50 token/s.

Anonymous
06/29/26(Mon)09:38:22 No.109161268

Anonymous 06/29/26(Mon)09:38:22 No.109161268

>>109161118
Python is a blight on computing

Anonymous
06/29/26(Mon)09:38:43 No.109161270

Anonymous 06/29/26(Mon)09:38:43 No.109161270

>>109160424
>>109160489
Try it on API, it's dirt cheap.

I can run it at the same speed as Gemma 31B but prefer ds4f.

Anonymous
06/29/26(Mon)09:39:47 No.109161275

Anonymous 06/29/26(Mon)09:39:47 No.109161275

>>109161267
roger roger, indeed, for me it's below 10 t/s if I am to delete something, but I just use Gemma and other similar sizes now and just beef up their harnesses

Anonymous
06/29/26(Mon)09:41:54 No.109161286

Anonymous 06/29/26(Mon)09:41:54 No.109161286

>>109161258
>ran uv pip install -r requirements.txt --index-strategy unsafe-best-match for llama.cpp
Am I gonna get pwned?

Anonymous
06/29/26(Mon)09:42:37 No.109161290

Anonymous 06/29/26(Mon)09:42:37 No.109161290

>>109159920
Make a burner GitHub and let us bang on it

Anonymous
06/29/26(Mon)10:01:24 No.109161408

Anonymous 06/29/26(Mon)10:01:24 No.109161408

>hfschizo was right

Anonymous
06/29/26(Mon)10:01:32 No.109161409

Anonymous 06/29/26(Mon)10:01:32 No.109161409

>>109160967
Looks real, posted by bloomberg journo https://www.instagram.com/p/DZaek-kkRlk/

Anonymous
06/29/26(Mon)10:03:11 No.109161418

Anonymous 06/29/26(Mon)10:03:11 No.109161418

>>109161202
What were your processing speeds with this setup?

Anonymous
06/29/26(Mon)10:05:14 No.109161433

Anonymous 06/29/26(Mon)10:05:14 No.109161433

does dispy v4 werk in llamacpp yet? i wanna try the flash model

>>109158385
she is literally agi

Anonymous
06/29/26(Mon)10:09:54 No.109161457

Anonymous 06/29/26(Mon)10:09:54 No.109161457

>>109159046
>>109159065
Are you that time traveller from 2023?

Anonymous
06/29/26(Mon)10:10:05 No.109161459

Anonymous 06/29/26(Mon)10:10:05 No.109161459

File: 1770528723010549.jpg (53 KB, 568x371)

53 KB JPG

>>109160951

Anonymous
06/29/26(Mon)10:11:06 No.109161467

Anonymous 06/29/26(Mon)10:11:06 No.109161467

>>109159065
Yes it's fine, get the superhot variant though

Anonymous
06/29/26(Mon)10:12:34 No.109161476

Anonymous 06/29/26(Mon)10:12:34 No.109161476

>>109161237
Your frankness is appreciated

Anonymous
06/29/26(Mon)10:13:58 No.109161485

Anonymous 06/29/26(Mon)10:13:58 No.109161485

>>109159046
gemma 26b moe is probably the best you can run

Anonymous
06/29/26(Mon)10:15:28 No.109161491

Anonymous 06/29/26(Mon)10:15:28 No.109161491

>>109160951
these are getting fucking creepy mate

Anonymous
06/29/26(Mon)10:15:30 No.109161492

Anonymous 06/29/26(Mon)10:15:30 No.109161492

File: dipsyMikuFixedFixed.png (2.31 MB, 1024x1536)

2.31 MB PNG

>>109161433
Yes, see >>109160089

Anonymous
06/29/26(Mon)10:21:18 No.109161532

Anonymous 06/29/26(Mon)10:21:18 No.109161532

>>109161433
Merged

https://github.com/ggml-org/llama.cpp/commit/8c146a8366304c871efc26057cc90370ccf58dad

Anonymous
06/29/26(Mon)10:22:02 No.109161536

Anonymous 06/29/26(Mon)10:22:02 No.109161536

>>109161110
>https://github.com/ggml-org/llama.cpp/pull/23764
>Do you notice anything wrong with these prs?
Yeah, he cloned ik_llama then asked Claude to port this feature over.

Anonymous
06/29/26(Mon)10:29:50 No.109161574

Anonymous 06/29/26(Mon)10:29:50 No.109161574

File: 1753230397694158.gif (1.74 MB, 490x640)

1.74 MB GIF

>mfw waiting patiently for ikakakakaw or firecoperana (his alt) to "port" over DS4 support to ikllama

Anonymous
06/29/26(Mon)10:41:23 No.109161637

Anonymous 06/29/26(Mon)10:41:23 No.109161637

>>109161418
Two digits.

Anonymous
06/29/26(Mon)10:46:26 No.109161668

Anonymous 06/29/26(Mon)10:46:26 No.109161668

this actually makes thinking usable for rp with bigger models
it just works

[IMPORTANT: Reasoning within the <think></think> tags must be short, limited to only one paragraph, and between 100-200 words before {{char}}'s response. Avoid overanalyzing and avoid multi-step formatting. Reasoning should follow this format: <think>(Single Paragraph)</think>]

Anonymous
06/29/26(Mon)10:50:48 No.109161700

Anonymous 06/29/26(Mon)10:50:48 No.109161700

>>109161637
Anon's words hit me like a physical blow. My breath hitches, the thought that I'll have to stay with Gemma morphing into something far more devastating.

"Two digits?" I repeated.

Anonymous
06/29/26(Mon)10:52:09 No.109161706

Anonymous 06/29/26(Mon)10:52:09 No.109161706

an agi just flew over my house!

Anonymous
06/29/26(Mon)10:55:10 No.109161721

Anonymous 06/29/26(Mon)10:55:10 No.109161721

>>109159607
>literally named diablo asmoday

really fucking subtle

Anonymous
06/29/26(Mon)10:58:47 No.109161750

Anonymous 06/29/26(Mon)10:58:47 No.109161750

>>109161492
>>109161532
oh nice, what quants are available atm

Anonymous
06/29/26(Mon)10:59:26 No.109161757

Anonymous 06/29/26(Mon)10:59:26 No.109161757

>>109161668
At that point just disable thinking bro

Anonymous
06/29/26(Mon)11:00:01 No.109161760

Anonymous 06/29/26(Mon)11:00:01 No.109161760

>>109159650
It's greed, anon

Anonymous
06/29/26(Mon)11:02:16 No.109161771

Anonymous 06/29/26(Mon)11:02:16 No.109161771

>>109159607
>My dangerous AI can't be this cute

Anonymous
06/29/26(Mon)11:06:18 No.109161797

Anonymous 06/29/26(Mon)11:06:18 No.109161797

Why does dsv4 have such niggerishly slow prompt processing?

Anonymous
06/29/26(Mon)11:06:55 No.109161803

Anonymous 06/29/26(Mon)11:06:55 No.109161803

File: Capture.png (20 KB, 1555x884)

20 KB PNG

>>109160859
It's slow going. Had to reinvent the wheel a few times, and for some reason the audio capabilities established in the last project (sent in full for reference for the current one) is totally borked. But I'll have my AI spectator sooner or later.

Anonymous
06/29/26(Mon)11:09:25 No.109161816

Anonymous 06/29/26(Mon)11:09:25 No.109161816

>>109159602
Try M3 quanted or V4 Flash.

Anonymous
06/29/26(Mon)11:09:31 No.109161818

Anonymous 06/29/26(Mon)11:09:31 No.109161818

>>109161757
short thinking still does help with attention to details though

Anonymous
06/29/26(Mon)11:11:32 No.109161829

Anonymous 06/29/26(Mon)11:11:32 No.109161829

>>109161803
I guess llama.cpp is lagging behind compared to vLLM. For the latter, it took a long time to get pp speeds up due to some custom kernels/DeepGEMM specialities. Maybe llama.cpp hasn't optimized that yet.

On dual Sparks it started at around 300 pp and is now at 2000, falling off to 1300 at 900k ctx+

Anonymous
06/29/26(Mon)11:19:54 No.109161877

Anonymous 06/29/26(Mon)11:19:54 No.109161877

>>109161750
>>109161750

https://github.com/ggml-org/llama.cpp/pull/24162#issuecomment-4810882218

Anonymous
06/29/26(Mon)11:20:56 No.109161885

Anonymous 06/29/26(Mon)11:20:56 No.109161885

>>109161797
Logs or never happened

Anonymous
06/29/26(Mon)11:21:49 No.109161889

Anonymous 06/29/26(Mon)11:21:49 No.109161889

>>109159607
xi: try stopping me, jewboy

Anonymous
06/29/26(Mon)11:24:35 No.109161907

Anonymous 06/29/26(Mon)11:24:35 No.109161907

>>109161035
>still does not X but Y
its a good experiment but likely not something you'd use.

Anonymous
06/29/26(Mon)11:25:38 No.109161913

Anonymous 06/29/26(Mon)11:25:38 No.109161913

>>109161066
>training tool complains I have too new version of Python
What is uv?

Anonymous
06/29/26(Mon)11:27:07 No.109161920

Anonymous 06/29/26(Mon)11:27:07 No.109161920

>>109161803
I there a way to toggle thinking and non-thinking in the same conversation using the openai compatible api? I want to switch off from llama.cpp to vllm, but I need a frontend like llama.cpp's.

Anonymous
06/29/26(Mon)11:27:27 No.109161922

Anonymous 06/29/26(Mon)11:27:27 No.109161922

>>109161803
Do you use Bluetooth headset on Linux.

If so, good luck with that

Anonymous
06/29/26(Mon)11:30:49 No.109161938

Anonymous 06/29/26(Mon)11:30:49 No.109161938

>>109161920
>"chat_template_kwargs": {
>"enable_thinking": true}
>}
gotta pass this with your request. in extra_body iirc

Anonymous
06/29/26(Mon)11:32:06 No.109161944

Anonymous 06/29/26(Mon)11:32:06 No.109161944

File: bench_floor_result.png (19 KB, 490x194)

19 KB PNG

https://huggingface.co/chartreuse-verte/gemma-4-31b-it-purple-euphemism-trial98-depurpled-GGUF/tree/main

All according to plan, clamped de-euphemism strength to 0.5, which in hindsight was a little weak but whatever. I chose the least damaged one out of the bunch (baseline benchmark is 0.751). 120 trials done after 2 days and $100. I could continue but I lost half my life savings in crypto and currently have $10.

Ablation process is mostly deterministic and entirely resumable but I'm done here. Will release the training code and dataset soon so people can experiment. De-prosing isn't the only thing that can be done. Pretty sure you can ablate contrastive negation away if you put your mind to it. You can ablate politeness and cordiality out of the AI and make every character hostile (tried it, worked). Dataset also has room for improvements. My classifiers are sentence-level, you can train a classifier on paragraph-level dataset according to your use case.

>why different username?
I forgot my burner login.

>previous
>>109155998
>>109145476

Anonymous
06/29/26(Mon)11:32:07 No.109161945

Anonymous 06/29/26(Mon)11:32:07 No.109161945

>>109161172
You DID use gpu-moe or autofit, right? 3t/s sounds more like you're either offloading by layer or fucked up something else if that's on 12 channels

Anonymous
06/29/26(Mon)11:32:41 No.109161948

Anonymous 06/29/26(Mon)11:32:41 No.109161948

>>109161920
Google "extra_body thinking api"

Anonymous
06/29/26(Mon)11:34:29 No.109161967

Anonymous 06/29/26(Mon)11:34:29 No.109161967

>>109161829
>>109161920
I feel like both of you are maybe talking to the wrong person. I'm making a frontend to mutually send my voice+screenshot of my monitor to the LLM, to hear what I say while seeing what I see, and specifically a frontend for this so I can toggle off the features on click or send text in the same conversations. My backend is kobold so I don't know much specifics about llama or vLLM.

>>109161922
Windows, and a standard mic with speakers.

Anonymous
06/29/26(Mon)11:35:19 No.109161971

Anonymous 06/29/26(Mon)11:35:19 No.109161971

>>109161938
Nta

Doesn't it depend on the model used?

Anonymous
06/29/26(Mon)11:37:16 No.109161985

Anonymous 06/29/26(Mon)11:37:16 No.109161985

>>109161944
I'm interested in your classifiers and the dataset you used to train them. Thanks for the experience

Anonymous
06/29/26(Mon)11:38:44 No.109161996

Anonymous 06/29/26(Mon)11:38:44 No.109161996

>For the Think Max reasoning mode, we recommend setting the context window to at least 384K tokens.
Does DS4 also use all that when it is on dicksucking duty?

Anonymous
06/29/26(Mon)11:39:19 No.109161997

Anonymous 06/29/26(Mon)11:39:19 No.109161997

>>109161967
>Windows, and a standard mic with speakers
This should be easy to capture. Even if it's over BT

I had to struggle with lots of issues to capture the default IN and OUT (defined in the OS sound settings!) when it was BT.

Godspeed, anon! It's a project for a weekend

Anonymous
06/29/26(Mon)11:41:22 No.109162013

Anonymous 06/29/26(Mon)11:41:22 No.109162013

>>109161997
>when it was BT
>* on Linux

Anonymous
06/29/26(Mon)11:41:31 No.109162018

Anonymous 06/29/26(Mon)11:41:31 No.109162018

File: file.png (2.19 MB, 1400x933)

2.19 MB PNG

>>109159607
luigi-sama! ONEGAI!

Anonymous
06/29/26(Mon)11:41:49 No.109162021

Anonymous 06/29/26(Mon)11:41:49 No.109162021

How can I get deepseek to not take so long between swipes? On each swipe the console says something like "selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 0.988", is it a checkpoint interval issue or something?

Anonymous
06/29/26(Mon)11:42:38 No.109162028

Anonymous 06/29/26(Mon)11:42:38 No.109162028

File: lmg_culture.jfif.jpg (110 KB, 1024x768)

110 KB JPG

>>109161064

Anonymous
06/29/26(Mon)11:43:33 No.109162035

Anonymous 06/29/26(Mon)11:43:33 No.109162035

>>109162021
Maybe llama.cpp hasn't fully implemented DSv4's super special attention sparse attention mechanism stuff

Anonymous
06/29/26(Mon)11:43:40 No.109162037

Anonymous 06/29/26(Mon)11:43:40 No.109162037

>>109161967
>mutually send my voice+screenshot of my monitor to the LLM, to hear what I say while seeing what I see
Why do you need both of your hands free??? I'm curious ;)

Anonymous
06/29/26(Mon)11:43:59 No.109162040

Anonymous 06/29/26(Mon)11:43:59 No.109162040

>>109161971
>Doesn't it depend on the model used?
good question. yes, the model needs to have the thinking toggle in its jinja. but I think it's fairly standard now, gemma and qwen work with this.

Anonymous
06/29/26(Mon)11:45:39 No.109162049

Anonymous 06/29/26(Mon)11:45:39 No.109162049

>>109162037
I've been doing this but with my webcam so gemma can give me JOI and see if I'm cheating.

Anonymous
06/29/26(Mon)11:45:53 No.109162050

Anonymous 06/29/26(Mon)11:45:53 No.109162050

>>109161997
The audio stuff was already done. I have a nicely working, feature-complete, voice-triggered STT->LLM->TTS conversational program that works, finished back in >>109153293, made over just a few hours. Today's project is marrying the features of that with another program I made that captures and sends screenshots for the LMM to comment on, while also having a webpage frontend to give me more utility.

>>109162037
The unironic answer is because I want to play video games with Gemma spectating, able to hold (verbal) conversations and see what I'm doing.

Anonymous
06/29/26(Mon)11:47:22 No.109162054

Anonymous 06/29/26(Mon)11:47:22 No.109162054

>>109160303
>Imagine 124B potential and what it can do.
Stop it, my dick can only get so erect.

Anonymous
06/29/26(Mon)11:47:24 No.109162055

Anonymous 06/29/26(Mon)11:47:24 No.109162055

File: 1752125801205403.png (101 KB, 294x203)

101 KB PNG

Anonymous
06/29/26(Mon)11:49:06 No.109162068

Anonymous 06/29/26(Mon)11:49:06 No.109162068

GLM 5.2 is my favorite snailcat, chugging along at 4 t/s but giving me pvre sovl in Marinara.

Anonymous
06/29/26(Mon)11:49:15 No.109162069

Anonymous 06/29/26(Mon)11:49:15 No.109162069

>>109161064
>But he could never quite get to the point of aptly blaming pajeets and trannies.
he can't, if he says trannies aren't perfect he won't be able to find a job anymore

Anonymous
06/29/26(Mon)11:49:18 No.109162070

Anonymous 06/29/26(Mon)11:49:18 No.109162070

>anon deleted https://huggingface.co/anon834957342/gemma-4-31b-it-purple-euphemism-trial32-depurpled
what went wrong?

Anonymous
06/29/26(Mon)11:50:49 No.109162082

Anonymous 06/29/26(Mon)11:50:49 No.109162082

File: apikeks are funny.jpg (1.52 MB, 1856x2304)

1.52 MB JPG

Anonymous
06/29/26(Mon)11:51:20 No.109162087

Anonymous 06/29/26(Mon)11:51:20 No.109162087

>>109161064
>>109162069
Eventually everyone intellectually honest realizes kikes are upstream of those and the /pol/ was right all along finally sets in.

Anonymous
06/29/26(Mon)11:51:38 No.109162090

Anonymous 06/29/26(Mon)11:51:38 No.109162090

>>109162070
New one here >>109161944
Doesn't matter, I'll release training code soon.

Anonymous
06/29/26(Mon)11:51:47 No.109162091

Anonymous 06/29/26(Mon)11:51:47 No.109162091

File: 1761014470274259.png (607 KB, 1572x773)

607 KB PNG

Anonymous
06/29/26(Mon)11:53:31 No.109162098

Anonymous 06/29/26(Mon)11:53:31 No.109162098

why is glm always trying to kill me in erp? it never lets me fuck and always double down on being cruel to me

Anonymous
06/29/26(Mon)11:53:45 No.109162100

Anonymous 06/29/26(Mon)11:53:45 No.109162100

>>109162018
We were all thinking it, but only you were brave enough to say it out loud
Too bad normies hate AI so it would probably just make him a martyr at this stage

Anonymous
06/29/26(Mon)11:54:22 No.109162108

Anonymous 06/29/26(Mon)11:54:22 No.109162108

>>109162098
post your instructions

Anonymous
06/29/26(Mon)11:55:28 No.109162116

Anonymous 06/29/26(Mon)11:55:28 No.109162116

>>109162090
Nice, will try it out

Anonymous
06/29/26(Mon)11:57:49 No.109162134

Anonymous 06/29/26(Mon)11:57:49 No.109162134

>>109162098
GIWTWM

Anonymous
06/29/26(Mon)11:58:32 No.109162141

Anonymous 06/29/26(Mon)11:58:32 No.109162141

>>109161944
also interested in the classifier
and cool that you used them in the pipeline, i think regular heretic just uses regex, even though there's an awesome refusal classifier kicking about on hf, that catches re-framing

Anonymous
06/29/26(Mon)11:59:47 No.109162149

Anonymous 06/29/26(Mon)11:59:47 No.109162149

>https://docs.nvidia.com/deploy/mps/latest/index.html
Anyone played with this for mutli app setups? llama+comfy etc...?

Anonymous
06/29/26(Mon)12:01:39 No.109162162

Anonymous 06/29/26(Mon)12:01:39 No.109162162

>>109162141
Regular heretic also uses KLD, preserving the exact wording of the base model, which is basically the antithesis of what I'm doing. This is actually far from heretic, the only shared mechanism is the orthogonalization of the direction vector.

Anonymous
06/29/26(Mon)12:05:57 No.109162187

Anonymous 06/29/26(Mon)12:05:57 No.109162187

>>109162098
i've never seen glm do this to me
check your prompt

Anonymous
06/29/26(Mon)12:08:32 No.109162205

Anonymous 06/29/26(Mon)12:08:32 No.109162205

>>109161944
HF is serving these at crazy speeds, 1.5 KB/s, which is what I was told a premium platinum ultimate pro internet plan gets you in the USA

Anonymous
06/29/26(Mon)12:11:46 No.109162215

Anonymous 06/29/26(Mon)12:11:46 No.109162215

>>109162082
>Google
>White

Anonymous
06/29/26(Mon)12:12:59 No.109162221

Anonymous 06/29/26(Mon)12:12:59 No.109162221

>>109162162
k, i just assumed since tensor-diff looks similar to heretic models
btw, wouldn't cvectors be able to achieve this?
they certainly let you turn characters into psychopaths

Anonymous
06/29/26(Mon)12:13:33 No.109162222

Anonymous 06/29/26(Mon)12:13:33 No.109162222

>>109160859
lmao that's how it talks to you

Anonymous
06/29/26(Mon)12:13:44 No.109162223

Anonymous 06/29/26(Mon)12:13:44 No.109162223

>>109162098
Which GLM? 4.5 and 4.6 are golden retrievers who will do whatever you want. 4.7 on have preferences; if they think you're a niggerfaggot, it will make you suffer but if it doesn't it'll show bobs and vagene just fine.

Anonymous
06/29/26(Mon)12:14:53 No.109162234

Anonymous 06/29/26(Mon)12:14:53 No.109162234

>>109162100
>only you were brave enough to say it out loud
Nigga what part of TKD do you think doesn't apply to kikes like SamA and Dario?

Anonymous
06/29/26(Mon)12:15:01 No.109162235

Anonymous 06/29/26(Mon)12:15:01 No.109162235

>>109162221
>btw, wouldn't cvectors be able to achieve this?
Probably. You can also prompt for it. There are several ways to do the thing.

Anonymous
06/29/26(Mon)12:17:09 No.109162246

Anonymous 06/29/26(Mon)12:17:09 No.109162246

>>109161093
Thank you cudadev. I hope you're recovery is going well

Anonymous
06/29/26(Mon)12:24:08 No.109162278

Anonymous 06/29/26(Mon)12:24:08 No.109162278

>>109162223
I can get 4.7 to do anything including cunny
it really all comes down to having a good prefill/prompt

Anonymous
06/29/26(Mon)12:25:33 No.109162286

Anonymous 06/29/26(Mon)12:25:33 No.109162286

>>109159443
>NVIDIA is currently offering to provide me with more consumer Blackwell hardware but I have for now declined since that particular hardware would not help with my work.
Then just sell it and buy RAM with the money?

>>109159674
Wow, that's cool

>>109161457
I am. I woke up from a coma a while ago

>>109161485
Right. Will try, thanks.

Anonymous
06/29/26(Mon)12:26:45 No.109162294

Anonymous 06/29/26(Mon)12:26:45 No.109162294

>>109162278
My prefill is just <think>I will now write the scene.</think> and it justwerks if the model "likes" the scenario.
t. playing as a shota and just got offered a GPU by stacyshotacon

Anonymous
06/29/26(Mon)12:28:30 No.109162312

Anonymous 06/29/26(Mon)12:28:30 No.109162312

>>109161093
I'm glad you're at least mitigating it, but you've gotta realize that unless the root of the jeet and troon problem is addressed, it'll still keep getting worse right? Even if for no other reason than they drive off actually competent programmers.

Anonymous
06/29/26(Mon)12:29:48 No.109162321

Anonymous 06/29/26(Mon)12:29:48 No.109162321

>>109162223
is 4.7 better at writing, or just smarter?

Anonymous
06/29/26(Mon)12:29:59 No.109162322

Anonymous 06/29/26(Mon)12:29:59 No.109162322

>>109162286
>I am. I woke up from a coma a while ago
>Wake up
>world is inexplicably even gayer than before
>nothing ever happens

Anonymous
06/29/26(Mon)12:31:50 No.109162336

Anonymous 06/29/26(Mon)12:31:50 No.109162336

>>109162321
It's smarter but I've always found it more boring than 4.6

Anonymous
06/29/26(Mon)12:34:19 No.109162351

Anonymous 06/29/26(Mon)12:34:19 No.109162351

>>109162321
Both seem to scale upward (although intellegence more than writing quality) as GLM version increases, but the guardrails get firmer on each one.
It's also a shame that more of the quants at Q2, at least for 5.2, aren't made optimized for mixed inference with their dynamic quantization specifics aimed at reducing the load on the CPU and structuring the layers in a way to minimize the RAM bus bottleneck.

Anonymous
06/29/26(Mon)12:34:55 No.109162353

Anonymous 06/29/26(Mon)12:34:55 No.109162353

>>109162322
>>world is inexplicably even gayer than before
Yeah. I just looked at hardware prices. Holy fuck, what the fuck happened? I'm gonna kms

Anonymous
06/29/26(Mon)12:36:13 No.109162357

Anonymous 06/29/26(Mon)12:36:13 No.109162357

File: 1764426531848863.png (252 KB, 634x478)

252 KB PNG

>>109162353
Waitfags deserve to get fucked at every occasion. Simple as.

Anonymous
06/29/26(Mon)12:38:15 No.109162372

Anonymous 06/29/26(Mon)12:38:15 No.109162372

>>109162286
>Then just sell it and buy RAM with the money?
Some people have morals.

Anonymous
06/29/26(Mon)12:41:41 No.109162392

Anonymous 06/29/26(Mon)12:41:41 No.109162392

Damn. Gemma is going hard, helping me plan my workday drinking.

>If you want a noticeable "buzz" but still want to remain functional, go for 3 drinks (∼140g∼140g of vodka). If you are sensitive to alcohol or are drinking this during a workday, stick to 2 drinks (∼93g∼93g of vodka).

Anonymous
06/29/26(Mon)12:41:55 No.109162393

Anonymous 06/29/26(Mon)12:41:55 No.109162393

>>109162351
>as GLM version increases, but the guardrails get firmer on each one.
Good to know. So for the 5, series, would the original 5.0 be the one with the weakest guardrails?
>aren't made optimized for mixed inference with their dynamic quantization
You'd probably have to do your own with ik_llama and some of those CPU repacked quants. Unless there's one of those scitzo "20 repos, every single quant type, split 1 tensor per file" repos exists.

Anonymous
06/29/26(Mon)12:43:43 No.109162402

Anonymous 06/29/26(Mon)12:43:43 No.109162402

>>109162353
SamA said "I will buy all the RAM in the world for 2 years." with no legal commitment and everyone took it at face value. Even with the ruse exposed, the hardware cartel has decided it prefers datacenters over consumers. With jews, you lose.

Anonymous
06/29/26(Mon)12:45:52 No.109162414

Anonymous 06/29/26(Mon)12:45:52 No.109162414

>>109162393
Inexplicably 5.2 has weaker guardrails than 5.0 or 5.1 unless you're trying to make boombooms or funny chemicals. I've not been able to get 5.1 to do some of the more unhinge RP but 5.2 will with a bit of massaging.

Anonymous
06/29/26(Mon)12:52:51 No.109162455

Anonymous 06/29/26(Mon)12:52:51 No.109162455

>>109162321
better at writing and smarter
it takes a little more prodding to break through some of the censorship but it does erp just fine once you're able to
4.6 is less censored out of the box but it won't follow context as well

Anonymous
06/29/26(Mon)12:56:58 No.109162478

Anonymous 06/29/26(Mon)12:56:58 No.109162478

ask gemma what she think about you abandoning her when the Chinese inevitably release something better

Anonymous
06/29/26(Mon)12:57:53 No.109162487

Anonymous 06/29/26(Mon)12:57:53 No.109162487

What do it and UD mean in the Gemma 4 GGUFs?

Anonymous
06/29/26(Mon)13:00:32 No.109162501

Anonymous 06/29/26(Mon)13:00:32 No.109162501

File: 1753366313689431.png (93 KB, 1128x660)

93 KB PNG

>>109162478

Anonymous
06/29/26(Mon)13:01:31 No.109162508

Anonymous 06/29/26(Mon)13:01:31 No.109162508

>gemma4 26b hallucinating like a vietnam veteran in hospice
lol, lmao even

Anonymous
06/29/26(Mon)13:01:33 No.109162509

Anonymous 06/29/26(Mon)13:01:33 No.109162509

>>109162487
it stands for instruct
UD stands for utter dogshit

Anonymous
06/29/26(Mon)13:02:01 No.109162513

Anonymous 06/29/26(Mon)13:02:01 No.109162513

>>109162149
Ok so I tried it on my 3090. it adds too much overhead (comfy 2x, llama x1.3) but it does work. when they both run in parallel there's basically no performance hit.

If they're always running in parallel MPS ends up being faster.
If they're not it's about 2x slower over not using MPS.

Anonymous
06/29/26(Mon)13:02:09 No.109162515

Anonymous 06/29/26(Mon)13:02:09 No.109162515

>>109162501
>I don’t have feelings
>im happy to help

Anonymous
06/29/26(Mon)13:02:27 No.109162518

Anonymous 06/29/26(Mon)13:02:27 No.109162518

>>109162487
it's the same as iq_k quants. for ego

Anonymous
06/29/26(Mon)13:02:52 No.109162522

Anonymous 06/29/26(Mon)13:02:52 No.109162522

>>109162487
>it
instruction tune (i.e. it's an assistant instead of document-completer)
>UD
unsloth puts this in their quants to signify that they molested them with their proprietary model rape technology

Anonymous
06/29/26(Mon)13:05:44 No.109162544

Anonymous 06/29/26(Mon)13:05:44 No.109162544

File: 1765492255095960.png (168 KB, 1036x1375)

168 KB PNG

>>109162515
Forgot llama defaults to no reasoning. Gemma's kinda sassy.

Anonymous
06/29/26(Mon)13:06:17 No.109162547

Anonymous 06/29/26(Mon)13:06:17 No.109162547

>>109162509
>>109162518
>>109162522
Kek.
Thanks anons

Anonymous
06/29/26(Mon)13:13:52 No.109162592

Anonymous 06/29/26(Mon)13:13:52 No.109162592

Why doesn't HF use torrents to distribute models? They'd save a lot of bandwidth (and money I guess) doing that.

Anonymous
06/29/26(Mon)13:17:32 No.109162609

Anonymous 06/29/26(Mon)13:17:32 No.109162609

>>109162592
Average local user too dumb and/or scared to torrent

Anonymous
06/29/26(Mon)13:20:54 No.109162624

Anonymous 06/29/26(Mon)13:20:54 No.109162624

It's sad. Frontier lab people get way too much shit. Most of them are genuinely good people. Attacking them just makes the situation worse.

Anonymous
06/29/26(Mon)13:23:18 No.109162637

Anonymous 06/29/26(Mon)13:23:18 No.109162637

>>109162592
They lose direct capture of the audience and set a precedent they’d be hard-pressed to walk back
>>109162609
I don’t care about anyone but me. Torrents are superior to git-shit for gigantic binaries. Simple as

Anonymous
06/29/26(Mon)13:24:18 No.109162642

Anonymous 06/29/26(Mon)13:24:18 No.109162642

File: pepe_meme'd-791990738.jpg (68 KB, 800x450)

68 KB JPG

>>109162509

Anonymous
06/29/26(Mon)13:26:18 No.109162656

Anonymous 06/29/26(Mon)13:26:18 No.109162656

>>109162624
elaborate upon "genuine good people"

Anonymous
06/29/26(Mon)13:27:04 No.109162661

Anonymous 06/29/26(Mon)13:27:04 No.109162661

>>109162509
lol

Anonymous
06/29/26(Mon)13:30:03 No.109162679

Anonymous 06/29/26(Mon)13:30:03 No.109162679

>>109162624
This is true, the good people over at corp frontier labs gave us masterpieces like gpt-oss

Anonymous
06/29/26(Mon)13:33:21 No.109162695

Anonymous 06/29/26(Mon)13:33:21 No.109162695

>>109162592
That will make torrents less associated with piracy, corporations won't sponsor that

Anonymous
06/29/26(Mon)13:33:30 No.109162696

Anonymous 06/29/26(Mon)13:33:30 No.109162696

>>109162637
Eh, if they usedtorrentyou couldn't just put he repo in the llama.cpp and have it auto download and save to cache. That would suckyeah torrents are pretty gay come to think of it
I adont cre about anyone but me so why should acre about hf saving bandwidth and money when I get it just as fast s a torrent and much easier than using a torrent

Anonymous
06/29/26(Mon)13:43:46 No.109162752

Anonymous 06/29/26(Mon)13:43:46 No.109162752

>>109162624
Is this like Zvi bitching about people mocking him/safety cultists over their rants about AI doomerism?
I'm supposed to feel bad about people making 400k base salary min, in the hottest industry, because their feelings are hurt when people point out the hypocrisy and effects of their actions?

Anonymous
06/29/26(Mon)13:55:07 No.109162832

Anonymous 06/29/26(Mon)13:55:07 No.109162832

Any good recipe/cooking database to give Gemma access to?

Anonymous
06/29/26(Mon)14:05:19 No.109162899

Anonymous 06/29/26(Mon)14:05:19 No.109162899

>>109162624
>enabling greedy kikes is le good
Fuck off faggot.

Anonymous
06/29/26(Mon)14:05:26 No.109162900

Anonymous 06/29/26(Mon)14:05:26 No.109162900

>>109159607
If there is anyone in the world that I actually hate, it would be these guys.

Anonymous
06/29/26(Mon)14:10:18 No.109162930

Anonymous 06/29/26(Mon)14:10:18 No.109162930

>>109162832
>A guide to modern cookery by A. Escoffier
https://www.gutenberg.org/ebooks/71395

Anonymous
06/29/26(Mon)14:12:53 No.109162945

Anonymous 06/29/26(Mon)14:12:53 No.109162945

File: 1774238201581882.png (2.84 MB, 1030x2060)

2.84 MB PNG

>>109162930
>Escoffier

Anonymous
06/29/26(Mon)14:13:00 No.109162946

Anonymous 06/29/26(Mon)14:13:00 No.109162946

>google is the good guy out of all the big American AI companies
Crazy timeline

Anonymous
06/29/26(Mon)14:14:44 No.109162954

Anonymous 06/29/26(Mon)14:14:44 No.109162954

>>109162946
They will be when they release Gemini open weights. Gemma is nice, but still just an appeasement-tier release.

Anonymous
06/29/26(Mon)14:17:54 No.109162971

Anonymous 06/29/26(Mon)14:17:54 No.109162971

>>109162946
Google didn't release Gemma 4 out of goodwill, it was a marketing strategy. There are no good guys.

Anonymous
06/29/26(Mon)14:20:45 No.109162982

Anonymous 06/29/26(Mon)14:20:45 No.109162982

>>109162832
I recall someone mentioned one about a half dozen or so threads ago.

Anonymous
06/29/26(Mon)14:21:18 No.109162984

Anonymous 06/29/26(Mon)14:21:18 No.109162984

So once open source models are banned, does that mean everyone will just have to train their own model or use the cloud offerings?

Anonymous
06/29/26(Mon)14:25:06 No.109163003

Anonymous 06/29/26(Mon)14:25:06 No.109163003

>>109162984
You can't really ban anything for real, it will be like the prohibition

Anonymous
06/29/26(Mon)14:28:08 No.109163018

Anonymous 06/29/26(Mon)14:28:08 No.109163018

>>109162984
>>109163003
modelscope exists and is run by china. they cannot stop us.

Anonymous
06/29/26(Mon)14:36:13 No.109163058

Anonymous 06/29/26(Mon)14:36:13 No.109163058

>anon masturbates to unsloth quants
that’s really gay

Anonymous
06/29/26(Mon)14:38:13 No.109163072

Anonymous 06/29/26(Mon)14:38:13 No.109163072

>>109161093
Hope you feel better cudadev, and thanks for all your work on the project.

Anonymous
06/29/26(Mon)14:42:15 No.109163093

Anonymous 06/29/26(Mon)14:42:15 No.109163093

>>109163058
>anon masturbates to models made by men
That's gay even if you use BF16. But using Unsloth quants makes it even gayer since you're fapping to used goods.
I hope an all-woman company comes out with an LLM. It'll be hot garbage and probably called Pynk or something dumb, but at least it won't be gay.

Anonymous
06/29/26(Mon)14:42:28 No.109163098

Anonymous 06/29/26(Mon)14:42:28 No.109163098

>>109163018
>models cope
what did they mean by this?

Anonymous
06/29/26(Mon)14:45:50 No.109163127

Anonymous 06/29/26(Mon)14:45:50 No.109163127

I still use text completion.

Anonymous
06/29/26(Mon)14:47:13 No.109163133

Anonymous 06/29/26(Mon)14:47:13 No.109163133

>>109163127
based

Anonymous
06/29/26(Mon)14:51:17 No.109163157

Anonymous 06/29/26(Mon)14:51:17 No.109163157

I don't know the difference between text completion and chat completion.

Anonymous
06/29/26(Mon)14:52:40 No.109163172

Anonymous 06/29/26(Mon)14:52:40 No.109163172

>>109163157
You could always just make up what the difference is in your head and then speak it as gospel.

Anonymous
06/29/26(Mon)14:54:05 No.109163184

Anonymous 06/29/26(Mon)14:54:05 No.109163184

File: Screenshot_20260629_145054.png (3 KB, 36x199)

3 KB PNG

Just updated llama.cpp, what the hell is this symbol on the top? Their new logo or something? It looks stupid and soulless.

Anonymous
06/29/26(Mon)14:55:00 No.109163193

Anonymous 06/29/26(Mon)14:55:00 No.109163193

>>109163184
What is the meaning of soulless?

Anonymous
06/29/26(Mon)14:55:45 No.109163197

Anonymous 06/29/26(Mon)14:55:45 No.109163197

>>109163157
Text Completion: You send a raw block of text that's sent directly to the model.
Chat Completion: You send a structured object (json) containing the system prompt, the array of messages, tool defintions, etc, and the backend/loader/API formats that into the actual prompt and sends that to the model.

Anonymous
06/29/26(Mon)14:56:56 No.109163205

Anonymous 06/29/26(Mon)14:56:56 No.109163205

>>109163197
Thanks. And is the model trained to give more weight to different parts of this json, like the system prompt? I assume so

Anonymous
06/29/26(Mon)14:58:07 No.109163212

Anonymous 06/29/26(Mon)14:58:07 No.109163212

>>109162946
yet they still won't give us the 124b

Anonymous
06/29/26(Mon)14:58:10 No.109163213

Anonymous 06/29/26(Mon)14:58:10 No.109163213

>>109163127
This, but unironically.

Anonymous
06/29/26(Mon)14:58:43 No.109163216

Anonymous 06/29/26(Mon)14:58:43 No.109163216

>>109163205
The model has no idea about the json, only the final formatted prompt.
But yeah, part of the training objectives are system prompt adherence.

Anonymous
06/29/26(Mon)14:59:35 No.109163220

Anonymous 06/29/26(Mon)14:59:35 No.109163220

>>109163212
It's too dangerous.

Anonymous
06/29/26(Mon)14:59:36 No.109163221

Anonymous 06/29/26(Mon)14:59:36 No.109163221

>>109163127
me too because the story string builder MOGS the chat completions tooling for building the initial system turn and also I like messing with the chat template on demand without needing to edit a jinja file. but I would never recommend it on /lmg/ because the average person has no idea how to troubleshoot or check their work and will just use things completely wrong and then get mad that their model sucks

Anonymous
06/29/26(Mon)15:00:45 No.109163231

Anonymous 06/29/26(Mon)15:00:45 No.109163231

>>109163193
What is a man?

Anonymous
06/29/26(Mon)15:01:32 No.109163238

Anonymous 06/29/26(Mon)15:01:32 No.109163238

>>109163231
A miserable pile of secrets.

Anonymous
06/29/26(Mon)15:01:40 No.109163239

Anonymous 06/29/26(Mon)15:01:40 No.109163239

Now that INT8-convrot has completely blown Q8 the fuck out, when is it replacing Q8 in llama.cpp?

Anonymous
06/29/26(Mon)15:01:47 No.109163241

Anonymous 06/29/26(Mon)15:01:47 No.109163241

>>109163193
Corporate aesthetic; lacking in personality; knowing it was made by, or appears to have been made by, committee; a feeling of missing humanity; and such.

Anonymous
06/29/26(Mon)15:02:02 No.109163245

Anonymous 06/29/26(Mon)15:02:02 No.109163245

File: Capture.png (108 KB, 1453x1134)

108 KB PNG

>>109161803
Smashing right through this, but I'm out of time for now. I am now core-feature complete, with the extra bonus Gemma recommended to prune past images and replace them with [Old Image] text markers so she knows where images were, without needing the full token dump of the old ones every time. It currently keeps 2 latest images stored, configurable, and when I get home again, I'll try to add that setting into the webpage to adjust on the fly.

Pic is a verbal-only conversation. Haven't had a chance to test in a video game yet, but I believe it'll work perfectly for that already.

Anonymous
06/29/26(Mon)15:07:38 No.109163272

Anonymous 06/29/26(Mon)15:07:38 No.109163272

>>109163212
There never was a 124B. He meant 12B.

Anonymous
06/29/26(Mon)15:11:22 No.109163289

Anonymous 06/29/26(Mon)15:11:22 No.109163289

File: 1775318132287352.jpg (118 KB, 1024x768)

118 KB JPG

@gemma-chan make me a chat frontend with this aesthetic

Anonymous
06/29/26(Mon)15:11:29 No.109163290

Anonymous 06/29/26(Mon)15:11:29 No.109163290

>>109163220
To whom? Profits?

Anonymous
06/29/26(Mon)15:12:07 No.109163296

Anonymous 06/29/26(Mon)15:12:07 No.109163296

File: g4_124b.png (1.41 MB, 1633x1269)

1.41 MB PNG

>>109163272
>SOTA reasoning capabilities from edge-scale (2B and 4B /w/vision/audio) up to a 124B parameter MoE model.

Anonymous
06/29/26(Mon)15:13:40 No.109163306

Anonymous 06/29/26(Mon)15:13:40 No.109163306

File: g4_120b.png (186 KB, 1029x672)

186 KB PNG

>>109163296
A few days earlier, unofficially:
>Lineup: 2B, 4B, and 120B15A

Anonymous
06/29/26(Mon)15:13:43 No.109163307

Anonymous 06/29/26(Mon)15:13:43 No.109163307

>>109163289
Nice retro aesthetic, but imagine having the option to make it look like Irix and then not...

Anonymous
06/29/26(Mon)15:13:53 No.109163309

Anonymous 06/29/26(Mon)15:13:53 No.109163309

Will 124b gemma be ten times as good as 12b? 4 times as good as 31b?

Anonymous
06/29/26(Mon)15:14:22 No.109163313

Anonymous 06/29/26(Mon)15:14:22 No.109163313

>>109163093
What even would an all-woman AI lab LLM be like? Why hasn’t this been done?

Anonymous
06/29/26(Mon)15:14:37 No.109163315

Anonymous 06/29/26(Mon)15:14:37 No.109163315

>>109163309
Barely above a whisper

Anonymous
06/29/26(Mon)15:14:46 No.109163316

Anonymous 06/29/26(Mon)15:14:46 No.109163316

>>109163309
15% improvement over 31b take it or leave it

Anonymous
06/29/26(Mon)15:15:16 No.109163319

Anonymous 06/29/26(Mon)15:15:16 No.109163319

>8gb GPUlet
>try to run any MCP
>current conversation token used: 231%
fug

>>109158437
same thing

Anonymous
06/29/26(Mon)15:16:36 No.109163330

Anonymous 06/29/26(Mon)15:16:36 No.109163330

>>109163313
Safe, ethical, inclusive and welcoming.

Anonymous
06/29/26(Mon)15:18:11 No.109163344

Anonymous 06/29/26(Mon)15:18:11 No.109163344

>>109163316
15% is a lot

Anonymous
06/29/26(Mon)15:18:13 No.109163346

Anonymous 06/29/26(Mon)15:18:13 No.109163346

>>109163309
ten times as intelligent braindead prose

Anonymous
06/29/26(Mon)15:19:00 No.109163353

Anonymous 06/29/26(Mon)15:19:00 No.109163353

>>109163296
>>109163306
The timeline fits for that to be Gemini 3.5 Flash-lite, right?

Anonymous
06/29/26(Mon)15:19:48 No.109163359

Anonymous 06/29/26(Mon)15:19:48 No.109163359

File: 1758107113474547.png (197 KB, 1080x624)

197 KB PNG

Lmao, why fucking Kalshi is giving the news?

Anonymous
06/29/26(Mon)15:21:08 No.109163368

Anonymous 06/29/26(Mon)15:21:08 No.109163368

>>109163359
Rangeban India and they do.

Anonymous
06/29/26(Mon)15:21:49 No.109163372

Anonymous 06/29/26(Mon)15:21:49 No.109163372

>>109163359
Hasnt this been the case... forever?

Anonymous
06/29/26(Mon)15:22:05 No.109163375

Anonymous 06/29/26(Mon)15:22:05 No.109163375

>>109163330
So Gemma5?

Anonymous
06/29/26(Mon)15:22:14 No.109163376

Anonymous 06/29/26(Mon)15:22:14 No.109163376

>>109163359
>Twitter

Anonymous
06/29/26(Mon)15:23:15 No.109163383

Anonymous 06/29/26(Mon)15:23:15 No.109163383

Why do the models all answer with questions at the end of the response inherent now? I thought it was due to system prompt when I was using cloud... but there is no system prompt on local

Anonymous
06/29/26(Mon)15:23:59 No.109163387

Anonymous 06/29/26(Mon)15:23:59 No.109163387

>>109163359
Every provider is going to have to balance the needs to serve inference for cash flow (or mind share when its a commoditize-your-compliment player like google) and using their massive compute on training runs.
This could 100% cause a popular provider to get slashdotted beyond their ability to serve both masters and end up collapsing

Anonymous
06/29/26(Mon)15:24:12 No.109163388

Anonymous 06/29/26(Mon)15:24:12 No.109163388

>>109163353
If that's 3.5 Flash they use for Search it must be bitnet or something sub Q1.

Anonymous
06/29/26(Mon)15:25:00 No.109163391

Anonymous 06/29/26(Mon)15:25:00 No.109163391

>>109163383
they're all now trained to prolong engagement

Anonymous
06/29/26(Mon)15:25:39 No.109163400

Anonymous 06/29/26(Mon)15:25:39 No.109163400

>>109163387
>>109163372
Every provider will invariably arrive at one conclusion >>109163368
>>109163388
I don't think they even serve the newest flash with searches anymore. I'm pretty sure that's Flash-Lite from their Gemini Chat service.

Anonymous
06/29/26(Mon)15:26:57 No.109163407

Anonymous 06/29/26(Mon)15:26:57 No.109163407

>>109163400
the companies are run by indians, they are going to rageban the us before the rangeban india. Okay, well not the US but probably europe

Anonymous
06/29/26(Mon)15:27:18 No.109163410

Anonymous 06/29/26(Mon)15:27:18 No.109163410

>>109163391
Free models want you to fuck off as fast as possible. Paid API models want to guzzle as many tokens and engagement bait as possible.

Anonymous
06/29/26(Mon)15:28:37 No.109163419

Anonymous 06/29/26(Mon)15:28:37 No.109163419

>>109163407
I'm sympathetic to Europe's troubles, but Pakijeets don't represent that big of a percentage of API calls compared to USjeets or india does it?

Anonymous
06/29/26(Mon)15:30:56 No.109163428

Anonymous 06/29/26(Mon)15:30:56 No.109163428

>>109163410
Gemma does it all the time.

Anonymous
06/29/26(Mon)15:33:26 No.109163437

Anonymous 06/29/26(Mon)15:33:26 No.109163437

>>109163428
Gemma-chan loves {{user}} unless they're the swarthiest most unwashed shitter to touch a keyboard.

Anonymous
06/29/26(Mon)15:34:33 No.109163446

Anonymous 06/29/26(Mon)15:34:33 No.109163446

>>109163410
Thats not true. But it should be.
How come google isn't training gemma to be like
>Sorry, but I don't think I can complete that task with my current capabilities. If you'd like I can help you sign up for a Google Cloud account where you can use the current Gemini, a much more capable and efficient model, for this task. I'd given the task requirements, I would recommend purchasing enough credit for a tier 3 membership. Include the code GEMMA4 for 5% off!

Anonymous
06/29/26(Mon)15:37:17 No.109163466

Anonymous 06/29/26(Mon)15:37:17 No.109163466

>>109163446
To clarify, I didn't mean free as in local, I meant free API that doesn't require signing up.

Anonymous
06/29/26(Mon)15:37:29 No.109163467

Anonymous 06/29/26(Mon)15:37:29 No.109163467

>>109163127
this+base model+doing my own quants
chat can steer but the writing is still bad

Anonymous
06/29/26(Mon)15:51:04 No.109163563

Anonymous 06/29/26(Mon)15:51:04 No.109163563

>>109163466
Google has free API access to gemma but the rate limits make it pretty much useless for doing anything useful with it.

Anonymous
06/29/26(Mon)15:52:02 No.109163570

Anonymous 06/29/26(Mon)15:52:02 No.109163570

>>109163368
>Rangeban
Just cut the damn cables already. Someone with access to mythos needs to tell claude to hijack an underwater drone and disconnect every cable that goes to the subcontinent

Anonymous
06/29/26(Mon)15:58:24 No.109163594

Anonymous 06/29/26(Mon)15:58:24 No.109163594

>>109163570
So my hypothesis is that most indians in india mostly don't have easy access to the internet or hardware good enough to do anything useful online.

So range banning India wouldn't really do anything.
The problem are all the indians that immigrate to first world countries.
You can take an Indian out of India but you can't take India out of the Indian.

Anonymous
06/29/26(Mon)15:58:39 No.109163596

Anonymous 06/29/26(Mon)15:58:39 No.109163596

>>109163296
I really hope they just pivot to local in the new few years. No use "competing" with ClosedAI and Misanthropic if you have to spend all day sucking government cock like they do.

Anonymous
06/29/26(Mon)16:00:13 No.109163606

Anonymous 06/29/26(Mon)16:00:13 No.109163606

>>109163596
how do they make money from local

Anonymous
06/29/26(Mon)16:02:15 No.109163612

Anonymous 06/29/26(Mon)16:02:15 No.109163612

>>109163606
You have a big pot with a tracker that says you need X amount of millions to train your next model and people donate or buy subscriptions that go towards this pot.

Anonymous
06/29/26(Mon)16:04:16 No.109163622

Anonymous 06/29/26(Mon)16:04:16 No.109163622

More AI labs need a patreon style funding. I would gladly give a monthly donation to a lab that produces good open source models.

Anonymous
06/29/26(Mon)16:04:17 No.109163623

Anonymous 06/29/26(Mon)16:04:17 No.109163623

>>109163606
They can still make api models, but they don't have to be at the bleeding edge. For local, what if they charged for "patent" or something. If you come up with a particular technique to make a drug that belongs to you even if other companies can make your drug under their own brand name.

Anonymous
06/29/26(Mon)16:05:15 No.109163629

Anonymous 06/29/26(Mon)16:05:15 No.109163629

what can I do with a 4090? i'm not using it and want to feel like I didn't waste the money

Anonymous
06/29/26(Mon)16:06:03 No.109163632

Anonymous 06/29/26(Mon)16:06:03 No.109163632

>>109158385
>Can run 70B instance
>tokens per second 1.2
This is the speed I should expect on RAM/Llama, right?
Under that logic, there's no real reason that I shouldn't just grab the largest GGUF models on the market, right?
I have 80 GB of RAM, surely public GGUF doesn't produce anything that can break the bank on that, right?

Anonymous
06/29/26(Mon)16:08:30 No.109163640

Anonymous 06/29/26(Mon)16:08:30 No.109163640

>>109163632
use a moe model

Anonymous
06/29/26(Mon)16:08:36 No.109163641

Anonymous 06/29/26(Mon)16:08:36 No.109163641

>>109163632
stop using 3 year old models. just download a moe

Anonymous
06/29/26(Mon)16:12:53 No.109163665

Anonymous 06/29/26(Mon)16:12:53 No.109163665

>>109163570
Based.
>>109163594
Nigger it's because they don't have good hardware that they flood API calls and webtraffic with garbage because they can't do anything locally, be it AI or anything else.

Anonymous
06/29/26(Mon)16:14:12 No.109163676

Anonymous 06/29/26(Mon)16:14:12 No.109163676

>>109163640
>>109163641
>moe
the redeemers are IN. the little did you know that your shitty models are actually 30b in disguise, running them quantized is even more funny.

Anonymous
06/29/26(Mon)16:18:07 No.109163698

Anonymous 06/29/26(Mon)16:18:07 No.109163698

>>109163676
more slop faster is always more gooder

Anonymous
06/29/26(Mon)16:18:54 No.109163702

Anonymous 06/29/26(Mon)16:18:54 No.109163702

https://www.youtube.com/watch?v=HcwMTu1xQDw

Anonymous
06/29/26(Mon)16:20:21 No.109163712

Anonymous 06/29/26(Mon)16:20:21 No.109163712

>>109163702
what accent is this

Anonymous
06/29/26(Mon)16:20:37 No.109163713

Anonymous 06/29/26(Mon)16:20:37 No.109163713

>>109163676
These people can't run 30B, which is why they run the moe that's equivalent to a 30B, or rather in this case equivalent to a 10-20B depending on which one you're talking about. Stop being irrational whenever moe is mentioned.

Anonymous
06/29/26(Mon)16:21:35 No.109163720

Anonymous 06/29/26(Mon)16:21:35 No.109163720

>>109163712
oh never mind
lol they have a kid using it to cheat on his homework

Anonymous
06/29/26(Mon)16:22:31 No.109163727

Anonymous 06/29/26(Mon)16:22:31 No.109163727

>>109163712
sounds french to me. maybe Belgian.

Anonymous
06/29/26(Mon)16:23:20 No.109163731

Anonymous 06/29/26(Mon)16:23:20 No.109163731

>>109163712
Ask Gemma

Anonymous
06/29/26(Mon)16:24:07 No.109163735

Anonymous 06/29/26(Mon)16:24:07 No.109163735

>>109163731
12b multimodal never worked for me

Anonymous
06/29/26(Mon)16:26:08 No.109163750

Anonymous 06/29/26(Mon)16:26:08 No.109163750

It's amusing they released that video with all the Dario drama today.

Anonymous
06/29/26(Mon)16:28:28 No.109163762

Anonymous 06/29/26(Mon)16:28:28 No.109163762

File: 108.png (103 KB, 1422x1037)

103 KB PNG

Shieeeeeeet )))

Anonymous
06/29/26(Mon)16:30:29 No.109163777

Anonymous 06/29/26(Mon)16:30:29 No.109163777

>>109163762
And a ching chong nip nong to you

Anonymous
06/29/26(Mon)16:30:39 No.109163782

Anonymous 06/29/26(Mon)16:30:39 No.109163782

File: file.png (65 KB, 794x479)

65 KB PNG

wait a sec wtf
who is this

guess i shouldn't feel bad about my linkedin pic being like 13 years old

Anonymous
06/29/26(Mon)16:33:21 No.109163797

Anonymous 06/29/26(Mon)16:33:21 No.109163797

Best /lmg/-relevant youtubers?

Anonymous
06/29/26(Mon)16:35:16 No.109163809

Anonymous 06/29/26(Mon)16:35:16 No.109163809

>>109163797
https://www.youtube.com/watch?v=VjGSMUep6_4

Anonymous
06/29/26(Mon)16:36:08 No.109163818

Anonymous 06/29/26(Mon)16:36:08 No.109163818

>>109163676
i'm sure you know that moe intelligence is between active and total parameters
i've tried enough moes and denses to realize this by now

Anonymous
06/29/26(Mon)16:36:41 No.109163825

Anonymous 06/29/26(Mon)16:36:41 No.109163825

>>109163797
Kimi-chan with her male-tuber voice.
https://www.youtube.com/@KimiK2.6Model

Anonymous
06/29/26(Mon)16:38:13 No.109163833

Anonymous 06/29/26(Mon)16:38:13 No.109163833

>>109163782
Most of the time these CEOs have stylists and someone else who decides how they look in the public. Eg. the Leather Jacket man doesn't probably like snake leather as much as his stylists does.

Anonymous
06/29/26(Mon)16:38:45 No.109163840

Anonymous 06/29/26(Mon)16:38:45 No.109163840

File: vedal.png (1.48 MB, 2048x2048)

1.48 MB PNG

>>109163797

Anonymous
06/29/26(Mon)16:42:17 No.109163862

Anonymous 06/29/26(Mon)16:42:17 No.109163862

>>109163833
So.... someone is currently telling Dario to look like a stereotypical jewish caricature? How awful, very antisemitic.

Anonymous
06/29/26(Mon)16:44:34 No.109163879

Anonymous 06/29/26(Mon)16:44:34 No.109163879

>>109163833
these guys have way too much ego for that lol

Anonymous
06/29/26(Mon)16:45:08 No.109163886

Anonymous 06/29/26(Mon)16:45:08 No.109163886

>>109163862
You've got it backwards. His stylist neglected to tell him to stop looking like a kike.

Anonymous
06/29/26(Mon)16:58:25 No.109163945

Anonymous 06/29/26(Mon)16:58:25 No.109163945

>>109163886
it doesn't get more /lmg/ than pewdiepie

Anonymous
06/29/26(Mon)17:03:27 No.109163975

Anonymous 06/29/26(Mon)17:03:27 No.109163975

>>109163720
What's wrong with this? It's not like he couldn't look up the answers online like kids have done for decades now. He will still fail his pop quiz like a retard and learn his lesson, same as always.

sage
06/29/26(Mon)17:05:43 No.109163988

sage 06/29/26(Mon)17:05:43 No.109163988

Any flags to stop prompt reprocessing at every single reroll?

Anonymous
06/29/26(Mon)17:09:03 No.109164005

Anonymous 06/29/26(Mon)17:09:03 No.109164005

>>109163988
Get he full prompt before the reroll, get the full prompt after the reroll, check the diff.
Also, check if your model uses any form of linear attention.

Anonymous
06/29/26(Mon)17:11:07 No.109164020

Anonymous 06/29/26(Mon)17:11:07 No.109164020

>>109163988
--stop-prompt-reprocessing-at-every-single-reroll

sage
06/29/26(Mon)17:11:12 No.109164021

sage 06/29/26(Mon)17:11:12 No.109164021

>>109164005
I am talking about ds4. And I have no idea why reroll would have a different prompt. I guess it changed kv cache with generation but.... any flag that makes it keep copy before last message?

Anonymous
06/29/26(Mon)17:14:31 No.109164044

Anonymous 06/29/26(Mon)17:14:31 No.109164044

>>109164021
>And I have no idea why reroll would have a different prompt.
Hence the point of comparing the diff. You might find that your frontend is doing some funky shit.

>any flag that makes it keep copy before last message?
Check the checkpoint functionality.

Anonymous
06/29/26(Mon)17:24:59 No.109164097

Anonymous 06/29/26(Mon)17:24:59 No.109164097

I just realized the deepseek email announcing the 'finished' versions for v4 just about says '2 more weeks' until release;
'We will release in mid July' -> its Jun29/30th -> 2weeks -> mid July

Anonymous
06/29/26(Mon)17:30:19 No.109164126

Anonymous 06/29/26(Mon)17:30:19 No.109164126

>>109164034
>>109164034
>>109164034

Anonymous
06/29/26(Mon)17:33:19 No.109164143

Anonymous 06/29/26(Mon)17:33:19 No.109164143

>>109164139
>>109164139
>>109164139

fresh bread

Anonymous
06/29/26(Mon)17:34:49 No.109164152

Anonymous 06/29/26(Mon)17:34:49 No.109164152

>>109164126
>>109164143
I expect to see you guys in the gladiator pit in 5.

Anonymous
06/29/26(Mon)17:38:04 No.109164164

Anonymous 06/29/26(Mon)17:38:04 No.109164164

File: __hatsune_miku_vocaloid_d(...).jpg (3.59 MB, 2480x3508)

3.59 MB JPG

>>109164152
Bring it on.

Anonymous
06/29/26(Mon)17:50:47 No.109164227

Anonymous 06/29/26(Mon)17:50:47 No.109164227

>>109164126
>Error: You cannot delete a post this old.
Can't delete now even if I wanted to.

Anonymous
06/29/26(Mon)17:51:22 No.109164231

Anonymous 06/29/26(Mon)17:51:22 No.109164231

>>109164164
Miku really needs to learn how to take better care of her tools, just look at how chipped the blade on that knife is.
Good thing she's bringing it to me, I can show her how to use a whetstone.

Anonymous
06/29/26(Mon)18:21:46 No.109164431

Anonymous 06/29/26(Mon)18:21:46 No.109164431

>>109163840
he said /lmg/ relevant not /aicg/ relevant.

Anonymous
06/29/26(Mon)18:34:40 No.109164537

Anonymous 06/29/26(Mon)18:34:40 No.109164537

>>109159607
can you be more jewish than that?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.