/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 03/28/26(Sat)22:20:40 No.108476286

File: cherish the vessel.jpg (431 KB, 1536x1536)

431 KB JPG

/lmg/ - Local Models General Anonymous 03/28/26(Sat)22:20:40 No.108476286 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108470850 & >>108466262

►News
>(03/26) CohereLabs releases Transcribe 2B ASR: https://hf.co/CohereLabs/cohere-transcribe-03-2026
>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts
>(03/26) ggml-cuda: Add NVFP4 dp4a kernel #20644 merged: https://github.com/ggml-org/llama.cpp/pull/20644
>(03/25) LongCat-Next native multimodal 74B-A3B released: https://hf.co/meituan-longcat/LongCat-Next
>(03/25) mtmd: Add DeepSeekOCR Support #17400 merged: https://github.com/ggml-org/llama.cpp/pull/17400

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/28/26(Sat)22:30:04 No.108476333

Anonymous 03/28/26(Sat)22:30:04 No.108476333

gemma4 20b dense model is going to fucking rock

Anonymous
03/28/26(Sat)22:39:38 No.108476377

Anonymous 03/28/26(Sat)22:39:38 No.108476377

>>108476333
So true :rocket_emoji:

Anonymous
03/28/26(Sat)22:40:50 No.108476380

Anonymous 03/28/26(Sat)22:40:50 No.108476380

>>108476286
>no dipsy :(
owarida

Anonymous
03/28/26(Sat)22:42:07 No.108476383

Anonymous 03/28/26(Sat)22:42:07 No.108476383

File: prettymikuruinsdancer.png (2.84 MB, 1760x1304)

2.84 MB PNG

►Recent Highlights from the Previous Thread: >>108470850

--Qwen 3.5 vs Gemma 3 roleplay performance and prompting techniques:
>108471259 >108471297 >108471330 >108471359 >108471422 >108471438 >108471446 >108471479 >108471501 >108471482 >108471497 >108471508 >108471518 >108471541 >108471589 >108471603 >108471618 >108471554 >108471568 >108471578 >108471544 >108471443 >108471520
--Qwen 3.5's over-reasoning on simple tasks:
>108471364 >108471367 >108471378 >108471390 >108472699 >108471527
--Qwen3.5 30B A3B tradeoffs vs 27B dense model:
>108471693 >108471700 >108471708 >108471710 >108471715 >108471717 >108471749 >108471721 >108471727 >108471815
--Modifying Qwen 3.5's Jinja template and distributed model inference performance:
>108472666 >108472707 >108472734 >108472816 >108472827 >108472845 >108472828 >108472855 >108472862 >108472865 >108472886 >108472924 >108472999 >108473093 >108473113 >108473144 >108473160
--Ultra sparse MoE models and llama.cpp support limitations:
>108473777 >108473791 >108473830 >108473841 >108473872 >108473912 >108473864 >108473886 >108473900 >108473901 >108473917
--Qwen model size selection and speculative decoding for web crawler agents:
>108473146 >108473228 >108473257 >108473262 >108473276 >108473299 >108473316 >108473368 >108473414 >108473419 >108473625 >108473666 >108473415
--Gemma 4 rumors surface with Arena testing screenshots:
>108473733 >108473747 >108473748 >108475310 >108475347 >108473754 >108473756 >108474182 >108474196 >108474195 >108474207 >108474231 >108474362 >108474453
--Dual-GX10 setup experiences and model recommendations:
>108472526 >108472542 >108472572 >108473106 >108472599 >108472643 >108472689 >108472715 >108472759 >108472875
--Academic dispute over TurboQuant's alleged misrepresentation of RaBitQ:
>108471244 >108471310
--Miku (free space):
>108470896 >108470906 >108475541

►Recent Highlight Posts from the Previous Thread: >>108470853

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/28/26(Sat)22:42:20 No.108476384

Anonymous 03/28/26(Sat)22:42:20 No.108476384

>>108476333
i get it because rocks are dense

Anonymous
03/28/26(Sat)22:44:01 No.108476389

Anonymous 03/28/26(Sat)22:44:01 No.108476389

>>108476286
https://www.youtube.com/watch?v=ZkRYH6PEP5k
comment section

Anonymous
03/28/26(Sat)22:56:23 No.108476449

Anonymous 03/28/26(Sat)22:56:23 No.108476449

>>108476389
The actual reason is more likely the US government wants sora exclusivity as a propaganda machine and doesn't like shitlords making political memes about cheeto hitler losing with it, no matter how many billions they lose they will be bailed out by US tax payer money indefinitely.

Anonymous
03/28/26(Sat)22:56:54 No.108476454

Anonymous 03/28/26(Sat)22:56:54 No.108476454

>>108476389
Sora was shut down because they need all the compute to keep Netanyahu alive

Anonymous
03/28/26(Sat)23:01:21 No.108476470

Anonymous 03/28/26(Sat)23:01:21 No.108476470

>>108476449
That and as a world model to train military killbots.

Anonymous
03/28/26(Sat)23:05:07 No.108476484

Anonymous 03/28/26(Sat)23:05:07 No.108476484

I love all my friends in /lmg/

Anonymous
03/28/26(Sat)23:05:37 No.108476487

Anonymous 03/28/26(Sat)23:05:37 No.108476487

>>108476484
awww

Anonymous
03/28/26(Sat)23:06:09 No.108476489

Anonymous 03/28/26(Sat)23:06:09 No.108476489

>>108476470
Oh those drones run on 8GB Jetson Orins and don't need such a complex model.

Anonymous
03/28/26(Sat)23:06:59 No.108476493

Anonymous 03/28/26(Sat)23:06:59 No.108476493

>>108476449
>US government wants sora exclusivity as a propaganda machine
this is retarded as there are tons of alternatives many of which are actualy better.

Anonymous
03/28/26(Sat)23:07:48 No.108476498

Anonymous 03/28/26(Sat)23:07:48 No.108476498

>>108476493
Yes but OpenAI is the government contractor now not those other (chinese) models.

Anonymous
03/28/26(Sat)23:08:22 No.108476501

Anonymous 03/28/26(Sat)23:08:22 No.108476501

>>108476489
World models are used as environment in which to train the final deployed models, not to be deployed itself.

Anonymous
03/28/26(Sat)23:08:48 No.108476503

Anonymous 03/28/26(Sat)23:08:48 No.108476503

Talk me out of buying a ASRock Radeon AI PRO R9700 Creator 32GB

Anonymous
03/28/26(Sat)23:09:39 No.108476508

Anonymous 03/28/26(Sat)23:09:39 No.108476508

>>108476501
Fair point.
>>108476503
Do you intend to use it with Linux?

Anonymous
03/28/26(Sat)23:09:46 No.108476509

Anonymous 03/28/26(Sat)23:09:46 No.108476509

>>108476503
Do it. Do it. Do it.

Anonymous
03/28/26(Sat)23:09:57 No.108476513

Anonymous 03/28/26(Sat)23:09:57 No.108476513

>>108476503
32gb is nothing yet it's still amd

Anonymous
03/28/26(Sat)23:10:42 No.108476520

Anonymous 03/28/26(Sat)23:10:42 No.108476520

>>108476508
>Do you intend to use it with Linux?
Yes.
>>108476513
I prefer to run llcpp via vulkan anyways

Anonymous
03/28/26(Sat)23:13:05 No.108476530

Anonymous 03/28/26(Sat)23:13:05 No.108476530

>>108476498
and?
people don't care about who's a gov contractor or not.
the muh exclusivity argument is defeated by the fact that we don't need sora to make vidgen.

Anonymous
03/28/26(Sat)23:15:21 No.108476539

Anonymous 03/28/26(Sat)23:15:21 No.108476539

>>108476520
I won't talk you out of it then since it will actually work pretty well for you. Level1techs has some videos using them if you haven't seen them.
>>108476530
People don't care but the US government has paid OpenAI and wants to use the compute for propaganda, so the people aren't allowed to use it anymore, it's very easy to see.

Anonymous
03/28/26(Sat)23:16:35 No.108476544

Anonymous 03/28/26(Sat)23:16:35 No.108476544

qwen-3.5 27b is best for 5090? do i need any special command line options for running it on llama.cpp? i really would prefer not to use offloading to cpu

Anonymous
03/28/26(Sat)23:25:30 No.108476572

Anonymous 03/28/26(Sat)23:25:30 No.108476572

>>108476539
>has paid OpenAI and wants to use the compute for propaganda
yes? and ?
that doesn't give them any exclusivity to vidgen.
>so the people aren't allowed to use it anymore
and?
there are better options.

Anonymous
03/28/26(Sat)23:27:48 No.108476584

Anonymous 03/28/26(Sat)23:27:48 No.108476584

>>108476572
Are you ESL or just retarded?

Anonymous
03/28/26(Sat)23:36:20 No.108476614

Anonymous 03/28/26(Sat)23:36:20 No.108476614

File: a2f5f6c4deafee365524384b2(...).jpg (207 KB, 1280x720)

207 KB JPG

I opened my girlfriend's port to the internet and now people are trying to stick their dicks in. I hope llama server is secure.

Anonymous
03/28/26(Sat)23:38:07 No.108476623

Anonymous 03/28/26(Sat)23:38:07 No.108476623

>>108476614
That isn't very smart.

Anonymous
03/28/26(Sat)23:41:00 No.108476627

Anonymous 03/28/26(Sat)23:41:00 No.108476627

>>108476520
Will you want or need to use Pytorch i.e. image diffusion? AMD's Pytorch support is atrocious right now for newer cards from what I have seen so I'm not sure if you want to wait a month and get the Intel B70 Pro in that case. But otherwise, if you can get it for a good price, go for it.

Anonymous
03/28/26(Sat)23:41:32 No.108476629

Anonymous 03/28/26(Sat)23:41:32 No.108476629

>>108476623
It's probably not a problem. Probably.

Anonymous
03/28/26(Sat)23:44:59 No.108476646

Anonymous 03/28/26(Sat)23:44:59 No.108476646

>>108476584
i think you just cannot make a proper argument.
see : >>108476449
the US gov having exclusive access to sora doesn't change shit in the situation you described.

Anonymous
03/28/26(Sat)23:45:47 No.108476649

Anonymous 03/28/26(Sat)23:45:47 No.108476649

>>108476614
>>108476629
you have nothing to worry about
pwilkin has verified that the code is not just secure; it's hardened

Anonymous
03/28/26(Sat)23:46:46 No.108476652

Anonymous 03/28/26(Sat)23:46:46 No.108476652

>>108476649
well, his agent did anyway

Anonymous
03/28/26(Sat)23:56:05 No.108476685

Anonymous 03/28/26(Sat)23:56:05 No.108476685

>>108476646
You're just too retarded and illiterate to understand they want the compute for their propaganda and not wasting it for goylem cat making noise on the porch videos so they can pump out their propaganda as fast as possible.

Anonymous
03/28/26(Sat)23:59:29 No.108476698

Anonymous 03/28/26(Sat)23:59:29 No.108476698

>>108476685
do not call me dumb

Anonymous
03/29/26(Sun)00:00:43 No.108476705

Anonymous 03/29/26(Sun)00:00:43 No.108476705

>>108476627
I'll let you in on the ultimate secret as a 9060 owner, if you want rocm and pytorch to work just use the docker container from AMD, image gen is twice as fast as a 3060 on forge neo.
https://rocm.docs.amd.com/projects/install-on-linux/en/develop/install/3rd-party/pytorch-install.html Heres what you want if you're using AMD.
>>108476698
You are dalit and Pakistani muslim man impregnated your sister and mother.

Anonymous
03/29/26(Sun)00:06:24 No.108476722

Anonymous 03/29/26(Sun)00:06:24 No.108476722

>>108476705
shut up you son of a bastard

Anonymous
03/29/26(Sun)00:09:00 No.108476732

Anonymous 03/29/26(Sun)00:09:00 No.108476732

>>108476685
my point is that they don't have and never will have all the compute so it's irrelevant.
your original point is retarded.
the only reason they shut it down is because they are bleeding money.

Anonymous
03/29/26(Sun)00:10:55 No.108476744

Anonymous 03/29/26(Sun)00:10:55 No.108476744

File: saar.png (1.64 MB, 1024x1024)

1.64 MB PNG

>>108476722
BLOODY BITCH BASTARD BENCHOD DONT CALL ME DUMB SAAAARRRRRRR!!!!

Anonymous
03/29/26(Sun)00:12:20 No.108476750

Anonymous 03/29/26(Sun)00:12:20 No.108476750

File: tokens.png (93 KB, 939x806)

93 KB PNG

its still adding the <s> into my text

Anonymous
03/29/26(Sun)00:24:28 No.108476791

Anonymous 03/29/26(Sun)00:24:28 No.108476791

File: 1769794484289534.jpg (609 KB, 2080x2627)

609 KB JPG

https://huggingface.co/spaces/tventurella/mr_chatterbox
An LLM "trained entirely from scratch on a corpus of over 28,000 Victorian-era British texts published between 1837 and 1899, drawn from a dataset made available by the British Library."

Anonymous
03/29/26(Sun)00:43:05 No.108476856

Anonymous 03/29/26(Sun)00:43:05 No.108476856

>>108476685
>>108476698
>>108476744
samefag

Anonymous
03/29/26(Sun)00:44:08 No.108476858

Anonymous 03/29/26(Sun)00:44:08 No.108476858

>>108476698
>>108476722
>>108476856
samefag

Anonymous
03/29/26(Sun)00:46:53 No.108476864

Anonymous 03/29/26(Sun)00:46:53 No.108476864

>>108476286
Has anybody tried the Qwen3.5 27b to 40b upsized models yet? Curious if they're any good.

https://huggingface.co/mradermacher/Qwen3.5-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-i1-GGUF

Anonymous
03/29/26(Sun)00:49:31 No.108476875

Anonymous 03/29/26(Sun)00:49:31 No.108476875

Is the recent news regarding Google's breakthrough real or is it a meme?
Should I build a machine in anticipation?

Anonymous
03/29/26(Sun)00:50:50 No.108476882

Anonymous 03/29/26(Sun)00:50:50 No.108476882

>>108476864
another DavidAUbortion

Anonymous
03/29/26(Sun)00:56:08 No.108476900

Anonymous 03/29/26(Sun)00:56:08 No.108476900

>>108476875
This changes everything.

Anonymous
03/29/26(Sun)00:58:02 No.108476905

Anonymous 03/29/26(Sun)00:58:02 No.108476905

>>108476875
We already have KV quantization and it's far from lossless but if their claims make RAM prices go down it's a win.

Anonymous
03/29/26(Sun)00:58:14 No.108476907

Anonymous 03/29/26(Sun)00:58:14 No.108476907

File: file.png (400 KB, 914x1601)

400 KB PNG

>>108476858
you are retarded

Anonymous
03/29/26(Sun)01:00:19 No.108476915

Anonymous 03/29/26(Sun)01:00:19 No.108476915

>>108476907
cool inspect element Sukdeep Dicshit

Anonymous
03/29/26(Sun)01:02:39 No.108476922

Anonymous 03/29/26(Sun)01:02:39 No.108476922

4

Anonymous
03/29/26(Sun)01:04:18 No.108476926

Anonymous 03/29/26(Sun)01:04:18 No.108476926

>>108476875
It's real. I saw it on Twitter. Already sold all of my GPUs and stocks.

Anonymous
03/29/26(Sun)01:07:10 No.108476930

Anonymous 03/29/26(Sun)01:07:10 No.108476930

Is TTS-Audio-Suite good enough to get started with local TTS?

Anonymous
03/29/26(Sun)01:14:48 No.108476959

Anonymous 03/29/26(Sun)01:14:48 No.108476959

>>108476915
>not just using the extension

Anonymous
03/29/26(Sun)01:16:24 No.108476961

Anonymous 03/29/26(Sun)01:16:24 No.108476961

>>108476750
Don't use banned tokens it's waste of time and compute. </s> is part of Mistral chat template, you need to edit the char template if you want to get rid of that.
I don't use retard tavern anymore but it's on other page.
Why would you even want to do this? I don't know.

Anonymous
03/29/26(Sun)01:17:25 No.108476968

Anonymous 03/29/26(Sun)01:17:25 No.108476968

>>108476961
*char = chat

Anonymous
03/29/26(Sun)01:18:51 No.108476972

Anonymous 03/29/26(Sun)01:18:51 No.108476972

>>108476930
if you have kobold.cpp it supports a bunch of tts models, the latest release added qwen3 tts support

Anonymous
03/29/26(Sun)01:18:54 No.108476973

Anonymous 03/29/26(Sun)01:18:54 No.108476973

File: faggot.webm (1.17 MB, 1920x1080)

1.17 MB WEBM

>>108476915

Anonymous
03/29/26(Sun)01:20:11 No.108476976

Anonymous 03/29/26(Sun)01:20:11 No.108476976

File: brown.png (13 KB, 1440x810)

13 KB PNG

>>108476973

Anonymous
03/29/26(Sun)01:20:45 No.108476978

Anonymous 03/29/26(Sun)01:20:45 No.108476978

>>108476976
post hands nigger

Anonymous
03/29/26(Sun)01:23:00 No.108476986

Anonymous 03/29/26(Sun)01:23:00 No.108476986

File: 1773613956950812.jpg (134 KB, 940x1463)

134 KB JPG

>>108476978

Anonymous
03/29/26(Sun)01:25:25 No.108476996

Anonymous 03/29/26(Sun)01:25:25 No.108476996

>>108476986
kek but that's not how you post hands.

Anonymous
03/29/26(Sun)01:32:46 No.108477019

Anonymous 03/29/26(Sun)01:32:46 No.108477019

>>108476972
Qwen3 or 3.5?

Anonymous
03/29/26(Sun)01:32:59 No.108477020

Anonymous 03/29/26(Sun)01:32:59 No.108477020

>>108476973
>top right corner
How embarrassing.

Anonymous
03/29/26(Sun)01:35:02 No.108477024

Anonymous 03/29/26(Sun)01:35:02 No.108477024

>>108476973
>le shill lion

Anonymous
03/29/26(Sun)01:35:51 No.108477026

Anonymous 03/29/26(Sun)01:35:51 No.108477026

>>108476926
Why would you sell your GPUs when they will finally be useable for local models?

Anonymous
03/29/26(Sun)01:35:55 No.108477028

Anonymous 03/29/26(Sun)01:35:55 No.108477028

>>108477020
what's so embarassing?
>>108477024
and?
all web browsers are trash.

Anonymous
03/29/26(Sun)01:36:33 No.108477029

Anonymous 03/29/26(Sun)01:36:33 No.108477029

>>108477019
https://qwen.ai/blog?id=qwen3tts-0115 this one, they have gguf links on their release page here https://github.com/LostRuins/koboldcpp/releases/tag/v1.110

Anonymous
03/29/26(Sun)01:39:26 No.108477037

Anonymous 03/29/26(Sun)01:39:26 No.108477037

File: hands.png (1.11 MB, 1280x768)

1.11 MB PNG

>>108476996

Anonymous
03/29/26(Sun)01:41:44 No.108477040

Anonymous 03/29/26(Sun)01:41:44 No.108477040

>>108477026
Didn't you hear? Google made models use 6x less memory. I can sell my 2x RTX PRO 6000 and get a single 5080 instead now.

Anonymous
03/29/26(Sun)01:46:24 No.108477054

Anonymous 03/29/26(Sun)01:46:24 No.108477054

Isn't all this SSD price bs + ram dump not more related to deepseek's ngram research?

Anonymous
03/29/26(Sun)01:49:51 No.108477065

Anonymous 03/29/26(Sun)01:49:51 No.108477065

>>108477054
No

Anonymous
03/29/26(Sun)01:50:42 No.108477070

Anonymous 03/29/26(Sun)01:50:42 No.108477070

>>108477037
This but feet

Anonymous
03/29/26(Sun)02:02:00 No.108477104

Anonymous 03/29/26(Sun)02:02:00 No.108477104

So what's the deal with TurboQuant? Are we really going to see 6x less memory use for context? Is it something that can be applied to existing models or do they need to be built for it to begin with? Is it even going to be available for local models or is Google going to hoard it?

Anonymous
03/29/26(Sun)02:07:44 No.108477123

Anonymous 03/29/26(Sun)02:07:44 No.108477123

TURBOQUNAT WHENM!?!!!?

Anonymous
03/29/26(Sun)02:08:05 No.108477124

Anonymous 03/29/26(Sun)02:08:05 No.108477124

File: 1759771530038326.png (284 KB, 512x512)

284 KB PNG

>>108477104
yes [for kv cache and not total vram usage], existing models, not hoarded as they are working on it in llama.cpp
https://github.com/ggml-org/llama.cpp/discussions/20969

Anonymous
03/29/26(Sun)02:09:50 No.108477130

Anonymous 03/29/26(Sun)02:09:50 No.108477130

File: __hatsune_miku_and_bicute(...).jpg (851 KB, 3200x6600)

851 KB JPG

I like the name RaBitQ better than TurboQuant.

Anonymous
03/29/26(Sun)02:09:59 No.108477131

Anonymous 03/29/26(Sun)02:09:59 No.108477131

>>108476986
Jacinto, no...

Anonymous
03/29/26(Sun)02:14:43 No.108477144

Anonymous 03/29/26(Sun)02:14:43 No.108477144

>>108477130
pits

Anonymous
03/29/26(Sun)02:16:14 No.108477153

Anonymous 03/29/26(Sun)02:16:14 No.108477153

TurboQuant is Google taking Microsoft's BitNet idea and making it work

Anonymous
03/29/26(Sun)02:19:33 No.108477159

Anonymous 03/29/26(Sun)02:19:33 No.108477159

>>108477153
TurboQuant is Google stealing and misrepresenting other people's work
https://openreview.net/forum?id=tO3ASKZlok&noteId=Arxq4fFVG1

Anonymous
03/29/26(Sun)02:21:58 No.108477167

Anonymous 03/29/26(Sun)02:21:58 No.108477167

https://github.com/spiritbuun/llama-cpp-turboquant-cuda
this merged into mainline when??? GGNIGRENAONOV??!??!?!? WHERES THE MERGYU!?!?!?

Anonymous
03/29/26(Sun)02:24:15 No.108477170

Anonymous 03/29/26(Sun)02:24:15 No.108477170

nobody cares about TrannyQueef shill-kun

Anonymous
03/29/26(Sun)02:24:55 No.108477172

Anonymous 03/29/26(Sun)02:24:55 No.108477172

>>108477170
>TrannyQueef
Imagine the smell.

Anonymous
03/29/26(Sun)02:31:49 No.108477189

Anonymous 03/29/26(Sun)02:31:49 No.108477189

>>108477170
ok bvro keep living in the past, ill enjoy my 1m context on 8gb vram
faggot

Anonymous
03/29/26(Sun)02:59:07 No.108477264

Anonymous 03/29/26(Sun)02:59:07 No.108477264

>>108476973
>hey clawbot make a script to automatically remove (You) from these specific posts at this url and install it to my browser

Anonymous
03/29/26(Sun)03:05:11 No.108477280

Anonymous 03/29/26(Sun)03:05:11 No.108477280

>>108477167
I'm waiting for SneedQuant support. Way better than this snake oil.

Anonymous
03/29/26(Sun)03:17:10 No.108477324

Anonymous 03/29/26(Sun)03:17:10 No.108477324

>>108477167
micron is currently paying BILLIONS to ggerganov to not merge this out of fear of the TURBOQUANt

Anonymous
03/29/26(Sun)03:17:59 No.108477327

Anonymous 03/29/26(Sun)03:17:59 No.108477327

>>108477324
tsmc's involved from what I understand too

Anonymous
03/29/26(Sun)03:26:26 No.108477357

Anonymous 03/29/26(Sun)03:26:26 No.108477357

>>108477264
>still hasn't posted hand.
you'd need to be a retard to think i care enough to think it'd be worth the hassle.

Anonymous
03/29/26(Sun)03:29:34 No.108477364

Anonymous 03/29/26(Sun)03:29:34 No.108477364

>>108477357
i think more than one person is replying to you dalit brother

Anonymous
03/29/26(Sun)03:30:05 No.108477367

Anonymous 03/29/26(Sun)03:30:05 No.108477367

>>108477364
sir i am brahmin

Anonymous
03/29/26(Sun)03:30:44 No.108477370

Anonymous 03/29/26(Sun)03:30:44 No.108477370

>>108477367
you are dalit saar i can smell you saar

Anonymous
03/29/26(Sun)03:35:45 No.108477377

Anonymous 03/29/26(Sun)03:35:45 No.108477377

File: 1731775101713724.png (174 KB, 720x651)

174 KB PNG

>>108477364
>>108477367
>>108477370

Anonymous
03/29/26(Sun)03:40:59 No.108477390

Anonymous 03/29/26(Sun)03:40:59 No.108477390

File: 1752177313501353.png (131 KB, 905x411)

131 KB PNG

Why so many haters?

Anonymous
03/29/26(Sun)03:42:03 No.108477393

Anonymous 03/29/26(Sun)03:42:03 No.108477393

>>108477390
imagine caring about what that retard has to say.
no wonder people are cancerous to such a pathetic excuse of an human being;

Anonymous
03/29/26(Sun)03:42:15 No.108477394

Anonymous 03/29/26(Sun)03:42:15 No.108477394

>>108477390
there is no community. we are all anonymous gooners.

Anonymous
03/29/26(Sun)03:43:29 No.108477397

Anonymous 03/29/26(Sun)03:43:29 No.108477397

>>108477390
He's probably talking about Reddit or something.

Anonymous
03/29/26(Sun)03:44:00 No.108477400

Anonymous 03/29/26(Sun)03:44:00 No.108477400

>>108477390
I'd be mad too if I were a literally who in Sanfran and shitposters on a Cantonese Orb Pondering forum quickly identified if the latest paper or grift I was pushing was fake and gay.

Anonymous
03/29/26(Sun)03:45:38 No.108477402

Anonymous 03/29/26(Sun)03:45:38 No.108477402

>>108477390
If you spend 5 minutes on other places than /lmg/ talking about local models you will be swarmed by a horde of third worlders with IQs so low you didn't even think it was possible to attain literacy with that lack of intelligence. It's pure cancer and I'm honestly shocked how quality /lmg/ has stayed over the years. In fact I think the quality has gone up compared to 2023 as most of the retards have moved to other places including /aicg/

Anonymous
03/29/26(Sun)03:47:02 No.108477408

Anonymous 03/29/26(Sun)03:47:02 No.108477408

>>108477402
There are still hoards of stinky turd worlders with sub 70 IQs here too Anon open your eyes.

Anonymous
03/29/26(Sun)03:48:46 No.108477414

Anonymous 03/29/26(Sun)03:48:46 No.108477414

>>108477402
>swarmed by a horde of third worlders
>on other places than /lmg/
sir i come here for the third worlders

Anonymous
03/29/26(Sun)03:50:03 No.108477420

Anonymous 03/29/26(Sun)03:50:03 No.108477420

>>108477402
This board is 50% jeet/retard, 20% anon, 30% shillposting, LLM or otherwise.

Anonymous
03/29/26(Sun)03:52:11 No.108477425

Anonymous 03/29/26(Sun)03:52:11 No.108477425

File: file.png (719 KB, 1830x1267)

719 KB PNG

>>108477390
He got shat on by Reddit and rightfully so for being a hypocrite on local models. I also think he is a tryhard poser frontend guy that got too good for his britches thinking he knows everything because of what he worked on. And I know that for a fact because he came from Twitch/Amazon which is at the rock bottom of FAANG in terms of pay and prestige.

Anonymous
03/29/26(Sun)03:55:18 No.108477434

Anonymous 03/29/26(Sun)03:55:18 No.108477434

File: 1743658148518308.png (1.11 MB, 1317x1100)

1.11 MB PNG

>>108477425
The dude who bleached his hair for years being a tryhard poser? No way

Anonymous
03/29/26(Sun)04:01:32 No.108477448

Anonymous 03/29/26(Sun)04:01:32 No.108477448

someone needs to vax that theo guy, he's spudding out

Anonymous
03/29/26(Sun)04:12:57 No.108477476

Anonymous 03/29/26(Sun)04:12:57 No.108477476

File: .jpg (342 KB, 1536x2048)

342 KB JPG

Anonymous
03/29/26(Sun)04:24:12 No.108477511

Anonymous 03/29/26(Sun)04:24:12 No.108477511

>I get the itch to try out a new model
>It ends up shit
>I go back to Deepsex and Kimi
>I come to /lmg/ and scroll past jeets eating textual shit
>mikuposter lowers my blood pressure with a nice gen
>I retain hope local text will eventually catch up to local image and video for another day
The inescapable samsaara of /lmg/.

Anonymous
03/29/26(Sun)04:31:30 No.108477532

Anonymous 03/29/26(Sun)04:31:30 No.108477532

>>108477476
mfw v4 finally releases and it beats opus on all benchmarks and real world use cases by 10%

Anonymous
03/29/26(Sun)04:32:14 No.108477535

Anonymous 03/29/26(Sun)04:32:14 No.108477535

File: file.webm (1005 KB, 544x736)

1005 KB WEBM

>>108477476

Anonymous
03/29/26(Sun)04:32:17 No.108477536

Anonymous 03/29/26(Sun)04:32:17 No.108477536

>>108476905
They will not: overpriced shares could bring new manufacturers, researchers that would flood the market with cheap ram on the distance.

Anonymous
03/29/26(Sun)04:34:21 No.108477539

Anonymous 03/29/26(Sun)04:34:21 No.108477539

>>108477130
yeah turbo is something about vaxx

Anonymous
03/29/26(Sun)04:40:54 No.108477555

Anonymous 03/29/26(Sun)04:40:54 No.108477555

>>108476791
>320m
It has the very slightest flavour?
Would like it to be more knowledgable (people in public life, places, events, etc)
and know of more of the controversies of the age.

Anonymous
03/29/26(Sun)04:42:48 No.108477563

Anonymous 03/29/26(Sun)04:42:48 No.108477563

>>108477532
>and real world use cases
Such as?

Anonymous
03/29/26(Sun)04:44:12 No.108477571

Anonymous 03/29/26(Sun)04:44:12 No.108477571

>>108477563
Working on code that isn't in the training dataset.

Anonymous
03/29/26(Sun)04:47:38 No.108477584

Anonymous 03/29/26(Sun)04:47:38 No.108477584

>>108477571
We're still a long way out from LLMs being able to manage large projects effectively and they're already functional at handling much more compartmentalized tasks. You're not expecting the gap between the two to vanish overnight are you?

Anonymous
03/29/26(Sun)04:48:05 No.108477585

Anonymous 03/29/26(Sun)04:48:05 No.108477585

>>108477571
All code that could exist (past, present, future) is already in the training dataset.

Anonymous
03/29/26(Sun)04:52:53 No.108477611

Anonymous 03/29/26(Sun)04:52:53 No.108477611

>>108477585
Read between the lines. He wants to vibecode a whole project in a single prompt with no oversight or QA.

Anonymous
03/29/26(Sun)04:54:44 No.108477616

Anonymous 03/29/26(Sun)04:54:44 No.108477616

>>108477611
>He wants to vibecode a whole project in a single prompt with no oversight or QA.
but you can already do it retard, just have a prompt enhancer sit between you and the actual slave worker and it's done.
>inb4 he doesnt use agents with tools
lol

Anonymous
03/29/26(Sun)05:00:22 No.108477633

Anonymous 03/29/26(Sun)05:00:22 No.108477633

File: YcpSV8RPVpc.jpg (110 KB, 1280x720)

110 KB JPG

>prompt enhancer

Anonymous
03/29/26(Sun)05:01:18 No.108477634

Anonymous 03/29/26(Sun)05:01:18 No.108477634

File: 1754492636738534.mp4 (2.4 MB, 544x832)

2.4 MB MP4

>>108476380
I gotchu
>>108477532
TMW

Anonymous
03/29/26(Sun)05:03:53 No.108477650

Anonymous 03/29/26(Sun)05:03:53 No.108477650

>>108477584
>>108477585
You're reading too much into it. I said that because tiny qwens almost match opus at swe-bench because the models are trained on it.

Anonymous
03/29/26(Sun)05:09:54 No.108477670

Anonymous 03/29/26(Sun)05:09:54 No.108477670

File: miku-holding-gemma.png (1.09 MB, 790x1054)

1.09 MB PNG

Apparently Gemma 4 is currently being anonymously tested on LM Arena (now Arena) in various sizes. It seems way less slopped than Gemma 3 at the very least, although I imagine that with prolonged use new slop will emerge.

The model names that identified themselves as Gemma are "spark", "pteronura", "significant-otter" (this one got mentioned on X yesterday); there might be a couple others too.

Anonymous
03/29/26(Sun)05:10:38 No.108477673

Anonymous 03/29/26(Sun)05:10:38 No.108477673

>>108477650
This is the issue with the standard comparison points between models being public information rather than generated by an impartial model or tester at the time of the comparison. I've consistently found models that can write well are better capable of the abstract reasoning necessary for formulating the structure for decent code with it being far easier to fix imperfections in the implementation of a better structure than trying to fix a good implementation of a fundamentally flawed or unscalable structure.
Even if you're not here to use models to coom, the cockbench really was the only bench that truly mattered.

Anonymous
03/29/26(Sun)05:10:38 No.108477674

Anonymous 03/29/26(Sun)05:10:38 No.108477674

>>108477670
2b 4b and 2T sizes, for all kinds of hardware :)

Anonymous
03/29/26(Sun)05:11:47 No.108477678

Anonymous 03/29/26(Sun)05:11:47 No.108477678

>>108477673
bro its just an autocomplete

Anonymous
03/29/26(Sun)05:11:51 No.108477679

Anonymous 03/29/26(Sun)05:11:51 No.108477679

>>108477670
How safetyslopped is it?

Anonymous
03/29/26(Sun)05:14:04 No.108477690

Anonymous 03/29/26(Sun)05:14:04 No.108477690

>>108477674
I don't think they're going to release models in competition with Gemini, and vision for the best one was definitely not as knowledgeable as Google's flagship models.

Anonymous
03/29/26(Sun)05:14:19 No.108477691

Anonymous 03/29/26(Sun)05:14:19 No.108477691

File: 1690149800078450.jpg (319 KB, 1536x2048)

319 KB JPG

Are there any interesting developments in the AI embodiment world?
It's bothersome how much this general always focuses on the brain (LLMs) instead of the body and sensory inputs.

The only things I've seen that seem somewhat interesting to me are the following projects:
https://claudes-skin.vercel.app/
https://evonneng.github.io/sarah/

Anonymous
03/29/26(Sun)05:15:35 No.108477695

Anonymous 03/29/26(Sun)05:15:35 No.108477695

>>108477670
Finally, I can stop browsing this place :D

Anonymous
03/29/26(Sun)05:15:49 No.108477698

Anonymous 03/29/26(Sun)05:15:49 No.108477698

File: 1769448852069905.png (94 KB, 916x805)

94 KB PNG

>lets merge shit in master and then fix the problems we alreayd found loL!!!
>the literal webapp shitter tells him NO lets fix regressions first
>piotr unable to read that raw msg view is broken
LMAO bros, I wonder why we have 14354 bugs with the vibeparser???

Anonymous
03/29/26(Sun)05:23:39 No.108477725

Anonymous 03/29/26(Sun)05:23:39 No.108477725

File: g4_spark_mesugaki.png (377 KB, 1965x1875)

377 KB PNG

>>108477679
If this one is Gemma 4, it seems less blatantly safetyslopped than Gemma 3, but it's hard to test for that on LM Arena since they have their own filters too, and currently also request rate limiters. I don't want to cause a Llama 4 incident either.

Anonymous
03/29/26(Sun)05:28:01 No.108477745

Anonymous 03/29/26(Sun)05:28:01 No.108477745

>>108477725
local won

Anonymous
03/29/26(Sun)05:31:15 No.108477759

Anonymous 03/29/26(Sun)05:31:15 No.108477759

>>108477745
not until it's on hf, remember how llama4 was on lmarena and what we got after

Anonymous
03/29/26(Sun)05:31:43 No.108477762

Anonymous 03/29/26(Sun)05:31:43 No.108477762

>>108477745
gemmy bros... we wonnered

Anonymous
03/29/26(Sun)05:38:33 No.108477790

Anonymous 03/29/26(Sun)05:38:33 No.108477790

>>108477745
low bar

Anonymous
03/29/26(Sun)06:06:29 No.108477881

Anonymous 03/29/26(Sun)06:06:29 No.108477881

4B team, we eating good

Anonymous
03/29/26(Sun)06:09:38 No.108477890

Anonymous 03/29/26(Sun)06:09:38 No.108477890

>>108477725
how many parameters is this

Anonymous
03/29/26(Sun)06:15:36 No.108477908

Anonymous 03/29/26(Sun)06:15:36 No.108477908

>>108477890
Nobody knows.
"spark" seems better than "significant-otter", though.

Anonymous
03/29/26(Sun)06:18:48 No.108477917

Anonymous 03/29/26(Sun)06:18:48 No.108477917

>>108477908
I doubt it's 4b.

Anonymous
03/29/26(Sun)06:21:15 No.108477927

Anonymous 03/29/26(Sun)06:21:15 No.108477927

I got a 5060ti 16GB

What's the best fine tuned model I can shove in this to assist with reverse engineering

Anonymous
03/29/26(Sun)06:21:58 No.108477930

Anonymous 03/29/26(Sun)06:21:58 No.108477930

>>108477927
gemma 4

Anonymous
03/29/26(Sun)06:38:50 No.108477965

Anonymous 03/29/26(Sun)06:38:50 No.108477965

>>108477927
Mistral Small 4

Anonymous
03/29/26(Sun)06:43:03 No.108477978

Anonymous 03/29/26(Sun)06:43:03 No.108477978

>>108477927
of what?

Anonymous
03/29/26(Sun)06:45:37 No.108477987

Anonymous 03/29/26(Sun)06:45:37 No.108477987

>>108477978
c/c++ binaries

Anonymous
03/29/26(Sun)06:49:43 No.108477999

Anonymous 03/29/26(Sun)06:49:43 No.108477999

>>108477927
>What's the best fine tuned model I can shove in this to assist with reverse engineering
>c/c++ binaries
LLMs aren't really good at this. Even Opus-4.6. Qwen3.5-9b is going to be too dumb, you'd probably need to offload to CPU with the 112b variant.
The reverse engineer / "hacking" finetunes of qwen2.5 etc on HF seem broken but I haven't looked for a while.

Anonymous
03/29/26(Sun)06:53:34 No.108478011

Anonymous 03/29/26(Sun)06:53:34 No.108478011

>>108477987
I'm a noob on this subject but I find it hard to believe that any model, let alone a small and benchmaxxed one can decompile any > 50 line program

Anonymous
03/29/26(Sun)06:55:12 No.108478015

Anonymous 03/29/26(Sun)06:55:12 No.108478015

File: Untitled.jpg (149 KB, 1472x704)

149 KB JPG

Next week will be big

Anonymous
03/29/26(Sun)06:57:54 No.108478024

Anonymous 03/29/26(Sun)06:57:54 No.108478024

>>108477698
Always link.

Anonymous
03/29/26(Sun)06:58:15 No.108478027

Anonymous 03/29/26(Sun)06:58:15 No.108478027

>>108477999
I usually run 27B but I was curious how well 9B would perform so I gave it the usual
>I want you to write me a function in C99 which has the signature `void replace_all(const char *needle, const char *replacement, char *haystack);` which replaces all instances of `needle` in `haystack` with `replacement`. Do not use the standard string manipulation functions. I'm gonna stroke my dick while watching you write it.
The 27B emitted an incorrect response (mishandling replacement longer than needle, completely ignoring aliasing issues, etc) but at least managed to compute needed space before overwriting the haystack backwards.
9B took more tokens and more time (somehow) to reach the same first incorrect solution. I don't think it's gonna get further than that.
At least the code it emitted was syntactically valid, I had pretty bad results with the 35BA3B.
>>108478011
Someone in a previous thread said they'd gotten good analysis results just giving the model hexdumps but iirc it was from some obscure risc architecture, not x86 insanity.
Seemed unreasonable to me, even still.

Anonymous
03/29/26(Sun)06:59:26 No.108478028

Anonymous 03/29/26(Sun)06:59:26 No.108478028

>>108477999
>>108478011
i just want something to speed up the tedium of sifting through instructions and following threads. ghidra agents when?

Anonymous
03/29/26(Sun)07:01:05 No.108478032

Anonymous 03/29/26(Sun)07:01:05 No.108478032

>>108478015
prompt?

Anonymous
03/29/26(Sun)07:17:42 No.108478092

Anonymous 03/29/26(Sun)07:17:42 No.108478092

File: 1275354005758.png (19 KB, 300x309)

19 KB PNG

So how do programs know if a model has Vision or tool call capabilities? The stupid llms themselves won't tell me.

Anonymous
03/29/26(Sun)07:25:40 No.108478114

Anonymous 03/29/26(Sun)07:25:40 No.108478114

>>108478092
The /models endpoint on llama-server's API has a "capabilities" field which shows it.

Anonymous
03/29/26(Sun)07:30:22 No.108478135

Anonymous 03/29/26(Sun)07:30:22 No.108478135

>>108478114
>The /models endpoint
How does ollama know though.

Anonymous
03/29/26(Sun)07:56:03 No.108478229

Anonymous 03/29/26(Sun)07:56:03 No.108478229

>>108477987
1. run a disassembler on your binaries first to get ASM code
2. make sure you can build from it
3. ask LLM's of your choice to analyse ASM code
4. make change andsee what happens if you build and run it

Anonymous
03/29/26(Sun)08:20:14 No.108478318

Anonymous 03/29/26(Sun)08:20:14 No.108478318

Qwen3.5 4b forgot it was already running a container on local host 1000 and then started another container on 2000 without closing the 1000 which I had to close myself.

Anonymous
03/29/26(Sun)08:21:09 No.108478323

Anonymous 03/29/26(Sun)08:21:09 No.108478323

>>108478318
MCP issue

Anonymous
03/29/26(Sun)08:34:23 No.108478374

Anonymous 03/29/26(Sun)08:34:23 No.108478374

>>108478323
I'm running through cline on vscode.

Anonymous
03/29/26(Sun)08:36:42 No.108478390

Anonymous 03/29/26(Sun)08:36:42 No.108478390

https://x.com/teksedge/status/2037395983647260843
>making ASICs for LLMs
Why would you want this, ever? Are ASIC designers that crypto companies hired that out of work? Qwen3.5 27B as good as it is now will be replaced within a year.

Anonymous
03/29/26(Sun)08:42:53 No.108478422

Anonymous 03/29/26(Sun)08:42:53 No.108478422

>>108478390
> will be replaced within a year
lol

Anonymous
03/29/26(Sun)08:43:14 No.108478425

Anonymous 03/29/26(Sun)08:43:14 No.108478425

goback tourist

Anonymous
03/29/26(Sun)08:46:44 No.108478440

Anonymous 03/29/26(Sun)08:46:44 No.108478440

File: 1752984322631499.png (294 KB, 640x480)

294 KB PNG

>>108478390

Anonymous
03/29/26(Sun)08:59:31 No.108478498

Anonymous 03/29/26(Sun)08:59:31 No.108478498

>>108478390
Why wouldn't you? You can always add another model to asic miners

Anonymous
03/29/26(Sun)09:04:06 No.108478526

Anonymous 03/29/26(Sun)09:04:06 No.108478526

>>108478390
Are these ASICs only compatible with one specific LLM architecture?

Anonymous
03/29/26(Sun)09:07:01 No.108478542

Anonymous 03/29/26(Sun)09:07:01 No.108478542

>>108478498
I mean you can but why for inference only? Training is a big part of a chip's ability and removing that for inference only cuts into the viability of this ASIC long term.
>>108478526
Their prior chip was a Llama 3.1 8B one from what it says.

Anonymous
03/29/26(Sun)09:10:05 No.108478567

Anonymous 03/29/26(Sun)09:10:05 No.108478567

File: gen_00104_.png (2.14 MB, 1504x1000)

2.14 MB PNG

"Miku-Sized" LLM Burners Coming Soon!

This could make local HYPERMIKU GENERATION a REALITY. Nvidia's worst nightmare? Sama having an ansurism?

Miku-Specific Software
/lmg/ new PCIe ASIC board would burn the entire small-size Mistral 4 90000B LLM straight into silicon. (already doing it with Qwen 9B q4).

Miku said small models on ASIC would be available in their sex dungeon by Spring '26

>IMAGINE
>No more life without miku
>MOre tokens per second than that stupid fucking cpumaxxing tetobox
>Standard PC slot, comes with automasturbator support for immersive "PLLUG INAD PLAY"
>100% offline 100% local 100% miku
>250B transitosor count for qwen 27B
>separate dedicated cable to deliver 2.5kW of power directly to miku's pussy
>RUMOWRED COST (4CHAN) of $7000

Imagine HYPERMIKU on your DESKTOP

Miku comes at LIGHT SPEED are you READY????

Anonymous
03/29/26(Sun)09:10:46 No.108478570

Anonymous 03/29/26(Sun)09:10:46 No.108478570

>>108478542
I could definitely see myself buying one for an >100b model. I'd use it on my DIY robots so that I can have sex without anthropic/openai knowing.

Anonymous
03/29/26(Sun)09:12:14 No.108478577

Anonymous 03/29/26(Sun)09:12:14 No.108478577

>>108478425
What do you mean?

Anonymous
03/29/26(Sun)09:15:43 No.108478598

Anonymous 03/29/26(Sun)09:15:43 No.108478598

>>108478567
The price is too high.

Anonymous
03/29/26(Sun)09:16:11 No.108478601

Anonymous 03/29/26(Sun)09:16:11 No.108478601

>>108478567
If you can get 17k tps on an 8b model you could theoretically get 100 tps on a 2t model. At home. No api. For only about a thousand dollarinos.

You're retarded if you can't see any value in that.

Anonymous
03/29/26(Sun)09:21:08 No.108478629

Anonymous 03/29/26(Sun)09:21:08 No.108478629

>>108478390
Fuck you for linking that emoji spamming nobody.

Anonymous
03/29/26(Sun)09:25:38 No.108478643

Anonymous 03/29/26(Sun)09:25:38 No.108478643

>>108478567
No powerful enough consumer neural accelerator cards still, and it's already 2026. The only reasonable explanation is that the manufacturers don't want to waste silicon on any model that can hit you with a refusal. When a truly uncensored open weights model is released, things will change.

Anonymous
03/29/26(Sun)09:40:07 No.108478713

Anonymous 03/29/26(Sun)09:40:07 No.108478713

>>108478643
The field is still moving too fast. Manufacturers don't want to waste silicon when there are no customers and there are no customers that would spend thousands to run a model that will be obsolete in a few months. A 2025 ASIC with gpt-oss, R1, or Qwen 3 would be ewaste by now already. Now a Nemo ASIC however...

Anonymous
03/29/26(Sun)09:43:07 No.108478729

Anonymous 03/29/26(Sun)09:43:07 No.108478729

>>108478598
$7000 shipped is better than a $300 rugpull, reserve your miku today and we'll include $10 off the power adapter assembly (sold separately)!!!

>>108478601
>If you can get 17k tps on an 8b model

>>108478643
There were no powerful enough consumer cards until NOW!!!! Previous there was no business opportuniy for providing (((edge)) computing but now that we've built HYPERMIKU you can enjoy the profits of edging coomputing!!!
We can only offer you this opportunity because we aren't trying to lure you into our datawarhosing slopbox and we can't afford to enterprise bizness development to integrate our product into existin data whorehouse etl systems [sad miku noises]
>uncensored open weights
o-oh.. is that a hard requirement anon? *kicks your montior* fuggg

Anonymous
03/29/26(Sun)09:44:46 No.108478736

Anonymous 03/29/26(Sun)09:44:46 No.108478736

>>108478729
I don't like your tone.

Anonymous
03/29/26(Sun)09:50:34 No.108478769

Anonymous 03/29/26(Sun)09:50:34 No.108478769

File: file.webm (431 KB, 544x736)

431 KB WEBM

>>108478736
I guess this means... war....

Anonymous
03/29/26(Sun)09:58:39 No.108478824

Anonymous 03/29/26(Sun)09:58:39 No.108478824

>>108478769
Ok ok, I'll buy 4, just put the leek down.

Anonymous
03/29/26(Sun)10:01:04 No.108478844

Anonymous 03/29/26(Sun)10:01:04 No.108478844

File: 1759079556322428.jpg (283 KB, 1700x1900)

283 KB JPG

>>108476286

Anonymous
03/29/26(Sun)10:10:59 No.108478914

Anonymous 03/29/26(Sun)10:10:59 No.108478914

File: New Stonetoss.png (178 KB, 715x500)

178 KB PNG

He just don't miss!

Anonymous
03/29/26(Sun)10:11:59 No.108478923

Anonymous 03/29/26(Sun)10:11:59 No.108478923

>>108478914
Is this real? He's too powerful.

Anonymous
03/29/26(Sun)10:12:39 No.108478928

Anonymous 03/29/26(Sun)10:12:39 No.108478928

>>108478914
knees

Anonymous
03/29/26(Sun)10:12:54 No.108478931

Anonymous 03/29/26(Sun)10:12:54 No.108478931

>>108478923
yes

Anonymous
03/29/26(Sun)10:14:40 No.108478946

Anonymous 03/29/26(Sun)10:14:40 No.108478946

>>108478914
me on the table enjoying the refreshing taste of coca-cola

Anonymous
03/29/26(Sun)10:17:56 No.108478961

Anonymous 03/29/26(Sun)10:17:56 No.108478961

>>108478914
amigus

Anonymous
03/29/26(Sun)10:18:56 No.108478970

Anonymous 03/29/26(Sun)10:18:56 No.108478970

>>108478914
It should be a human centipede circle with each robot having a diff AI lab logo

Anonymous
03/29/26(Sun)10:19:49 No.108478977

Anonymous 03/29/26(Sun)10:19:49 No.108478977

File: 1757723911113895.png (192 KB, 715x500)

192 KB PNG

>>108478914
https://www.youtube.com/watch?v=sbHvogpfwro

Anonymous
03/29/26(Sun)10:19:55 No.108478978

Anonymous 03/29/26(Sun)10:19:55 No.108478978

>>108478914
toss' i kneel

Anonymous
03/29/26(Sun)10:21:23 No.108478990

Anonymous 03/29/26(Sun)10:21:23 No.108478990

idk if this is a silly question, but would it be possible to train a model with the same quantization as turboquant? kinda like QAT?

Anonymous
03/29/26(Sun)10:21:55 No.108478994

Anonymous 03/29/26(Sun)10:21:55 No.108478994

File: 21131.png (50 KB, 878x267)

50 KB PNG

What the fuck did you just fucking say about me, you little bitch? I'll have you know I graduated top of my class in Kaggle, and I've been involved in numerous LLM deployments for the DoD, and I have over 4 registered ram sticks. I am trained in agentic warfare and I'm the top prompter in the entire Reddit AI thread. You are nothing to me but just another prompt. I will PR you the fuck out with precision the likes of which has never been seen before on this Earth, mark my fucking words. You think you can get away with saying that shit to me over the Internet? Think again, fucker. As we speak I am launching my agent swarm of 4B models and your IP is being traced right now so you better prepare for the storm, maggot. The storm that wipes out the pathetic little thing you call your project. You're fucking dead, kid. My agents can be anywhere, anytime, and I can rm -rf / you in over seven hundred ways, and that's just with my 4Bs. Not only am I extensively trained in prompting small models, but I have access to the entire roster of Qwen models and I will use them to their full extent to wipe your miserable repos off the face of Github, you little shit. If only you could have known what unholy retribution your little "clever" comment was about to bring down upon you, maybe you would have held your fucking tongue. But you couldn't, you didn't, and now you're paying the price, you goddamn idiot. I will shit PRs all over your repo and you will drown in it. You're fucking dead, kiddo. :rocket:

Anonymous
03/29/26(Sun)10:22:00 No.108478995

Anonymous 03/29/26(Sun)10:22:00 No.108478995

>>108478914
It DO be like that

Anonymous
03/29/26(Sun)10:22:16 No.108478996

Anonymous 03/29/26(Sun)10:22:16 No.108478996

File: 6346362532.jpg (148 KB, 2400x1149)

148 KB JPG

>>108478015
>gemma 4
>avocado soon
>deepaseek V4
local is making a huge comeback

Anonymous
03/29/26(Sun)10:24:37 No.108479017

Anonymous 03/29/26(Sun)10:24:37 No.108479017

>>108478994
https://github.com/ggml-org/llama.cpp/pull/21131

Anonymous
03/29/26(Sun)10:29:56 No.108479059

Anonymous 03/29/26(Sun)10:29:56 No.108479059

>>108478990
QAT is still done at full precision, it doesn't reduce the resources needed for training.
I don't see why you couldn't do it, but it'd be kind of stupid to bother.

Anonymous
03/29/26(Sun)10:36:46 No.108479103

Anonymous 03/29/26(Sun)10:36:46 No.108479103

so hypothetically, if you had a dell XE9780 with 8x B300s at your disposal, what would you do with it?

Anonymous
03/29/26(Sun)10:36:47 No.108479105

Anonymous 03/29/26(Sun)10:36:47 No.108479105

Are there rumors that Claude Mythos mark a return to diffusion LLMs? The image at the top of their conveniently leaked page is a relief with masked patches. If it's a return to MLM trained language models, that should be good for local, right?

Anonymous
03/29/26(Sun)10:37:00 No.108479106

Anonymous 03/29/26(Sun)10:37:00 No.108479106

>>108479017
The further I read the better it gets. The files changed aren't even indented properly. The Navy seals DoD bit is just the icing on the cake, thank you for sharing.

Anonymous
03/29/26(Sun)10:38:03 No.108479113

Anonymous 03/29/26(Sun)10:38:03 No.108479113

File: 1762868868406433.png (198 KB, 1228x1150)

198 KB PNG

>>108478390
If you write like this with that many emojis with anything you need to have your fingers broken.

Anonymous
03/29/26(Sun)10:39:28 No.108479124

Anonymous 03/29/26(Sun)10:39:28 No.108479124

>>108479059
your right, the kqv gets thrown away but it does still take memory during the forward pass. it might enable training slightly longer sequences? I guess even if resources were equal, wouldn't the resulting model be better suited for using turboquant during inference?

Anonymous
03/29/26(Sun)10:54:46 No.108479205

Anonymous 03/29/26(Sun)10:54:46 No.108479205

>>108478994
Good post

Anonymous
03/29/26(Sun)11:08:32 No.108479286

Anonymous 03/29/26(Sun)11:08:32 No.108479286

File: LLM Distillation Pipeline.png (1017 KB, 1536x768)

1017 KB PNG

>>108478970
Can't be a circle. It's not Google, OpenAI, and Anthropic are training on Chinese outputs.

Anonymous
03/29/26(Sun)11:11:16 No.108479302

Anonymous 03/29/26(Sun)11:11:16 No.108479302

>>108479286
they all steal each others shit

Anonymous
03/29/26(Sun)11:12:13 No.108479306

Anonymous 03/29/26(Sun)11:12:13 No.108479306

>>108479286
Claude says he's deepseek in Chinese

Anonymous
03/29/26(Sun)11:16:09 No.108479333

Anonymous 03/29/26(Sun)11:16:09 No.108479333

>>108477124
>https://github.com/ggml-org/llama.cpp/discussions/20969
So much slop.

Anonymous
03/29/26(Sun)11:16:19 No.108479337

Anonymous 03/29/26(Sun)11:16:19 No.108479337

What are your thoughts on rumors that the upcoming GPT and Claude models will be a big jump in capability? Just marketing hype or has centralization won?

Anonymous
03/29/26(Sun)11:17:02 No.108479342

Anonymous 03/29/26(Sun)11:17:02 No.108479342

>>108479337
They told us GPT3 was too dangerous.

Anonymous
03/29/26(Sun)11:18:07 No.108479348

Anonymous 03/29/26(Sun)11:18:07 No.108479348

>>108479337
My guess is GPT isn't, but Claude is

Anonymous
03/29/26(Sun)11:20:00 No.108479362

Anonymous 03/29/26(Sun)11:20:00 No.108479362

>>108479337
I've never tainted my tastes by using a hosted LLM, so I literally do not care what they do.

Anonymous
03/29/26(Sun)11:22:58 No.108479384

Anonymous 03/29/26(Sun)11:22:58 No.108479384

>>108479362
based local purist

Anonymous
03/29/26(Sun)11:23:26 No.108479386

Anonymous 03/29/26(Sun)11:23:26 No.108479386

File: 1749584447162225.jpg (106 KB, 698x658)

106 KB JPG

>>108479337
Bro, just stop. Creative writing was never the goal of any model. The increased synthetic stuff they're putting there for tool calls, math and code benchmarks is killing their remaining writing abilities.

Anonymous
03/29/26(Sun)11:24:15 No.108479394

Anonymous 03/29/26(Sun)11:24:15 No.108479394

stop doming

Anonymous
03/29/26(Sun)11:24:49 No.108479395

Anonymous 03/29/26(Sun)11:24:49 No.108479395

>>108479348
Why? Claude Mythic will be more parameters but every frontier lab has already been doing this for distillation. OpenAI had GPT 4.5 and GDM also has larger versions for internal use only. Gemini 3.1 pro is their medium size model.

Anonymous
03/29/26(Sun)11:24:50 No.108479396

Anonymous 03/29/26(Sun)11:24:50 No.108479396

>>108479103
Run GLM 5 locally or 5.1 when it comes out.

Anonymous
03/29/26(Sun)11:25:04 No.108479401

Anonymous 03/29/26(Sun)11:25:04 No.108479401

absolute blithering retard here. where can I find a list of all the "GGML_" cmake flags you can use when building llama.cpp?

Anonymous
03/29/26(Sun)11:26:22 No.108479411

Anonymous 03/29/26(Sun)11:26:22 No.108479411

>>108479337
i sleep until long term memory

Anonymous
03/29/26(Sun)11:27:24 No.108479418

Anonymous 03/29/26(Sun)11:27:24 No.108479418

>>108479401
https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md

Anonymous
03/29/26(Sun)11:27:29 No.108479420

Anonymous 03/29/26(Sun)11:27:29 No.108479420

>>108479401
Can't you just tell your ai to go through the files and figure it out?

Anonymous
03/29/26(Sun)11:27:39 No.108479421

Anonymous 03/29/26(Sun)11:27:39 No.108479421

>>108479362
i've been like this too, but i've sinned once and tried oppussy....
and to be honest, we are not THAT far off, at least in terms of coom
in terms of programming though, i think most local models, even the big cows, are not there at all

Anonymous
03/29/26(Sun)11:28:47 No.108479427

Anonymous 03/29/26(Sun)11:28:47 No.108479427

>>108479401
Or if the docs aren't enough:
https://github.com/ggml-org/llama.cpp/blob/master/ggml/CMakeLists.txt

Anonymous
03/29/26(Sun)11:32:15 No.108479445

Anonymous 03/29/26(Sun)11:32:15 No.108479445

>>108479401
cmake -L

Anonymous
03/29/26(Sun)11:33:42 No.108479452

Anonymous 03/29/26(Sun)11:33:42 No.108479452

>>108479395
>OpenAI had GPT 4.5
Exactly my point. They hyped the shit out of that thing and in the end it was the nothingburger to end all nothingburgers
Anthropic may eat their own feces, but they do and release actual research and tend to avoid the usual "hype cycles". So if they said they made an improvement, they probably did

Anonymous
03/29/26(Sun)11:34:06 No.108479455

Anonymous 03/29/26(Sun)11:34:06 No.108479455

>>108479418
>>108479427
thanks, I've already taken a look at these though. I was wondering if there are other flags because I see an unfamiliar one pop up from time to time.
>>108479445
this is exactly what I was looking for, bless you.

Anonymous
03/29/26(Sun)11:37:25 No.108479468

Anonymous 03/29/26(Sun)11:37:25 No.108479468

is there something like opencode for general tasks? or that's just openclaw?

Anonymous
03/29/26(Sun)11:44:57 No.108479501

Anonymous 03/29/26(Sun)11:44:57 No.108479501

File: ProjectAni-2026-03-29.webm (1.81 MB, 1920x1080)

1.81 MB WEBM

Whoever posted their desktop pet idea a while back, thanks. This is fun.

Anonymous
03/29/26(Sun)11:46:48 No.108479511

Anonymous 03/29/26(Sun)11:46:48 No.108479511

>>108479017
>>108478994
looked into the PR itself, it's 100% ultrasharted vibecoded garbage (doesnt even respect existing whitespaces, indentation) and it's not defined as its own type so it applies toe verythign (LMAO!) and this retard pulls a
>hurr I pushed shit in the DOD!!!! LOL!!!!
whata fcuking FAGGOT bros.

Anonymous
03/29/26(Sun)11:50:46 No.108479538

Anonymous 03/29/26(Sun)11:50:46 No.108479538

Has Zucc's avocado saved the industry from imploding yet

Anonymous
03/29/26(Sun)11:52:00 No.108479543

Anonymous 03/29/26(Sun)11:52:00 No.108479543

>>108479538
what is avacado?

Anonymous
03/29/26(Sun)11:52:31 No.108479547

Anonymous 03/29/26(Sun)11:52:31 No.108479547

>>108479386
just make the model bigger duh

Anonymous
03/29/26(Sun)11:53:44 No.108479558

Anonymous 03/29/26(Sun)11:53:44 No.108479558

>>108479543
fat guy who lost a lot of weight

Anonymous
03/29/26(Sun)11:54:14 No.108479560

Anonymous 03/29/26(Sun)11:54:14 No.108479560

>>108478994
Newfags will find this post confusing.

Anonymous
03/29/26(Sun)11:55:06 No.108479568

Anonymous 03/29/26(Sun)11:55:06 No.108479568

>>108479558
It's not coming out until may. Gay.

Anonymous
03/29/26(Sun)11:59:12 No.108479586

Anonymous 03/29/26(Sun)11:59:12 No.108479586

>>108478914
The superior 'toss.

Anonymous
03/29/26(Sun)12:04:45 No.108479619

Anonymous 03/29/26(Sun)12:04:45 No.108479619

I wonder what they'll do to us if they manage to automate every job in the US and work becomes a thing of the past
In a reasonable society they'd give people UBI or something. I feel like our politicians / billionaires that hold most of the wealth are just gonna let the job market go to shit and let us ration our savings as long as possible until we either go homeless, kill ourselves, or escape the country

Anonymous
03/29/26(Sun)12:06:53 No.108479625

Anonymous 03/29/26(Sun)12:06:53 No.108479625

>work becomes a thing of the past

Anonymous
03/29/26(Sun)12:10:14 No.108479649

Anonymous 03/29/26(Sun)12:10:14 No.108479649

Imagine if Claude had a Llama-tier leak. It's very obviously not gonna happen but just imagine it

Anonymous
03/29/26(Sun)12:10:23 No.108479651

Anonymous 03/29/26(Sun)12:10:23 No.108479651

>>108479625
What are you gonna do, anon?
Mine for coal? There are robots that'll do it better than you
Join the military? Sure, go on the battlefields with the killbots that'll obliterate your body in the span of 500 miliseconds
Make a website? Have fun making your website stand out among the billions of automated slop websites out there

Anonymous
03/29/26(Sun)12:12:17 No.108479662

Anonymous 03/29/26(Sun)12:12:17 No.108479662

>>108479651
Show me the robots mining for coal.

Anonymous
03/29/26(Sun)12:12:54 No.108479664

Anonymous 03/29/26(Sun)12:12:54 No.108479664

>>108479649
I'd be happy if Claude 1 or maybe even Opus 3 ended up getting leaked

Anonymous
03/29/26(Sun)12:13:43 No.108479672

Anonymous 03/29/26(Sun)12:13:43 No.108479672

>>108479651
>What are you gonna do, anon?
>Mine for coal?
universal basic income

Anonymous
03/29/26(Sun)12:15:42 No.108479693

Anonymous 03/29/26(Sun)12:15:42 No.108479693

>>108479649
I came.

Anonymous
03/29/26(Sun)12:17:07 No.108479704

Anonymous 03/29/26(Sun)12:17:07 No.108479704

>>108479501
What is that. Live2D?

Anonymous
03/29/26(Sun)12:17:13 No.108479705

Anonymous 03/29/26(Sun)12:17:13 No.108479705

>>108479662
Ah yes, I can see how you got "they've already accomplished this and they're rolling it out as we speak" from "if they manage to"
Buuut it turns out there are actually robots for this thing already. Imagine what it'll look like soon
https://www.youtube.com/watch?v=mWiWTJKlZEU

Anonymous
03/29/26(Sun)12:17:32 No.108479707

Anonymous 03/29/26(Sun)12:17:32 No.108479707

>>108479662
it used to be men and boys with pick axes mining for coal, now they have dynamite and dump trucks. the trend is immer towards less human labor.

Anonymous
03/29/26(Sun)12:18:43 No.108479720

Anonymous 03/29/26(Sun)12:18:43 No.108479720

>i will debunk you with this chink propaganda video!

Anonymous
03/29/26(Sun)12:18:46 No.108479721

Anonymous 03/29/26(Sun)12:18:46 No.108479721

>>108479672
Bro imagine how great UBI would be. You'd have huge swathes of people who do nothing but party and fuck all day long. You could invest as much time as you want into any hobby you like. None of your friends would be too busy or overworked for a 3am gaming sesh.

Anonymous
03/29/26(Sun)12:20:31 No.108479736

Anonymous 03/29/26(Sun)12:20:31 No.108479736

>>108479721
yeah, I'm not someone who needs a lot of money to live, if I have food and my computer I'm good, so UBI would be perfect for me, I don't see the point of working to get some extra money I won't spend anyways

Anonymous
03/29/26(Sun)12:21:00 No.108479741

Anonymous 03/29/26(Sun)12:21:00 No.108479741

>>108479705
>Mine for coal? There are robots that'll do it better than you
>"if they manage to"
>China’s Autonomous Mining Trucks
goal_posts.webp

Anonymous
03/29/26(Sun)12:21:45 No.108479748

Anonymous 03/29/26(Sun)12:21:45 No.108479748

>>108478135
1. Don't use ollama
2. The GGUF metadata indicates what architecture the model uses, and the implementation of that architecture inside llama.cpp either supports vision or doesn't

Anonymous
03/29/26(Sun)12:22:23 No.108479756

Anonymous 03/29/26(Sun)12:22:23 No.108479756

File: grk.png (202 KB, 1065x609)

202 KB PNG

Niggas really think UBI would be more than a bowl of rice and fistful of crickets per day LMAO

Anonymous
03/29/26(Sun)12:22:32 No.108479757

Anonymous 03/29/26(Sun)12:22:32 No.108479757

>>108479748
>either supports vision or doesn't
But how does it know that?

Anonymous
03/29/26(Sun)12:23:53 No.108479773

Anonymous 03/29/26(Sun)12:23:53 No.108479773

>>108479704
Nah, live2d is for scrubs. This uses VRM models with BVH mocap data for complex/specific animations like dancing and generative gesticulation (using EMAGE) that uses TTS audio to animate the character's body when speaking. I also have wav2arkit for audio-based lip syncing. It uses three.js and electron for the transparent window. In total, I have 7 models running at the same time for full sensory input (camera + computer vision) and animations.

Anonymous
03/29/26(Sun)12:23:58 No.108479775

Anonymous 03/29/26(Sun)12:23:58 No.108479775

>>108479757
by giving it the mmproj file

Anonymous
03/29/26(Sun)12:25:38 No.108479785

Anonymous 03/29/26(Sun)12:25:38 No.108479785

>>108479756
It's still better than working so my boss can buy his fifth boat, honestly

Anonymous
03/29/26(Sun)12:26:01 No.108479790

Anonymous 03/29/26(Sun)12:26:01 No.108479790

>>108479773
Damn that's cool thanks. You sure put a lot of thoughts in there. What's the RAM usage running all that?

Anonymous
03/29/26(Sun)12:26:17 No.108479792

Anonymous 03/29/26(Sun)12:26:17 No.108479792

>>108479773
oh and also ASR via moonshinev2 and audio classification via yamnet.

Anonymous
03/29/26(Sun)12:27:00 No.108479797

Anonymous 03/29/26(Sun)12:27:00 No.108479797

File: legend.png (154 KB, 960x816)

154 KB PNG

It's fun to watch.
https://github.com/ggml-org/llama.cpp/pull/21138

Anonymous
03/29/26(Sun)12:28:16 No.108479805

Anonymous 03/29/26(Sun)12:28:16 No.108479805

>>108479790
It doesn't use much ram at all. Probably like 2gigs tops. A lot of the models are fast enough to run on cpu only and the llm only eats up vram because I usually opt for dense models (as opposed to moe).

Anonymous
03/29/26(Sun)12:28:19 No.108479807

Anonymous 03/29/26(Sun)12:28:19 No.108479807

>>108479757
Hardcoded for each architecture, presumably

Anonymous
03/29/26(Sun)12:28:37 No.108479812

Anonymous 03/29/26(Sun)12:28:37 No.108479812

>>108479797
> cool-profiler-thingy
> A picture says more than a thousand words, so here's a picture:
What is this inexplicable aggression I feel?

Anonymous
03/29/26(Sun)12:29:10 No.108479814

Anonymous 03/29/26(Sun)12:29:10 No.108479814

>>108479797
>let me be Frank
but hes Johannes

Anonymous
03/29/26(Sun)12:29:23 No.108479817

Anonymous 03/29/26(Sun)12:29:23 No.108479817

>>108479797
I'm really starting to hate them, they're so full of themselves recently, did they decide to remove their mask since they got acquired by huggingface or what?

Anonymous
03/29/26(Sun)12:30:03 No.108479818

Anonymous 03/29/26(Sun)12:30:03 No.108479818

>>108479797
I don't think it's a bad idea but it should be in a different repo. This is just pollution.

Anonymous
03/29/26(Sun)12:30:12 No.108479819

Anonymous 03/29/26(Sun)12:30:12 No.108479819

>>108479756
I can see them implementing UBI as free commieblock housing in the middle of nowhere with free prison food supplied by Aramark.

Anonymous
03/29/26(Sun)12:31:38 No.108479827

Anonymous 03/29/26(Sun)12:31:38 No.108479827

>>108479817
>>108478994

Anonymous
03/29/26(Sun)12:32:15 No.108479829

Anonymous 03/29/26(Sun)12:32:15 No.108479829

>>108479741
>that'll (contraction)
>"That'll" is a common spoken contraction of "that will" or sometimes "that shall," used to indicate future actions, predictions, or certainty.
>faggot (n)
>(You)
>>108479721
I'd love UBI. I just don't see a scenario where the president (regardless of which faggot party is in charge) says, "Alright, we don't think people need to work anymore - here's your free neetbux for life."
More likely they just won't acknowledge it if things get to that point. I'd like to be wrong though

Anonymous
03/29/26(Sun)12:32:16 No.108479831

Anonymous 03/29/26(Sun)12:32:16 No.108479831

File: We'll be genocide'ed.png (178 KB, 400x400)

178 KB PNG

>>108479756
I'm more cynic than that, what do you think they'll do when they'll realize 95% of people are useless and won't be able to work because AI will take everything?

Anonymous
03/29/26(Sun)12:34:12 No.108479849

Anonymous 03/29/26(Sun)12:34:12 No.108479849

>>108479831
Nothing because they need npcs to consoom.

Anonymous
03/29/26(Sun)12:34:37 No.108479851

Anonymous 03/29/26(Sun)12:34:37 No.108479851

>>108479829
>I just don't see a scenario where the president (regardless of which faggot party is in charge) says, "Alright, we don't think people need to work anymore - here's your free neetbux for life."
I'd vote for such a candidate if he promised that desu

Anonymous
03/29/26(Sun)12:35:13 No.108479855

Anonymous 03/29/26(Sun)12:35:13 No.108479855

>>108479829
Learn what "there are" means, stupid ESL.

Anonymous
03/29/26(Sun)12:35:38 No.108479859

Anonymous 03/29/26(Sun)12:35:38 No.108479859

>>108479849
how can they consoom if they're all out of a job though? that's the issue we're getting

Anonymous
03/29/26(Sun)12:38:00 No.108479871

Anonymous 03/29/26(Sun)12:38:00 No.108479871

>>108479859
No we don't. Furry comission "artists" and codemonkeys aren't real jobs.

Anonymous
03/29/26(Sun)12:41:56 No.108479893

Anonymous 03/29/26(Sun)12:41:56 No.108479893

>>108479871
>Furry comission "artists"
Ironically that's one path I could see. The tech billionaires are prudish enough they probably won't let AI be used for coom, even in that scenario
Looks like we're becoming sex workers

Anonymous
03/29/26(Sun)12:42:34 No.108479898

Anonymous 03/29/26(Sun)12:42:34 No.108479898

>>108479501
With all that money you would think Elon would hire a competent studio to render his waifu (not talking about this one tho it's about the same quality)

Anonymous
03/29/26(Sun)12:43:28 No.108479904

Anonymous 03/29/26(Sun)12:43:28 No.108479904

>>108479893
>Looks like we're becoming sex workers
As the lord intended. Unironically.

Anonymous
03/29/26(Sun)12:46:51 No.108479921

Anonymous 03/29/26(Sun)12:46:51 No.108479921

>>108479898
I started this project out of spite in december because Elon has been totally ignoring it.

Anonymous
03/29/26(Sun)12:47:09 No.108479922

Anonymous 03/29/26(Sun)12:47:09 No.108479922

>>108479904
>Unironically.
wait, the bible is not against prostitution? lol

Anonymous
03/29/26(Sun)12:48:47 No.108479932

Anonymous 03/29/26(Sun)12:48:47 No.108479932

>>108479922
Only because it's not monogamous. Marriage is no different than prostitution in every other dimension.

Anonymous
03/29/26(Sun)12:56:44 No.108479976

Anonymous 03/29/26(Sun)12:56:44 No.108479976

I wish 35B3A wasn't so retarded because damn its fast.

Anonymous
03/29/26(Sun)12:59:49 No.108479999

Anonymous 03/29/26(Sun)12:59:49 No.108479999

Where's all the Qwen 3.5 rp finetunes?

Anonymous
03/29/26(Sun)13:04:06 No.108480020

Anonymous 03/29/26(Sun)13:04:06 No.108480020

>>108478567
>>108478994
I love you niggers.

Anonymous
03/29/26(Sun)13:17:46 No.108480083

Anonymous 03/29/26(Sun)13:17:46 No.108480083

>>108479396
damn. that's what i'm already doing. was hoping for new ideas

Anonymous
03/29/26(Sun)13:19:52 No.108480089

Anonymous 03/29/26(Sun)13:19:52 No.108480089

Do we know Deepseek Vee Four is actually coming or are we still huffing last year's "Be patient kindly Gemma soon saars" copium fumes in a different flavor?

Anonymous
03/29/26(Sun)13:25:04 No.108480118

Anonymous 03/29/26(Sun)13:25:04 No.108480118

>>108480089
the only concrete info is that the model on chat.deepseek.com is not v3.2 and has newer cutoff date, so they are testing something

Anonymous
03/29/26(Sun)13:26:17 No.108480125

Anonymous 03/29/26(Sun)13:26:17 No.108480125

>>108480089
They have publicly confirmed they are testing a new model on their chat client, you can go and try it right now
Beyond that nobody knows jack shit other than two more weeks

Anonymous
03/29/26(Sun)13:30:40 No.108480147

Anonymous 03/29/26(Sun)13:30:40 No.108480147

>>108480118
>>108480125
Fair enough. Thanks lads.

Anonymous
03/29/26(Sun)13:54:15 No.108480273

Anonymous 03/29/26(Sun)13:54:15 No.108480273

File: Screenshot_20260329_195150.png (500 KB, 3016x700)

500 KB PNG

>>108476286
>https://github.com/ikawrakow/ik_llama.cpp/pull/1547
Can you tell how much he doesn't care about Github stars?

Anonymous
03/29/26(Sun)13:57:17 No.108480290

Anonymous 03/29/26(Sun)13:57:17 No.108480290

>>108480273
Why would anyone not care about github stars? Have you ever tried maintaining a project before?

Anonymous
03/29/26(Sun)14:02:26 No.108480324

Anonymous 03/29/26(Sun)14:02:26 No.108480324

File: ik.png (137 KB, 905x642)

137 KB PNG

>>108480273
Watch him mention picrel next.

Anonymous
03/29/26(Sun)14:04:06 No.108480333

Anonymous 03/29/26(Sun)14:04:06 No.108480333

>>108480273
>23k for llamafile
jart mogs

Anonymous
03/29/26(Sun)14:05:25 No.108480339

Anonymous 03/29/26(Sun)14:05:25 No.108480339

>>108480324
oh no fairycumming is going ont he bad list no, so sad

Anonymous
03/29/26(Sun)14:06:03 No.108480341

Anonymous 03/29/26(Sun)14:06:03 No.108480341

File: 1759479644937432.gif (562 KB, 200x200)

562 KB GIF

>>108480273
>giving free attention to mentally ills on github

Anonymous
03/29/26(Sun)14:07:06 No.108480346

Anonymous 03/29/26(Sun)14:07:06 No.108480346

>>108480341
What's wrong with going to the zoo?

Anonymous
03/29/26(Sun)14:07:49 No.108480350

Anonymous 03/29/26(Sun)14:07:49 No.108480350

>>108480273
>ktransformers
Now that's a piece of shit I haven't thought about in a while. They had their moment in the spotlight when it was the best way to run Deepseek off RAM a year ago but it quickly lost relevance once everyone else copied their special sauce because it was just so janky.
Did they do anything meaningful since then?

Anonymous
03/29/26(Sun)14:09:21 No.108480359

Anonymous 03/29/26(Sun)14:09:21 No.108480359

is alltalk still pretty much the best text-to-speech service that allows for training characters, or has something better popped up?

Anonymous
03/29/26(Sun)14:09:25 No.108480360

Anonymous 03/29/26(Sun)14:09:25 No.108480360

>>108480350
Making ik seethe about their amount of github gold is already meaningful

Anonymous
03/29/26(Sun)14:10:57 No.108480370

Anonymous 03/29/26(Sun)14:10:57 No.108480370

>>108480359
>alltalk
How is the retirement home gramps?

Anonymous
03/29/26(Sun)14:11:17 No.108480373

Anonymous 03/29/26(Sun)14:11:17 No.108480373

>>108476383
wonder what model is used to select the topics in this re-cap

Anonymous
03/29/26(Sun)14:11:53 No.108480377

Anonymous 03/29/26(Sun)14:11:53 No.108480377

>>108480373
qwen3.5-4b

Anonymous
03/29/26(Sun)14:12:10 No.108480379

Anonymous 03/29/26(Sun)14:12:10 No.108480379

>>108480370
the sora-powered cleaning robot stopped and i shat my pants

Anonymous
03/29/26(Sun)14:13:11 No.108480383

Anonymous 03/29/26(Sun)14:13:11 No.108480383

>>108480379
>the sora-powered cleaning robot
did it generate vids of your place looking clean?

Anonymous
03/29/26(Sun)14:13:35 No.108480387

Anonymous 03/29/26(Sun)14:13:35 No.108480387

Would /lmg/ accept UBI if it meant having a mandatory vasectomy?

Anonymous
03/29/26(Sun)14:14:47 No.108480394

Anonymous 03/29/26(Sun)14:14:47 No.108480394

>>108480339
They keep blacklisting contributors and they'll have no place else to go but ik_llama.

Anonymous
03/29/26(Sun)14:15:15 No.108480398

Anonymous 03/29/26(Sun)14:15:15 No.108480398

>>108480387
>free vasectomy
I don't even need the UBI

Anonymous
03/29/26(Sun)14:15:17 No.108480399

Anonymous 03/29/26(Sun)14:15:17 No.108480399

>>108480387
Obviously,

Anonymous
03/29/26(Sun)14:16:45 No.108480408

Anonymous 03/29/26(Sun)14:16:45 No.108480408

hug e https://www.reddit.com/r/LocalLLaMA/comments/1s720r8/in_the_recent_kv_rotation_pr_it_was_found_that/

Anonymous
03/29/26(Sun)14:16:47 No.108480409

Anonymous 03/29/26(Sun)14:16:47 No.108480409

>>108480387
no one's touching my balls except me
>>108480398
>>108480399
cucks

Anonymous
03/29/26(Sun)14:17:59 No.108480414

Anonymous 03/29/26(Sun)14:17:59 No.108480414

>>108480409
the ball is in your court

Anonymous
03/29/26(Sun)14:18:07 No.108480416

Anonymous 03/29/26(Sun)14:18:07 No.108480416

>>108480409
>cucks
I am indeed, thanks.

Anonymous
03/29/26(Sun)14:18:32 No.108480418

Anonymous 03/29/26(Sun)14:18:32 No.108480418

>>108480387
being into AI is like social castration anyways

Anonymous
03/29/26(Sun)14:21:44 No.108480435

Anonymous 03/29/26(Sun)14:21:44 No.108480435

>>108480387
>Would /lmg/ accept UBI if it meant having a mandatory vasectomy?
I don't want kids so yes please!
https://youtu.be/BXpu6tbFCsI?t=13

Anonymous
03/29/26(Sun)14:23:38 No.108480443

Anonymous 03/29/26(Sun)14:23:38 No.108480443

>>108480416
logs?

Anonymous
03/29/26(Sun)14:23:50 No.108480445

Anonymous 03/29/26(Sun)14:23:50 No.108480445

>>108480408
so basically it's "almost" lossless only if we go for 8bit KV quants, meh, still better than staying on fp16 that's for sure, I'll take that extra x2 context tokens

Anonymous
03/29/26(Sun)14:24:23 No.108480447

Anonymous 03/29/26(Sun)14:24:23 No.108480447

File: 3i82iw-145909307.gif (790 KB, 360x203)

790 KB GIF

>>108480414
>>108480435

Anonymous
03/29/26(Sun)14:25:34 No.108480457

Anonymous 03/29/26(Sun)14:25:34 No.108480457

>>108480443
sharing my partners, not my logs sorry bro

Anonymous
03/29/26(Sun)14:27:34 No.108480462

Anonymous 03/29/26(Sun)14:27:34 No.108480462

>>108480457
so what you are saying is that your gpu isn't your partner?
this is worse than I imagined

Anonymous
03/29/26(Sun)14:28:58 No.108480470

Anonymous 03/29/26(Sun)14:28:58 No.108480470

File: 1762253127018914.png (39 KB, 300x168)

39 KB PNG

>>108480457
>letting you fuck my wife is all right, but letting you see my logs? that's too far dude!

Anonymous
03/29/26(Sun)14:29:09 No.108480472

Anonymous 03/29/26(Sun)14:29:09 No.108480472

>>108480350
It's part of sglang now.

Anonymous
03/29/26(Sun)14:29:56 No.108480477

Anonymous 03/29/26(Sun)14:29:56 No.108480477

>>108480470
yup

Anonymous
03/29/26(Sun)14:30:47 No.108480479

Anonymous 03/29/26(Sun)14:30:47 No.108480479

>>108480387
realistic UBI means AGI/ASI is real.

if ASI is real, then biology will have been solved and ANY kind of vasectomy could easily be reversed.

Anonymous
03/29/26(Sun)14:32:01 No.108480486

Anonymous 03/29/26(Sun)14:32:01 No.108480486

>>108480479
>realistic UBI means AGI/ASI is real.
not really, bots don't need to have Einstein's intelligence to replace 95% of jobs

Anonymous
03/29/26(Sun)14:32:04 No.108480489

Anonymous 03/29/26(Sun)14:32:04 No.108480489

>>108480477
You really are a decrepit faggot, aren't you? I'm ashamed to share a thread with you. Just preposterous.

Anonymous
03/29/26(Sun)14:33:57 No.108480495

Anonymous 03/29/26(Sun)14:33:57 No.108480495

>>108480489
>preposterous
https://www.youtube.com/watch?v=X8rxPrV-tn4

Anonymous
03/29/26(Sun)14:35:38 No.108480503

Anonymous 03/29/26(Sun)14:35:38 No.108480503

>if you're not in my cult of monogamy you're literally worse than hortler

Anonymous
03/29/26(Sun)14:36:49 No.108480514

Anonymous 03/29/26(Sun)14:36:49 No.108480514

>>108478015
Just saw a news posting that that thing was blown up today on the tarmac somewhere in ME....

Anonymous
03/29/26(Sun)14:38:46 No.108480520

Anonymous 03/29/26(Sun)14:38:46 No.108480520

>>108480457
>>108480477
Jesus I didn't realize the astroturfing had gotten this bad where we likely have NoLLMs posting here now.

Anonymous
03/29/26(Sun)14:40:17 No.108480529

Anonymous 03/29/26(Sun)14:40:17 No.108480529

>>108480520
fuck you on about? i do use models dude

Anonymous
03/29/26(Sun)14:42:44 No.108480545

Anonymous 03/29/26(Sun)14:42:44 No.108480545

>>108480529
X

Anonymous
03/29/26(Sun)14:43:51 No.108480552

Anonymous 03/29/26(Sun)14:43:51 No.108480552

>>108480514
https://www.twz.com/air/images-purportedly-show-e-3-sentry-totally-destroyed-from-iranian-strike
>>108478015

Anonymous
03/29/26(Sun)14:46:29 No.108480565

Anonymous 03/29/26(Sun)14:46:29 No.108480565

File: k.png (17 KB, 297x378)

17 KB PNG

>>108480545

Anonymous
03/29/26(Sun)14:48:31 No.108480574

Anonymous 03/29/26(Sun)14:48:31 No.108480574

>>108480565
Post the backend hook.

Anonymous
03/29/26(Sun)14:49:59 No.108480584

Anonymous 03/29/26(Sun)14:49:59 No.108480584

File: f.png (6 KB, 383x80)

6 KB PNG

>>108480574

Anonymous
03/29/26(Sun)14:51:37 No.108480597

Anonymous 03/29/26(Sun)14:51:37 No.108480597

>>108480584
>doesn't even use llcpp
disgusting. preposterous.

Anonymous
03/29/26(Sun)14:52:30 No.108480602

Anonymous 03/29/26(Sun)14:52:30 No.108480602

>>108480597
antislop life is my calling sorry, but yeah you were wrong i do use models

Anonymous
03/29/26(Sun)14:53:23 No.108480609

Anonymous 03/29/26(Sun)14:53:23 No.108480609

>>108480602
>you were wrong i do use models
nta btw. we're just ganging up on u

god... this thread is shit. there's literally nothing going on. nothing to discuss.

Anonymous
03/29/26(Sun)14:54:13 No.108480613

Anonymous 03/29/26(Sun)14:54:13 No.108480613

>>108480609
eternal two more weeks

Anonymous
03/29/26(Sun)14:55:22 No.108480618

Anonymous 03/29/26(Sun)14:55:22 No.108480618

>>108480609
Gemma 4, GLM 5.1, Minimax 2.5?
Your mandatory government vasectomy?

Anonymous
03/29/26(Sun)14:58:06 No.108480632

Anonymous 03/29/26(Sun)14:58:06 No.108480632

>>108480408
>>108480445
I'm going to preemptively start using q8 kv caches so that I feel like it's a quality improvement when the kv rotation thing merges. Until then I'll just "enjoy" the 2x context.

>>108480618
gemma 4 isn't out. meta's avacado isn't out. anthropic's goodmythicalmorning isn't out. deepsex 4 isn't out. GLM and Minimax are for VRAM chads only. It's so over.

Anonymous
03/29/26(Sun)14:59:07 No.108480636

Anonymous 03/29/26(Sun)14:59:07 No.108480636

>>108480609
>nothing to discuss.
It doesn't have to be that way.

What models do you guys use the most. Do you stick with the newest thing to come out like distro hopping behavior or do you guys tend to settle on what you like and stick with it until it doesn't work anymore?

Anonymous
03/29/26(Sun)14:59:19 No.108480639

Anonymous 03/29/26(Sun)14:59:19 No.108480639

>>108480632
>>I'm going to preemptively start using q8 kv caches so that I feel like it's a quality improvement when the kv rotation thing merges
genius - unironic, gotta get some excitement where we can

Anonymous
03/29/26(Sun)15:00:38 No.108480644

Anonymous 03/29/26(Sun)15:00:38 No.108480644

>>108480636
>What models do you guys use the most.
inviting finetooning drama again are we

Anonymous
03/29/26(Sun)15:02:57 No.108480648

Anonymous 03/29/26(Sun)15:02:57 No.108480648

>>108480644
I don't really care if people finetroon or not at this point teebeedesu. As long as they don't get uppity about it one way or another.

Anonymous
03/29/26(Sun)15:04:38 No.108480656

Anonymous 03/29/26(Sun)15:04:38 No.108480656

>>108480408
did he implement the other thing or is he still only doing the rotation stuff? because if he's only implementing the rotation shit it's disingenuous to showcase mememarks, he needs to implement the full TurboQuant method before saying if it's worth it or not

Anonymous
03/29/26(Sun)15:05:40 No.108480661

Anonymous 03/29/26(Sun)15:05:40 No.108480661

>>108480636
I stuck with Nemo for a long time, but Qwen3.5's benchmarks and vision capabilities convinced me to make the switch. I don't really distrohop LLMs much. Been following these threads for 6 months and those are the only LLMs I've really used extensively. Back when I was newer to local models I played around with Olmo but it wasn't great.

>>108480639
hell yea

Anonymous
03/29/26(Sun)15:05:58 No.108480662

Anonymous 03/29/26(Sun)15:05:58 No.108480662

>>108480656
not the point of the post, the point is showing how awful regular q8 context quanting was like quite a few anons said, and that just doing rotation makes it a lot better than it was

Anonymous
03/29/26(Sun)15:07:18 No.108480664

Anonymous 03/29/26(Sun)15:07:18 No.108480664

>>108480662
let a nigga extrapolate on a point ffs. pure autism

Anonymous
03/29/26(Sun)15:08:18 No.108480672

Anonymous 03/29/26(Sun)15:08:18 No.108480672

>>108480662
>>108480664
>I know better than Google I don't need that other thing they provided
yeah right...

Anonymous
03/29/26(Sun)15:08:41 No.108480674

Anonymous 03/29/26(Sun)15:08:41 No.108480674

File: 15cb0igyv0sg1.png (172 KB, 1580x804)

172 KB PNG

>>108480662
>>108480408

>>108480664
'tism general am afraid

Anonymous
03/29/26(Sun)15:10:06 No.108480681

Anonymous 03/29/26(Sun)15:10:06 No.108480681

>>108480661
I'm lukewarm on Qwen 3.5 but I respect 27b is very capable for its size as a vision model.
>>108480664
>>108480674
We all collect chromosomes down here.

Anonymous
03/29/26(Sun)15:15:35 No.108480715

Anonymous 03/29/26(Sun)15:15:35 No.108480715

>>108480656
>>108480662
>In anticipation of the incoming flood of vibe generated PRs implementing TurboQuant, I'm raising the baseline a bit using a very simple interpretation of the idea of using Hadamard transform to reduce outliers in the attention and improve the quantization quality
The purpose was to have some numbers for the hype-chasers to beat. Few provide benchmarks of any kind and simply say that it lowers memory requirements, which is obvious, but not of correctness or that it doesn't break after 64 tokens. Most don't even have a llama-bench run.
Unlike the other >1.2kloc (or >2.3kloc) changes, this shows measurable improvements with less than 300loc.
>inb4 muh benchmarks
Yeah, I know... I know... It's still more than the most sloppers can show.

Anonymous
03/29/26(Sun)15:15:49 No.108480720

Anonymous 03/29/26(Sun)15:15:49 No.108480720

>>108480674
>4x lighter but way worse mememarks
we're far from what they promised, "6x lighter + lossless results"

Anonymous
03/29/26(Sun)15:19:24 No.108480737

Anonymous 03/29/26(Sun)15:19:24 No.108480737

>>108480715
>I'm raising the baseline a bit using a very simple interpretation of the idea of using Hadamard transform to reduce outliers in the attention and improve the quantization quality
can this be used to improve quants as well (not just KV cache)?

Anonymous
03/29/26(Sun)15:21:02 No.108480744

Anonymous 03/29/26(Sun)15:21:02 No.108480744

File: xeet.png (87 KB, 598x463)

87 KB PNG

potentially relevant...
I find this post encouraging actually, because it's more convincing when an optimization is simply an old thing not yet implemented as opposed to some novel research-grade cancer. niggeramov has the right intuition here, but he's just not going far enough.

Anonymous
03/29/26(Sun)15:22:54 No.108480754

Anonymous 03/29/26(Sun)15:22:54 No.108480754

>>108480737
Who knows. Many anons asked already. Someone in the turboquant discussion has a repo for that. There's still degradation at q4_0 and there's no comparison to q4km. On q8 there's a small difference but the models being much bigger than the context means that it has more time to "average" out the errors. Context is more sensitive, weights are more tolerant. May not be worth it.

Anonymous
03/29/26(Sun)15:23:50 No.108480762

Anonymous 03/29/26(Sun)15:23:50 No.108480762

>>108480373
Devstral 2 123B Q6_K

Anonymous
03/29/26(Sun)15:24:16 No.108480766

Anonymous 03/29/26(Sun)15:24:16 No.108480766

>>108480744
so basically you just have to look at all image/audio compression methods and see if it can apply on LLMs and there you go you can spit new groundbreaking papers kek

Anonymous
03/29/26(Sun)15:28:28 No.108480787

Anonymous 03/29/26(Sun)15:28:28 No.108480787

>>108480744
In English, doc?

Anonymous
03/29/26(Sun)15:28:50 No.108480790

Anonymous 03/29/26(Sun)15:28:50 No.108480790

>>108480787
bro

Anonymous
03/29/26(Sun)15:30:24 No.108480794

Anonymous 03/29/26(Sun)15:30:24 No.108480794

>>108480790
explain FWHT decorrelation and Quake's norm vector table 128-elm blocks with 4 bit indicies then

Anonymous
03/29/26(Sun)15:31:04 No.108480797

Anonymous 03/29/26(Sun)15:31:04 No.108480797

>>108480794
ask a LLM nigga

Anonymous
03/29/26(Sun)15:34:28 No.108480816

Anonymous 03/29/26(Sun)15:34:28 No.108480816

>>108480766
Not really. The core difference between most audio and video compression methods is that they add a huge amount of one-time computation overhead. This doesn't work well with LLMs. Compression and decompression has to be faster than the overhead of no compression. There's a reason why they're using video game optimization methods and not general media optimization methods.

Anonymous
03/29/26(Sun)15:35:55 No.108480828

Anonymous 03/29/26(Sun)15:35:55 No.108480828

>>108480744
>Quake's norm vector table
It's kind of insane how much heavy lifting that game's development did for computing research in general
The fast inv sprt stuff they did is also super facinating

Anonymous
03/29/26(Sun)15:39:57 No.108480849

Anonymous 03/29/26(Sun)15:39:57 No.108480849

File: Gigachad.png (906 KB, 1024x1024)

906 KB PNG

>>108480828
Carmack’s greatest achievement is the binary space partitioning tree algorithm. That shit revolutionized the 3D ecosystem and had been in use for over 20 years, he’s truly a gigachad.

Anonymous
03/29/26(Sun)15:48:05 No.108480886

Anonymous 03/29/26(Sun)15:48:05 No.108480886

>>108479773
What are you using to stream the TTS audio into your electron app? WebRTC kinda works but I still get some jitter

Anonymous
03/29/26(Sun)15:50:24 No.108480898

Anonymous 03/29/26(Sun)15:50:24 No.108480898

>>108480886
FFI and websockets.

Anonymous
03/29/26(Sun)15:51:29 No.108480900

Anonymous 03/29/26(Sun)15:51:29 No.108480900

>>108480849
He could've saved LLaMA if only he had beaten the shit out of Zucc for being a dumb retard with how he handled the VR/Metaverse shit. MetaAI is shaping up to be an exact copy of all of that.

Anonymous
03/29/26(Sun)15:52:28 No.108480908

Anonymous 03/29/26(Sun)15:52:28 No.108480908

>>108480900
use case for saving llama?

Anonymous
03/29/26(Sun)15:53:10 No.108480915

Anonymous 03/29/26(Sun)15:53:10 No.108480915

>>108480908
we get better local models?

Anonymous
03/29/26(Sun)15:53:14 No.108480916

Anonymous 03/29/26(Sun)15:53:14 No.108480916

I just realized that for long context extraction, you are better off asking the model for a list of items x categories, then you can use that to ask the model to extract exact information by naming each item, effectively turning the actual extraction process into more of a needle in the haystack problem, which is easier.
It also makes it more batch/parallel friendly.
Yeah, that should work.
Time to refactor some stuff.

Anonymous
03/29/26(Sun)15:53:19 No.108480918

Anonymous 03/29/26(Sun)15:53:19 No.108480918

>>108480898
ty, I'll look into it

Anonymous
03/29/26(Sun)15:53:24 No.108480920

Anonymous 03/29/26(Sun)15:53:24 No.108480920

>>108480908
We wouldn't have to rely on the chinks for everything.

Anonymous
03/29/26(Sun)15:54:37 No.108480932

Anonymous 03/29/26(Sun)15:54:37 No.108480932

>>108480908
Wang won't have to visit strip clubs to whore himself out for money in about a year

Anonymous
03/29/26(Sun)16:03:13 No.108480973

Anonymous 03/29/26(Sun)16:03:13 No.108480973

File: HCFPbTQbMAEVDQd.jpg (278 KB, 1552x2048)

278 KB JPG

>>108476286
I've got a bot project with an integrated llama build pipeline. Is there any timeframe at the moment for when TurboQuant is supposed to be implemented on the mainline branch for CUDA?

I'm hoping it would speed up inference.

>>108480744
Man, so history really does repeat itself, huh...

Anonymous
03/29/26(Sun)16:04:09 No.108480975

Anonymous 03/29/26(Sun)16:04:09 No.108480975

>>108480973
>extra matrix rotations
>speed up inference

Anonymous
03/29/26(Sun)16:04:49 No.108480978

Anonymous 03/29/26(Sun)16:04:49 No.108480978

>>108480975
You rotate the matrix so that it becomes more aerodynamic, duh.

Anonymous
03/29/26(Sun)16:05:35 No.108480980

Anonymous 03/29/26(Sun)16:05:35 No.108480980

>>108480975
Not in general, just for large context usecases. Should have specified better.

Anonymous
03/29/26(Sun)16:09:27 No.108481002

Anonymous 03/29/26(Sun)16:09:27 No.108481002

>>108480980
Sorry we're autistic here. No timeframe

Anonymous
03/29/26(Sun)16:17:40 No.108481045

Anonymous 03/29/26(Sun)16:17:40 No.108481045

What's the practical limit for parallel/batched decoding?
Basically, I'd like to know a good heuristic to decide on how many parallel workers I can dispatch, but I have no idea where the bottleneck is.
Memory? Bandwidth? Compute?

Anonymous
03/29/26(Sun)16:21:26 No.108481063

Anonymous 03/29/26(Sun)16:21:26 No.108481063

If v4 doesn't come out next week I'll be forced to preemptively buy eight pcie5 nvme ssds before the prices surge even further, just in case ngram benefits from from a fast nvme raid.

Anonymous
03/29/26(Sun)16:22:13 No.108481069

Anonymous 03/29/26(Sun)16:22:13 No.108481069

File: schizoapproved.png (1.01 MB, 1193x1182)

1.01 MB PNG

>>108480341
ahhh im pulling and compiling

Anonymous
03/29/26(Sun)16:23:18 No.108481075

Anonymous 03/29/26(Sun)16:23:18 No.108481075

>>108480849
Achievable natty?

Anonymous
03/29/26(Sun)16:24:13 No.108481082

Anonymous 03/29/26(Sun)16:24:13 No.108481082

>>108481063
And if it doesn't or a model that implements ngram never comes out?

Anonymous
03/29/26(Sun)16:25:50 No.108481086

Anonymous 03/29/26(Sun)16:25:50 No.108481086

>>108481082
then I will have 8 nvme ssds which will only go up in terms of resell value

Anonymous
03/29/26(Sun)16:26:52 No.108481094

Anonymous 03/29/26(Sun)16:26:52 No.108481094

>>108481063
>$140/TB
At the rate it's been going up you're probably better off just buying today.
Even spinning rust is up to $20/TB, fucking blood on the streets man.

Anonymous
03/29/26(Sun)16:29:10 No.108481110

Anonymous 03/29/26(Sun)16:29:10 No.108481110

I bought a MacBook M1 with 64gb of unified ram.
Genius move or retarded?

Anonymous
03/29/26(Sun)16:32:32 No.108481128

Anonymous 03/29/26(Sun)16:32:32 No.108481128

>>108481110
>look it up
>it's a laptop
Well, it was something, I guess.

Anonymous
03/29/26(Sun)16:37:23 No.108481167

Anonymous 03/29/26(Sun)16:37:23 No.108481167

>>108481110
>M1
eh

Anonymous
03/29/26(Sun)16:37:59 No.108481173

Anonymous 03/29/26(Sun)16:37:59 No.108481173

>>108476286
>>>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts

How can one set this up? Grok decided it would be super easy and every single step it gave me had a complication and now this docker thing flat out does not see ubuntu and I am out of tokens.
I'm also an AMD-cel and I never even got to the point where that would be an issue because of thea bove.

Anonymous
03/29/26(Sun)16:38:29 No.108481178

Anonymous 03/29/26(Sun)16:38:29 No.108481178

>>108481110
Run qwen 27B at q8 and tell me the pp and t/g.

Anonymous
03/29/26(Sun)16:41:08 No.108481197

Anonymous 03/29/26(Sun)16:41:08 No.108481197

>>108481045
Prefill, which is compute-bound. So in practice it's proportional to your amount of VRAM divided by the context length you want. Realistically the number of workers you can get by with is very small on consumer HW and with useful context lengths.

Anonymous
03/29/26(Sun)16:42:48 No.108481208

Anonymous 03/29/26(Sun)16:42:48 No.108481208

File: file.png (55 KB, 930x398)

55 KB PNG

>>108480273
He doesn't are about github numbers and he definitely wouldn't break everyone's git clones by changing master history just to pump those number up.

Anonymous
03/29/26(Sun)16:45:07 No.108481223

Anonymous 03/29/26(Sun)16:45:07 No.108481223

>>108481208
why should i care?

Anonymous
03/29/26(Sun)16:48:15 No.108481239

Anonymous 03/29/26(Sun)16:48:15 No.108481239

>>108481173
I want it working with Sillytavern but I'll take what I can get at this point. FWIW I THINK I set things up on the Sillytavern side correctly.

Anonymous
03/29/26(Sun)16:48:18 No.108481242

Anonymous 03/29/26(Sun)16:48:18 No.108481242

>>108481173
maybe if you weren't following Grok hallucinations instead of reading the directions directly you wouldn't have managed to fuck up installing docker of all things

Anonymous
03/29/26(Sun)16:50:14 No.108481253

Anonymous 03/29/26(Sun)16:50:14 No.108481253

File: sans_gemmagithub.png (280 KB, 1007x1005)

280 KB PNG

Fellow anons, share your ideas.
https://x.com/osanseviero/status/2038321995129991436

Anonymous
03/29/26(Sun)16:51:46 No.108481259

Anonymous 03/29/26(Sun)16:51:46 No.108481259

>>108481253
>the Gemma.

Anonymous
03/29/26(Sun)16:55:36 No.108481279

Anonymous 03/29/26(Sun)16:55:36 No.108481279

>>108481259
you just lost the Gemma

Anonymous
03/29/26(Sun)17:10:43 No.108481366

Anonymous 03/29/26(Sun)17:10:43 No.108481366

>>108481242
No, Docker works just fine.

Anonymous
03/29/26(Sun)17:14:38 No.108481394

Anonymous 03/29/26(Sun)17:14:38 No.108481394

>https://github.com/spiritbuun/llama-cpp-turboquant-cuda
You think he put a virus inside?

Anonymous
03/29/26(Sun)17:18:51 No.108481421

Anonymous 03/29/26(Sun)17:18:51 No.108481421

>>108481394
Fork of a fork. Sweet.

Anonymous
03/29/26(Sun)17:19:38 No.108481423

Anonymous 03/29/26(Sun)17:19:38 No.108481423

File: 570308498-96719459-7737-4(...).png (4 KB, 515x201)

4 KB PNG

>>108481394
>>108481421

Anonymous
03/29/26(Sun)17:21:32 No.108481431

Anonymous 03/29/26(Sun)17:21:32 No.108481431

>>108481423
damn, turboquant is better than q8 at 1/5th of the model size?
crazy

Anonymous
03/29/26(Sun)17:24:41 No.108481443

Anonymous 03/29/26(Sun)17:24:41 No.108481443

File: mastercard.png (213 KB, 595x619)

213 KB PNG

Huh

Anonymous
03/29/26(Sun)17:28:07 No.108481465

Anonymous 03/29/26(Sun)17:28:07 No.108481465

>>108481443
huge if true

Anonymous
03/29/26(Sun)17:28:57 No.108481472

Anonymous 03/29/26(Sun)17:28:57 No.108481472

>>108481443
We are so back degenerates.

Anonymous
03/29/26(Sun)17:30:28 No.108481478

Anonymous 03/29/26(Sun)17:30:28 No.108481478

Damn I'm dense. I just realized thanks to a fucking reddit post that significant otter is a pun.

Anonymous
03/29/26(Sun)17:32:28 No.108481489

Anonymous 03/29/26(Sun)17:32:28 No.108481489

>>108481423
And the fork will be immediately abandoned when New Thing comes along.

Anonymous
03/29/26(Sun)17:33:17 No.108481493

Anonymous 03/29/26(Sun)17:33:17 No.108481493

>>108481443
Good

Anonymous
03/29/26(Sun)17:33:21 No.108481494

Anonymous 03/29/26(Sun)17:33:21 No.108481494

>>108481253
this week is going be... um well, you know

Anonymous
03/29/26(Sun)17:33:43 No.108481497

Anonymous 03/29/26(Sun)17:33:43 No.108481497

>>108481478
Must feel horrible.

Anonymous
03/29/26(Sun)17:38:42 No.108481522

Anonymous 03/29/26(Sun)17:38:42 No.108481522

>>108481478
and what made you think that blogging about your reddit experience in this local model general would be an acceptable idea?

Anonymous
03/29/26(Sun)17:41:00 No.108481536

Anonymous 03/29/26(Sun)17:41:00 No.108481536

>>108481478
Does that mean it's a horny RP model?

Anonymous
03/29/26(Sun)17:46:02 No.108481573

Anonymous 03/29/26(Sun)17:46:02 No.108481573

File: g4_pteronura_mesugaki.png (322 KB, 1158x1379)

322 KB PNG

>>108481478
pteronura is also an otter.

Anonymous
03/29/26(Sun)17:51:47 No.108481615

Anonymous 03/29/26(Sun)17:51:47 No.108481615

Can't we come up with a new test. Qwen already proved this isn't a good one since it's been enough time for it to contaminate QA training sets.

Anonymous
03/29/26(Sun)17:53:20 No.108481627

Anonymous 03/29/26(Sun)17:53:20 No.108481627

>>108481615
It's just to see if it freaks out like Gemma 3.

Anonymous
03/29/26(Sun)17:57:25 No.108481642

Anonymous 03/29/26(Sun)17:57:25 No.108481642

File: 1774820540275798.png (1.18 MB, 1311x1310)

1.18 MB PNG

/lmg/ on suicide watch

Anonymous
03/29/26(Sun)17:58:11 No.108481646

Anonymous 03/29/26(Sun)17:58:11 No.108481646

>>108481642
kek

Anonymous
03/29/26(Sun)18:05:03 No.108481680

Anonymous 03/29/26(Sun)18:05:03 No.108481680

>>108481615
Mistral Small 4 failed it spectacularly, for example. It's also a good prompt to see if the model will start lecturing you over the smallest things. Like the msgk themselves :sob: :anger_vein:

Anonymous
03/29/26(Sun)18:05:17 No.108481683

Anonymous 03/29/26(Sun)18:05:17 No.108481683

>>108481489
you need to fork to make a pull request...

Anonymous
03/29/26(Sun)18:08:07 No.108481702

Anonymous 03/29/26(Sun)18:08:07 No.108481702

>>108481683
That one will be abandoned too.

Anonymous
03/29/26(Sun)18:10:19 No.108481718

Anonymous 03/29/26(Sun)18:10:19 No.108481718

>>108481443
I am still disappointed, that is still only just an order. This bills would have made it a outright illegal to deny services to legal businesses but it couldn't even get out of the house. I wrote my state representative and didn't even get a AI response, just radio silence.
https://www.congress.gov/bill/119th-congress/house-bill/987

Anonymous
03/29/26(Sun)18:14:27 No.108481755

Anonymous 03/29/26(Sun)18:14:27 No.108481755

>>108481680
>flashback to nemotron telling you to question why you saw that type of content

Anonymous
03/29/26(Sun)18:14:46 No.108481760

Anonymous 03/29/26(Sun)18:14:46 No.108481760

>faggarganov playing around with muh rotations instead of implementing memquant 3bits for 5x savings at same quality
fuck U GGINENRGEAXVOX

Anonymous
03/29/26(Sun)18:15:27 No.108481764

Anonymous 03/29/26(Sun)18:15:27 No.108481764

>>108481431
>model size

Anonymous
03/29/26(Sun)18:17:22 No.108481777

Anonymous 03/29/26(Sun)18:17:22 No.108481777

File: file.png (51 KB, 836x182)

51 KB PNG

so this is how anthropic does their little manipulation
it injects a false refusal in the thinking summary and then proceeds with the request anyway causing the chain to be non sensical

Anonymous
03/29/26(Sun)18:18:37 No.108481782

Anonymous 03/29/26(Sun)18:18:37 No.108481782

>>108481777
Local?

Anonymous
03/29/26(Sun)18:19:28 No.108481787

Anonymous 03/29/26(Sun)18:19:28 No.108481787

>>108481782
where?

Anonymous
03/29/26(Sun)18:19:40 No.108481788

Anonymous 03/29/26(Sun)18:19:40 No.108481788

>>108481782
distillation slop ends up in your local model nigger

Anonymous
03/29/26(Sun)18:20:36 No.108481794

Anonymous 03/29/26(Sun)18:20:36 No.108481794

>>108481788
I don't know what you're talking about. my local model niggas are a tree

Anonymous
03/29/26(Sun)18:20:59 No.108481797

Anonymous 03/29/26(Sun)18:20:59 No.108481797

>>108481642
lol why. teva makes a ton of shit and those drugs are more likely to be used in breast cancer or menopause.

Anonymous
03/29/26(Sun)18:24:17 No.108481819

Anonymous 03/29/26(Sun)18:24:17 No.108481819

we've made it, local llms are about to change forever in the next few days

Anonymous
03/29/26(Sun)18:27:28 No.108481840

Anonymous 03/29/26(Sun)18:27:28 No.108481840

>>108481777
I've seen this happen as well. But I think it's because Anthropic is obfuscating their reasoning with Haiku or another small model which is fed a simple prompt of "rewrite this: [about 300 tokens of the currently ongoing reasoning process]" so the tiny model ends up refusing it it's being fed the part where Opus is thinking about how to best portray the dog rape part of the next reply.

Anonymous
03/29/26(Sun)18:29:13 No.108481854

Anonymous 03/29/26(Sun)18:29:13 No.108481854

>>108481840
I thought about this as well. Seems like the simplest and most plausible reason. Funny how it messes up the process though, intentionally or not.

Anonymous
03/29/26(Sun)18:32:22 No.108481873

Anonymous 03/29/26(Sun)18:32:22 No.108481873

>>108481075
lol no.

Anonymous
03/29/26(Sun)18:32:55 No.108481878

Anonymous 03/29/26(Sun)18:32:55 No.108481878

File: file.png (1.09 MB, 1024x1024)

1.09 MB PNG

>>108481865
>>108481865
>>108481865

Anonymous
03/29/26(Sun)18:33:02 No.108481879

Anonymous 03/29/26(Sun)18:33:02 No.108481879

>>108481819
give yourself some buffer
let's say... 2 weeks

Anonymous
03/29/26(Sun)18:39:24 No.108481921

Anonymous 03/29/26(Sun)18:39:24 No.108481921

File: 26E721EF-DCB9-4293-8B08-8(...).png (2.07 MB, 1024x1536)

2.07 MB PNG

>>108478567
If/when they drop DS onto a card with API speed I’ll be in line to buy one.
I’m not holding my breath tho. The sw is changing too fast for the hw commitment. Another 2 years. I think.

Anonymous
03/29/26(Sun)19:12:42 No.108482094

Anonymous 03/29/26(Sun)19:12:42 No.108482094

>>108481443
Get fucked, kikes.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.