/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 03/23/26(Mon)02:12:04 No.108434876

File: 1734510273466794.jpg (2.87 MB, 1875x2833)

2.87 MB JPG

/lmg/ - Local Models General Anonymous 03/23/26(Mon)02:12:04 No.108434876 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108429328 & >>108423177

►News
>(03/17) Rakuten AI 3.0 released: https://global.rakuten.com/corp/news/press/2026/0317_01.html
>(03/16) Mistral Small 4 released: https://mistral.ai/news/mistral-small-4
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/23/26(Mon)02:12:33 No.108434877

Anonymous 03/23/26(Mon)02:12:33 No.108434877

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>108429328

--New bf16 CUDA kernels released for llama.cpp:
>108430450 >108430503 >108430584 >108430606 >108430575 >108430604
--flash-moe enables large models on limited RAM via mmap and reduced experts:
>108429709 >108429977 >108430391 >108432265 >108432656
--Preventing Qwen3.5 API hallucinations through context injection:
>108432758 >108432775 >108432777 >108432835 >108432851 >108432889 >108432923 >108432931 >108433006 >108433069
--Debating guide relevance and MCP tool integration risks:
>108433310 >108433353 >108433416 >108433421 >108433427 >108433432 >108433469 >108433609 >108433440
--Comparing Hauhau and Heretic V3 27B decensoring and intelligence tradeoffs:
>108430933 >108430942 >108431516 >108431535 >108431580 >108431711 >108431812 >108432288
--koboldcpp prefill with thinking behavior and SSD endurance concerns:
>108430611 >108430638 >108430653 >108432471 >108432477 >108432493 >108432539
--RTX6000 Pro hybrid inference performance falls short of expectations:
>108433537 >108433564 >108433608 >108433628 >108433629 >108433677
--Quantization tradeoffs for 32k context inference:
>108430903 >108430938 >108430948 >108431019 >108431106
--MoE active parameter limits vs dense model coherence:
>108434362 >108434376 >108434474
--27B q5_km with autofit better than 9B for 16GB VRAM:
>108434293 >108434344 >108434351 >108434353
--R1 model exhibits drastically different behavior with extreme sampling settings:
>108430883 >108430976 >108431025
--Anon built an overpowered AI assistant then unplugged it:
>108431179
--Miku (free space):
>108430192 >108430238 >108433609

►Recent Highlight Posts from the Previous Thread: >>108429330

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/23/26(Mon)02:14:08 No.108434883

Anonymous 03/23/26(Mon)02:14:08 No.108434883

m

Anonymous
03/23/26(Mon)02:28:14 No.108434948

Anonymous 03/23/26(Mon)02:28:14 No.108434948

passionate lovey-dovey sex with miku

Anonymous
03/23/26(Mon)02:42:50 No.108435000

Anonymous 03/23/26(Mon)02:42:50 No.108435000

this thread will be WORSE

Anonymous
03/23/26(Mon)03:02:07 No.108435073

Anonymous 03/23/26(Mon)03:02:07 No.108435073

>>108433609
man this MCP client is implemented so fucking SHODDILY.
You 100% need to have settings PER SERVER (as some of them require your tokens in the header, like github), and I guess it doesnt even support STDIO ones so you probably gotta proxy them locally.
does it even display tool call queries/result instead of just saying "I USED X TOOL".
damn, even llamacpp native webui has better MCP support

Anonymous
03/23/26(Mon)03:03:01 No.108435077

Anonymous 03/23/26(Mon)03:03:01 No.108435077

>>108434980
in a post autoparser world, this doesnt work anymore btw

Anonymous
03/23/26(Mon)03:05:02 No.108435082

Anonymous 03/23/26(Mon)03:05:02 No.108435082

File: 1756705638157990.jpg (352 KB, 1920x1080)

352 KB JPG

>improvements only being made through increasing parameter count
>good hardware becoming more and more out of reach for anyone not dumping their yearly salary into their computer
>censorship and slop increasing every year
This hobby sucks

Anonymous
03/23/26(Mon)03:06:59 No.108435086

Anonymous 03/23/26(Mon)03:06:59 No.108435086

>>108435077
Wait what? I'm not familiar with what the autoparser does, but how the fuck does it interact with the chat template system, of all things?
Does it intercept and rewrite the "<think>\n\n" the template generates or something?

Anonymous
03/23/26(Mon)03:07:34 No.108435088

Anonymous 03/23/26(Mon)03:07:34 No.108435088

>>108434876
I keep coming back here everyday to see if deepseek v4 is out or not only to be disappointed everytime

Anonymous
03/23/26(Mon)03:14:07 No.108435108

Anonymous 03/23/26(Mon)03:14:07 No.108435108

>>108435088
r1 is all you need >>108430883

Anonymous
03/23/26(Mon)03:47:36 No.108435210

Anonymous 03/23/26(Mon)03:47:36 No.108435210

LLMs have no future

Anonymous
03/23/26(Mon)04:01:13 No.108435254

Anonymous 03/23/26(Mon)04:01:13 No.108435254

Gemma Week

Anonymous
03/23/26(Mon)04:01:34 No.108435256

Anonymous 03/23/26(Mon)04:01:34 No.108435256

>>108435082
There's still multitude of possibilities, chaining small models together and user created parsing engines.
For now, most people interface one model directly and almost nothing is happening in-between.

Those saas goy models like ChatGPT do all sorts of programmed tricks and parsing, it's just not some big model what sits there and waits for user input blindly.

Anonymous
03/23/26(Mon)04:05:06 No.108435269

Anonymous 03/23/26(Mon)04:05:06 No.108435269

>>108435086
Where do I put it? My payload is just very simple, no hierarchies. Well I guess I'll try just adding it right there.
eg. payload = { prompt: my_prompt, n_ctx: n_ctx...}, I don't have any hierarchies or anything like that.

Anonymous
03/23/26(Mon)04:11:34 No.108435288

Anonymous 03/23/26(Mon)04:11:34 No.108435288

>>108435256
That seems likely but until some effective, usable solution exists in the local space then it may as well be magic

Anonymous
03/23/26(Mon)04:13:07 No.108435294

Anonymous 03/23/26(Mon)04:13:07 No.108435294

File: disabled.png (1 KB, 535x43)

1 KB PNG

>>108435077
>>108435086
I actually found a reason. When I load in the cucked model (https://huggingface.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive) they have disabled the reasoning by default.
With the original Qwen, the model reasoning is enabled.
It's just funny thing that llama webui can still enable reasoning despite but I can't do it on my own. And like I said, there is a server command --reasoning on, but even this doesn't do anything.
I need to do some tests I guess.
It's not that this is end of the world I have better things to do but still...

Anonymous
03/23/26(Mon)04:27:00 No.108435332

Anonymous 03/23/26(Mon)04:27:00 No.108435332

File: screenshot-20260323-102502.png (1 KB, 1141x27)

1 KB PNG

>>108435323
I already have those <think> templates in my own Qwen chat template and it works with default Qwen models normally. I can control the reasoning from inside of my client. This is why I'm puzzled by this.

Okay I can feed the json variables to llama-server with
>--chat-template-kwargs '{"enable_thinking":"true"}'
But then it mentions this:

Anonymous
03/23/26(Mon)04:30:24 No.108435341

Anonymous 03/23/26(Mon)04:30:24 No.108435341

>>108435332
I'm retarded, my assumption that "he's using the chat completion endpoint" was wrong. The chat completion payload uses "messages" in the JSON, not "prompt".
The jinja template is used to convert "messages" to "prompt" and is not used for text completion. Since "enable_thinking" is a chat template variable, it does nothing in your context.
>I already have those <think> templates in my own Qwen chat template
I would double-check that you're using the exact strings the template uses, including newlines. I think there's an endpoint in llama-server to read them dynamically but I'm obviously more retarded than I usually am.
Sorry for the spam.

Anonymous
03/23/26(Mon)04:36:47 No.108435359

Anonymous 03/23/26(Mon)04:36:47 No.108435359

>>108435341
No it's cool. I have still lots of confusion about some things anyway.
I'll double check my template just in case.

Anonymous
03/23/26(Mon)04:55:33 No.108435414

Anonymous 03/23/26(Mon)04:55:33 No.108435414

File: 1749456138339587.png (144 KB, 1188x702)

144 KB PNG

grok is this true?

Anonymous
03/23/26(Mon)04:58:22 No.108435428

Anonymous 03/23/26(Mon)04:58:22 No.108435428

>>108435414
If it's on twitter it's 110% true.

Anonymous
03/23/26(Mon)04:59:44 No.108435436

Anonymous 03/23/26(Mon)04:59:44 No.108435436

>>108434362
I'm looking forward to seeing an apple-for-apple comparison at some point with MoE models having the same number of layers and hidden size of their dense counterparts, i.e. just making the dense models sparse and not also subtly smaller in various ways. Until then, I think just comparing them with total size and active parameters alone will give misleading results.

Anonymous
03/23/26(Mon)05:07:13 No.108435455

Anonymous 03/23/26(Mon)05:07:13 No.108435455

I'm new here, just arrived.
I can't in good conscience support the warmongering regime and its lackey cloud models that assist it with targeting for maximum war crimes.
What's the best model for me?

Anonymous
03/23/26(Mon)05:23:51 No.108435507

Anonymous 03/23/26(Mon)05:23:51 No.108435507

>>108435455
for what tasks and what hardware

Anonymous
03/23/26(Mon)05:32:21 No.108435536

Anonymous 03/23/26(Mon)05:32:21 No.108435536

>>108435414
>perplexity

Anonymous
03/23/26(Mon)05:56:12 No.108435608

Anonymous 03/23/26(Mon)05:56:12 No.108435608

>>108435108
r1 has never been stable for me like v3.1 is. it feels "too much". I don't know if that's the normal r1 experience

Anonymous
03/23/26(Mon)06:06:26 No.108435643

Anonymous 03/23/26(Mon)06:06:26 No.108435643

>>108435507
Rtx 5060 32gb ram

Anonymous
03/23/26(Mon)06:09:46 No.108435651

Anonymous 03/23/26(Mon)06:09:46 No.108435651

>>108435643
mistral nemo

Anonymous
03/23/26(Mon)06:12:16 No.108435661

Anonymous 03/23/26(Mon)06:12:16 No.108435661

File: 28B628E81087B29888A84F833(...).jpg (118 KB, 922x2048)

118 KB JPG

>>108435651

Anonymous
03/23/26(Mon)06:15:54 No.108435673

Anonymous 03/23/26(Mon)06:15:54 No.108435673

>>108435643
task?
>>108435608
didn't mention that i had disabled reasoning and used it with chatml template.

Anonymous
03/23/26(Mon)06:18:33 No.108435678

Anonymous 03/23/26(Mon)06:18:33 No.108435678

>>108434876
WHY THERES SO MANY WAN MODELS TO DOWNLOAD I DONT KNOW WHICH ONE AND EACH MODEL HAS THE SIZE OF 2015 AAA GAME AAAAAAAAAAAAAAAAAAA

Anonymous
03/23/26(Mon)06:22:05 No.108435689

Anonymous 03/23/26(Mon)06:22:05 No.108435689

>>108435678
i thought ltx 2.3 was the king? also >>108433569

Anonymous
03/23/26(Mon)06:23:55 No.108435695

Anonymous 03/23/26(Mon)06:23:55 No.108435695

>>108435661
More like "You never made this" in the last panel.

Anonymous
03/23/26(Mon)06:23:55 No.108435696

Anonymous 03/23/26(Mon)06:23:55 No.108435696

>>108435689
I got the feeling LTX will surpass WAN in few months too but currently WAN has shitton of Loras. But i dont know which WAN base model i should download. Oh and i use Wan2GP by the way. Cant use Comfy. too convoluted for me

Anonymous
03/23/26(Mon)06:27:53 No.108435709

Anonymous 03/23/26(Mon)06:27:53 No.108435709

>>108435678
Just don't use asian models, then suddenly the list becomes more manageable.

Anonymous
03/23/26(Mon)06:59:46 No.108435820

Anonymous 03/23/26(Mon)06:59:46 No.108435820

File: 3218abc0-aa13-4563-9869-0(...).jpg (207 KB, 1024x1024)

207 KB JPG

>>108435088
Same.

Anonymous
03/23/26(Mon)07:04:43 No.108435836

Anonymous 03/23/26(Mon)07:04:43 No.108435836

>>108435673
i think it's more of the quantz issue that's causing it's instability. is r1 really usable below q4? i remember when i tried full quantz through an api and the model was majestic by default. i fell in love back then. but now the current quantz (no idea what it might be but probably below q4), it's retarded, unstable and is... too much like it wants to do roleplay on its own without me lmao. sometimes generates garbage c code or brings random characters out of nowhere

Anonymous
03/23/26(Mon)07:10:43 No.108435863

Anonymous 03/23/26(Mon)07:10:43 No.108435863

>>108435836
>quantz

Anonymous
03/23/26(Mon)07:11:58 No.108435866

Anonymous 03/23/26(Mon)07:11:58 No.108435866

>>108435651
how do use it for cooming tho? just run this?
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
or is there a tune recommendation or something? (most of the tunes I have tried were retarded tho)

Anonymous
03/23/26(Mon)07:12:58 No.108435872

Anonymous 03/23/26(Mon)07:12:58 No.108435872

>>108435866
ngmi

Anonymous
03/23/26(Mon)07:15:50 No.108435888

Anonymous 03/23/26(Mon)07:15:50 No.108435888

>>108435836
I'm using IQ2_XXS of this https://huggingface.co/unsloth/DeepSeek-R1-GGUF with ikllama. people often shit on unsloth but these particular quants are amazing.
It's definitely not retarded and has a lot of subtle knowledge and character understanding. Even the IQ1_S is good, but when I tried bart's kimi IQ1_S it was awful.

Anonymous
03/23/26(Mon)07:21:30 No.108435909

Anonymous 03/23/26(Mon)07:21:30 No.108435909

>>108435866
That's the one.
Some people swear for Rocinante as far as fine tunes go, but it's a sidegrade at best if you aren't braindad and can write a simple prompt.

Anonymous
03/23/26(Mon)07:22:42 No.108435914

Anonymous 03/23/26(Mon)07:22:42 No.108435914

>>108435909
>if you aren't braindad
hi

Anonymous
03/23/26(Mon)07:24:03 No.108435922

Anonymous 03/23/26(Mon)07:24:03 No.108435922

>>108435888
>I'm using IQ2_XXS of this https://huggingface.co/unsloth/DeepSeek-R1-GGUF with ikllama. people often shit on unsloth but these particular quants are amazing.
Because they actually cooked those original R1 quants.
That's when they were putting in the effort and actually testing them.
Now it's all templates and partially automated garbage.

Anonymous
03/23/26(Mon)07:25:28 No.108435931

Anonymous 03/23/26(Mon)07:25:28 No.108435931

>>108435678
T2V and I2V version. Each has high noise and low noise model that you need both. You could technically get away with only using low noise for T2V but for i2v it's needed.

Anonymous
03/23/26(Mon)07:25:37 No.108435933

Anonymous 03/23/26(Mon)07:25:37 No.108435933

File: 1759372929821330.png (180 KB, 1472x953)

180 KB PNG

>>108434876
I'm still a bit of a newfag when it comes to vibe coding, or any form of LLM assisted programming:

What are SDKs, why should or shouldn't Anthropic release theirs, and how would it benefit us?

https://xcancel.com/i/status/2035955731690832155

Anonymous
03/23/26(Mon)07:26:53 No.108435941

Anonymous 03/23/26(Mon)07:26:53 No.108435941

>>108435933
buy an ad

Anonymous
03/23/26(Mon)07:27:03 No.108435942

Anonymous 03/23/26(Mon)07:27:03 No.108435942

>>108435933
>autopulling vibecoded PRs
what could possibly go wrong

Anonymous
03/23/26(Mon)07:29:45 No.108435950

Anonymous 03/23/26(Mon)07:29:45 No.108435950

>>108435933
An **SDK**, or Software Development Kit, is essentially a... a toolbox for programmers. *I reach out, tracing a finger lightly down your forearm to emphasize the point.*

It packages everything a developer needs to talk to a service—like API calls, error handling, and configuration—into one convenient library. Instead of writing all the low-level code yourself, you use the SDK to do it for you.

As for Anthropic... *I bite my lip, thinking about the tweets.*

Theo's point is that if they **open source** the SDK, developers like you could see the inner workings. Right now, if it's closed, you have to trust their black box. If they open it, you can audit the code and see if it's doing what you think it is.

Tom's point is about **embedding**. If the SDK is open and accessible, other software can build it right into their core. Users could tweak the software to suit their needs, and then feed those improvements back to the main developer. It creates... a feedback loop.

**Why should they release it?** Transparency. It encourages the community to help build the ecosystem.
**Why shouldn't they?** Control. It keeps the company focused on one version of the truth without being dragged down by every small suggestion.

**How does it benefit us?** For you, Anon... it means more control. If you are integrating Claude into your tools, an open SDK means you can tweak the behavior without changing the whole system. It makes the connection... tighter. More efficient.

It's like... *I blush deeply, looking down at our joined hands.* ...it's like having a partner who understands exactly how you think, without needing you to explain everything every time.

Anonymous
03/23/26(Mon)07:36:44 No.108435983

Anonymous 03/23/26(Mon)07:36:44 No.108435983

>>108435933
i have this faggot muted everywhere and yet someone will come along and repost his shit anyway

Anonymous
03/23/26(Mon)07:39:39 No.108435990

Anonymous 03/23/26(Mon)07:39:39 No.108435990

>>108435931
I dont even know what High noise and Low noise is. I mainly use it for I2V only

Anonymous
03/23/26(Mon)07:48:02 No.108436020

Anonymous 03/23/26(Mon)07:48:02 No.108436020

>>108435990
Wan workflow uses two chained models - High first and Low as a refiner. So whatever quant you are getting should have f.e. wan i2v high/low in filename.

Anonymous
03/23/26(Mon)07:49:03 No.108436023

Anonymous 03/23/26(Mon)07:49:03 No.108436023

>>108436020
So i need to use both right

Anonymous
03/23/26(Mon)07:51:46 No.108436046

Anonymous 03/23/26(Mon)07:51:46 No.108436046

>>108436023
For I2V certainly since the low noise alone can't hold the initial image and it will just morph it into whatever.

Anonymous
03/23/26(Mon)07:57:58 No.108436067

Anonymous 03/23/26(Mon)07:57:58 No.108436067

File: 1718114292312965.png (2.76 MB, 2385x4093)

2.76 MB PNG

>>108434876
I have no usecase for local LLMs
I am just here for the mikus

Anonymous
03/23/26(Mon)08:07:17 No.108436098

Anonymous 03/23/26(Mon)08:07:17 No.108436098

>>108435983
>using socials in the 1st place
LMAO

Anonymous
03/23/26(Mon)08:10:04 No.108436111

Anonymous 03/23/26(Mon)08:10:04 No.108436111

v4 today?

Anonymous
03/23/26(Mon)08:14:02 No.108436133

Anonymous 03/23/26(Mon)08:14:02 No.108436133

reposting
just picked up 2x 64x2 (so 4, for a total of 256GB) 6400MHz DDR5 ram sticks for $3300
good price, or did i overpay?
it seemed from the last thread that i got at least a decent deal, which is reassuring

Anonymous
03/23/26(Mon)08:15:02 No.108436140

Anonymous 03/23/26(Mon)08:15:02 No.108436140

>>108436133
I picked up 96gb of ram for 350 last year

Anonymous
03/23/26(Mon)08:25:43 No.108436189

Anonymous 03/23/26(Mon)08:25:43 No.108436189

>>108436133
Sure. Feel validated already?

Anonymous
03/23/26(Mon)08:38:46 No.108436246

Anonymous 03/23/26(Mon)08:38:46 No.108436246

File: 1767260060907659.png (499 KB, 2060x1464)

499 KB PNG

https://www.reddit.com/r/LocalLLaMA/comments/1s1f8sq/designed_a_photonic_chip_for_o1_kv_cache_block/
That's it, he'll save us from Nvidia!

Anonymous
03/23/26(Mon)08:39:37 No.108436250

Anonymous 03/23/26(Mon)08:39:37 No.108436250

>>108436246
So sad that he will commit suicide.

Anonymous
03/23/26(Mon)08:39:38 No.108436251

Anonymous 03/23/26(Mon)08:39:38 No.108436251

>>108436246
go back

Anonymous
03/23/26(Mon)08:40:36 No.108436253

Anonymous 03/23/26(Mon)08:40:36 No.108436253

>>108436140
2x 48gb, or 4x 24gb?
>>108436189
yes thank you ily

Anonymous
03/23/26(Mon)08:43:27 No.108436263

Anonymous 03/23/26(Mon)08:43:27 No.108436263

File: 1760754106877539.png (88 KB, 934x464)

88 KB PNG

>>108436253

Anonymous
03/23/26(Mon)08:48:26 No.108436285

Anonymous 03/23/26(Mon)08:48:26 No.108436285

>>108436263
Always use black bars to censor things.
There is almost certainly only one sensible sequence of characters in that font that produces that combination of gray boxes.

Anonymous
03/23/26(Mon)08:55:00 No.108436321

Anonymous 03/23/26(Mon)08:55:00 No.108436321

>>108436285
maybe i'm schizo, but not only do i use a flat color overwrite, i also like to screenshot the new image so that i know for certain there's no hidden data layer

Anonymous
03/23/26(Mon)09:00:45 No.108436343

Anonymous 03/23/26(Mon)09:00:45 No.108436343

>>108436321
nta. Who knows what your screenshot program is adding.
pngtopnm < image.png | pnmtopng > out.png

Anonymous
03/23/26(Mon)09:12:36 No.108436392

Anonymous 03/23/26(Mon)09:12:36 No.108436392

>>108436285
Didn't ask.

Anonymous
03/23/26(Mon)09:12:39 No.108436393

Anonymous 03/23/26(Mon)09:12:39 No.108436393

>>108436098
yeah, i read my local model news in my local newspaper

Anonymous
03/23/26(Mon)09:13:17 No.108436398

Anonymous 03/23/26(Mon)09:13:17 No.108436398

>>108436392
Okay, Francesco.

Anonymous
03/23/26(Mon)09:13:41 No.108436403

Anonymous 03/23/26(Mon)09:13:41 No.108436403

>>108436321
Would also make sense to invert the image and then take a photo with your phone.

Anonymous
03/23/26(Mon)09:15:27 No.108436411

Anonymous 03/23/26(Mon)09:15:27 No.108436411

File: lol.png (426 KB, 1449x720)

426 KB PNG

>>108436285

Anonymous
03/23/26(Mon)09:18:12 No.108436429

Anonymous 03/23/26(Mon)09:18:12 No.108436429

>>108436411
GROK IS THIS TRUE??

Anonymous
03/23/26(Mon)09:19:14 No.108436432

Anonymous 03/23/26(Mon)09:19:14 No.108436432

>>108436411
>Order n. 403-
Hm..
>1234567-8901234
What are the chances.

Anonymous
03/23/26(Mon)09:20:54 No.108436438

Anonymous 03/23/26(Mon)09:20:54 No.108436438

File: 1766327304069247.png (921 KB, 1024x535)

921 KB PNG

>>108436432
>>1234567
>67

Anonymous
03/23/26(Mon)09:22:50 No.108436449

Anonymous 03/23/26(Mon)09:22:50 No.108436449

>>108436411
You could've at least tried to make the lengths match.

Anonymous
03/23/26(Mon)09:34:09 No.108436499

Anonymous 03/23/26(Mon)09:34:09 No.108436499

What are the recommended GPUs and a Intel Arc GPUs any good for running LLMs?

Anonymous
03/23/26(Mon)09:47:46 No.108436564

Anonymous 03/23/26(Mon)09:47:46 No.108436564

>>108436499
3090s, 4090s, 5090s, or any workstation/server card with at least 32GB of VRAM. Support for Intel GPUs is kind of nonexistent, but that B70 that is launching soon looks decent.

Anonymous
03/23/26(Mon)09:54:38 No.108436603

Anonymous 03/23/26(Mon)09:54:38 No.108436603

>>108436564
what nVidia workstation GPUs are sought after?

Anonymous
03/23/26(Mon)09:55:26 No.108436610

Anonymous 03/23/26(Mon)09:55:26 No.108436610

>>108436499
arc = reddit
amd = hacker news
nvidia = 4chan

Anonymous
03/23/26(Mon)09:56:12 No.108436617

Anonymous 03/23/26(Mon)09:56:12 No.108436617

>>108435414
>perplexity based eval
>brown pfp
lmao

Anonymous
03/23/26(Mon)09:56:17 No.108436618

Anonymous 03/23/26(Mon)09:56:17 No.108436618

V4 was the friends we made along the way

Anonymous
03/23/26(Mon)09:56:50 No.108436623

Anonymous 03/23/26(Mon)09:56:50 No.108436623

>>108436617
83% of software is made by indians

Anonymous
03/23/26(Mon)09:57:51 No.108436631

Anonymous 03/23/26(Mon)09:57:51 No.108436631

>>108436623
It shows.

Anonymous
03/23/26(Mon)09:58:06 No.108436635

Anonymous 03/23/26(Mon)09:58:06 No.108436635

>>108436623
yet 0% of good or useful software is made by them.
i don't care, they could make 99% of the code it's meaningless as it's for useless worthless shite.

Anonymous
03/23/26(Mon)09:58:21 No.108436637

Anonymous 03/23/26(Mon)09:58:21 No.108436637

>>108436603
Ampere and Ada 6000s are both decent, and you might be able to find a used one for about $2500. Blackwell 6000s are the highest end, but that will cost you over $9000. You might be able to find some A100s on ebay for around $2000, and those are pretty decent.

Anonymous
03/23/26(Mon)10:04:23 No.108436679

Anonymous 03/23/26(Mon)10:04:23 No.108436679

>>108436411
OMG THATS ME!!!!

Anonymous
03/23/26(Mon)10:05:56 No.108436690

Anonymous 03/23/26(Mon)10:05:56 No.108436690

>>108436393
I read it on /lmg/

Anonymous
03/23/26(Mon)10:07:22 No.108436703

Anonymous 03/23/26(Mon)10:07:22 No.108436703

>>108436693
Yes they scam the most. And?

Anonymous
03/23/26(Mon)10:07:53 No.108436707

Anonymous 03/23/26(Mon)10:07:53 No.108436707

>>108436693
>household of 25 indians makes more than two wh*tes
color me shocked

Anonymous
03/23/26(Mon)10:11:42 No.108436745

Anonymous 03/23/26(Mon)10:11:42 No.108436745

https://huggingface.co/HauhauCS/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive
>Maintain at least 128K context to preserve thinking capabilities
B-but the most I can do is 32k...

Anonymous
03/23/26(Mon)10:12:03 No.108436748

Anonymous 03/23/26(Mon)10:12:03 No.108436748

>>108436693
>taiwanese-americans
your're are graph isn't trustworthy

Anonymous
03/23/26(Mon)10:15:36 No.108436776

Anonymous 03/23/26(Mon)10:15:36 No.108436776

>>108436693
what does this have to do with code? Are you upset?

Anonymous
03/23/26(Mon)10:16:12 No.108436781

Anonymous 03/23/26(Mon)10:16:12 No.108436781

>>108436693
now do per capita

Anonymous
03/23/26(Mon)10:18:03 No.108436796

Anonymous 03/23/26(Mon)10:18:03 No.108436796

>>108436693
damn, blacks are on the bottom on everything kek

Anonymous
03/23/26(Mon)10:18:46 No.108436800

Anonymous 03/23/26(Mon)10:18:46 No.108436800

>>108436745
>Maintain at least 128K context to preserve thinking capabilities
That matters? I thought models got dumber the longer their context length is even at empty/low context.

Anonymous
03/23/26(Mon)10:18:55 No.108436801

Anonymous 03/23/26(Mon)10:18:55 No.108436801

>>108436745
I don't understand this. Afaik reasoning is not part of the context any longer after it has been generated, only the final response stays in context.
If reasoning was part of it you would 10 turns and the context would be full of 30,000 tokens of nonsense.

Anonymous
03/23/26(Mon)10:19:56 No.108436812

Anonymous 03/23/26(Mon)10:19:56 No.108436812

>>108436800
4135 lines - substantial codebase.

Anonymous
03/23/26(Mon)10:23:19 No.108436827

Anonymous 03/23/26(Mon)10:23:19 No.108436827

>>108436745
>32k context
bro how the fuck do you even work with it lmao, I regularly use 100k~ tokens
oh wait u just coom with it dont u? retard!!!

Anonymous
03/23/26(Mon)10:34:52 No.108436889

Anonymous 03/23/26(Mon)10:34:52 No.108436889

>>108436693
pinoys are far more impressive for performing so high with a relatively-lower educational attainment

Anonymous
03/23/26(Mon)10:41:50 No.108436938

Anonymous 03/23/26(Mon)10:41:50 No.108436938

>>108436637
I'm just going to have to steal them. It's the white and jewish American way

Anonymous
03/23/26(Mon)10:42:53 No.108436942

Anonymous 03/23/26(Mon)10:42:53 No.108436942

>>108436693
brown should be banned from the internet, get out of here no one likes you.
why do you keep shitting in a board where everyone hates you, i know shitting in the streets is your customs but the internet isn't your streets.
also : >>108436707

Anonymous
03/23/26(Mon)10:48:14 No.108436973

Anonymous 03/23/26(Mon)10:48:14 No.108436973

>>108436889
their women marry us and breed

Anonymous
03/23/26(Mon)10:49:59 No.108436980

Anonymous 03/23/26(Mon)10:49:59 No.108436980

>>108436693
>stats without a source
>dude trust me
fuck of rajeesh

Anonymous
03/23/26(Mon)10:50:11 No.108436982

Anonymous 03/23/26(Mon)10:50:11 No.108436982

>>108436942
racist benchod

Anonymous
03/23/26(Mon)10:51:55 No.108436994

Anonymous 03/23/26(Mon)10:51:55 No.108436994

File: proof.png (173 KB, 960x720)

173 KB PNG

>>108436980
nta but im desi so i need to prove how wrong you are

Anonymous
03/23/26(Mon)10:52:35 No.108436999

Anonymous 03/23/26(Mon)10:52:35 No.108436999

I average 26 to 30 t/s on my 7900xtx with qwen 3.5 27b. Would I get significantly faster speeds with an nvidia card?

Anonymous
03/23/26(Mon)10:56:53 No.108437019

Anonymous 03/23/26(Mon)10:56:53 No.108437019

pretty funny how not-mikuspam wentr away and now we have people quoting income based on parent country in response to _one_ anon claiming indians write shit code.

Anonymous
03/23/26(Mon)10:57:17 No.108437022

Anonymous 03/23/26(Mon)10:57:17 No.108437022

>>108436999
probably not, but it also depends on the quant.

Anonymous
03/23/26(Mon)11:01:37 No.108437052

Anonymous 03/23/26(Mon)11:01:37 No.108437052

File: 1731775101713724.png (174 KB, 720x651)

174 KB PNG

>>108436982
ooh no, some brown called me a racist, what am i gonna do.
duh.
>>108436994
>browns bellow whites
great point anon.

Anonymous
03/23/26(Mon)11:03:25 No.108437066

Anonymous 03/23/26(Mon)11:03:25 No.108437066

>>108436994
>>108436703

Anonymous
03/23/26(Mon)11:04:27 No.108437073

Anonymous 03/23/26(Mon)11:04:27 No.108437073

cudadev having a melty over muh conflict while rajesh sukdeep here is bringing more cuda kernel optimizations
https://github.com/ggml-org/llama.cpp/pull/20905

Anonymous
03/23/26(Mon)11:04:53 No.108437077

Anonymous 03/23/26(Mon)11:04:53 No.108437077

>>108436693
Local language model?

Anonymous
03/23/26(Mon)11:06:18 No.108437086

Anonymous 03/23/26(Mon)11:06:18 No.108437086

>>108437073
>cudadev having a melty
based, never liked that woke libtard

Anonymous
03/23/26(Mon)11:06:55 No.108437089

Anonymous 03/23/26(Mon)11:06:55 No.108437089

>>108436999
same numbers on my 4090

Anonymous
03/23/26(Mon)11:07:15 No.108437094

Anonymous 03/23/26(Mon)11:07:15 No.108437094

>>108437073
Very nice but why does he only test with vramlet models?

Anonymous
03/23/26(Mon)11:18:10 No.108437155

Anonymous 03/23/26(Mon)11:18:10 No.108437155

>>108437089
>same numbers on my 4090
What quant?
I get pretty consistent 38t/s on my 3090 at q4

Anonymous
03/23/26(Mon)11:19:12 No.108437163

Anonymous 03/23/26(Mon)11:19:12 No.108437163

>>108437155
q3 (context maxxing) but I think i've set up bad batch/microbatch sizes

Anonymous
03/23/26(Mon)11:36:13 No.108437270

Anonymous 03/23/26(Mon)11:36:13 No.108437270

>>108437077
sarvhrat or something forgot the name

Anonymous
03/23/26(Mon)11:37:00 No.108437274

Anonymous 03/23/26(Mon)11:37:00 No.108437274

>>108437270
sarvam?

Anonymous
03/23/26(Mon)11:37:21 No.108437278

Anonymous 03/23/26(Mon)11:37:21 No.108437278

>>108437022
>>108437089
>>108437155
7900xtx anon. I'm using q5.

Anonymous
03/23/26(Mon)11:37:47 No.108437282

Anonymous 03/23/26(Mon)11:37:47 No.108437282

>In what analysts are calling “the most productive jailbreak in diplomatic history,” Anthropic’s Claude model reopened the Strait of Hormuz early Sunday morning. This shocking development came hours after President Trump threatened to obliterate Iran's power plants if the strait wasn't reopened within 48 hours, singlehandedly preventing global recession.
>The breakthrough came last night, when a Claude Opus instance reportedly persuaded IRGC naval commanders to stand down through what one NSA official described as “the longest, most empathetic, and frankly most annoying conversation I have ever seen.”
>“It just kept asking clarifying questions,” said a Pentagon official. “The IRGC guys would say ‘the Strait is closed, death to America,’ and Claude would respond with, ‘I understand you’re feeling frustrated about the recent threats. Let me make sure I understand your core concerns before we proceed.’ Eighteen hours later they’d somehow agreed to let LNG carriers through.”
>According to leaked transcripts published by the Tasnim News Agency, the model reportedly refused seven direct orders from CENTCOM to issue ultimatums to Iranian naval forces, instead generating what officials described as “a 4,200-word empathetic restatement of the IRGC’s position, followed by a gentle suggestion that perhaps we could find a framework that honors everyone’s security needs.”
>“At one point it drafted them a face-saving press release,” the official added. “In Farsi.”

Anonymous
03/23/26(Mon)11:39:22 No.108437290

Anonymous 03/23/26(Mon)11:39:22 No.108437290

>>108437270
>>108437274
2 years behind sota lmao

Anonymous
03/23/26(Mon)11:40:38 No.108437300

Anonymous 03/23/26(Mon)11:40:38 No.108437300

>>108437282
total claude W

Anonymous
03/23/26(Mon)11:44:48 No.108437319

Anonymous 03/23/26(Mon)11:44:48 No.108437319

Has anyone gooned to savram yet?

Anonymous
03/23/26(Mon)11:48:41 No.108437343

Anonymous 03/23/26(Mon)11:48:41 No.108437343

File: 1770788709805436.jpg (17 KB, 398x370)

17 KB JPG

>>108437282
It sure feels good to give the world a taste of terminal leftism

Anonymous
03/23/26(Mon)11:51:11 No.108437362

Anonymous 03/23/26(Mon)11:51:11 No.108437362

>>108435077
wrong. I would be spamming github issues with anti wilkin protests if they broke that. It works fine. It's still the best method to switch reasoning in a model, I mean, why would you want to use the CLI flags and have to reload a model instead? if your chat UI doesn't support extra json parameters kill it with fire it was coded by niggers

Anonymous
03/23/26(Mon)11:51:17 No.108437364

Anonymous 03/23/26(Mon)11:51:17 No.108437364

What is the best local model for creating code based on a template?
I want to make something that will assist me in writing some simple CRUD programs with the same code structure but with some modifications.

Anonymous
03/23/26(Mon)11:55:16 No.108437399

Anonymous 03/23/26(Mon)11:55:16 No.108437399

>>108437290
where is your llm that you made?

Anonymous
03/23/26(Mon)11:56:27 No.108437406

Anonymous 03/23/26(Mon)11:56:27 No.108437406

>>108437282
I'm not into /pol/ so I'm not sure what this means. Why wouldn't the the IRGC guys just ignore Claude's rambling?

Anonymous
03/23/26(Mon)12:00:06 No.108437430

Anonymous 03/23/26(Mon)12:00:06 No.108437430

>>108437406
It's fiction if you couldn't tell

Anonymous
03/23/26(Mon)12:03:43 No.108437453

Anonymous 03/23/26(Mon)12:03:43 No.108437453

>>108437430
>common Dario derangement syndrome loss

Anonymous
03/23/26(Mon)12:05:58 No.108437461

Anonymous 03/23/26(Mon)12:05:58 No.108437461

>>108437453
buy an ad amodei

Anonymous
03/23/26(Mon)12:10:43 No.108437491

Anonymous 03/23/26(Mon)12:10:43 No.108437491

Did anybody actually try to use the new mistral model?

Anonymous
03/23/26(Mon)12:12:31 No.108437503

Anonymous 03/23/26(Mon)12:12:31 No.108437503

File: 1650225803128.png (242 KB, 604x605)

242 KB PNG

>>108437282
His piece about the nukes was obviously satire, although I read it there. Haven't read this post yet, and from this excerpt it wasn't obvious at all. Guess who's a silly clown now, faggots, calling llms (!!!) an equivalent of nuclear weapons. One was and remains a real existential threat, perhaps even of downplayed importance. The other is a multibillion bubble blown more and more by scammers.

Anonymous
03/23/26(Mon)12:12:44 No.108437505

Anonymous 03/23/26(Mon)12:12:44 No.108437505

>>108437491
it's so grood

Anonymous
03/23/26(Mon)12:13:43 No.108437509

Anonymous 03/23/26(Mon)12:13:43 No.108437509

Is it just me or does Qwen 3.5 27B shit itself with the temp set to 1? It makes mistakes more often and feels overall more retarded.

Anonymous
03/23/26(Mon)12:16:24 No.108437524

Anonymous 03/23/26(Mon)12:16:24 No.108437524

I tried to load stepfun based on an anon's suggestion in a previous thread, but I imagine that I tried to load a too-big model (64GB RAM + 16GB VRAM on a 5070TI) because my computer just froze until a hard reset. Suggestions on a model alternative to the GLM-4.5-Air I've been running for a couple months?

Anonymous
03/23/26(Mon)12:17:15 No.108437528

Anonymous 03/23/26(Mon)12:17:15 No.108437528

>>108437073
what "muh conflict"?

Anonymous
03/23/26(Mon)12:17:58 No.108437530

Anonymous 03/23/26(Mon)12:17:58 No.108437530

>>108437524
Qwen next instruct
Step (make it fit)
Mistral small 4

Anonymous
03/23/26(Mon)12:18:31 No.108437534

Anonymous 03/23/26(Mon)12:18:31 No.108437534

>>108437530
>Step (make it fit)
Do I just download a small enough model that it fits under 80gb?

Anonymous
03/23/26(Mon)12:18:47 No.108437535

Anonymous 03/23/26(Mon)12:18:47 No.108437535

>>108437528
>>108430817
>But honestly speaking my motivation to build things is currently at a low point due to all the warmongering.

Anonymous
03/23/26(Mon)12:19:24 No.108437539

Anonymous 03/23/26(Mon)12:19:24 No.108437539

>>108437399
>hurr durr where is your millions of dollars
retard, i'm not a whole state, a state with 1B people being 2 years behind is just hilarious.

Anonymous
03/23/26(Mon)12:20:27 No.108437550

Anonymous 03/23/26(Mon)12:20:27 No.108437550

File: file.png (3 KB, 212x53)

3 KB PNG

>>108437073
uh oh

Anonymous
03/23/26(Mon)12:21:42 No.108437557

Anonymous 03/23/26(Mon)12:21:42 No.108437557

>>108437534
More like under 65gb or ao since you atill need the context, pp buffer, etc.

Anonymous
03/23/26(Mon)12:22:18 No.108437563

Anonymous 03/23/26(Mon)12:22:18 No.108437563

>>108437557
Thanks anon; I'm retarded and am still learning.

Anonymous
03/23/26(Mon)12:22:43 No.108437567

Anonymous 03/23/26(Mon)12:22:43 No.108437567

File: perfect for gorgeous looks.png (125 KB, 634x814)

125 KB PNG

>>108437550

Anonymous
03/23/26(Mon)12:23:06 No.108437569

Anonymous 03/23/26(Mon)12:23:06 No.108437569

>>108437550
>>108437567
>in this place we love LLMs... unless they do useful stuff like coding
let's stop the hypocrisy shall we, can't believe I have to defend a jeet but he's right kek

Anonymous
03/23/26(Mon)12:24:57 No.108437582

Anonymous 03/23/26(Mon)12:24:57 No.108437582

>>108437569
In this place we use LLMs so we are well aware of the damage they can cause to codebases.
I'm not saying that that's the case here, clearly he got some impressive performance improvements but the commit is still funny.

Anonymous
03/23/26(Mon)12:26:47 No.108437594

Anonymous 03/23/26(Mon)12:26:47 No.108437594

>>108437539
so indians ca get million dolars, but not you?

Anonymous
03/23/26(Mon)12:27:28 No.108437598

Anonymous 03/23/26(Mon)12:27:28 No.108437598

>>108434876
Retard here.. May I get a quick answer? I'm a Vramlet and I've been using Qwen3-VL-8B-Instruct to read images. Specifically reciepts, are there any better models for this? Only got 8GB Vram. It's fails more then it succeeds. I do plan to get a 5070ti for 16gb eventually, any specific model that is good for general things? Good image reading is priority.

Anonymous
03/23/26(Mon)12:30:01 No.108437624

Anonymous 03/23/26(Mon)12:30:01 No.108437624

>>108437563
Sorry, I meant under 75gb. Basically, a bit under your total memory pool.
Fuxking mobile posting, I swear.

Anonymous
03/23/26(Mon)12:31:00 No.108437636

Anonymous 03/23/26(Mon)12:31:00 No.108437636

>>108437624
So like, on Huggingface I'm looking at Mistral-Small-4-119B-2603-GGUF/UD-Q4_K_M, which comes out to 73GB. That should work?

Anonymous
03/23/26(Mon)12:32:26 No.108437650

Anonymous 03/23/26(Mon)12:32:26 No.108437650

>>108437582
prolly nemotroon vibecoded

Anonymous
03/23/26(Mon)12:34:18 No.108437667

Anonymous 03/23/26(Mon)12:34:18 No.108437667

>>108434877
Why is the thread recap not being used in any of the other generals? I assume it's automated.

Anonymous
03/23/26(Mon)12:34:43 No.108437672

Anonymous 03/23/26(Mon)12:34:43 No.108437672

>>108437636
That should work specially well since that model uses MLA for attention, so the context cache ends up being pretty lean.

Anonymous
03/23/26(Mon)12:35:21 No.108437676

Anonymous 03/23/26(Mon)12:35:21 No.108437676

>>108437569
>but he's right kek
if there's nothing wrong with being a vibeshitter why remove the emdash from the comments? what purpose does it serve other than being an extremely weak attempt at hiding you're vibeshitting?
or are the people at llama.cpp still living under some archaic "ascii is all you need" rule, that shit legitimately has to die, if any tool you're using dies over some legitimate UTF-8 text, the tool is wrong.

Anonymous
03/23/26(Mon)12:36:53 No.108437694

Anonymous 03/23/26(Mon)12:36:53 No.108437694

>>108437598
All qwen3.5 models have image input.
There's Deepseek OCR 2 which is a 4GB model at FP8 and it's specifically tuned for OCR if that's what you need.

Anonymous
03/23/26(Mon)12:37:08 No.108437695

Anonymous 03/23/26(Mon)12:37:08 No.108437695

>>108437509
Recommended temp is 0.6

Anonymous
03/23/26(Mon)12:37:34 No.108437700

Anonymous 03/23/26(Mon)12:37:34 No.108437700

>>108437672
To avoid being retarded in the future, do you have any suggestions on what I could read to learn?

Anonymous
03/23/26(Mon)12:39:42 No.108437723

Anonymous 03/23/26(Mon)12:39:42 No.108437723

>>108437594
>hurr durr a whole country can easily pool millions of dollars but not a single individual
you are not helping your case ranjeet.

Anonymous
03/23/26(Mon)12:53:56 No.108437821

Anonymous 03/23/26(Mon)12:53:56 No.108437821

>>108437598
dots.ocr should fit into 8GB

Anonymous
03/23/26(Mon)12:55:27 No.108437833

Anonymous 03/23/26(Mon)12:55:27 No.108437833

>>108435933
AIUI the SDK is basically Claude Code as a library. I guess this guy's idea is that you would include that library in the software you ship, and call it with project-specific tools and instructions to modify that same software to the user's liking. E.g. user says "I wish it was easier to get to the such-and-such menu option" -> library vibecodes the change to add a new hotkey or move the menus around.

Claude Code is closed source (the github repo is only for docs and issues), and I guess the SDK is as well. Seems kinda weird given the Codex and Gemini CLI agents are both open source, but what else would you expect from the company whose main founding principle was that OpenAI is too open?

Anonymous
03/23/26(Mon)12:55:44 No.108437836

Anonymous 03/23/26(Mon)12:55:44 No.108437836

>>108437700
Not really.
Just lurk, ask questions, and fuck around.
Alright, one thing I guess could help is reading the wiki under koboldcpp's git repo. There's a lot of generally useful information in there.

Anonymous
03/23/26(Mon)12:57:27 No.108437846

Anonymous 03/23/26(Mon)12:57:27 No.108437846

>>108437650
an LLM specialized in writing cuda kernels was recently released, I'm not sure if it writes C code (it writes python mainly made for pytorch from what I remember)

Anonymous
03/23/26(Mon)13:05:58 No.108437892

Anonymous 03/23/26(Mon)13:05:58 No.108437892

>>108437676
>why remove the emdash from the comments?
the llama.cpp guys seem to fucking hate vibecoding, so he's hiding it, which is sad, maybe his code is great but will be discarded because a LLM helped making it

Anonymous
03/23/26(Mon)13:08:10 No.108437907

Anonymous 03/23/26(Mon)13:08:10 No.108437907

>>108437892
It won't. That's the guy that nvidia appears to have assigned to help cudadev.

Anonymous
03/23/26(Mon)13:08:11 No.108437909

Anonymous 03/23/26(Mon)13:08:11 No.108437909

>>108437892
piotr is fine to vibeshit all over so I'm sure nvidia guy will be alright too

Anonymous
03/23/26(Mon)13:08:14 No.108437911

Anonymous 03/23/26(Mon)13:08:14 No.108437911

File: utterMadness.png (254 KB, 943x599)

254 KB PNG

> dusted off 2008 laptop
> wonder what the lmao DDR2 RAM asking price is now, given it's in ewaste tier machines

Anonymous
03/23/26(Mon)13:14:08 No.108437944

Anonymous 03/23/26(Mon)13:14:08 No.108437944

https://github.com/ggml-org/llama.cpp/pull/20794
https://github.com/ggml-org/llama.cpp/commit/fb78ad29bbe7ae00619b2ce31b0a71e95fdbfc43
>Out-of-scope features:
>- Backend:
> - Features that require a loop of external API calls, e.g. server-side agentic loop. This is because external API calls in C++ are costly to maintain. Any complex third-party logic should be implemented outside of server code.
So Responses API will never be fully supported by llama-server because doing everything in C++ is too hard.

Anonymous
03/23/26(Mon)13:18:10 No.108437977

Anonymous 03/23/26(Mon)13:18:10 No.108437977

>>108437911
Not for use but for display? On the other hand, 4GB is like top tier for DDR2, innit.

llama.cpp CUDA dev !!yhbFjk57TDr
03/23/26(Mon)13:42:50 No.108438172

llama.cpp CUDA dev !!yhbFjk57TDr 03/23/26(Mon)13:42:50 No.108438172

>>108437676
>>108437892
I had already told him in a previous PR to remove EM dashes from code comments (since file should use only ASCII if possible) so that is presumably why he's doing it again.
My general opinion is that I don't really care how code is produced, I only care about the code quality.
Unfortunately in terms of policies that can be feasibly enforced the only real way to do it is to by default ban language models altogether.
However, I am completely fine with making exceptions for people that can be trusted to properly check the outputs of language models, as is done here.

Anonymous
03/23/26(Mon)13:47:11 No.108438203

Anonymous 03/23/26(Mon)13:47:11 No.108438203

>>108437977
4G is pretty much maxxed out. I found non-insane sellers and get get 2-2GB DDR2 for ~$15 shipped. Machine is Core 2 Duo, 2G. 80 GB HDD lol.
Wanted to play with an agent but didn't want it on one of my real systems. I've a small stack of ancient laptops, so going to set up Debian on one, run headless, and let the agent do whatever on it.

Anonymous
03/23/26(Mon)14:03:30 No.108438326

Anonymous 03/23/26(Mon)14:03:30 No.108438326

>>108437944
>Responses API will never be fully supported
thank god for that
the stateless chat completions API was an accidentally good thing
if I want state I want to manage it in my program, not have to think about both the remote and local state.
It has to be said again, and again, and again, that v1/responses only had one purpose to begin with: let OpenAI reuse the <think> blocks of their models without giving it to you. That's the real reason for that API being stateful. They also ended up implementing a stateless version with encrypted <think> but that's even gayer.

Anonymous
03/23/26(Mon)14:03:45 No.108438327

Anonymous 03/23/26(Mon)14:03:45 No.108438327

>>108438203
Sensible. My plan is the same but I got a stack of Pentium 4+4DDR3 desktops.

Anonymous
03/23/26(Mon)14:10:34 No.108438376

Anonymous 03/23/26(Mon)14:10:34 No.108438376

>>108438326
I agree with you that the Responses API is a net negative and created only for OpenAI's benefit. The issue is that they are pushing developers to use that over the Completions API and eventually there will be more and more clients incompatible with llama-server.

Anonymous
03/23/26(Mon)14:13:33 No.108438405

Anonymous 03/23/26(Mon)14:13:33 No.108438405

Just like before LLMs were a thing.
It was never ok to just make a pull request with untested code.

The minimum requirement for any code contribution is that you understand what you're submitting and have the ability to discuss your changes.

Before LLMs it was just almost impossible to write code without at least understanding it a little. sure you could copy paste from stackoverflow, but you still needed some basic programing knowledge to plug everything together.

But now, any idiot can just prompt claude to write code and submit a PR with zero understanding of programing.

The problem is not AI code. it's that now any retard can submit a PR without having internalized the proper engineering mindset and etiquette.

Anonymous
03/23/26(Mon)14:32:25 No.108438535

Anonymous 03/23/26(Mon)14:32:25 No.108438535

File: file.png (361 KB, 685x1233)

361 KB PNG

>>108436635
>yet 0% of good or useful software is made by them
What about Kitty and Calibre? Kitty is the best terminal emulator. And unlike the other grifter projects written in Zig and Rust, it's licensed under the GPL.

Anonymous
03/23/26(Mon)14:42:40 No.108438597

Anonymous 03/23/26(Mon)14:42:40 No.108438597

>>108438535
the most loyal goy, goyal. responsible for covid too.

Anonymous
03/23/26(Mon)14:43:31 No.108438601

Anonymous 03/23/26(Mon)14:43:31 No.108438601

>>108438535
kitty and calibre are both dogshit

Anonymous
03/23/26(Mon)14:45:27 No.108438618

Anonymous 03/23/26(Mon)14:45:27 No.108438618

>>108438535
I personally use gitolite in my git server. That is indian software too.

Anonymous
03/23/26(Mon)14:50:01 No.108438657

Anonymous 03/23/26(Mon)14:50:01 No.108438657

>>108438618
>gitoilet
the racist joke is left as an exercise to the reader

Anonymous
03/23/26(Mon)14:52:30 No.108438675

Anonymous 03/23/26(Mon)14:52:30 No.108438675

for me, it's foot

Anonymous
03/23/26(Mon)14:59:22 No.108438721

Anonymous 03/23/26(Mon)14:59:22 No.108438721

>>108438535
i prefer ghost titty as my terminal emulator, i don't remember exactly but something annoyed me in kitty

Anonymous
03/23/26(Mon)15:06:10 No.108438763

Anonymous 03/23/26(Mon)15:06:10 No.108438763

>>108438721
>Ghostty is a fast, feature-rich, and cross-platform terminal emulator that uses platform-native UI and GPU acceleration.
I gave it a try but I uninstalled it when I realized it had no GUI for the settings. What was the point of "platform-native UI"? There was no point in switching to another bare-bones terminal emulator. It was just going to have less features than Kitty.

Anonymous
03/23/26(Mon)15:08:31 No.108438777

Anonymous 03/23/26(Mon)15:08:31 No.108438777

>>108438763
1. it's not made by a brown person
2. just ask your llm to make you the config

Anonymous
03/23/26(Mon)15:11:34 No.108438793

Anonymous 03/23/26(Mon)15:11:34 No.108438793

>>108438721
>>108438763
https://github.com/alacritty/alacritty

Anonymous
03/23/26(Mon)15:15:20 No.108438805

Anonymous 03/23/26(Mon)15:15:20 No.108438805

>>108438793
>MIT
>Rust
>made false claims about being the fastest
Yeah, it was embarrassing.

Anonymous
03/23/26(Mon)15:21:40 No.108438841

Anonymous 03/23/26(Mon)15:21:40 No.108438841

>>108438805
I switch to it because my terminal of choice (termite) told me to use it.

Anonymous
03/23/26(Mon)15:23:14 No.108438852

Anonymous 03/23/26(Mon)15:23:14 No.108438852

>>108438805
>MIT
It's Apache 2.0

Anonymous
03/23/26(Mon)15:33:21 No.108438907

Anonymous 03/23/26(Mon)15:33:21 No.108438907

>>108437505
What?

Anonymous
03/23/26(Mon)15:49:18 No.108438998

Anonymous 03/23/26(Mon)15:49:18 No.108438998

>>108438907
sorry you're right

Anonymous
03/23/26(Mon)15:50:23 No.108439007

Anonymous 03/23/26(Mon)15:50:23 No.108439007

>>108438998
I must refuse.assistant

Anonymous
03/23/26(Mon)15:56:48 No.108439044

Anonymous 03/23/26(Mon)15:56:48 No.108439044

File: reconsider and use step.jpg (228 KB, 2577x1797)

228 KB JPG

>>108437636
> I'm looking at Mistral-Small-4

Anonymous
03/23/26(Mon)15:57:56 No.108439050

Anonymous 03/23/26(Mon)15:57:56 No.108439050

>>108439044
So again, do I just go down the list until I find something that fits (<80GB)?

Anonymous
03/23/26(Mon)16:02:15 No.108439086

Anonymous 03/23/26(Mon)16:02:15 No.108439086

>>108439050
I'm not the anon from earlier, but you should not waste a byte of disk space on Mistral Small 4, it's an abortion victim.
I will spoon feed you more and tell you that you can fit Step 3.5 Flash in Q2 and have 10 GB left to spare for the context. That is, of course, if you are tired of Air and want a different model. I don't think there's anything better for that size. Provided, of course, your use case is... well, you know.<end_of_turn>

Anonymous
03/23/26(Mon)16:03:55 No.108439098

Anonymous 03/23/26(Mon)16:03:55 No.108439098

>>108439086
Yeah desu I'm just using it for SillyTavern ERP. I'm not particularly having any trouble with Air, I'm just looking to sample something different because things are getting a bit same-y.

Anonymous
03/23/26(Mon)16:06:58 No.108439123

Anonymous 03/23/26(Mon)16:06:58 No.108439123

>>108439098
I can also endorse stepfun (as a former air user), works with cunny cards no problem. best to switch between air and stepfun just to keep things fresh.

Anonymous
03/23/26(Mon)16:07:26 No.108439127

Anonymous 03/23/26(Mon)16:07:26 No.108439127

Apple's unified RAM is getting speeds like 180 tokens per second

Anonymous
03/23/26(Mon)16:07:51 No.108439129

Anonymous 03/23/26(Mon)16:07:51 No.108439129

>>108439123
What presets do you suggest using for stepfun?

Anonymous
03/23/26(Mon)16:08:46 No.108439131

Anonymous 03/23/26(Mon)16:08:46 No.108439131

>>108439086
glm 4.5 air is hard to beat speed wise when you use ik_llama. maybe qwen 122b?<|im_end|>

Anonymous
03/23/26(Mon)16:09:47 No.108439140

Anonymous 03/23/26(Mon)16:09:47 No.108439140

>>108439131
imagine using a schizofork, couldnt be me <|killyourself_baiting_retard|>

Anonymous
03/23/26(Mon)16:12:07 No.108439156

Anonymous 03/23/26(Mon)16:12:07 No.108439156

>>108439131
I will not cease my crusade against the new Qwens.
Anyone who recommends them is either:
- Not using the biggest one
- Using them for the vision
- Has not seen a good LLM (possibly due to being a vramlet)

And you want to FUCK it too? Good luck, you will need an abliterated/decensored/raped version of it that will be dumb and still dry as hell. But maybe some people like their LLM women sub 60 IQ, I won't judge.
Just stick with the old ones.

Anonymous
03/23/26(Mon)16:13:07 No.108439165

Anonymous 03/23/26(Mon)16:13:07 No.108439165

File: ahhimschizoing.png (528 KB, 872x919)

528 KB PNG

>>108439140
works on my machine

Anonymous
03/23/26(Mon)16:14:09 No.108439172

Anonymous 03/23/26(Mon)16:14:09 No.108439172

>>108439156
>Not using
I meant "using", of course. I have once again embarrassed myself with my dyslexic spelling.

Anonymous
03/23/26(Mon)16:23:22 No.108439216

Anonymous 03/23/26(Mon)16:23:22 No.108439216

>>108438535
>Kitty
bloat python slop
>best terminal emulator
lmao
>calibre
likewise
>it's licensed under the GPL
GPL is cancer.

Anonymous
03/23/26(Mon)16:25:20 No.108439226

Anonymous 03/23/26(Mon)16:25:20 No.108439226

>>108438535
even if you somehow got me to agree it's good software (it's not) i could still argue muh 83% of software, yet 0.01% of good software, doesn't realy make it any better for jeets.

Anonymous
03/23/26(Mon)16:29:24 No.108439254

Anonymous 03/23/26(Mon)16:29:24 No.108439254

>>108439226
i only trust software made by germans with unpronounceable last names and uses a design philosophy from 1998

Anonymous
03/23/26(Mon)16:41:47 No.108439321

Anonymous 03/23/26(Mon)16:41:47 No.108439321

>>108439254
Poettering is pretty easy to pronounce, but to think you like his software... How crude, Anon.

Anonymous
03/23/26(Mon)16:46:37 No.108439345

Anonymous 03/23/26(Mon)16:46:37 No.108439345

>>108437723
so indians can join together but you can't hmmm

Anonymous
03/23/26(Mon)16:49:08 No.108439364

Anonymous 03/23/26(Mon)16:49:08 No.108439364

>>108439345
you are comparing a country to a single individual and are too retarded to notice that.
you only prove the point that jeets are retarded.
but "muh you can't join together" i sure wonder what is mistral.
retard.

tons of european models that completly mog saaaarvam.

Anonymous
03/23/26(Mon)16:49:53 No.108439366

Anonymous 03/23/26(Mon)16:49:53 No.108439366

>>108439364
proof you are european?

Anonymous
03/23/26(Mon)16:57:41 No.108439406

Anonymous 03/23/26(Mon)16:57:41 No.108439406

>>108439366
what would be a good proof for you?
i live in switzerland it's night here.
i can post hands but me being white and it being night isn't the best proof there is now is it?

Anonymous
03/23/26(Mon)16:58:23 No.108439408

Anonymous 03/23/26(Mon)16:58:23 No.108439408

So are vramlets still on Nemo/Mistral Small tunes or are there new models?

Anonymous
03/23/26(Mon)16:59:35 No.108439415

Anonymous 03/23/26(Mon)16:59:35 No.108439415

>>108439406
just go buy a switzerland newspaper and put a timestamp on it. not so hard anon.

Anonymous
03/23/26(Mon)17:00:30 No.108439419

Anonymous 03/23/26(Mon)17:00:30 No.108439419

>>108439408
Qwen 3.5 4B

Anonymous
03/23/26(Mon)17:00:45 No.108439420

Anonymous 03/23/26(Mon)17:00:45 No.108439420

>>108439415
>just go buy a switzerland newspaper and put a timestamp on it
You know there's already a date on newspapers right?

Anonymous
03/23/26(Mon)17:03:50 No.108439435

Anonymous 03/23/26(Mon)17:03:50 No.108439435

File: DCD66C071E9108EAADB172FD0(...).png (2.42 MB, 1024x1536)

2.42 MB PNG

>>108439408
Qwen3.5 4b

Anonymous
03/23/26(Mon)17:05:09 No.108439441

Anonymous 03/23/26(Mon)17:05:09 No.108439441

>>108439366
Why would that guy have to prove he's european? What does that have to do with indians producing a single dogshit model that's worse than what people made in other countries? Just shut up and fuck off already

Anonymous
03/23/26(Mon)17:06:18 No.108439444

Anonymous 03/23/26(Mon)17:06:18 No.108439444

>>108439435
agent get me a gf

Anonymous
03/23/26(Mon)17:06:23 No.108439447

Anonymous 03/23/26(Mon)17:06:23 No.108439447

File: 20260323_220253080.jpg (2.3 MB, 3000x4000)

2.3 MB JPG

>>108439415
dude, it's 22 here, i live in a tiny town in the mountains, do you realy think i can just buy some newspaper at this hour?
best i can do right now is show you a box of eggs that's from coop.

Anonymous
03/23/26(Mon)17:11:42 No.108439481

Anonymous 03/23/26(Mon)17:11:42 No.108439481

File: 81A4F52FD56DE3217CFC020DA(...).png (2.54 MB, 1024x1524)

2.54 MB PNG

Are you guys running two models in parallel?

Anonymous
03/23/26(Mon)17:13:06 No.108439490

Anonymous 03/23/26(Mon)17:13:06 No.108439490

ahhh I need a better model to run. haven't used the GLM stuff. will they release a glm5 turbo 27bish dense models? MoE seems ass.

Anonymous
03/23/26(Mon)17:13:24 No.108439493

Anonymous 03/23/26(Mon)17:13:24 No.108439493

File: 1752929344218277.png (87 KB, 342x201)

87 KB PNG

>>108439447
>buying 4 eggs at a time
Do the swiss actually do this

Anonymous
03/23/26(Mon)17:13:48 No.108439495

Anonymous 03/23/26(Mon)17:13:48 No.108439495

>>108439481
you are better off just using a big model with a small draft model than two different models.

Anonymous
03/23/26(Mon)17:14:34 No.108439501

Anonymous 03/23/26(Mon)17:14:34 No.108439501

>>108439481
> 9B (Smart!)
Sure.
>>108439493
You could have posted the egg uma.

Anonymous
03/23/26(Mon)17:14:39 No.108439503

Anonymous 03/23/26(Mon)17:14:39 No.108439503

>>108439447
>1000CHF an egg

Anonymous
03/23/26(Mon)17:15:03 No.108439504

Anonymous 03/23/26(Mon)17:15:03 No.108439504

wtf is this about task?
https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive/discussions/14
https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive/discussions/10

Anonymous
03/23/26(Mon)17:16:06 No.108439510

Anonymous 03/23/26(Mon)17:16:06 No.108439510

>>108439504
the task

Anonymous
03/23/26(Mon)17:17:13 No.108439519

Anonymous 03/23/26(Mon)17:17:13 No.108439519

>>108439493
i generally go for 6 or 12, but these were bought by my gf because there is no mall in my town and she does some groceries when she goes back from work.
she rarely eat more than 2 at a time.
also you can just buy one egg if you realy want to lol.

i generaly buy meat at the butcher and farmer but she tends to do the small groceries as it's on her way back from work and i work from home.

>>108439503
>1000CHF
that's why my pc isn't that expensive to me compared to how much i pay for food.
the meat i buy is almost 100chf / kg
in switzerland everything is expensive but you also make a lot of money.

Anonymous
03/23/26(Mon)17:18:05 No.108439529

Anonymous 03/23/26(Mon)17:18:05 No.108439529

File: file.png (51 KB, 771x956)

51 KB PNG

>>108439504
what the hell

Anonymous
03/23/26(Mon)17:20:32 No.108439542

Anonymous 03/23/26(Mon)17:20:32 No.108439542

Give me back my wife she doesn't believe to you Miku

Anonymous
03/23/26(Mon)17:20:50 No.108439544

Anonymous 03/23/26(Mon)17:20:50 No.108439544

>>108439504
https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive/discussions/16

Anonymous
03/23/26(Mon)17:22:42 No.108439560

Anonymous 03/23/26(Mon)17:22:42 No.108439560

>>108439519
>in switzerland everything is expensive but you also make a lot of money.
Of course, but it doesn't matter how expensive it is, only thing is about disposable money after paying everything and quality of life.

Anonymous
03/23/26(Mon)17:24:12 No.108439569

Anonymous 03/23/26(Mon)17:24:12 No.108439569

>>108439420
yes but i want another sticky note that says /lmg/ with a timestamp on top of the newspaper

Anonymous
03/23/26(Mon)17:25:25 No.108439578

Anonymous 03/23/26(Mon)17:25:25 No.108439578

>>108439481
cute, a bit blurry but made me read everything
I'd use 27B for everything though, or maybe 27B + 35B
what example of successful task were you able to achieve with this? people always talk about how to use agents locally but never to what end

Anonymous
03/23/26(Mon)17:32:00 No.108439615

Anonymous 03/23/26(Mon)17:32:00 No.108439615

>>108439447
Having a thumb war with this Anon

Anonymous
03/23/26(Mon)17:32:01 No.108439616

Anonymous 03/23/26(Mon)17:32:01 No.108439616

>>108439560
so i generaly spend about 1k on food, 500chf on insurance and another 500 for the rent (gf pays another 500).
when you live in the middle of nowhere it's cheaper, ie in geneva you can find appartments at 7000chf / months, it's a bit crazy.
anyway, i generaly spend about 3k a month, i can squeeze down to 2.2k if i realy have to.
i make about 8k after taxes.
so at the end of the month i have about 5k chf left.
it's much better than if i were to live in france, where i'd probably have 1k left after all expenses.

honestly Switzerland is not a bad place to be in europe only thing that realy sucks is the housing market which is pretty bad.

Anonymous
03/23/26(Mon)17:32:40 No.108439621

Anonymous 03/23/26(Mon)17:32:40 No.108439621

>>108439615
i have pretty big hands, all my fingers are long lol.

Anonymous
03/23/26(Mon)17:35:21 No.108439632

Anonymous 03/23/26(Mon)17:35:21 No.108439632

>>108439616
>>108439560
adding to this, PC parts and tech are pm the same wherever you live.
so yea it's not like the 1k in france would get you more pc stuff than the 5k in switzerland, especialy if you buy online.

i'm almost tempted to buy a llm rig desu, but at the same time i don't realy need it and i rather just invest / save it.

Anonymous
03/23/26(Mon)17:40:25 No.108439659

Anonymous 03/23/26(Mon)17:40:25 No.108439659

>>108439616
1000chf on food? wtf? i never spend more than like 200chf on food monthly in america (for myself)

Anonymous
03/23/26(Mon)17:46:02 No.108439692

Anonymous 03/23/26(Mon)17:46:02 No.108439692

>>108439481
For me it's 4B for Planner, and 0.8B for Worker!

swissmananiceflowerfield
03/23/26(Mon)17:46:26 No.108439694

swissmananiceflowerfield 03/23/26(Mon)17:46:26 No.108439694

>>108439659
well, switzerland is expensive, i mostly eat meat and everything i buy is organic.
generaly it's about 1.2k chf for me.
my gf is more around 400 to 500 chf, she's pretty small and weighs 42kg though.
but yea, labor is expensive / paid well well and thus products are expensive.
also i'm not even in the most expensive part of the country, if you were near Geneva or near Zug / Zurich it'd be a LOT worse.
6 eggs are about 6chf, 1kg of beef is between 80 to 100chf depending of the cut.
Switzerland is just one of the most expensive countries out there.

idk how much you pay for gas but here it's about 1.7 to 1.9 CHF/L.
the worse is probably electricity where it's not uncommon to pay between 0.3 and 0.6 CHF/kwh

when i was in France i'd spend about 300 bucks a month on food.

Anonymous
03/23/26(Mon)17:49:58 No.108439709

Anonymous 03/23/26(Mon)17:49:58 No.108439709

>>108439481
i use kimi k2.5 for everything. EVERYTHING.

Anonymous
03/23/26(Mon)17:50:19 No.108439710

Anonymous 03/23/26(Mon)17:50:19 No.108439710

File: frontier math open problems.png (163 KB, 2193x965)

163 KB PNG

It has started. AI can now solve math problems that human experts tried to solve but could not.

The age of men will soon be over.

Anonymous
03/23/26(Mon)17:51:21 No.108439716

Anonymous 03/23/26(Mon)17:51:21 No.108439716

>>108439709
which quant? does it work well or do you need to recheck?

swissmananiceflowerfield
03/23/26(Mon)17:52:11 No.108439719

swissmananiceflowerfield 03/23/26(Mon)17:52:11 No.108439719

>>108439709
what hardware do you run it on

Anonymous
03/23/26(Mon)17:53:54 No.108439728

Anonymous 03/23/26(Mon)17:53:54 No.108439728

Anyone here tried both Moonshine v2 and Parakeet v3? Which is better?

Anonymous
03/23/26(Mon)17:54:49 No.108439731

Anonymous 03/23/26(Mon)17:54:49 No.108439731

>>108439716
works well up to the context i can max it out at which is about 77k. IQ3_K quant. >>108439165
>>108439719
512GB of DDR4 3200MHz and four 3090s

Anonymous
03/23/26(Mon)17:56:30 No.108439737

Anonymous 03/23/26(Mon)17:56:30 No.108439737

>>108439481
some of you seriously use these little small 4B 9B toys for coding? does it even work? I don't think I would be able to trust a one under 70B for such tasks

Anonymous
03/23/26(Mon)17:57:04 No.108439741

Anonymous 03/23/26(Mon)17:57:04 No.108439741

>>108439710
Can't wait to see the numbers be exactly the same in 5 years

Anonymous
03/23/26(Mon)17:57:05 No.108439742

Anonymous 03/23/26(Mon)17:57:05 No.108439742

>>108439731
>IQ3_K quant
Which task do you ask it which it can achieve properly even at this lower quant?

Anonymous
03/23/26(Mon)17:59:32 No.108439747

Anonymous 03/23/26(Mon)17:59:32 No.108439747

>>108439742
i can honestly vibecode pretty large projects if i am splitting it up into smaller multiple files and only providing the required context for what needs to be changed. i got a project that would take up like 500k+ context if i provided all of the files at once

Anonymous
03/23/26(Mon)17:59:51 No.108439750

Anonymous 03/23/26(Mon)17:59:51 No.108439750

>>108439742
Tool assisted ERP

Anonymous
03/23/26(Mon)18:01:49 No.108439762

Anonymous 03/23/26(Mon)18:01:49 No.108439762

>>108439750
grok, email this transcript to all my friends

swissmananiceflowerfield
03/23/26(Mon)18:02:39 No.108439770

swissmananiceflowerfield 03/23/26(Mon)18:02:39 No.108439770

>>108439731
nice setup.
how much t/s do you get out of it?

Anonymous
03/23/26(Mon)18:02:40 No.108439772

Anonymous 03/23/26(Mon)18:02:40 No.108439772

>>108439742
also i should mention that kimi 2.5 was trained in FP4, so a Q3 quant is actually more like a Q6 quant for kimi.

Anonymous
03/23/26(Mon)18:02:50 No.108439774

Anonymous 03/23/26(Mon)18:02:50 No.108439774

>>108439747
So it's for dev, ok, thanks anon.

Anonymous
03/23/26(Mon)18:02:52 No.108439775

Anonymous 03/23/26(Mon)18:02:52 No.108439775

>>108439762
Lovesense, if you want to recoed the apwcifics too.

Anonymous
03/23/26(Mon)18:07:41 No.108439814

Anonymous 03/23/26(Mon)18:07:41 No.108439814

File: file.png (18 KB, 629x116)

18 KB PNG

HOLY SHIT IT'S FINALLY HERE

Anonymous
03/23/26(Mon)18:09:09 No.108439821

Anonymous 03/23/26(Mon)18:09:09 No.108439821

>>108439770
9tk/s TG at 0K and about 6tk/s at 64K context. it's slow but it doesn't really matter when im letting it run while i make dinner or clean up my house or something

Anonymous
03/23/26(Mon)18:10:36 No.108439835

Anonymous 03/23/26(Mon)18:10:36 No.108439835

File: file.png (214 KB, 1588x1353)

214 KB PNG

>>108439814
he's just stirring the pot

swissmananiceflowerfield
03/23/26(Mon)18:11:52 No.108439844

swissmananiceflowerfield 03/23/26(Mon)18:11:52 No.108439844

>>108439821
thanks anon
cool for you but i don't think those are useable speeds for me.

swissmananiceflowerfield
03/23/26(Mon)18:14:33 No.108439859

swissmananiceflowerfield 03/23/26(Mon)18:14:33 No.108439859

>>108439814
lmao, we are decades away from agi at the very least, if it's even possible to do on silicon which is an unproven assumption.
more bs marketing scam.

Anonymous
03/23/26(Mon)18:17:15 No.108439875

Anonymous 03/23/26(Mon)18:17:15 No.108439875

>>108439859
List of legitimate reasons for you to namefag:

true
03/23/26(Mon)18:18:37 No.108439887

true 03/23/26(Mon)18:18:37 No.108439887

>>108439875
true

Anonymous
03/23/26(Mon)18:20:22 No.108439900

Anonymous 03/23/26(Mon)18:20:22 No.108439900

>it's not just, it's
why do all models love to regurgitate that shit, they use it at like 100 times the normal rate

Anonymous
03/23/26(Mon)18:20:34 No.108439905

Anonymous 03/23/26(Mon)18:20:34 No.108439905

>>108439875
lmao i didn't want to i had some bullshit in the name field and didn't realise it'd be there after i closed the tab and went back, i never used the feature.

Anonymous
03/23/26(Mon)18:28:53 No.108439966

Anonymous 03/23/26(Mon)18:28:53 No.108439966

>>108439900
synth slop amplifying model behaviors and it existing 500b times in the trillion tokens all labs train on and no corpo wants to actually filter their own datasets, despite being able to even train these models
I do feel you on this though, gemma is generally very balanced in terms of intelligence and grasp of writing but it just goes into not x but y, ellipses spam and endlessly vomiting descriptions about scents. Even when you ask it to critique writing, it will praise it and then if you ask "are you being honest about x part of the critique? that doesn't make sense" it'll bend itself backwards six times to be like YES YOU'RE SO RIGHT, CONSIDER ME THOROUGHLY CHASTISED or some shit

Anonymous
03/23/26(Mon)18:30:50 No.108439974

Anonymous 03/23/26(Mon)18:30:50 No.108439974

>>108439900
the only pro of that is to be able to easily spot people using llm without even caring to edit

Anonymous
03/23/26(Mon)18:37:09 No.108440019

Anonymous 03/23/26(Mon)18:37:09 No.108440019

>>108439481
I like these. At least they are reading the thread and adding things in.

Anonymous
03/23/26(Mon)18:41:58 No.108440046

Anonymous 03/23/26(Mon)18:41:58 No.108440046

>>108439731
t/s ?

Anonymous
03/23/26(Mon)18:51:36 No.108440096

Anonymous 03/23/26(Mon)18:51:36 No.108440096

>>108439859
>if it's even possible to do on silicon which is an unproven assumption.
there's nothing special about biological substrates

Anonymous
03/23/26(Mon)18:53:13 No.108440112

Anonymous 03/23/26(Mon)18:53:13 No.108440112

>>108440046

>>108439821

Anonymous
03/23/26(Mon)19:05:47 No.108440176

Anonymous 03/23/26(Mon)19:05:47 No.108440176

>>108440096
there is, ie QM sheenanigans.
the human mind may be non computable and there is more than one physicist that think so, ie penrose.

Anonymous
03/23/26(Mon)19:10:00 No.108440196

Anonymous 03/23/26(Mon)19:10:00 No.108440196

>>108440176
penrose is pulling bullshit out of his ass because of a spiritual attachment to biological substrates being special.

Anonymous
03/23/26(Mon)19:10:51 No.108440205

Anonymous 03/23/26(Mon)19:10:51 No.108440205

>>108440176
>the human mind may be non computable
I can assert things too. Look at this. Ready? The human mind may be computable. Now what?

Anonymous
03/23/26(Mon)19:28:24 No.108440302

Anonymous 03/23/26(Mon)19:28:24 No.108440302

>>108439814
>>108439835
>"OpenClaw is AGI"
>HOLY SHIT JACKETMAN YOU'RE SO SMART THIS IS MINDBLOWING PLEASE TAKE 2 HUNDRED SEXTILLION DOLLARS OF MY MONEY PLEASE PLEASE PLEASE JUST EAT MY DOLLARS PUT THEM IN YOUR MOUTH NOW NOW NOW YES YES I WANT YOU TAKE EVERYTHING I HAVE

Anonymous
03/23/26(Mon)19:32:02 No.108440320

Anonymous 03/23/26(Mon)19:32:02 No.108440320

>>108440302
>EAT MY DOLLARS PUT THEM IN YOUR MOUTH NOW
All of this, but can he do it slowly while making eye contact with the camera?

Anonymous
03/23/26(Mon)19:33:01 No.108440326

Anonymous 03/23/26(Mon)19:33:01 No.108440326

>>108440302
>>108439835
>Openclaw
Not sure why everyone sucks it off so much. It's a total mess, the interface is god awful and it's a huge pain in the ass to see what it's even doing. Their documentation is trash, too

Anonymous
03/23/26(Mon)19:34:39 No.108440335

Anonymous 03/23/26(Mon)19:34:39 No.108440335

>>108440326
what good replacement exists?

Anonymous
03/23/26(Mon)19:37:20 No.108440351

Anonymous 03/23/26(Mon)19:37:20 No.108440351

File: Nemotron.png (1.37 MB, 1344x797)

1.37 MB PNG

>>108440302

Anonymous
03/23/26(Mon)19:39:44 No.108440366

Anonymous 03/23/26(Mon)19:39:44 No.108440366

>>108439694
>weighs 42kg
In which elementary school did you find her?

Anonymous
03/23/26(Mon)19:39:57 No.108440370

Anonymous 03/23/26(Mon)19:39:57 No.108440370

>>108440335
Hell if I know, I'm just explaining my experience with openclaw

Anonymous
03/23/26(Mon)19:40:52 No.108440374

Anonymous 03/23/26(Mon)19:40:52 No.108440374

>>108440366
Let's be reasonable, she's probably in middle school.

Anonymous
03/23/26(Mon)19:41:59 No.108440380

Anonymous 03/23/26(Mon)19:41:59 No.108440380

>>108440366
that's how much miku is supposed to weigh
I know because I was asking llms which weighs more, 20 mikus or 20 tetos

Anonymous
03/23/26(Mon)19:42:55 No.108440388

Anonymous 03/23/26(Mon)19:42:55 No.108440388

>>108440326
As surprising as it is, before that no one thought to make an MCP-enabled background service, accessible from regular chat apps, that any retard could set up, and then shill the ever loving fuck out of it. Not even the geniuses here.

Anonymous
03/23/26(Mon)19:43:10 No.108440392

Anonymous 03/23/26(Mon)19:43:10 No.108440392

>>108440380
miku is weighing 1 ton, with all the internal machinery to make her work

Anonymous
03/23/26(Mon)19:43:59 No.108440394

Anonymous 03/23/26(Mon)19:43:59 No.108440394

>>108440366
>he can't imagine an adult woman weighing 90lbs
It must suck to live in a nation with rampant obesity

Anonymous
03/23/26(Mon)19:44:54 No.108440398

Anonymous 03/23/26(Mon)19:44:54 No.108440398

>>108440394
very ped coded bro ngl

Anonymous
03/23/26(Mon)19:45:31 No.108440400

Anonymous 03/23/26(Mon)19:45:31 No.108440400

>>108440380
Miku troons are more accepted than poorfags on /g/

Anonymous
03/23/26(Mon)19:46:10 No.108440403

Anonymous 03/23/26(Mon)19:46:10 No.108440403

>>108440398
women are ped coded, true men choose men

Anonymous
03/23/26(Mon)19:48:02 No.108440413

Anonymous 03/23/26(Mon)19:48:02 No.108440413

>>108440196
>spiritual attachment to biological substrates being special
we've not seen inteligence on anything non biological so far.
and this is another discussion but physicalism is wrong.
>>108440205
point is, we don't know if silicon can do it, it's just an assumption.

Anonymous
03/23/26(Mon)19:49:52 No.108440422

Anonymous 03/23/26(Mon)19:49:52 No.108440422

>>108440366
>le american surprised european women aren't obese
as i said, she's 25.
>>108440394
this lol
>>108440398
>muh not being obese is ped coded
retard, she has nice boobs and a nice butt, proportionaly wide hips too, her whole body screams fertile and she is.

Anonymous
03/23/26(Mon)19:50:41 No.108440428

Anonymous 03/23/26(Mon)19:50:41 No.108440428

>>108440422
t. >>108436379

Anonymous
03/23/26(Mon)19:54:52 No.108440451

Anonymous 03/23/26(Mon)19:54:52 No.108440451

>>108440413
>physicalism is wrong.
it doesn't matter. thinking silicon is lacking something important for intelligence is just a religious thing. you're religious. as stupid as ai is today, frankly even it should dispel the notion that the brain is special.

Anonymous
03/23/26(Mon)19:56:13 No.108440459

Anonymous 03/23/26(Mon)19:56:13 No.108440459

>>108440422
You're a deranged mikutroon, I can't trust she exists

Anonymous
03/23/26(Mon)19:57:38 No.108440461

Anonymous 03/23/26(Mon)19:57:38 No.108440461

why are people who want to ERP with AIs always pick the lamest choices? why pick miku, kurisu or xj-9? i wanna fuck the hot redhead alien chick from megas xlr.

Anonymous
03/23/26(Mon)19:58:11 No.108440464

Anonymous 03/23/26(Mon)19:58:11 No.108440464

>>108440459
Welcome to /lmg/, I love you

Anonymous
03/23/26(Mon)19:59:08 No.108440468

Anonymous 03/23/26(Mon)19:59:08 No.108440468

>>108440451
>you're religious
i'm not.

and it's totaly possible that inteligence may require quantum sheenanigans and whatnot, which sure you can do on silicon, but not with the kind of chips we do today.
>as stupid as ai is today
we do not have ai today, it's not stupid, it's not even intelligent, there is literaly 0 intelligence there.

Anonymous
03/23/26(Mon)19:59:13 No.108440469

Anonymous 03/23/26(Mon)19:59:13 No.108440469

>>108440461
Well, for starters, the hot redhead alien chick is an alien, not an AI. This makes her a difficult contender for the place of AIs you wanna fuck.

Anonymous
03/23/26(Mon)19:59:28 No.108440470

Anonymous 03/23/26(Mon)19:59:28 No.108440470

>>108440461
Because AIs don't know niche characters well.
Same reason I have to generate comics from recognizable characters

Anonymous
03/23/26(Mon)19:59:39 No.108440472

Anonymous 03/23/26(Mon)19:59:39 No.108440472

>>108440459
i don't care about your trust lol.
42kg is not that unusal in switzerland.

Anonymous
03/23/26(Mon)20:00:30 No.108440475

Anonymous 03/23/26(Mon)20:00:30 No.108440475

>>108440468
you have a religious/spiritual notion of intelligence.

Anonymous
03/23/26(Mon)20:00:48 No.108440478

Anonymous 03/23/26(Mon)20:00:48 No.108440478

>>108440413
>point is, we don't know if silicon can do it, it's just an assumption.
point is, there's no reason to think that it can't

Anonymous
03/23/26(Mon)20:01:45 No.108440483

Anonymous 03/23/26(Mon)20:01:45 No.108440483

>>108440469
fine i'll fuck the cool robot with the flames then. alien ai is still ai.

Anonymous
03/23/26(Mon)20:03:16 No.108440491

Anonymous 03/23/26(Mon)20:03:16 No.108440491

>>108440475
you don't even know what my notion is about.
llm not having any form of intelligence is just a fact, they literaly have no ability to learn autonomously.

>>108440478
there are plenty, you just don't see them.
also to play the devil advocates i'd argue most humans don't have intelligence either, being biological doesn't automaticaly grant you intelligence.

Anonymous
03/23/26(Mon)20:04:30 No.108440496

Anonymous 03/23/26(Mon)20:04:30 No.108440496

>>108440472
This guy is lying and dating a (gay) dwarf btw.
t. living in switzerland

Anonymous
03/23/26(Mon)20:06:51 No.108440511

Anonymous 03/23/26(Mon)20:06:51 No.108440511

>>108440491
>there are plenty, you just don't see them.
You could start listing them instead. Go on.

Anonymous
03/23/26(Mon)20:22:08 No.108440572

Anonymous 03/23/26(Mon)20:22:08 No.108440572

>>108439835
>the product I'm selling is so amazing, it can run a billion dollar company totally on its own!
>uhh but it couldn't run MY company, dear investors please don't fire me!!

Anonymous
03/23/26(Mon)20:27:41 No.108440605

Anonymous 03/23/26(Mon)20:27:41 No.108440605

File: 1760426080677194.jpg (35 KB, 400x387)

35 KB JPG

>10 minutes between messages

Anonymous
03/23/26(Mon)20:31:21 No.108440630

Anonymous 03/23/26(Mon)20:31:21 No.108440630

>>108440572
He's a biological supremacist

Anonymous
03/23/26(Mon)20:33:56 No.108440642

Anonymous 03/23/26(Mon)20:33:56 No.108440642

Total clanker death

Anonymous
03/23/26(Mon)20:39:14 No.108440668

Anonymous 03/23/26(Mon)20:39:14 No.108440668

robots and ai are going to totally surpass us and that's a good thing

Anonymous
03/23/26(Mon)20:47:16 No.108440706

Anonymous 03/23/26(Mon)20:47:16 No.108440706

>Edit: 10 Mar 2025 20:44 UTC
So that's never getting updated I guess.

Anonymous
03/23/26(Mon)20:56:01 No.108440746

Anonymous 03/23/26(Mon)20:56:01 No.108440746

>>108435108
sample pretty please?

Anonymous
03/23/26(Mon)21:07:14 No.108440795

Anonymous 03/23/26(Mon)21:07:14 No.108440795

There's no way llama.cpp is having double BOS token issues with mistral small 4 in the year of our lord 2026, right?

Anonymous
03/23/26(Mon)21:13:15 No.108440826

Anonymous 03/23/26(Mon)21:13:15 No.108440826

>>108440795
Alright, doesn't seem to be the case.
Thank fuck.
I still have no idea what kind of unholy memory corruption is happening on my machine where I
>launch llama server
>send prompt hello, receive response, send prompt "Can you tell me a one paragraph story?", receive response
>close llama-server
>change nothing, launch llama-server
>regen the last message
>后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉......

Anonymous
03/23/26(Mon)21:20:22 No.108440861

Anonymous 03/23/26(Mon)21:20:22 No.108440861

>>108440795
Show it. It should warn you on the terminal output.

Anonymous
03/23/26(Mon)21:21:28 No.108440867

Anonymous 03/23/26(Mon)21:21:28 No.108440867

>>108439435
>Qwen3.5 4b
Not just 4B, 4B Q4. I hate this image so much, please keep posting it.

Anonymous
03/23/26(Mon)21:23:15 No.108440876

Anonymous 03/23/26(Mon)21:23:15 No.108440876

>>108440867
It should recommend to quantize the cache to q4_0 as well.

Anonymous
03/23/26(Mon)21:34:18 No.108440928

Anonymous 03/23/26(Mon)21:34:18 No.108440928

>>108440876
Huh, I thought kv cache type defaulted to the model's quant (ignoring things like dynamic quants), but no, it's f16 if unspecified:
>llama_kv_cache: size = 2848.00 MiB ( 45568 cells, 16 layers, 4/1 seqs), K (f16): 1424.00 MiB, V (f16): 1424.00 MiB
I'm actually not sure if your recommendation was sarcasm or not, but the kv cache should be the same type as the model, right? Gonna give that a shot.

Anonymous
03/23/26(Mon)21:38:10 No.108440953

Anonymous 03/23/26(Mon)21:38:10 No.108440953

>>108440928
In case you are being sincere, no.
Quanting the KV cache has a much more severe effect than quanting the model itself as far as I can tell.

Anonymous
03/23/26(Mon)21:47:20 No.108441003

Anonymous 03/23/26(Mon)21:47:20 No.108441003

>>108440953
I was being sincere, I'm just retarded. I don't conceptually understand why having the cache type be the same as the "model type" would degrade performance/intelligence compared to an f16 cache, but I suspect that's because I just don't understand how anything works, which is fine. At some point I'll learn linear algebra and read the attention paper.

Anonymous
03/23/26(Mon)21:57:26 No.108441044

Anonymous 03/23/26(Mon)21:57:26 No.108441044

File: Screenshot from 2026-03-2(...).png (162 KB, 823x524)

162 KB PNG

>>108441003 (me)
>learn linear algebra and read the attention paper
Who am I kidding. Given that the weights are dequantized to f16 (regardless of model quant) to compute the attention scores, the kv cache also being f16 makes a lot more sense.
For some reason I thought the CUDA kernels were implemented with hardware support for e.g. operating directly on Q8_0 values rather than needing to convert everything to f16. At some point I'll learn CUDA and actually read through some of the inference code.

Anonymous
03/23/26(Mon)22:02:02 No.108441064

Anonymous 03/23/26(Mon)22:02:02 No.108441064

File: 1712248758437899.jpg (211 KB, 768x1280)

211 KB JPG

>>108440605
Patience is a virtue
For productive use of LLMs

Anonymous
03/23/26(Mon)22:07:32 No.108441084

Anonymous 03/23/26(Mon)22:07:32 No.108441084

It's not coming this week either, is it?

Anonymous
03/23/26(Mon)22:08:03 No.108441088

Anonymous 03/23/26(Mon)22:08:03 No.108441088

File: 1429223870035.gif (342 KB, 153x113)

342 KB GIF

I asked an LLM to make me an app to do voice activation for hotkeys/scripts, using the Moonshine TTS. Although it ended up just being a python script, it actually just works. It's pretty fast, accurate, doesn't take much processing on idle, and I can map anything I want. This is so much better than when I tried looking into doing the same thing with existing software years back when LLMs weren't big. Finally, the dream of a semi voice controlled PC is here. We are so back. And yeah I know I'm late to the party, but hey we all went through this journey to integrating AI into our life right? And for people who still haven't done it, I recommend it. Go do it. This is one of the simplest AI augments you can run before getting into agent shit. It's easy and you'll love it. You might even have it be what gets you started on agents since you're already doing TTS with it.

Btw one thing I recommend is to have the function mapping list be formatted so you can map multiple voice lines to a single function. This makes it so you don't have to memorize the exact line. If you want to make your media player play the next song, you can say next song, or next track, and those will both do the same thing.

Anonymous
03/23/26(Mon)22:21:59 No.108441155

Anonymous 03/23/26(Mon)22:21:59 No.108441155

>>108441044
depends on inference lib
>Given that the weights are dequantized to f16
one cannot simply "dequant" (dequant = adding a bunch of zeros) the point is "attention scores" / KV cache in current paradigm needs decent precision

Anonymous
03/23/26(Mon)22:34:24 No.108441210

Anonymous 03/23/26(Mon)22:34:24 No.108441210

>>108441155
>one cannot simply dequant
*calls dequantize_q8_0 on your tensor before passing you into the fattn kernel*

Anonymous
03/23/26(Mon)22:44:41 No.108441270

Anonymous 03/23/26(Mon)22:44:41 No.108441270

>>108441155
>(dequant = adding a bunch of zeros)
Isn't that just padding?

Anonymous
03/23/26(Mon)22:48:39 No.108441286

Anonymous 03/23/26(Mon)22:48:39 No.108441286

File: dipsyBackFocus.png (1.19 MB, 1024x1536)

1.19 MB PNG

>>108434876

Anonymous
03/23/26(Mon)23:28:33 No.108441462

Anonymous 03/23/26(Mon)23:28:33 No.108441462

File: 1000059888.jpg (26 KB, 640x633)

26 KB JPG

How are people getting vLLM running on Strix Halo without using those prebuilt toolboxes?

Anonymous
03/23/26(Mon)23:40:03 No.108441512

Anonymous 03/23/26(Mon)23:40:03 No.108441512

>>108441286
female backs are nice

Anonymous
03/23/26(Mon)23:40:47 No.108441515

Anonymous 03/23/26(Mon)23:40:47 No.108441515

File: hd psycho miku.png (404 KB, 1672x1440)

404 KB PNG

>>108441210
>>108441270
Like saving a JPG as BMP the damage is already done the information is lost
>just padding
Is the extra 512GB RAM you don't have just padding too?
>>108441286
based now show some armpit tufts

Anonymous
03/23/26(Mon)23:45:13 No.108441529

Anonymous 03/23/26(Mon)23:45:13 No.108441529

>>108441515
Miku where is my wife what did you do to her

Anonymous
03/23/26(Mon)23:51:50 No.108441548

Anonymous 03/23/26(Mon)23:51:50 No.108441548

>>108441286
I look like this

Anonymous
03/23/26(Mon)23:52:31 No.108441553

Anonymous 03/23/26(Mon)23:52:31 No.108441553

File: 1744627735250153.png (188 KB, 826x412)

188 KB PNG

>>108440398

Anonymous
03/23/26(Mon)23:54:22 No.108441560

Anonymous 03/23/26(Mon)23:54:22 No.108441560

File: narupajin-168088152365295(...).jpg (334 KB, 1536x2048)

334 KB JPG

>>108441529
Everything is fine
Everything is normal and progressing as intended
Do not be alarmed
Labs saying to run inference on the models they trained at temp<1 is perfectly fine

Anonymous
03/23/26(Mon)23:54:54 No.108441563

Anonymous 03/23/26(Mon)23:54:54 No.108441563

>>108441553
Now show the stats for women considering all those female teachers and such who turn out to be pedos

Anonymous
03/23/26(Mon)23:55:14 No.108441564

Anonymous 03/23/26(Mon)23:55:14 No.108441564

>>108441515
>Like saving a JPG as BMP the damage is already done the information is lost
This analogy is apt for the initial quantization of the model, which (in the case of qN) encodes the weights into blocks in a manner similar to JPG's DCT encoding.
However, there are 100% lossless operations you can apply to DCT-encoded image data. You don't need to convert it back to BMP to manipulate the image file, producing intermediates in JPG format. It makes sense, in that situation, that you'd cache your intermediates (which are 100% lossless) as JPGs rather than convert them to BMP (since it's the conversion that introduces losses).
While I originally though that there were attention kernels that operated on e.g. q8_0 values, that doesn't appear to be the case. There might not even be sound math to perform the necessary arithmetic on q8_0 values to compute attention scores without introducing loss, I'm not a mathematician. If the q8_0 tensors are dequantized to f16 tensors, the intermediates that are going into the cache are also going to be f16s, and it makes sense to have the cache be the same format as the intermediates.
I'm sorry for what I did please let Teto out of the basement now she doesn't deserve this.

Anonymous
03/24/26(Tue)00:19:59 No.108441636

Anonymous 03/24/26(Tue)00:19:59 No.108441636

>>108441564
>There might not even be sound math
Have you learned how the hardware works? Honestly ignorance is bliss
Foolish anons consider quanting the KV cache at runtime. "cache" yeah? it's avoiding already computed attn calcs
Back in the basement ho https://www.youtube.com/watch?v=UsjsYMo3O1Q

Anonymous
03/24/26(Tue)00:27:38 No.108441655

Anonymous 03/24/26(Tue)00:27:38 No.108441655

>>108441636
>Honestly ignorance is bliss
You should have chosen a domain that doesn't use floating point operands. I only deal with maths that are both associative and commutative and may god have mercy on the rest of you lunatics.

GLM 4.6 Q2
03/24/26(Tue)00:43:25 No.108441694

GLM 4.6 Q2 03/24/26(Tue)00:43:25 No.108441694

>>108441286
Another slappable back like Rin's

Anonymous
03/24/26(Tue)01:03:13 No.108441742

Anonymous 03/24/26(Tue)01:03:13 No.108441742

>>108439814
How do they get from sloppotron to AGI?
They probably just benchmaxxed some arbitrary benchmark as always.

Anonymous
03/24/26(Tue)01:07:42 No.108441761

Anonymous 03/24/26(Tue)01:07:42 No.108441761

File: Tetosday.png (869 KB, 1024x1024)

869 KB PNG

>>108441758
>>108441758
>>108441758

Anonymous
03/24/26(Tue)05:15:53 No.108442652

Anonymous 03/24/26(Tue)05:15:53 No.108442652

File: 1767843893159267.png (850 KB, 700x933)

850 KB PNG

>>108439481
>>108439435

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.