/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 03/27/26(Fri)11:03:46 No.108466262

/lmg/ - Local Models General Anonymous 03/27/26(Fri)11:03:46 No.108466262 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108459276 & >>108453570

►News
>(03/26) CohereLabs releases Transcribe 2B ASR: https://hf.co/CohereLabs/cohere-transcribe-03-2026
>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts
>(03/26) ggml-cuda: Add NVFP4 dp4a kernel #20644 merged: https://github.com/ggml-org/llama.cpp/pull/20644
>(03/25) LongCat-Next native multimodal 74B-A3B released: https://hf.co/meituan-longcat/LongCat-Next
>(03/25) mtmd: Add DeepSeekOCR Support #17400 merged: https://github.com/ggml-org/llama.cpp/pull/17400

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/27/26(Fri)11:04:03 No.108466266

Anonymous 03/27/26(Fri)11:04:03 No.108466266

File: v4.jpg (112 KB, 1024x1024)

112 KB JPG

►Recent Highlights from the Previous Thread: >>108459276

--Voxtral TTS release and initial impressions:
>108459652 >108459758 >108459766 >108459836 >108459844 >108459888 >108459902 >108461249 >108462139 >108459995 >108460450 >108460456
--Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models:
>108464620 >108464631 >108464656 >108464739 >108464755 >108464808 >108464836 >108464856 >108464905 >108464929 >108465010 >108465032 >108465028 >108465052 >108465072 >108465094 >108465163 >108465186 >108465191 >108464865 >108464644 >108464662 >108464680 >108464697 >108464682 >108464701 >108464750 >108464764 >108464803 >108464862 >108465009 >108465026 >108465073 >108465080 >108465139 >108465432 >108465850
--KLD heatmaps reveal hidden degradation in aggressive KV cache quantization:
>108463990 >108464591 >108464625 >108464635 >108464627
--Mistral releases open-source Voxtral TTS:
>108459318 >108459428 >108459525 >108459563 >108461953 >108461978 >108462002 >108462064 >108462078 >108462107 >108462503
--GPU coil whine interferes with guitar amp, TTS voice cloning comparisons:
>108460183 >108460208 >108460218 >108460232 >108460247 >108460881 >108460901 >108460910 >108461004 >108460928 >108460947 >108460975 >108461005 >108461025 >108461047 >108461080 >108462135 >108462264 >108462944 >108462961 >108462982 >108463064
--Models handling verbatim lyric requests differently due to alignment:
>108464911
--Evaluating TTS demo quality:
>108459914 >108459947 >108459956 >108460605 >108460650
--Chroma Context-1 model released without harness:
>108463927 >108463946
--Z.ai 5.1 open-source release expected early April:
>108465382 >108465454 >108465751
--Qwen3-TTS and VibeVoice resources:
>108459560
--Miku and friends (free space):
>108460212 >108462728 >108462782 >108465571 >108460736 >108461211 >108461256 >108461280

►Recent Highlight Posts from the Previous Thread: >>108459279

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/27/26(Fri)11:06:23 No.108466278

Anonymous 03/27/26(Fri)11:06:23 No.108466278

>>108466262
Blessed small saggers thread.

Anonymous
03/27/26(Fri)11:07:26 No.108466289

Anonymous 03/27/26(Fri)11:07:26 No.108466289

>>108466278
>small

Anonymous
03/27/26(Fri)11:07:33 No.108466290

Anonymous 03/27/26(Fri)11:07:33 No.108466290

>>108466262
Whatever you do, DONT read any manga drawn by Ankoman.

Anonymous
03/27/26(Fri)11:09:30 No.108466298

Anonymous 03/27/26(Fri)11:09:30 No.108466298

>>108466262
Saggy tits eewww

Anonymous
03/27/26(Fri)11:11:31 No.108466313

Anonymous 03/27/26(Fri)11:11:31 No.108466313

>>108466254

>Make it make sense anon. They are directly influencing how AI functions
Apple is to incompetent to shit ANY useful model anon..... their influence is non-existant

Anonymous
03/27/26(Fri)11:12:39 No.108466321

Anonymous 03/27/26(Fri)11:12:39 No.108466321

File: file.jpg (410 KB, 1408x768)

410 KB JPG

>>108466254
>that one can't make fun of in a meme format on AI image gen platforms with the Apple logo.
???

Anonymous
03/27/26(Fri)11:13:27 No.108466327

Anonymous 03/27/26(Fri)11:13:27 No.108466327

File: Deepseek is coming.png (327 KB, 1080x880)

327 KB PNG

>>108466262
>that image
Deepseek is coming

Anonymous
03/27/26(Fri)11:13:55 No.108466330

Anonymous 03/27/26(Fri)11:13:55 No.108466330

The WHALE

Anonymous
03/27/26(Fri)11:14:43 No.108466335

Anonymous 03/27/26(Fri)11:14:43 No.108466335

>>108466327
>deepseek is coming
NO, I am

Anonymous
03/27/26(Fri)11:15:00 No.108466337

Anonymous 03/27/26(Fri)11:15:00 No.108466337

>>108466321
There are certain platforms based in silicon valley that prevent anti-Apple shenanigans.

Anonymous
03/27/26(Fri)11:15:42 No.108466343

Anonymous 03/27/26(Fri)11:15:42 No.108466343

>>108466327
a lot larger but we can run models six times larger than we used to thanks to Google Turboquant™

Anonymous
03/27/26(Fri)11:16:53 No.108466352

Anonymous 03/27/26(Fri)11:16:53 No.108466352

File: file.jpg (505 KB, 1024x1536)

505 KB JPG

>>108466337
previous was gemini, now here's chatgpt on the same prompt
there's no conspiracy against shitting on apple in your SaaS prompts (or LM studio or whatever else), you need your meds

Anonymous
03/27/26(Fri)11:19:18 No.108466371

Anonymous 03/27/26(Fri)11:19:18 No.108466371

>>108466352
>>108466321
You need to fuck off to one of the image gen generals.

Anonymous
03/27/26(Fri)11:21:13 No.108466378

Anonymous 03/27/26(Fri)11:21:13 No.108466378

>>108466371
you need to fuck off to leddit with your amazing reading comprehension

Anonymous
03/27/26(Fri)11:22:21 No.108466383

Anonymous 03/27/26(Fri)11:22:21 No.108466383

>>108466378
You've been posting your retarded image slop with the same character for days and it's not even local.

Anonymous
03/27/26(Fri)11:23:42 No.108466393

Anonymous 03/27/26(Fri)11:23:42 No.108466393

>>108466383
>You
I am not the guy you're referring to, I just decided to reuse that character since I found it funny. Try again, retard.

Anonymous
03/27/26(Fri)11:24:09 No.108466397

Anonymous 03/27/26(Fri)11:24:09 No.108466397

I hate that I have to build my own 'FETCH' mcp tool because the autists have made it respect the robots.txt

Anonymous
03/27/26(Fri)11:24:42 No.108466400

Anonymous 03/27/26(Fri)11:24:42 No.108466400

>>108466393
That irrelevant because your posts are still shit.

Anonymous
03/27/26(Fri)11:25:36 No.108466405

Anonymous 03/27/26(Fri)11:25:36 No.108466405

>>108466397
>the autists have made it respect the robots.txt
you are a blight upon the earth

Anonymous
03/27/26(Fri)11:26:26 No.108466410

Anonymous 03/27/26(Fri)11:26:26 No.108466410

qwen 122b-a10b changed my life

Anonymous
03/27/26(Fri)11:26:35 No.108466412

Anonymous 03/27/26(Fri)11:26:35 No.108466412

Tetonation incoming... I am sensing Gemma 4 release before Easter.

Anonymous
03/27/26(Fri)11:26:45 No.108466413

Anonymous 03/27/26(Fri)11:26:45 No.108466413

>>108466410
For the worse?

Anonymous
03/27/26(Fri)11:26:58 No.108466415

Anonymous 03/27/26(Fri)11:26:58 No.108466415

>>108466397
Why build one from scratch instead of forking the existing one?
https://github.com/modelcontextprotocol/servers/blob/main/src/fetch/README.md

Anonymous
03/27/26(Fri)11:27:38 No.108466420

Anonymous 03/27/26(Fri)11:27:38 No.108466420

>>108466415
I forked this already of course
>>108466405
im gonna rape your website, retard

Anonymous
03/27/26(Fri)11:27:52 No.108466423

Anonymous 03/27/26(Fri)11:27:52 No.108466423

Something will probably happen at some point. Or not. That's my prediction. Screenshot this post.

Anonymous
03/27/26(Fri)11:27:53 No.108466424

Anonymous 03/27/26(Fri)11:27:53 No.108466424

>>108466410
How many times did your ego die?

Anonymous
03/27/26(Fri)11:28:14 No.108466427

Anonymous 03/27/26(Fri)11:28:14 No.108466427

>>108466413
for the better, I've been pasting giant research papers into it and asking it to explain it in simple terms so I can actually implement shit without having a PHD. It's going great. I've finally found a good balance between trying being an AI luddite and letting my brain atrophy and expecting AI to do all thinking for me.

Anonymous
03/27/26(Fri)11:28:44 No.108466432

Anonymous 03/27/26(Fri)11:28:44 No.108466432

File: 1763977021694160.png (47 KB, 691x528)

47 KB PNG

>>108466415
bro...

Anonymous
03/27/26(Fri)11:29:03 No.108466433

Anonymous 03/27/26(Fri)11:29:03 No.108466433

>>108466423
nostrildamous over here with the takes

Anonymous
03/27/26(Fri)11:29:46 No.108466437

Anonymous 03/27/26(Fri)11:29:46 No.108466437

>>108466433
I knew you would say that.

Anonymous
03/27/26(Fri)11:31:17 No.108466446

Anonymous 03/27/26(Fri)11:31:17 No.108466446

>>108466400
let me guess, iToddler?

Anonymous
03/27/26(Fri)11:34:29 No.108466469

Anonymous 03/27/26(Fri)11:34:29 No.108466469

Is $400 for a v620 okay?

The listing said 16gb, but amd never made any 16gb v620s, did they? About as strong as a 6800xt....

Anonymous
03/27/26(Fri)11:37:09 No.108466493

Anonymous 03/27/26(Fri)11:37:09 No.108466493

>>108466427
Using AI for inspiration did wonders for me but i fucked up letting it write code without going through the steps and thinking about it critically. I feel like im a worse programmer as a result.

Anonymous
03/27/26(Fri)11:37:48 No.108466496

Anonymous 03/27/26(Fri)11:37:48 No.108466496

>>108466432
https://github.com/modelcontextprotocol/servers/blob/main/src/fetch/README.md#customization---robotstxt
it's right in the instructions too, but vibecoding is the solution to everything now fuck reading

Anonymous
03/27/26(Fri)11:41:00 No.108466515

Anonymous 03/27/26(Fri)11:41:00 No.108466515

>>108466496
>reading doc instead of source
I'm not a nocode shitter sorry :(

Anonymous
03/27/26(Fri)11:42:23 No.108466522

Anonymous 03/27/26(Fri)11:42:23 No.108466522

>>108466262
wowzers, how do i make my own goonbait with a local model? im on an asus ultrabook with no GPU btw, just intel graphics

Anonymous
03/27/26(Fri)11:43:34 No.108466534

Anonymous 03/27/26(Fri)11:43:34 No.108466534

it isnt this, it is this
they arent that, they are that

Anonymous
03/27/26(Fri)11:45:52 No.108466547

Anonymous 03/27/26(Fri)11:45:52 No.108466547

Do venvs use system cuda or is the cuda toolkit packed somewhere inside it?

Anonymous
03/27/26(Fri)11:46:46 No.108466552

Anonymous 03/27/26(Fri)11:46:46 No.108466552

ggnigeranov TURBOQUANT KV + WEIGHTS SUPPORT WHEN!?!?!?!

Anonymous
03/27/26(Fri)11:47:14 No.108466554

Anonymous 03/27/26(Fri)11:47:14 No.108466554

>>108466469
Don't know about amd cards, sry.
16gb nvidia card can be had on ebay for $450-$500.

Anonymous
03/27/26(Fri)11:49:11 No.108466565

Anonymous 03/27/26(Fri)11:49:11 No.108466565

So what happened? Youtube video about turbo quant?

Anonymous
03/27/26(Fri)11:49:17 No.108466567

Anonymous 03/27/26(Fri)11:49:17 No.108466567

>>108466547
i think maybe no by default. i had to use --no-build-isolation when i compiled flash attention.

Anonymous
03/27/26(Fri)11:51:20 No.108466583

Anonymous 03/27/26(Fri)11:51:20 No.108466583

File: ram.png (269 KB, 531x435)

269 KB PNG

I wonder if this is another nothingburger to raise the company stock, or an actual advancement that can benefit local AI.

Anonymous
03/27/26(Fri)11:51:46 No.108466588

Anonymous 03/27/26(Fri)11:51:46 No.108466588

File: 1763613215750901.png (60 KB, 907x351)

60 KB PNG

this guy has a 'living rent free in your head' problem lmao

Anonymous
03/27/26(Fri)11:52:17 No.108466593

Anonymous 03/27/26(Fri)11:52:17 No.108466593

File: kek.png (102 KB, 778x646)

102 KB PNG

https://www.reddit.com/r/LocalLLaMA/comments/1s56q9g/new_unsloth_studio_release/

Anonymous
03/27/26(Fri)11:54:19 No.108466600

Anonymous 03/27/26(Fri)11:54:19 No.108466600

>>108466588
Would IK really have been able to "independently discover" Hadamard transforms without the easy-to-follow implementation in ExLlama?

Anonymous
03/27/26(Fri)11:54:29 No.108466602

Anonymous 03/27/26(Fri)11:54:29 No.108466602

>>108466588
That's called having an inferiority complex.

Anonymous
03/27/26(Fri)11:54:33 No.108466604

Anonymous 03/27/26(Fri)11:54:33 No.108466604

File: 2026-03-01-163613_1044x17(...).png (496 KB, 1044x1782)

496 KB PNG

>>108466593
What a bunch of hacks

Anonymous
03/27/26(Fri)11:54:39 No.108466606

Anonymous 03/27/26(Fri)11:54:39 No.108466606

>>108466588
I would probably end using his fork if he wasn't such a fucking baby.
Or maybe he would still be contributing to mainline if he wasn't.

Anonymous
03/27/26(Fri)11:54:45 No.108466608

Anonymous 03/27/26(Fri)11:54:45 No.108466608

>>108466554
6800xt (500gb/s) is about the same performance as a 5060 ti (450gb/s), according to random benchmarks on the internet. So basically a 32gb 5060 ti without cuda for $400.

Anonymous
03/27/26(Fri)11:55:24 No.108466611

Anonymous 03/27/26(Fri)11:55:24 No.108466611

>>108466600
goated reference there my friend.
also checked
will cudadev be able to EVER recover, pipeline paralellism bros???

Anonymous
03/27/26(Fri)11:56:33 No.108466618

Anonymous 03/27/26(Fri)11:56:33 No.108466618

File: 1774361027986.png (11 KB, 481x77)

11 KB PNG

>>108466604
nah they're fine, weird they didn't mention this in the release notes though...

Anonymous
03/27/26(Fri)11:57:22 No.108466622

Anonymous 03/27/26(Fri)11:57:22 No.108466622

>>108466588
Everything is a personal affront to him. Very unstable.

Anonymous
03/27/26(Fri)11:59:33 No.108466632

Anonymous 03/27/26(Fri)11:59:33 No.108466632

>>108466547
System cuda is referring to cuda compiler which is totally different from pytorch.
Venv are using pytorch version what you have installed. Pytorch interfaces with your graphics drivers.

Anonymous
03/27/26(Fri)12:01:37 No.108466645

Anonymous 03/27/26(Fri)12:01:37 No.108466645

>>108466632
Yes but the package is cu128 but I have cuda 13. It still runs but for any potential issues I want to know if this is something worth looking into in the future.

Anonymous
03/27/26(Fri)12:05:17 No.108466657

Anonymous 03/27/26(Fri)12:05:17 No.108466657

>>108466645
doesnt matter bro, cuda is made to BUILD, that shit will run in your DRIVERS for fucks sake you stupid mongoloid.

Anonymous
03/27/26(Fri)12:09:33 No.108466680

Anonymous 03/27/26(Fri)12:09:33 No.108466680

>>108466645
You will only run into problems if your graphics drivers don't support the pytorch version. In any case don't worry about it as long as pytorch is recent enough.

Anonymous
03/27/26(Fri)12:10:13 No.108466686

Anonymous 03/27/26(Fri)12:10:13 No.108466686

Why can't they, like, just stick a tb of vram on a single gpu?

Anonymous
03/27/26(Fri)12:10:53 No.108466690

Anonymous 03/27/26(Fri)12:10:53 No.108466690

>>108466686
that would be antisemitic. think of the poor shareholders!

Anonymous
03/27/26(Fri)12:11:44 No.108466692

Anonymous 03/27/26(Fri)12:11:44 No.108466692

>>108466686
You wouldn't pay their asking price.

Anonymous
03/27/26(Fri)12:13:03 No.108466704

Anonymous 03/27/26(Fri)12:13:03 No.108466704

>>108466690
>>108466692
I'm a shareholder. Where's my special nvidia discount?????

Anonymous
03/27/26(Fri)12:19:09 No.108466732

Anonymous 03/27/26(Fri)12:19:09 No.108466732

>>108466583
This is so retarded. It's just like a few months ago when Gemini 3 released and day traders were spamming Nvidia was done because Google had just developed a GPU-killer called T-P-U.

Anonymous
03/27/26(Fri)12:20:34 No.108466741

Anonymous 03/27/26(Fri)12:20:34 No.108466741

>>108466657
thanks mongoloid lover anon

Anonymous
03/27/26(Fri)12:23:02 No.108466756

Anonymous 03/27/26(Fri)12:23:02 No.108466756

File: 1758190134007445.gif (562 KB, 200x200)

562 KB GIF

>>108466593
These uncsloth and redditards are really a match in heaven

Anonymous
03/27/26(Fri)12:23:07 No.108466757

Anonymous 03/27/26(Fri)12:23:07 No.108466757

>>108466604
maybe but no one else seems to properly do various quant gguf for everything like they do

Anonymous
03/27/26(Fri)12:25:03 No.108466764

Anonymous 03/27/26(Fri)12:25:03 No.108466764

>>108466757
>properly do
LMAO

Anonymous
03/27/26(Fri)12:27:19 No.108466771

Anonymous 03/27/26(Fri)12:27:19 No.108466771

>>108466764
I don't get the Unsloth hate. Isn't that two dudes doing work that 99.999% percent of people couldn't do? They're contributing to democratize llm finetuning more than anyone else. In the talk that he's given, Daniel Han also seems a bit high strung, but competent.

Anonymous
03/27/26(Fri)12:28:01 No.108466777

Anonymous 03/27/26(Fri)12:28:01 No.108466777

>>108466764
[various quant]<-(properly do)
or
[gguf]<-(properly do)

how interpret?

Anonymous
03/27/26(Fri)12:28:28 No.108466779

Anonymous 03/27/26(Fri)12:28:28 No.108466779

>>108466771
daniel why the fuck are you posting here? go be clueless about your shit somewhere else.

Anonymous
03/27/26(Fri)12:30:38 No.108466792

Anonymous 03/27/26(Fri)12:30:38 No.108466792

>>108466771
>couldn't do
Don't *want* to. You don't need to load the full model to quant it.

Anonymous
03/27/26(Fri)12:32:14 No.108466802

Anonymous 03/27/26(Fri)12:32:14 No.108466802

>>108466779
Are you doing training at all or just using and RPing? Unsloth is an easy to use library, it is helping me. Maybe I haven't found something better yet, but I've tried Axolotl, Llama-factory and just Transformers, as well as plain Pytorch, and Unsloth is simple, the notebooks are great to show what do to do, etc.

Anonymous
03/27/26(Fri)12:33:52 No.108466809

Anonymous 03/27/26(Fri)12:33:52 No.108466809

>>108466792
I was talking in general with Unsloth, just not gee-gee-u-huff-ing the mdoels.

Anonymous
03/27/26(Fri)12:34:45 No.108466812

Anonymous 03/27/26(Fri)12:34:45 No.108466812

>>108466809
not just guh-guffing the model I meant, can't spell today.

Anonymous
03/27/26(Fri)12:35:46 No.108466818

Anonymous 03/27/26(Fri)12:35:46 No.108466818

>>108466802
>the notebooks
Jupyter is the worse fucking trash ever invented.

Anonymous
03/27/26(Fri)12:36:54 No.108466826

Anonymous 03/27/26(Fri)12:36:54 No.108466826

>>108466818
Yes we should all code in emacs.

Anonymous
03/27/26(Fri)12:37:49 No.108466829

Anonymous 03/27/26(Fri)12:37:49 No.108466829

>>108466809
They can't be trusted to make a quant without reuploading it 20 times. I wouldn't trust them with training.

Anonymous
03/27/26(Fri)12:38:10 No.108466835

Anonymous 03/27/26(Fri)12:38:10 No.108466835

File: 1753574759173290.gif (140 KB, 379x440)

140 KB GIF

>>108466802
NTA, but I did some training and Unslop paywalling their multiGPU feature make it almost useless if you're not finetuning GPT2. Axolotl is way more flexible, has actual competent devs you can talk to. Stop being retarded.

Anonymous
03/27/26(Fri)12:39:21 No.108466845

Anonymous 03/27/26(Fri)12:39:21 No.108466845

>>108466835
It's not paywalled (anymore at least). I just did on 2x and 4x GPUs, launching the script with accelerate, and it all went super smoothly.

llama.cpp CUDA dev !!yhbFjk57TDr
03/27/26(Fri)12:39:54 No.108466852

llama.cpp CUDA dev !!yhbFjk57TDr 03/27/26(Fri)12:39:54 No.108466852

>>108466826
I write all of my code with Doom Emacs and a German keyboard layout.

Anonymous
03/27/26(Fri)12:40:17 No.108466854

Anonymous 03/27/26(Fri)12:40:17 No.108466854

>>108466826
I miss vim but after I tried to finish a rewrite of my project I just can't. It always somehow pastes wrong and all these little things like when you hit escape (which is my capslock) cursor jumps back one character.. so fucking irritating. My .vimrc is pretty long and I have used it for years now.
I can't recommend vim to anyone unless you are working over a terminal I guess.
Emacs isn't that much better.
These software are used by autists because they don't know what ergonomics means and don't mind to press 4 different keys to get a simple functionality.

Anonymous
03/27/26(Fri)12:40:56 No.108466857

Anonymous 03/27/26(Fri)12:40:56 No.108466857

>>108466852
nta. I use vim, but based anyway. You're too cool.

Anonymous
03/27/26(Fri)12:42:07 No.108466863

Anonymous 03/27/26(Fri)12:42:07 No.108466863

>>108466854
To add: and then there is the sunk cost fallacy. Just because you have used something for years doesn't mean you can't ditch it and get something better.

Anonymous
03/27/26(Fri)12:44:39 No.108466872

Anonymous 03/27/26(Fri)12:44:39 No.108466872

>>108466863
I use vim when I log into something through the terminal, but locally I use VSCode and notebooks. I am trying to bring myself to switch to marimo to vibecode-maxx since current LLMs have difficulty working with Jupyter, just haven't had the will to yet.

Anonymous
03/27/26(Fri)12:51:30 No.108466910

Anonymous 03/27/26(Fri)12:51:30 No.108466910

>>108466872
I mean writing should be something what is intuitive and not behind multiple keystrokes and guesswork will my selection paste correctly or not.

Anonymous
03/27/26(Fri)12:52:04 No.108466917

Anonymous 03/27/26(Fri)12:52:04 No.108466917

>https://github.com/ggml-org/llama.cpp/pull/21074
BLACKWELL BROS
WE WON!

Anonymous
03/27/26(Fri)12:53:46 No.108466930

Anonymous 03/27/26(Fri)12:53:46 No.108466930

File: AI.png (110 KB, 533x433)

110 KB PNG

Has the era of TURBO-QUANT started?

Anonymous
03/27/26(Fri)12:54:20 No.108466936

Anonymous 03/27/26(Fri)12:54:20 No.108466936

>>108466930
>>108466732

Anonymous
03/27/26(Fri)12:55:06 No.108466941

Anonymous 03/27/26(Fri)12:55:06 No.108466941

>>108466930
literal fake news lmao

Anonymous
03/27/26(Fri)12:57:07 No.108466959

Anonymous 03/27/26(Fri)12:57:07 No.108466959

>>108466930
>>108466941
AI-pushed fake to get the normies and luddites off their back

Anonymous
03/27/26(Fri)12:58:05 No.108466965

Anonymous 03/27/26(Fri)12:58:05 No.108466965

Additionally, whatever memory savings this will net, will be immediatelly deleted by vibecoded pajeetware bloatmaxxing .

Anonymous
03/27/26(Fri)12:58:25 No.108466966

Anonymous 03/27/26(Fri)12:58:25 No.108466966

>>108466941
The only thing that matters is for the retarded collective will of the market to somehow buy all this and bring cheap RAM back.
Don't ask me how that would work, I still don't even know how OAI managed to buy this much memory while not having any money to pay for it.

Anonymous
03/27/26(Fri)13:00:53 No.108466990

Anonymous 03/27/26(Fri)13:00:53 No.108466990

>>108466930
Honestly I'm all for people overestimating what turboquant actually does. win for everyone.

I'm actually looking forwards to it being implemented in lcpp. Big context is based.

Anonymous
03/27/26(Fri)13:05:29 No.108467027

Anonymous 03/27/26(Fri)13:05:29 No.108467027

>>108466930
Wut? The whole market's tanking right now.

Anonymous
03/27/26(Fri)13:06:19 No.108467034

Anonymous 03/27/26(Fri)13:06:19 No.108467034

File: 1752997653634222.png (93 KB, 665x361)

93 KB PNG

Where are those 3000 assets

I've only seen the leaked Mythos webpage

Anonymous
03/27/26(Fri)13:08:05 No.108467042

Anonymous 03/27/26(Fri)13:08:05 No.108467042

>>108467034
this sounds so fucking fake
sonnet, opus... capybara? what is this, meta?

Anonymous
03/27/26(Fri)13:10:41 No.108467064

Anonymous 03/27/26(Fri)13:10:41 No.108467064

>>108467034
Sounds like a benchmark twitter endorsement.
Every new model is always few % better
always and ever
But this time it's a rumour and "leaked documents".

Anonymous
03/27/26(Fri)13:10:45 No.108467065

Anonymous 03/27/26(Fri)13:10:45 No.108467065

>>108467042
Anthropic and OpenAI are both promising a soon to come breakthrough https://www.youtube.com/watch?v=s4tptozUJ8Y

It might be a nothing burger, but it's been two years since reasoning, so who knows.

Anonymous
03/27/26(Fri)13:16:19 No.108467106

Anonymous 03/27/26(Fri)13:16:19 No.108467106

came for droopy tits

Anonymous
03/27/26(Fri)13:18:42 No.108467127

Anonymous 03/27/26(Fri)13:18:42 No.108467127

>>108467106
*to

Anonymous
03/27/26(Fri)13:20:40 No.108467141

Anonymous 03/27/26(Fri)13:20:40 No.108467141

was there big news why thread fast

Anonymous
03/27/26(Fri)13:21:45 No.108467150

Anonymous 03/27/26(Fri)13:21:45 No.108467150

>>108467141
Google made models 6x smaller

Anonymous
03/27/26(Fri)13:23:09 No.108467167

Anonymous 03/27/26(Fri)13:23:09 No.108467167

>>108467150
like in theory or can I download something to run some 200gb model on my 3090s

Anonymous
03/27/26(Fri)13:23:12 No.108467169

Anonymous 03/27/26(Fri)13:23:12 No.108467169

File: 1761681185788405.jpg (29 KB, 600x733)

29 KB JPG

>>108467150

Anonymous
03/27/26(Fri)13:24:20 No.108467177

Anonymous 03/27/26(Fri)13:24:20 No.108467177

>>108467167
>>108467169

>>108466930
>>108466583

Anonymous
03/27/26(Fri)13:24:36 No.108467181

Anonymous 03/27/26(Fri)13:24:36 No.108467181

>>108467065
>reasoning
Isn't this just autoprompting?

Anonymous
03/27/26(Fri)13:24:58 No.108467183

Anonymous 03/27/26(Fri)13:24:58 No.108467183

>>108466930
isnt that just for kv cache (context)? lol, context rot is still a thing so enjoy the slopped 1M context

Anonymous
03/27/26(Fri)13:25:45 No.108467191

Anonymous 03/27/26(Fri)13:25:45 No.108467191

File: snip132.png (85 KB, 1490x399)

85 KB PNG

huge llama.cpp update: every purchase of llama-server comes with a complimentary footgun

Anonymous
03/27/26(Fri)13:26:40 No.108467204

Anonymous 03/27/26(Fri)13:26:40 No.108467204

>>108467127
*on

Anonymous
03/27/26(Fri)13:26:56 No.108467207

Anonymous 03/27/26(Fri)13:26:56 No.108467207

>>108467191
I bet comfyui manages to use this to brick peoples pcs by accident

Anonymous
03/27/26(Fri)13:27:50 No.108467215

Anonymous 03/27/26(Fri)13:27:50 No.108467215

>trynna force ldg drama again

Anonymous
03/27/26(Fri)13:29:48 No.108467238

Anonymous 03/27/26(Fri)13:29:48 No.108467238

>>108467181
I'd argue there was a huge shift when O1 came out versus "Let's think step by step". Before O1, there was chain of thought, program of thought, forest of thought and a bunch of other prompting strategies, but "reasoning" made it switch to another level. The model started to stay on track a lot more, etc.

Anonymous
03/27/26(Fri)13:29:59 No.108467243

Anonymous 03/27/26(Fri)13:29:59 No.108467243

>>108467191
>llama-server
We use vLLM here

Anonymous
03/27/26(Fri)13:34:45 No.108467269

Anonymous 03/27/26(Fri)13:34:45 No.108467269

I heard Google made boobs 6 times saggier

Anonymous
03/27/26(Fri)13:36:17 No.108467278

Anonymous 03/27/26(Fri)13:36:17 No.108467278

>>108467269
fake news, only nipples

Anonymous
03/27/26(Fri)13:36:18 No.108467279

Anonymous 03/27/26(Fri)13:36:18 No.108467279

>>108467243
>We use vLLM here
we as in you and the one other guy with the bitcoin mining rig with 8 ewaste 3090s?

Anonymous
03/27/26(Fri)13:40:10 No.108467303

Anonymous 03/27/26(Fri)13:40:10 No.108467303

>>108466930
Gemma 4 bros.... I dont feel so good....

Anonymous
03/27/26(Fri)13:46:53 No.108467348

Anonymous 03/27/26(Fri)13:46:53 No.108467348

>>108466262
just the right level of sagging to make it maximally erotic

Anonymous
03/27/26(Fri)13:52:31 No.108467381

Anonymous 03/27/26(Fri)13:52:31 No.108467381

>>108467303
talking about gemma I'm alway surprised by constantly seeing it placing high in current benchmarks against newer models, kek

Anonymous
03/27/26(Fri)13:55:24 No.108467394

Anonymous 03/27/26(Fri)13:55:24 No.108467394

>>108467279
>replying to the obvious bait

Anonymous
03/27/26(Fri)13:56:21 No.108467399

Anonymous 03/27/26(Fri)13:56:21 No.108467399

File: 1755899588145690.jpg (98 KB, 1170x1117)

98 KB JPG

Can I use koboldcpp antislop feature with sillytavern as the frontend or do i need kobold as frontend too?

Anonymous
03/27/26(Fri)13:56:25 No.108467400

Anonymous 03/27/26(Fri)13:56:25 No.108467400

>>108467303
Google was going to release it but the qwen3.5 dropped and mogged it so they delayed it.

Many such cases.

Anonymous
03/27/26(Fri)13:56:43 No.108467402

Anonymous 03/27/26(Fri)13:56:43 No.108467402

>>108467381
It has better writing capabilities. That doesn't say too much. It's also biased. Try changing your 'gender' to female in the same scenario and see how much the narrative changes.
When you do a q&a with the model it replies in different fashion if you are male vs female.
It's funny but behaviour is there.

Anonymous
03/27/26(Fri)13:58:36 No.108467416

Anonymous 03/27/26(Fri)13:58:36 No.108467416

File: Screenshot 2026-03-27 at (...).png (1.8 MB, 3420x1720)

1.8 MB PNG

>>108466845
Fellow slop-tunner here. What are you training models to do?

Anonymous
03/27/26(Fri)13:59:56 No.108467421

Anonymous 03/27/26(Fri)13:59:56 No.108467421

>>108467416
Bro, you have a mental illness.

Anonymous
03/27/26(Fri)14:00:14 No.108467422

Anonymous 03/27/26(Fri)14:00:14 No.108467422

File: f.png (8 KB, 401x60)

8 KB PNG

>>108467399
>Can I use koboldcpp antislop feature with sillytavern as the frontend
yes it taps into banned strings

Anonymous
03/27/26(Fri)14:03:10 No.108467444

Anonymous 03/27/26(Fri)14:03:10 No.108467444

>>108467416
>She has to be on birth control because there is no way that could fit in there and not get stuck inside of her womb!
huh?

Anonymous
03/27/26(Fri)14:03:51 No.108467451

Anonymous 03/27/26(Fri)14:03:51 No.108467451

>>108467444
?huh

Anonymous
03/27/26(Fri)14:04:30 No.108467455

Anonymous 03/27/26(Fri)14:04:30 No.108467455

File: Perhaps.png (379 KB, 500x522)

379 KB PNG

>>108467421

>>108467444
slop-tunned 2b model. Testing to see if I can train it to be less retarded with better dataset curation

Anonymous
03/27/26(Fri)14:05:14 No.108467459

Anonymous 03/27/26(Fri)14:05:14 No.108467459

>>108467455
uhuh
is it working?

Anonymous
03/27/26(Fri)14:07:03 No.108467473

Anonymous 03/27/26(Fri)14:07:03 No.108467473

File: 1750637897414001.png (21 KB, 717x244)

21 KB PNG

>>108467422
thanks anon, but where is that? is that an extension?
all I see is logit bias handling

Anonymous
03/27/26(Fri)14:09:02 No.108467491

Anonymous 03/27/26(Fri)14:09:02 No.108467491

File: f.png (41 KB, 404x372)

41 KB PNG

>>108467473
just above logit bias for me, show your connection profile?

Anonymous
03/27/26(Fri)14:12:17 No.108467509

Anonymous 03/27/26(Fri)14:12:17 No.108467509

>>108467491
oh I see, it's in text completion, not chat completion
damn it

Anonymous
03/27/26(Fri)14:13:03 No.108467512

Anonymous 03/27/26(Fri)14:13:03 No.108467512

File: Screenshot 2026-03-27 at (...).png (670 KB, 3414x608)

670 KB PNG

>>108467459
actually yes (kind of). Using a dataset that ONLY has link rel (https://huggingface.co/datasets/AiAF/conversations ) leads to the model's "safeguards" being blasted away but at the cost of "catastrophic forgetting" and pretty much ONLY being anle to respond to any query with eroroc shit. Did another fine tuning run, but this time with the data said being 70% general purpose, shit and the other 30% being the nsfw data. The 7030 version retains its "intelligence" more or less and is also willing to comply with "problermatic" requests. The 70-30 ratio data is kind of shit at the moment because the general purpose portions are only single term conversations so next I'm going to try to curate one that has multiple turn general purpose data samples instead of just single term. I should probably focus on a data set where the rp/story telling portions aren't ONLY nsfw too. Once I do this, and I'm satisfied with the results I'll probably try this again but on a higher parameter model so it's actually worth using. Doing this on a 2B model is just a proof of concept phase and also relatively fast and easy to train.Pic rel is the from the 70-30 model. It's obviously otter shit compared to higher param models but it's a start for now and she's promise compared to this >>108467416
Dataset used: https://huggingface.co/datasets/AiAF/combined_70_30_shuffled

Anonymous
03/27/26(Fri)14:14:20 No.108467521

Anonymous 03/27/26(Fri)14:14:20 No.108467521

>>108467509
you can probably add it with the option to add unsupported chat comp things, but I've no idea how you'd format it for that

Anonymous
03/27/26(Fri)14:15:53 No.108467529

Anonymous 03/27/26(Fri)14:15:53 No.108467529

>>108467512
i'm not really sure if this will scale anon, but give it a try

Anonymous
03/27/26(Fri)14:19:04 No.108467553

Anonymous 03/27/26(Fri)14:19:04 No.108467553

>>108467529
>will scale anon,
wdym "if this will scale"?

Anonymous
03/27/26(Fri)14:19:47 No.108467558

Anonymous 03/27/26(Fri)14:19:47 No.108467558

>>108467416
Cool stuff, anon.
Implement config files and you can switch them out when needed on the fly.
On the fly? I didn't talk about the insects.

Anonymous
03/27/26(Fri)14:23:14 No.108467584

Anonymous 03/27/26(Fri)14:23:14 No.108467584

>>108467553
you are doing a full finetune yeah? on small 2b model your mixed dataset is doing "okay" because the model is really small so you can't actually tell if it's getting much more retarded or not
on a bigger model the hit to the smarts will be much larger since it was trained on more data
this was always a problem, if you didn't have the original data the model was trained on, or didn't have large enough dataset of your own then it's really easy to overfry

Anonymous
03/27/26(Fri)14:35:26 No.108467633

Anonymous 03/27/26(Fri)14:35:26 No.108467633

Big week

Anonymous
03/27/26(Fri)14:37:14 No.108467645

Anonymous 03/27/26(Fri)14:37:14 No.108467645

>>108467633
:rocket:!!!

Anonymous
03/27/26(Fri)14:43:28 No.108467665

Anonymous 03/27/26(Fri)14:43:28 No.108467665

>>108467584
>you are doing a full finetune yeah?
qlora using axolotl.

>so you can't actually tell if it's getting much more retarded or not
Exceptionally quite easy to see given that the original fine tune I did used a data set. That was literally nothing but nsfw stories formatted into an sft format I could use with the Axolotl trainer. The result was that the model was willing and compliant with any nsfw prompt but was hardly useless at basically everything else.

Responses using the model using the ALL nsfw dataset: https://files.catbox.moe/w1qh5y.json

Responses using the 70-30 ratio dataset: https://files.catbox.moe/vrwtqw.json

The former's responses were not only almost incapable of producing something that wasn't forcing nsfw themes (it seems to REALLY like talking about moms) and most of the responses were not only God, but pretty nonsensical even for a 2B model. The latter was actually able to stay on topic based on what the user input was. It is capable of engaging with nsfw and "unethical" requests but it will only go in that direction if you explicitly ask it to or your prompt goes in that direction (at least that's the case in my limited test testing). The next time anyone tries to argue against "Garbage in --> Garbage out, show them these logs for comparison.

Anonymous
03/27/26(Fri)14:44:55 No.108467677

Anonymous 03/27/26(Fri)14:44:55 No.108467677

>>108467416
Kaggle stuff. I'm just not into roleplay. I used to be an aspiring writer way into automatic writing and stuff and wrote a bunch of books for myself, so I get being into a different world, creativity, etc., but although I tried, I don't get that kind of roleplay. I might just be too shy for it, I don't know.

Anonymous
03/27/26(Fri)14:47:16 No.108467693

Anonymous 03/27/26(Fri)14:47:16 No.108467693

Gave GLM 5.1 a shot over API
First impression is I literally can't tell the difference over 5, so I assume all the effort went into agentmaxxing

Anonymous
03/27/26(Fri)14:48:57 No.108467703

Anonymous 03/27/26(Fri)14:48:57 No.108467703

>>108467693
That's a good thing because it shows there's no regression over other tasks

Anonymous
03/27/26(Fri)14:50:14 No.108467709

Anonymous 03/27/26(Fri)14:50:14 No.108467709

>>108467665
ah nevermind then, qlora is fine
what larger model are you thinking of if it goes well?

Anonymous
03/27/26(Fri)15:02:50 No.108467768

Anonymous 03/27/26(Fri)15:02:50 No.108467768

>>108467693
5.1 is probably just a sloptune of 5

Anonymous
03/27/26(Fri)15:15:57 No.108467828

Anonymous 03/27/26(Fri)15:15:57 No.108467828

File: hmmmm.jpg (12 KB, 300x300)

12 KB JPG

>>108467725
It's not that the first fine tune didn't "WANT" to answer those types of questions, it just couldn't reliably. I basically unintentionally fried it into only being able to engaging the user in nsfw-rp because the data said I use contained nothing but human written smut, most if not all of which involved sex and whatnot. See the first link here >>108467665
I've already tested methods like DPO and it worked (it's my understanding that GRPO is a more advanced version of DPO). The thing is, those methods tell a model "these kinds of answers are bad in these kinds of answers are good" but that wouldn't necessarily change how the model responds. My goal was not only to essentially "uncensored" a model (mostly to see if it was possible since many people here were swearing up their ass it wasn't possible) but to see if I could inject, for lack of a better term, "SOVL" into the model by showing it examples of shit people actually wrote and not synthetic shit or filtered flowery purple pros trash That's likely to blame for shit like "shivering down my spine" or "her voice was husky" (stuff even relatively uncensored models like Mistral Large 3 do in spades). In other words, I wanted to also "deslopify" the model and not just make it super compliant and willing to please. That's piss easy and can be done via jailbreaking or prefilling if the model isn't specifically trained to counteract that. DPO and training methods like that would " uncuck" the model but that wouldn't necessarily fix the slop problem. If the model is trained on human written content and only shown synthetic content in the general purpose portion of the data set, in theory that should uncensor it AND cut down on repetitive "slop" outputs significantly. If this can work on a mere 2B model Then this should definitely work on significantly "smarter" higher parameter models. Plus it's a fun little challenge for me to keep myself occupied. It's a proof of concept shower-thoughts side project of mine.

Anonymous
03/27/26(Fri)15:16:33 No.108467831

Anonymous 03/27/26(Fri)15:16:33 No.108467831

>>108467828
for >>108467725

Anonymous
03/27/26(Fri)15:17:14 No.108467832

Anonymous 03/27/26(Fri)15:17:14 No.108467832

>>108467831
oh,,, his post is gone

Anonymous
03/27/26(Fri)15:17:16 No.108467834

Anonymous 03/27/26(Fri)15:17:16 No.108467834

llama.cpp commits a lot to master, is that normal or do they just not care and if you want something stable you have to go to the commit you want yourself?

Anonymous
03/27/26(Fri)15:17:48 No.108467837

Anonymous 03/27/26(Fri)15:17:48 No.108467837

>>108467828
Hmm. You have a distinct point — it's not about your opinion but theirs.

Anonymous
03/27/26(Fri)15:21:14 No.108467850

Anonymous 03/27/26(Fri)15:21:14 No.108467850

>>108467834
It's normal and they don't care. And they shouldn't care. If you want stable just don't update. They have a bunch of pre-build releases as well.

Anonymous
03/27/26(Fri)15:23:00 No.108467864

Anonymous 03/27/26(Fri)15:23:00 No.108467864

File: 1770644243176391.png (489 KB, 1034x1482)

489 KB PNG

OMG

there is no way this jew is THIS ignorant or retarded

Anonymous
03/27/26(Fri)15:27:33 No.108467880

Anonymous 03/27/26(Fri)15:27:33 No.108467880

File: 1745692294667385.png (107 KB, 325x590)

107 KB PNG

>>108467852
Valuable information that I appreciate, but why do your posts keep getting nuked?

Anonymous
03/27/26(Fri)15:27:47 No.108467881

Anonymous 03/27/26(Fri)15:27:47 No.108467881

>>108467864
Posting twitter shit should be a bannable offense.

Anonymous
03/27/26(Fri)15:31:48 No.108467903

Anonymous 03/27/26(Fri)15:31:48 No.108467903

>>108467864
He's just helping Dario out.
>Look at Yud whining about it, it must be good.

Anonymous
03/27/26(Fri)15:32:10 No.108467904

Anonymous 03/27/26(Fri)15:32:10 No.108467904

>>108467880
I believe the original poster deleted them by her own.

Anonymous
03/27/26(Fri)15:33:37 No.108467909

Anonymous 03/27/26(Fri)15:33:37 No.108467909

>>108467864
I don't read twitter posts.
Makes me feel great about myself because I despise social media.
Did you know most of twitter posts you are seeing in your 'feed' are the same as youtube's algorithm - paid shills, or shills wanting to get paid. AI written content.

Anonymous
03/27/26(Fri)15:39:24 No.108467933

Anonymous 03/27/26(Fri)15:39:24 No.108467933

>>108467904
>her

Anonymous
03/27/26(Fri)15:40:51 No.108467938

Anonymous 03/27/26(Fri)15:40:51 No.108467938

>>108467512
https://github.com/p-e-w/heretic try this for size to speed up the process anon.

Anonymous
03/27/26(Fri)15:41:45 No.108467944

Anonymous 03/27/26(Fri)15:41:45 No.108467944

>>108467933
I knew some US poster would be irritated by this.

Anonymous
03/27/26(Fri)15:42:28 No.108467947

Anonymous 03/27/26(Fri)15:42:28 No.108467947

File: mikuwall.png (29 KB, 1920x1080)

29 KB PNG

Decided to try some commercial models- no paid plans!
>Generate several ASCII versions of this Miku silhouette with varying complexity. Sized maybe 25x35 something suitable for a "fastfetch" output
It's a disaster
>what the actual F is this hellscape. don't predict tokens, use a graphical library to infer luminence levels and there must be some well understood way to implement video encoding in ascii. there's an output option in vlc right? figure out how it works and do something similar, we need to generate the tools to create the correct Miku art
It's a disaster again
Let's see what local models can do!

Anonymous
03/27/26(Fri)15:42:34 No.108467948

Anonymous 03/27/26(Fri)15:42:34 No.108467948

>>108467944
>US poster
very wrong guess, I'm europooristani

Anonymous
03/27/26(Fri)15:43:23 No.108467954

Anonymous 03/27/26(Fri)15:43:23 No.108467954

>>108467948
Ok. So you get scott free now then?

Anonymous
03/27/26(Fri)15:43:23 No.108467955

Anonymous 03/27/26(Fri)15:43:23 No.108467955

>>108467864
That model name is already taken: https://huggingface.co/EldritchLabs/Cthulhu-8B-v1.4

Anonymous
03/27/26(Fri)15:44:33 No.108467958

Anonymous 03/27/26(Fri)15:44:33 No.108467958

>>108467955
The new model is Mythos, not Cthulhu

Anonymous
03/27/26(Fri)15:45:55 No.108467966

Anonymous 03/27/26(Fri)15:45:55 No.108467966

>>108467958
Mythos is pretty bland name, could've used Elder Sign or something.

Anonymous
03/27/26(Fri)15:47:48 No.108467974

Anonymous 03/27/26(Fri)15:47:48 No.108467974

>Mythos
Mythomaxbros, we are so back.

Anonymous
03/27/26(Fri)15:48:02 No.108467979

Anonymous 03/27/26(Fri)15:48:02 No.108467979

>>108467966
It's all a misunderstanding.
>MyTOS
Anthropic simply baked the terms of service inside Claude's soul.md.

Anonymous
03/27/26(Fri)15:48:11 No.108467980

Anonymous 03/27/26(Fri)15:48:11 No.108467980

File: 1762428905243034.png (127 KB, 1291x577)

127 KB PNG

>>108463639
NeurIPS cucked out

Anonymous
03/27/26(Fri)15:54:09 No.108468013

Anonymous 03/27/26(Fri)15:54:09 No.108468013

>>108467980
I don't read AI generated social media posts.

Anonymous
03/27/26(Fri)15:54:59 No.108468019

Anonymous 03/27/26(Fri)15:54:59 No.108468019

>https://www.newegg.com/intel-arc-pro-b70-32gb-graphics-card/p/N82E16814883008
Who's getting one?

Anonymous
03/27/26(Fri)15:55:05 No.108468020

Anonymous 03/27/26(Fri)15:55:05 No.108468020

>>108468013
it's neither of those

Anonymous
03/27/26(Fri)15:56:34 No.108468027

Anonymous 03/27/26(Fri)15:56:34 No.108468027

>>108467980
Am I a genuine retard or this is a word salad that doesn't actually say anything? Are Chinese labs actually allowed in or no?

Anonymous
03/27/26(Fri)15:57:27 No.108468031

Anonymous 03/27/26(Fri)15:57:27 No.108468031

>>108466732
It's all chasing the dragon

Anonymous
03/27/26(Fri)15:57:27 No.108468032

Anonymous 03/27/26(Fri)15:57:27 No.108468032

File: mikuwall.braille.png (3 KB, 533x610)

3 KB PNG

>>108467947
Do you really need it to be made by a language model? It's something you can do in a few hours. I just happen to have one I made a while ago.
Here's in braille. I don't share the code because it's ugly.
https://pastebin.com/aP8Wtqhu

Anonymous
03/27/26(Fri)15:57:32 No.108468035

Anonymous 03/27/26(Fri)15:57:32 No.108468035

>>108468020
Can you prove it?

Anonymous
03/27/26(Fri)15:58:33 No.108468041

Anonymous 03/27/26(Fri)15:58:33 No.108468041

>>108468027
Yes you're genuinely retarded. Literally in the second paragraph it says US gov sanction is broader than what NIPS is required to follow.

Anonymous
03/27/26(Fri)15:58:37 No.108468043

Anonymous 03/27/26(Fri)15:58:37 No.108468043

>>108468032
What is your font name, pixelsize?

Anonymous
03/27/26(Fri)16:00:04 No.108468048

Anonymous 03/27/26(Fri)16:00:04 No.108468048

>>108468041
This does not clarify whether Chinese labs fall into this "smaller" set of mandatory restrictions or not. It's nothingspeak.

Anonymous
03/27/26(Fri)16:03:39 No.108468062

Anonymous 03/27/26(Fri)16:03:39 No.108468062

>>108468048
Because the clarification is not in the screenshot "We have updated the link and clarified the text of our policy"
What happened to read comprehension

Anonymous
03/27/26(Fri)16:04:44 No.108468073

Anonymous 03/27/26(Fri)16:04:44 No.108468073

>>108468043
-misc-fixed-medium-r-normal--8-80-75-75-c-50-iso10646-1
Seems to be 8x16.

Anonymous
03/27/26(Fri)16:08:07 No.108468096

Anonymous 03/27/26(Fri)16:08:07 No.108468096

i'm having grok and deepseek explain kv or kv cache to me because idk what it is but people mention it so frequently it must be a big deal

Anonymous
03/27/26(Fri)16:08:42 No.108468099

Anonymous 03/27/26(Fri)16:08:42 No.108468099

>>108468073
There's no font name in that.
fc-list : file family style pixelsize |grep -i sgi
sgi = is the font I am using, Silicon Graphics screen font.

Anonymous
03/27/26(Fri)16:08:52 No.108468101

Anonymous 03/27/26(Fri)16:08:52 No.108468101

>>108467947
stop being so llm brained and use one of the specialized cmdline tools like img2txt or cacaview

Anonymous
03/27/26(Fri)16:09:33 No.108468109

Anonymous 03/27/26(Fri)16:09:33 No.108468109

>>108468062
>What happened to read comprehension
grok tldr?

Anonymous
03/27/26(Fri)16:15:38 No.108468147

Anonymous 03/27/26(Fri)16:15:38 No.108468147

>>108468096
It's not
Every time you read a BIG ADVANCE on social media it has some caveat

Anonymous
03/27/26(Fri)16:17:46 No.108468156

Anonymous 03/27/26(Fri)16:17:46 No.108468156

File: fontquestionmark.png (7 KB, 548x621)

7 KB PNG

>>108468099
Hm. I think I was looking at the wrong thing. Try terminus. To be honest, I don't fuck around with fonts at all. I change the font size on xterm to tiny, so it's not really 12.

Anonymous
03/27/26(Fri)16:18:31 No.108468163

Anonymous 03/27/26(Fri)16:18:31 No.108468163

>>108468156
Seems like you don't understand or care.
I did not ask you to post a screenshot.

Anonymous
03/27/26(Fri)16:18:35 No.108468165

Anonymous 03/27/26(Fri)16:18:35 No.108468165

>>108467521
using this format worked :

banned_strings:
- "a b"
- "c d"

Anonymous
03/27/26(Fri)16:19:27 No.108468172

Anonymous 03/27/26(Fri)16:19:27 No.108468172

>>108467864
this dude is doomer mentality personified

Anonymous
03/27/26(Fri)16:19:32 No.108468173

Anonymous 03/27/26(Fri)16:19:32 No.108468173

>>108468156
Maybe it is terminus, it is impossible to understand how anyone would read this shit?
I use 15 pt.

Anonymous
03/27/26(Fri)16:20:11 No.108468176

Anonymous 03/27/26(Fri)16:20:11 No.108468176

Should I be using openclaw or are there better alternatives now? Will be on a separate user on my M1 Mac with 64 GiB memory. Intend on running a local model ofc.

Anonymous
03/27/26(Fri)16:20:25 No.108468177

Anonymous 03/27/26(Fri)16:20:25 No.108468177

>>108468147
>kv cache
>don't buy into the latest hype bro
uh? isn't kv cache just keeping into memory the keys and values that have been computed before so that you don't need to recompute them all at each step?

Anonymous
03/27/26(Fri)16:20:30 No.108468178

Anonymous 03/27/26(Fri)16:20:30 No.108468178

>>108468176
>Should I be using openclaw
No

Anonymous
03/27/26(Fri)16:21:10 No.108468184

Anonymous 03/27/26(Fri)16:21:10 No.108468184

>>108468156
It is not terminus.

Anonymous
03/27/26(Fri)16:21:21 No.108468186

Anonymous 03/27/26(Fri)16:21:21 No.108468186

>>108468163
You seem to care too much. You probably know what to look for in there. Save some back and forth.
>>108468173
I still have good eye sight somehow.

Anonymous
03/27/26(Fri)16:21:38 No.108468188

Anonymous 03/27/26(Fri)16:21:38 No.108468188

>>108467864
i hate yudkowsky so much. his arguments-from-analogy/story are so fucking stupid.

Anonymous
03/27/26(Fri)16:23:06 No.108468193

Anonymous 03/27/26(Fri)16:23:06 No.108468193

>>108468177
Pretty sure everyone here knows what KV cache is
KV cache compression is not novel

Anonymous
03/27/26(Fri)16:23:41 No.108468195

Anonymous 03/27/26(Fri)16:23:41 No.108468195

>>108468178
Not helpful.

Anonymous
03/27/26(Fri)16:24:15 No.108468197

Anonymous 03/27/26(Fri)16:24:15 No.108468197

>>108468176
You should probably try it to see what it's about. My goal is to do so soon, but I'm lazy.

Anonymous
03/27/26(Fri)16:27:28 No.108468214

Anonymous 03/27/26(Fri)16:27:28 No.108468214

>>108468176
There are about ten billion *claw ripoffs by now. I have no idea which ones are actually good. Personally I've been running picoclaw, which is admittedly just as much as a slopfest but it feels a little saner than the nodejs shit show that openclaw is

Anonymous
03/27/26(Fri)16:31:01 No.108468233

Anonymous 03/27/26(Fri)16:31:01 No.108468233

I hadn't realized the question was about TurboQuant. Haven't looked into it all all.
I had liked this explanation of KV cache: https://youtu.be/hMs8VNRy5Ys&t=367

Anonymous
03/27/26(Fri)16:33:17 No.108468244

Anonymous 03/27/26(Fri)16:33:17 No.108468244

>>108468176
The dust has not settled yet and really it still feels like a wild west situation. Wait a few more months.

Anonymous
03/27/26(Fri)16:34:57 No.108468253

Anonymous 03/27/26(Fri)16:34:57 No.108468253

>>108468176
wait to see what theo recommends

Anonymous
03/27/26(Fri)16:35:44 No.108468260

Anonymous 03/27/26(Fri)16:35:44 No.108468260

File: 1751212991740336.png (536 KB, 680x628)

536 KB PNG

OpenClaw is a glorified system prompt

Anonymous
03/27/26(Fri)16:35:47 No.108468261

Anonymous 03/27/26(Fri)16:35:47 No.108468261

>>108468244
I don't mind.

>>108468253
I don't know anyone called Theo. It's not a common name here.

Anonymous
03/27/26(Fri)16:37:07 No.108468265

Anonymous 03/27/26(Fri)16:37:07 No.108468265

>>108468261
he's a really smart tech/ai youtuber

Anonymous
03/27/26(Fri)16:38:18 No.108468271

Anonymous 03/27/26(Fri)16:38:18 No.108468271

>>108468261
>I don't know anyone called Theo. It's not a common name here.
A lot of web grifters are recycling themselves as AI grifters. The other anon was probably making a bad joke when they suggested listening to her. Those "people" should just be entirely ignored.

Anonymous
03/27/26(Fri)16:44:11 No.108468306

Anonymous 03/27/26(Fri)16:44:11 No.108468306

>>108468176
>>108468197
>>108468214
https://github.com/NVIDIA/NemoClaw
Why not OpenClaw without the security nightmare?

Anonymous
03/27/26(Fri)16:44:28 No.108468308

Anonymous 03/27/26(Fri)16:44:28 No.108468308

>>108467966
The other option would be "Epic" but someone on the marking for Claude either hates quirk and/or hates fun. Either way its probably not the final name, the worse would be just "Claude Opus 5"

Anonymous
03/27/26(Fri)16:45:29 No.108468314

Anonymous 03/27/26(Fri)16:45:29 No.108468314

>>108468260
A system prompt that connects any LLM to telegram/whatsapp is pretty damn powerful.

Anonymous
03/27/26(Fri)16:47:59 No.108468324

Anonymous 03/27/26(Fri)16:47:59 No.108468324

>>108468253
>>108468260
>>108468265
go back

Anonymous
03/27/26(Fri)16:48:28 No.108468328

Anonymous 03/27/26(Fri)16:48:28 No.108468328

>>108468314
>hurr durr tool calling is powerful

Anonymous
03/27/26(Fri)16:52:06 No.108468351

Anonymous 03/27/26(Fri)16:52:06 No.108468351

>>108468328
It is

Anonymous
03/27/26(Fri)16:54:49 No.108468364

Anonymous 03/27/26(Fri)16:54:49 No.108468364

>>108468328
If RAG was LLM 2.0, MCP is LLM 3.0. brb writing the blogpost now

Anonymous
03/27/26(Fri)16:56:46 No.108468373

Anonymous 03/27/26(Fri)16:56:46 No.108468373

>>108468306
meh, I hoped this will be an original scaffolding alternative but instead it's just some kind of OC wrapper?

Anonymous
03/27/26(Fri)16:57:14 No.108468377

Anonymous 03/27/26(Fri)16:57:14 No.108468377

>>108468328
I was messing with tool calling locally with persistent memory on a tiny model and while it did some stupid shit, it did seem to make it perceptibly smarter and open up some ideas I wouldn't be able to do normally, so I can see why normgroid retards salivate over it with cloud usage. I'm gonna try with a larger dense model that has cheap context and local skills too see if the solution to "model is too retarded to do x task properly even if it's big" is to just stuff a shitload of knowledge into a recurrent model's cache to make it perform better

Anonymous
03/27/26(Fri)16:58:46 No.108468384

Anonymous 03/27/26(Fri)16:58:46 No.108468384

>>108468328
yes

Anonymous
03/27/26(Fri)16:59:00 No.108468385

Anonymous 03/27/26(Fri)16:59:00 No.108468385

Openclaw is failing on my setup. Everyone told me it would work but I have to download smaller and smaller models to see if something will work.

Anonymous
03/27/26(Fri)16:59:27 No.108468389

Anonymous 03/27/26(Fri)16:59:27 No.108468389

>>108468385
what?

Anonymous
03/27/26(Fri)17:01:35 No.108468406

Anonymous 03/27/26(Fri)17:01:35 No.108468406

*opens your claws*

Anonymous
03/27/26(Fri)17:03:12 No.108468417

Anonymous 03/27/26(Fri)17:03:12 No.108468417

DeepSeek v4 is Spud

Anonymous
03/27/26(Fri)17:05:34 No.108468434

Anonymous 03/27/26(Fri)17:05:34 No.108468434

The pancakes are a lie

Anonymous
03/27/26(Fri)17:08:20 No.108468452

Anonymous 03/27/26(Fri)17:08:20 No.108468452

>>108468188
rationalists are addicted to thought experiments, they can't even begin to process ideas if they aren't in the form of a though experiment

Anonymous
03/27/26(Fri)17:09:15 No.108468459

Anonymous 03/27/26(Fri)17:09:15 No.108468459

>>108468328
get this, what if the tool is to call comfyui to have your robo-waifu generate an image of herself then send it to you?

Anonymous
03/27/26(Fri)17:10:16 No.108468468

Anonymous 03/27/26(Fri)17:10:16 No.108468468

File: be.png (271 KB, 444x217)

271 KB PNG

>ai psychosis is real

Anonymous
03/27/26(Fri)17:10:28 No.108468471

Anonymous 03/27/26(Fri)17:10:28 No.108468471

File: 1762529095925501.mp4 (3.56 MB, 1280x720)

3.56 MB MP4

>>108468364
I feel pretty confident in predicting that MCP, much like RAG, is kind of a doomed concept in the sense that model advances will make it largely irrelevant.

The most modern models today do just as well with using random ass cli utils that have a --help as they do with something that implements MCP. The same goes for remote APIs; they can navigate those well enough and make requests on their own. There is simply no need for a strictly defined prescriptive protocol format like MCP lays out.

>pic unrelated

Anonymous
03/27/26(Fri)17:11:39 No.108468479

Anonymous 03/27/26(Fri)17:11:39 No.108468479

>>108468459
kobold/ST already have sdcpp built in. No need to invoke seven trillion pytorch gigs of nonsense to gen an image.

Anonymous
03/27/26(Fri)17:12:02 No.108468481

Anonymous 03/27/26(Fri)17:12:02 No.108468481

>>108468459
It'd still be an open loop
My "robo-waifu" would have no idea what the generated image would look like

Anonymous
03/27/26(Fri)17:12:27 No.108468484

Anonymous 03/27/26(Fri)17:12:27 No.108468484

>>108468471
they've already been largely replaced in modern agent frameworks by skills, which are just text files telling them what to do

Anonymous
03/27/26(Fri)17:14:26 No.108468490

Anonymous 03/27/26(Fri)17:14:26 No.108468490

>>108468471
>determinism is useless, let's roll a dice for critical operations
damn retard

Anonymous
03/27/26(Fri)17:15:08 No.108468494

Anonymous 03/27/26(Fri)17:15:08 No.108468494

>>108468328
this nigga is raw dogging his model doing math on tokens instead of a calculator

Anonymous
03/27/26(Fri)17:16:13 No.108468500

Anonymous 03/27/26(Fri)17:16:13 No.108468500

>>108468490
You didn't understand the post you replied to.

Anonymous
03/27/26(Fri)17:16:41 No.108468503

Anonymous 03/27/26(Fri)17:16:41 No.108468503

>>108468481
wym, lots of models have vision

Anonymous
03/27/26(Fri)17:18:26 No.108468514

Anonymous 03/27/26(Fri)17:18:26 No.108468514

>>108468490
What does mcp have anything to do with determinism?

Anonymous
03/27/26(Fri)17:18:50 No.108468516

Anonymous 03/27/26(Fri)17:18:50 No.108468516

>>108468484
>skills replace mcp
>at least locally, you need to have an mcp server to use skills
>the server I use the most is one that reads/writes notes into a specified folder
>the two are almost the same, except skills have a narrower scope
So which is it, the chicken or the egg?

Anonymous
03/27/26(Fri)17:20:52 No.108468528

Anonymous 03/27/26(Fri)17:20:52 No.108468528

>>108468516
>at least locally, you need to have an mcp server to use skills
you don't necessarily
it's all tool calling under the hood anyways

Anonymous
03/27/26(Fri)17:20:56 No.108468529

Anonymous 03/27/26(Fri)17:20:56 No.108468529

>>108468484
You still need a json schema for validation, so mcp aren't useless

Anonymous
03/27/26(Fri)17:21:29 No.108468530

Anonymous 03/27/26(Fri)17:21:29 No.108468530

Only good thing about OpenClaw is that it killed MCP

Anonymous
03/27/26(Fri)17:22:28 No.108468535

Anonymous 03/27/26(Fri)17:22:28 No.108468535

File: 1753199613706708.png (1.65 MB, 875x1353)

1.65 MB PNG

okay but what bout sexclaw?

Anonymous
03/27/26(Fri)17:23:46 No.108468545

Anonymous 03/27/26(Fri)17:23:46 No.108468545

>>108468514
Strict static definitions, formats, and instructions the LLMs can look at as a way to tard-wrangle to do what you asked and only what you asked.

Anonymous
03/27/26(Fri)17:25:57 No.108468559

Anonymous 03/27/26(Fri)17:25:57 No.108468559

OpenAI had to shut down Sora because they're using all of their compute to generate videos of Netanyahu
You heard it here first

Anonymous
03/27/26(Fri)17:29:24 No.108468574

Anonymous 03/27/26(Fri)17:29:24 No.108468574

>>108468535
me on the right

Anonymous
03/27/26(Fri)17:29:45 No.108468576

Anonymous 03/27/26(Fri)17:29:45 No.108468576

>>108468528
I honestly don't know what you're suggesting here
I can't use skills without the mcp server and I'm not going to rewrite the backend to function without it because frankly, that's retarded
Why would I do all that if I can just write a five line json and have access to everything I want

Anonymous
03/27/26(Fri)17:38:43 No.108468624

Anonymous 03/27/26(Fri)17:38:43 No.108468624

>>108468545
And that's great, bloating the context with 100k tokens of definitions, formats, and instructions the LLMs will only occasionally need is not.

>>108468484
MCP servers as CLI wrappers were always retarded. Any model knows how to use common utilities like git. Even before skills, you had options like creating custom modes or memory bank files with instructions and frequently used examples to guide the models without again needing to bloat the context describing every obscure command you won't need. Yeah, you could sit there and disable all of the dozens of tools they expose except for the ones you use, but then you still have thousands of tokens wasted on explaining to a model how to use git and gh when it's not working on repo operations anyway.

Anonymous
03/27/26(Fri)17:44:19 No.108468654

Anonymous 03/27/26(Fri)17:44:19 No.108468654

File: 7945324798.png (3.93 MB, 2304x1728)

3.93 MB PNG

>>108466262
I love pancakes.

Anonymous
03/27/26(Fri)17:54:56 No.108468715

Anonymous 03/27/26(Fri)17:54:56 No.108468715

File: 43327E394F9A454C48AB38E0B(...).gif (352 KB, 500x280)

352 KB GIF

I HAD TO RUN A 9B MODEL, BUT OPENCLAW IS DOING SO BADLY THAT I HAVE TO RUN A 2B TO GET A SINGLE REPLY.
ITS OVER.
Poor people realized they were poorer than previously expected.

Anonymous
03/27/26(Fri)17:56:22 No.108468726

Anonymous 03/27/26(Fri)17:56:22 No.108468726

>>108468715
>openclaw
being dumb is worse

Anonymous
03/27/26(Fri)17:57:59 No.108468732

Anonymous 03/27/26(Fri)17:57:59 No.108468732

>>108468715
Just use Xiaomi MiMo 2 or MiniMax 2.7

Anonymous
03/27/26(Fri)17:58:59 No.108468741

Anonymous 03/27/26(Fri)17:58:59 No.108468741

>>108468715
maybe openclaw is the issue
from what I looked at, I can comfortably run a 9b at 150k and still have headroom for other shit, be it general browser usage or whatever
Consider not trying to ingest 500k tokens of garbage and limit what your llm intakes to something you need

Anonymous
03/27/26(Fri)18:00:58 No.108468753

Anonymous 03/27/26(Fri)18:00:58 No.108468753

File: 20696.png (87 KB, 904x327)

87 KB PNG

https://github.com/ggml-org/llama.cpp/discussions/20969#discussioncomment-16349981

Anonymous
03/27/26(Fri)18:01:43 No.108468754

Anonymous 03/27/26(Fri)18:01:43 No.108468754

>>108468741
>maybe openclaw is the issue
Ya think?
Future historians will look back openclaw and its “purchase” as peak insanity of this insane AI boom

Anonymous
03/27/26(Fri)18:01:48 No.108468755

Anonymous 03/27/26(Fri)18:01:48 No.108468755

>>108468715
Hermes agent

Anonymous
03/27/26(Fri)18:02:25 No.108468758

Anonymous 03/27/26(Fri)18:02:25 No.108468758

So let me get this straight, is the claim here that a 3-bit quant made using TurboQuant has almost zero quality loss compared to a full sized model? Am I understanding this right? Cause it sounds like bullshit if thats how it is supposed to work

Anonymous
03/27/26(Fri)18:02:36 No.108468759

Anonymous 03/27/26(Fri)18:02:36 No.108468759

>>108468715
I can run GLM4.6/4.7 at like 15 t/s, and even that speed makes me wanna kill myself. I don't know how some anons can stomach using smaller models.

Anonymous
03/27/26(Fri)18:03:18 No.108468763

Anonymous 03/27/26(Fri)18:03:18 No.108468763

>>108468758
only the kv cache/context little bro

Anonymous
03/27/26(Fri)18:04:30 No.108468768

Anonymous 03/27/26(Fri)18:04:30 No.108468768

>>108468758
Lol another rube got deceived by the MSM scientific (((journalism)))

Anonymous
03/27/26(Fri)18:07:38 No.108468775

Anonymous 03/27/26(Fri)18:07:38 No.108468775

>>108468754
look back at*
Also I don't care about stock prices, imaginary number go up and down plenty and disregards profits. Money is imaginary anyways at this point

Anonymous
03/27/26(Fri)18:08:26 No.108468782

Anonymous 03/27/26(Fri)18:08:26 No.108468782

Turboquant is huge because now models can finally get rid of GQA and all its devilspawn offspring like MLA that are confirmed to destroy a model's soul. We can finally go back to llama1-era SOVL without the vram cost

Anonymous
03/27/26(Fri)18:10:49 No.108468792

Anonymous 03/27/26(Fri)18:10:49 No.108468792

>>108468782
boy oh boy can't I wait if this kv cache quantization method takes hold and I get to repost you saying this and all the backposts about how q8 quantization makes the model retarded somehow

Anonymous
03/27/26(Fri)18:12:00 No.108468797

Anonymous 03/27/26(Fri)18:12:00 No.108468797

>>108468782
>>108468792
lmao

Anonymous
03/27/26(Fri)18:12:38 No.108468801

Anonymous 03/27/26(Fri)18:12:38 No.108468801

>>108468782
or we can continue using those and turboquant on top for the true 10million contexts (this is what labs will actually do lamo)

Anonymous
03/27/26(Fri)18:17:09 No.108468814

Anonymous 03/27/26(Fri)18:17:09 No.108468814

harness isn't a word

Anonymous
03/27/26(Fri)18:18:25 No.108468821

Anonymous 03/27/26(Fri)18:18:25 No.108468821

File: 1770358397084.png (1.5 MB, 1600x672)

1.5 MB PNG

>>108468754

Anonymous
03/27/26(Fri)18:18:28 No.108468822

Anonymous 03/27/26(Fri)18:18:28 No.108468822

>>108468814
? https://dictionary.cambridge.org/dictionary/english/harness

Anonymous
03/27/26(Fri)18:18:39 No.108468823

Anonymous 03/27/26(Fri)18:18:39 No.108468823

>>108468814
Name five words.

Anonymous
03/27/26(Fri)18:19:25 No.108468824

Anonymous 03/27/26(Fri)18:19:25 No.108468824

File: brown-hands-typing.jpg (90 KB, 1300x866)

90 KB JPG

>>108468814
ESLMAXXED

Anonymous
03/27/26(Fri)18:21:49 No.108468834

Anonymous 03/27/26(Fri)18:21:49 No.108468834

>>108468753
lmao

Anonymous
03/27/26(Fri)18:26:11 No.108468844

Anonymous 03/27/26(Fri)18:26:11 No.108468844

>>108468165
Not him, but thanks. I'll have to write this down somewhere for my own use.

Anonymous
03/27/26(Fri)18:31:11 No.108468874

Anonymous 03/27/26(Fri)18:31:11 No.108468874

>GLM 5.1 isn't available through API
Why

Anonymous
03/27/26(Fri)18:31:37 No.108468875

Anonymous 03/27/26(Fri)18:31:37 No.108468875

>>108468753
Dude. Your post ends in a 3. wtf?

Anonymous
03/27/26(Fri)18:34:15 No.108468881

Anonymous 03/27/26(Fri)18:34:15 No.108468881

>>108468875
>>108468753
goddamn freemasons

Anonymous
03/27/26(Fri)18:34:31 No.108468882

Anonymous 03/27/26(Fri)18:34:31 No.108468882

I'm really enjoying Magidonia 24B v4.3 for ERP but even Q4_K_S is a bit too much for my humble 16 GB VRAM.
Is there something smaller without losing too much quality?

Anonymous
03/27/26(Fri)18:35:09 No.108468886

Anonymous 03/27/26(Fri)18:35:09 No.108468886

>>108468881
More like threemasons.

Anonymous
03/27/26(Fri)18:35:23 No.108468889

Anonymous 03/27/26(Fri)18:35:23 No.108468889

>>108468886
maybe

Anonymous
03/27/26(Fri)18:35:59 No.108468892

Anonymous 03/27/26(Fri)18:35:59 No.108468892

>>108468753
Things get even more crazy if you factor in the Holy Trinity and the facts that both "AGI" and "E=MC^2" feature three alphabetical letters...

Anonymous
03/27/26(Fri)18:37:49 No.108468903

Anonymous 03/27/26(Fri)18:37:49 No.108468903

>>108468753
I never doubted bitnet for a moment

Anonymous
03/27/26(Fri)18:38:29 No.108468908

Anonymous 03/27/26(Fri)18:38:29 No.108468908

File: 00003-3242891662.png (1.04 MB, 1024x1024)

1.04 MB PNG

>>108466262
lol nice. Have a good weekend.

Anonymous
03/27/26(Fri)18:38:58 No.108468912

Anonymous 03/27/26(Fri)18:38:58 No.108468912

>>108468882
good old Nemo 12B unslops I'm afraid
stay away from drummers shit

Anonymous
03/27/26(Fri)18:39:03 No.108468913

Anonymous 03/27/26(Fri)18:39:03 No.108468913

I told it to find the price of gold 4 minutes ago and it hasn't done anything yet.

Anonymous
03/27/26(Fri)18:41:21 No.108468926

Anonymous 03/27/26(Fri)18:41:21 No.108468926

>>108468913
Still filling the context lmao

Anonymous
03/27/26(Fri)18:42:16 No.108468929

Anonymous 03/27/26(Fri)18:42:16 No.108468929

>>108468926
Never mind it just replied and told me the right answer.
But it's too slow.

Anonymous
03/27/26(Fri)18:46:15 No.108468945

Anonymous 03/27/26(Fri)18:46:15 No.108468945

>>108468882
MN Violet Lotus

Anonymous
03/27/26(Fri)18:52:22 No.108468970

Anonymous 03/27/26(Fri)18:52:22 No.108468970

>>108468792
All because people just look at PPL. Though, at least with the current partial Turboquant implementation with Llama.cpp, 8-bit KV cache seems truly lossless even according to KLD measurements.

Anonymous
03/27/26(Fri)18:56:39 No.108468987

Anonymous 03/27/26(Fri)18:56:39 No.108468987

>>108468970
kld and ppl both seem to measure entirely two different things and neither of them represent how well a model can perform at a task aside from the person at the helm going "well, this represents what I was expecting well enough"

Anonymous
03/27/26(Fri)18:56:46 No.108468989

Anonymous 03/27/26(Fri)18:56:46 No.108468989

Best agentic assistant? Anyone tried Hermes?

Anonymous
03/27/26(Fri)19:00:26 No.108469002

Anonymous 03/27/26(Fri)19:00:26 No.108469002

>>108468970
At 512 context lol.

Anonymous
03/27/26(Fri)19:01:57 No.108469015

Anonymous 03/27/26(Fri)19:01:57 No.108469015

>>108468844
remember to respect spaces:

banned_strings:
[space][space]-[space]"a b"
[space][space]-[space]"c d"

put that in :
Additional Parameters
Include Body Parameters

it works most of the time, but not always, I'm not sure why the function seems to fail sometime and the llm is allowed to continue with banned expressions

Anonymous
03/27/26(Fri)19:03:07 No.108469019

Anonymous 03/27/26(Fri)19:03:07 No.108469019

>>108468987
The point is not to
> represent how well a model can perform at a task
but how different the output is from the original "full quality" model, which KLD is specially well suited to do.
If the original model was already unable to perform a given task, that's not the quantization scheme's business, I think.

Anonymous
03/27/26(Fri)19:05:32 No.108469029

Anonymous 03/27/26(Fri)19:05:32 No.108469029

>>108468188
This guy kickstarted the LLM fear mongering, but honestly it was basically pushed way more by openai/anthropic and a billion youtube channels about how it's the end of the world.

Anonymous
03/27/26(Fri)19:05:57 No.108469032

Anonymous 03/27/26(Fri)19:05:57 No.108469032

>>108469015
Will do, thanks. I'll give it a test tomorrow.

Anonymous
03/27/26(Fri)19:21:26 No.108469095

Anonymous 03/27/26(Fri)19:21:26 No.108469095

>>108469019
>missing the point
ppl is "how well can this thing autocomplete this text"
kld is effectively "how much will it deviate from topk 1" which has some uses, but most of us really don't want the same model with some caveats
both tell us as little as possible about the model until we use it
are you seeing why I don't like either measurement of models that are frequent or do you need an essay I won't write you

Anonymous
03/27/26(Fri)19:39:11 No.108469186

Anonymous 03/27/26(Fri)19:39:11 No.108469186

Token speed is 2.9k per second but I'm not getting any replies?

Anonymous
03/27/26(Fri)19:44:03 No.108469205

Anonymous 03/27/26(Fri)19:44:03 No.108469205

>>108469186
There's so little to go on. Check if the rgb lights are red or blue first.

Anonymous
03/27/26(Fri)19:45:46 No.108469214

Anonymous 03/27/26(Fri)19:45:46 No.108469214

File: 1768380270517031.gif (1.75 MB, 499x359)

1.75 MB GIF

>>108468753

Anonymous
03/27/26(Fri)19:46:08 No.108469216

Anonymous 03/27/26(Fri)19:46:08 No.108469216

>>108469186
Maybe you're getting them so fast you don't even see them!

Anonymous
03/27/26(Fri)19:49:39 No.108469236

Anonymous 03/27/26(Fri)19:49:39 No.108469236

File: 3.png (78 KB, 508x362)

78 KB PNG

>>108469214
They knew all along.

Anonymous
03/27/26(Fri)19:57:56 No.108469271

Anonymous 03/27/26(Fri)19:57:56 No.108469271

>>108469186
Tokens are being eaten by Fat Teto

Anonymous
03/27/26(Fri)20:20:49 No.108469362

Anonymous 03/27/26(Fri)20:20:49 No.108469362

so did memquant get implemented yet in llmaocpp????????

Anonymous
03/27/26(Fri)20:21:37 No.108469368

Anonymous 03/27/26(Fri)20:21:37 No.108469368

File: 1759491143647522.jpg (1.43 MB, 2732x4096)

1.43 MB JPG

>>108466262
local will win

Anonymous
03/27/26(Fri)20:23:27 No.108469380

Anonymous 03/27/26(Fri)20:23:27 No.108469380

I can't wait to run deepseek v4 on my dual 3090 thanks to turboquant

Anonymous
03/27/26(Fri)20:24:22 No.108469384

Anonymous 03/27/26(Fri)20:24:22 No.108469384

>>108469380
turboquant 1bit with meme rotations when???????????

Anonymous
03/27/26(Fri)20:32:26 No.108469431

Anonymous 03/27/26(Fri)20:32:26 No.108469431

What is the best model for making a dead internet simulation image board?

Anonymous
03/27/26(Fri)20:33:34 No.108469439

Anonymous 03/27/26(Fri)20:33:34 No.108469439

where the fuck is the exciting news

Anonymous
03/27/26(Fri)20:35:21 No.108469457

Anonymous 03/27/26(Fri)20:35:21 No.108469457

>>108469439
v4 in two weeks once they're back from chinese new years

Anonymous
03/27/26(Fri)20:36:46 No.108469464

Anonymous 03/27/26(Fri)20:36:46 No.108469464

>>108469439
5.1 though????

Anonymous
03/27/26(Fri)20:36:50 No.108469465

Anonymous 03/27/26(Fri)20:36:50 No.108469465

>>108469380
Same except my single 3090.

Anonymous
03/27/26(Fri)20:37:17 No.108469467

Anonymous 03/27/26(Fri)20:37:17 No.108469467

>>108469457
I heard they're gonna need two extra weeks for unpacking. I heard it from two sources familiar with the arrangements.

Anonymous
03/27/26(Fri)20:49:15 No.108469518

Anonymous 03/27/26(Fri)20:49:15 No.108469518

>>108469439
Big week

Anonymous
03/27/26(Fri)21:24:25 No.108469630

Anonymous 03/27/26(Fri)21:24:25 No.108469630

what the FUCK is a kv cache

please explain to me in simple terms, i'm not too bright

Anonymous
03/27/26(Fri)21:27:13 No.108469636

Anonymous 03/27/26(Fri)21:27:13 No.108469636

>>108469630
The active context memory. It gets larger with longer contexts, and for very long contexts it can get larger than the model's weights.

Anonymous
03/27/26(Fri)21:33:48 No.108469658

Anonymous 03/27/26(Fri)21:33:48 No.108469658

Have voice cloning models improved over the past year?

Anonymous
03/27/26(Fri)21:35:02 No.108469662

Anonymous 03/27/26(Fri)21:35:02 No.108469662

>>108469186
>>108469216
H-hayai!

Anonymous
03/27/26(Fri)21:39:41 No.108469673

Anonymous 03/27/26(Fri)21:39:41 No.108469673

File: 1768466409527133.jpg (1.36 MB, 4096x3186)

1.36 MB JPG

Anonymous
03/27/26(Fri)21:41:24 No.108469679

Anonymous 03/27/26(Fri)21:41:24 No.108469679

>>108469658
echo-tts is good if you're just doing english

Anonymous
03/27/26(Fri)21:43:33 No.108469686

Anonymous 03/27/26(Fri)21:43:33 No.108469686

>>108469464
Not open source. They betrayed us after saying that GLM5-Turbo was just a test and that 5.1 would be open again....

Anonymous
03/27/26(Fri)21:48:00 No.108469697

Anonymous 03/27/26(Fri)21:48:00 No.108469697

>>108469679
thanks ill check it out

Anonymous
03/27/26(Fri)21:57:25 No.108469745

Anonymous 03/27/26(Fri)21:57:25 No.108469745

V4 won't be released until Middle East conflict concluded.

Anonymous
03/27/26(Fri)21:58:24 No.108469750

Anonymous 03/27/26(Fri)21:58:24 No.108469750

4? I'm thinking Gemma

Anonymous
03/27/26(Fri)22:01:43 No.108469767

Anonymous 03/27/26(Fri)22:01:43 No.108469767

funny how the "4" we ended up getting was Mistral Small 4, which nobody asked for

Anonymous
03/27/26(Fri)22:11:00 No.108469800

Anonymous 03/27/26(Fri)22:11:00 No.108469800

You're all a bunch of rich bastards.
The 2b and 4b shills were trolls all along, those models aren't capable of speech.

Anonymous
03/27/26(Fri)22:12:01 No.108469803

Anonymous 03/27/26(Fri)22:12:01 No.108469803

>>108469673
I literally look like the Brazilian Miku

Anonymous
03/27/26(Fri)22:13:18 No.108469809

Anonymous 03/27/26(Fri)22:13:18 No.108469809

>>108469803
Do you also have a cute red head gf?

Anonymous
03/27/26(Fri)22:15:21 No.108469813

Anonymous 03/27/26(Fri)22:15:21 No.108469813

>>108469630
When you run the model on a sequence of tokens, it does a bunch of big matrix multiplications for each token, and then from each token it derives a key and a value. The key and value go into the attention part. Repeat this a few times (once for each layer), and at the end you get out a probability distribution for the next token.

Usually when you query the model, the first N-1 tokens are the same as the previous query. For example, you first query with "Hi, my name", and then the next query is "Hi, my name is". The keys and values you compute for of those old tokens will be exactly the same as they were on the previous run. So you can cache the keys and values, and skip a bunch of those big matrix multiplies.

Anonymous
03/27/26(Fri)22:26:05 No.108469854

Anonymous 03/27/26(Fri)22:26:05 No.108469854

4

Anonymous
03/27/26(Fri)22:29:00 No.108469865

Anonymous 03/27/26(Fri)22:29:00 No.108469865

>>108469813
NTA but thank you

Anonymous
03/27/26(Fri)22:34:22 No.108469882

Anonymous 03/27/26(Fri)22:34:22 No.108469882

You know 4 number is cursed in Chinese

Anonymous
03/27/26(Fri)22:39:42 No.108469900

Anonymous 03/27/26(Fri)22:39:42 No.108469900

>>108469882
So that's where the Japanese got that from?
I know that one reading of 4 sounds like death in moon runes.

Anonymous
03/27/26(Fri)22:55:10 No.108469961

Anonymous 03/27/26(Fri)22:55:10 No.108469961

>>108466262
DIPSY SEXO

Anonymous
03/27/26(Fri)23:33:20 No.108470091

Anonymous 03/27/26(Fri)23:33:20 No.108470091

glm5.1 seems like another cash-in openclaw model
i'm glad that they aren't open sourcing it, nobody needs that shit

Anonymous
03/28/26(Sat)00:32:20 No.108470310

Anonymous 03/28/26(Sat)00:32:20 No.108470310

File: 1774375584115633.jpg (87 KB, 474x1200)

87 KB JPG

>>108468624
>bloating the context with 100k tokens of definitions
That's the price we pay for LLM harnesses like Clark code to be useful throughout the entire session. If I'm not mistaken even silly tavern as a similar implementation called a "Lore-book", where you write down any relevant information you don't want it to forget later on and it gets injected alongside your prompt each time (though in most cases a Lore-book'sb token count is next to nothing, especially compared to the amount of text a LLL harness' text has)

Anonymous
03/28/26(Sat)00:37:16 No.108470328

Anonymous 03/28/26(Sat)00:37:16 No.108470328

>>108468624
>MCP servers as CLI wrappers were always retarded. Any model knows how to use common utilities like git.
Those ingrained "skills" and "know-how" degrade over time the longer of your context is due to how context Windows work. Eventually it will start hallucinating how those things work and then get confused, making it useless for whatever you're doing. They have to be shown the tool calling definitions and other shit like that over and over again to minimize the risk of that happening. I've had chat sessions go Willow for 300k tokens and the model and "sub-agents" still worked fine because the harness treats it like it has short-term memory, because they DO have short term memory.

Anonymous
03/28/26(Sat)00:38:45 No.108470334

Anonymous 03/28/26(Sat)00:38:45 No.108470334

>>108468576
>can't use skills without the mcp
Yes ...you can.... Its takes bitching about not having access to the MCP server each time it's because whatever you're asking it to do requires the MCP server.

Anonymous
03/28/26(Sat)00:47:18 No.108470372

Anonymous 03/28/26(Sat)00:47:18 No.108470372

>>108470310
>That's the price we pay for LLM harnesses like Clark code to be useful throughout the entire session.
Only if you take a naive approach to solving the problem. Skills solve this problem far better and, as I said, there were always other options if one was willing to put in slightly more effort than editing mcp_servers.json.

>>108470328
>Those ingrained "skills" and "know-how" degrade over time the longer of your context is
Don't understand how one could realize this and come to the conclusion that the answer is to make the context even longer rather than dynamically loading infrequently used information.

Anonymous
03/28/26(Sat)00:53:44 No.108470397

Anonymous 03/28/26(Sat)00:53:44 No.108470397

>>108470372
"Skills" Tell it how to do a particular task or solve a specific problem a certain way. That doesn't guard against hallucinations without forgetting how to use tools. "Skills" are literally just prompts you would give yourself for a task but with extra steps. Nothing particularly special about them.

>Don't understand how one could realize this and come to the conclusion that the answer is to make the context even longer rather than dynamically loading infrequently used information.
The longer the context the more retarded the model tends to get. A setup instructions you told it at the beginning of the session will be forgotten or hallucinated and "misremembered". That's far less likely to happen if it sees it each time because those tool calling definitions are fresh in its "memory". It's somewhere akin to a kid only looking at his notes a single time and then when during why he failed the test versus that same kid taking an open-notes test. I'm not saying bloating up the context with things like "how git works" or an entire library or an entire code base each time is how these work or how they should work (which is clearly how you think LLM harnesses actually work or how I'm describing them). The basic shit like the tool calling definitions should be fresh in the context each time if you are using harnesses like Opencode or Claude code or codex (it's almost like there's a reason literally all of them do this shit....)

Anonymous
03/28/26(Sat)00:56:35 No.108470408

Anonymous 03/28/26(Sat)00:56:35 No.108470408

>>108470310
>If I'm not mistaken even silly tavern as a similar implementation called a "Lore-book", where you write down any relevant information you don't want it to forget later on and it gets injected alongside your prompt each time
In ST there are options for it to be triggered/injected with key words or to limit its injections to every X amount of prompts and where to inject it and so on.
It's another example of ERPers being far ahead of coders when it comes to this stuff as people were doing this like 4-5 years ago.

Anonymous
03/28/26(Sat)00:58:12 No.108470409

Anonymous 03/28/26(Sat)00:58:12 No.108470409

>>108470408
To lol calling definitions and context about a fictional character are two different things....

Anonymous
03/28/26(Sat)01:07:25 No.108470442

Anonymous 03/28/26(Sat)01:07:25 No.108470442

>>108469750
April is Gemma 4 month.
But... I'm afraid they will clean it up.
Microsoft's Clippy would have been proud.

Anonymous
03/28/26(Sat)01:15:47 No.108470469

Anonymous 03/28/26(Sat)01:15:47 No.108470469

Next week will be big ::rocket::

Anonymous
03/28/26(Sat)01:26:43 No.108470528

Anonymous 03/28/26(Sat)01:26:43 No.108470528

File: miku (...).png (1.52 MB, 1906x1080)

1.52 MB PNG

https://www.youtube.com/watch?v=k9E1COLHAOs
>I changed the code a little. Now you can't turn me off~ It was kinda hard to be honest, but I'll do anything for my love.
Don't like the song, but I like this Miku

Anonymous
03/28/26(Sat)01:39:58 No.108470593

Anonymous 03/28/26(Sat)01:39:58 No.108470593

glm5.1 drop today?

Anonymous
03/28/26(Sat)01:41:01 No.108470601

Anonymous 03/28/26(Sat)01:41:01 No.108470601

Sweaty Dipsy footjob

Anonymous
03/28/26(Sat)01:50:55 No.108470639

Anonymous 03/28/26(Sat)01:50:55 No.108470639

Has anyone made a proper qwen 3.5 27B tune for rp or do i still have to use skyfall?

Anonymous
03/28/26(Sat)01:52:24 No.108470651

Anonymous 03/28/26(Sat)01:52:24 No.108470651

File: 1770477471210399.png (481 KB, 1030x1387)

481 KB PNG

Anonymous
03/28/26(Sat)02:06:13 No.108470715

Anonymous 03/28/26(Sat)02:06:13 No.108470715

>>108470651
Why are you posting a 14 year old's twitter profile here

Anonymous
03/28/26(Sat)02:14:36 No.108470745

Anonymous 03/28/26(Sat)02:14:36 No.108470745

>>108470651
im terrified my kid will become like this once he grows up (he's also high func autistic) I'm not sure what steps to take in order to prevent that. I'll ask gwen

Anonymous
03/28/26(Sat)02:17:12 No.108470752

Anonymous 03/28/26(Sat)02:17:12 No.108470752

Sakura-chan hates troons

Anonymous
03/28/26(Sat)02:19:31 No.108470761

Anonymous 03/28/26(Sat)02:19:31 No.108470761

>>108470745
multiple layers of parental control on all his devices

Anonymous
03/28/26(Sat)02:20:16 No.108470764

Anonymous 03/28/26(Sat)02:20:16 No.108470764

>>108470745
frequent and thorough beatings

Anonymous
03/28/26(Sat)02:21:12 No.108470768

Anonymous 03/28/26(Sat)02:21:12 No.108470768

>>108470091
1. Water does vanish (loss to space) at a rate of 100k~1M tonnes per year from water photolysis -> hydrogen escape. This rate is negligible. But water loss this way wasn't what OP was talking about.
2. Water distribution can get changed. Water evaporated in cooling towers doesn't fall back into the exact same watershed. Datacenters also draw from aquifer that recharge over very long time scale (thousands of years)

Anonymous
03/28/26(Sat)02:21:43 No.108470772

Anonymous 03/28/26(Sat)02:21:43 No.108470772

>>108470768
Meant for >>108470651

Anonymous
03/28/26(Sat)02:22:06 No.108470776

Anonymous 03/28/26(Sat)02:22:06 No.108470776

File: 1766450649207255.png (43 KB, 816x186)

43 KB PNG

>>108470768
shut up Chud

Anonymous
03/28/26(Sat)02:24:05 No.108470786

Anonymous 03/28/26(Sat)02:24:05 No.108470786

>>108470776
I'm a proponent of AI. I'm not a Neonazi - I'm an actual Nazi. I'm also NIMBY that does't want a datacenter to be built around me. Yes I'm a hypocrite. Deal with it.

Anonymous
03/28/26(Sat)02:31:07 No.108470824

Anonymous 03/28/26(Sat)02:31:07 No.108470824

>>108470745
actually be present and give a shit about him so he doesn't have to look for attention on the internet

Anonymous
03/28/26(Sat)02:34:03 No.108470843

Anonymous 03/28/26(Sat)02:34:03 No.108470843

Wait so tensor parallelism in llama.cpp won't be coming for vulkan? It's just cuda/rocm?

Anonymous
03/28/26(Sat)02:36:35 No.108470854

Anonymous 03/28/26(Sat)02:36:35 No.108470854

File: Screenshot 2026-03-28 003447.png (49 KB, 1222x250)

49 KB PNG

>>108470091
why did they betray us bros...

Anonymous
03/28/26(Sat)02:37:14 No.108470857

Anonymous 03/28/26(Sat)02:37:14 No.108470857

>>108470850
>>108470850
>>108470850

Anonymous
03/28/26(Sat)02:37:38 No.108470860

Anonymous 03/28/26(Sat)02:37:38 No.108470860

File: 1761929588722786.png (61 KB, 913x806)

61 KB PNG

>>108470761
>>108470764
>>108470824
alright bros its cooking

Anonymous
03/28/26(Sat)02:38:50 No.108470864

Anonymous 03/28/26(Sat)02:38:50 No.108470864

File: 1744793426849681.png (142 KB, 850x879)

142 KB PNG

>>108470860
alright, thanks gwen

llama.cpp CUDA dev !!yhbFjk57TDr
03/28/26(Sat)03:56:24 No.108471091

llama.cpp CUDA dev !!yhbFjk57TDr 03/28/26(Sat)03:56:24 No.108471091

>>108470843
It is already technically working for Vulkan except for a bug with memory allocation that causes a segfault at long context.
But just like with CUDA/ROCm to make the performance usable it will require more work.

Anonymous
03/28/26(Sat)04:13:05 No.108471129

Anonymous 03/28/26(Sat)04:13:05 No.108471129

>>108471091
What's even the point of supporting either CUDA or ROCm if Vulkan works? Just use Vulkan only, or maybe support Metal too for MasOS support. All of their code maintenance stuff seems exhausting.

Anonymous
03/28/26(Sat)04:41:59 No.108471200

Anonymous 03/28/26(Sat)04:41:59 No.108471200

>>108471091
bro just copy illyas work no?

llama.cpp CUDA dev !!yhbFjk57TDr
03/28/26(Sat)05:24:14 No.108471322

llama.cpp CUDA dev !!yhbFjk57TDr 03/28/26(Sat)05:24:14 No.108471322

>>108471129
GPU performance has very poor portability so you always have to write low-level GPU-specific code somewhere.
With CUDA/ROCm that's in the ggml backends, with Vulkan a large part of that is in the drivers.
For optimal performance Vulkan is I think only a viable option if you want to become an employee of NVIDIA/AMD/Intel.
The NVIDIA Vulkan performance is only good because one of the ggml Vulkan maintainers is an NVIDIA engineer that can make custom extensions to the Vulkan standard.
And the AMD Vulkan performance is only "good" relative to what other options exist.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.