[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File deleted.
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108459276 & >>108453570

►News
>(03/26) CohereLabs releases Transcribe 2B ASR: https://hf.co/CohereLabs/cohere-transcribe-03-2026
>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts
>(03/26) ggml-cuda: Add NVFP4 dp4a kernel #20644 merged: https://github.com/ggml-org/llama.cpp/pull/20644
>(03/25) LongCat-Next native multimodal 74B-A3B released: https://hf.co/meituan-longcat/LongCat-Next
>(03/25) mtmd: Add DeepSeekOCR Support #17400 merged: https://github.com/ggml-org/llama.cpp/pull/17400

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: v4.jpg (112 KB, 1024x1024)
112 KB
112 KB JPG
►Recent Highlights from the Previous Thread: >>108459276

--Voxtral TTS release and initial impressions:
>108459652 >108459758 >108459766 >108459836 >108459844 >108459888 >108459902 >108461249 >108462139 >108459995 >108460450 >108460456
--Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models:
>108464620 >108464631 >108464656 >108464739 >108464755 >108464808 >108464836 >108464856 >108464905 >108464929 >108465010 >108465032 >108465028 >108465052 >108465072 >108465094 >108465163 >108465186 >108465191 >108464865 >108464644 >108464662 >108464680 >108464697 >108464682 >108464701 >108464750 >108464764 >108464803 >108464862 >108465009 >108465026 >108465073 >108465080 >108465139 >108465432 >108465850
--KLD heatmaps reveal hidden degradation in aggressive KV cache quantization:
>108463990 >108464591 >108464625 >108464635 >108464627
--Mistral releases open-source Voxtral TTS:
>108459318 >108459428 >108459525 >108459563 >108461953 >108461978 >108462002 >108462064 >108462078 >108462107 >108462503
--GPU coil whine interferes with guitar amp, TTS voice cloning comparisons:
>108460183 >108460208 >108460218 >108460232 >108460247 >108460881 >108460901 >108460910 >108461004 >108460928 >108460947 >108460975 >108461005 >108461025 >108461047 >108461080 >108462135 >108462264 >108462944 >108462961 >108462982 >108463064
--Models handling verbatim lyric requests differently due to alignment:
>108464911
--Evaluating TTS demo quality:
>108459914 >108459947 >108459956 >108460605 >108460650
--Chroma Context-1 model released without harness:
>108463927 >108463946
--Z.ai 5.1 open-source release expected early April:
>108465382 >108465454 >108465751
--Qwen3-TTS and VibeVoice resources:
>108459560
--Miku and friends (free space):
>108460212 >108462728 >108462782 >108465571 >108460736 >108461211 >108461256 >108461280

►Recent Highlight Posts from the Previous Thread: >>108459279

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108466262
Blessed small saggers thread.
>>
>>108466278
>small
>>
>>108466262
Whatever you do, DONT read any manga drawn by Ankoman.
>>
>>108466262
Saggy tits eewww
>>
>>108466254

>Make it make sense anon. They are directly influencing how AI functions
Apple is to incompetent to shit ANY useful model anon..... their influence is non-existant
>>
File: file.jpg (410 KB, 1408x768)
410 KB
410 KB JPG
>>108466254
>that one can't make fun of in a meme format on AI image gen platforms with the Apple logo.
???
>>
File: Deepseek is coming.png (327 KB, 1080x880)
327 KB
327 KB PNG
>>108466262
>that image
Deepseek is coming
>>
The WHALE
>>
>>108466327
>deepseek is coming
NO, I am
>>
>>108466321
There are certain platforms based in silicon valley that prevent anti-Apple shenanigans.
>>
>>108466327
a lot larger but we can run models six times larger than we used to thanks to Google Turboquant™
>>
File: file.jpg (505 KB, 1024x1536)
505 KB
505 KB JPG
>>108466337
previous was gemini, now here's chatgpt on the same prompt
there's no conspiracy against shitting on apple in your SaaS prompts (or LM studio or whatever else), you need your meds
>>
>>108466352
>>108466321
You need to fuck off to one of the image gen generals.
>>
>>108466371
you need to fuck off to leddit with your amazing reading comprehension
>>
>>108466378
You've been posting your retarded image slop with the same character for days and it's not even local.
>>
>>108466383
>You
I am not the guy you're referring to, I just decided to reuse that character since I found it funny. Try again, retard.
>>
I hate that I have to build my own 'FETCH' mcp tool because the autists have made it respect the robots.txt
>>
>>108466393
That irrelevant because your posts are still shit.
>>
>>108466397
>the autists have made it respect the robots.txt
you are a blight upon the earth
>>
qwen 122b-a10b changed my life
>>
Tetonation incoming... I am sensing Gemma 4 release before Easter.
>>
>>108466410
For the worse?
>>
>>108466397
Why build one from scratch instead of forking the existing one?
https://github.com/modelcontextprotocol/servers/blob/main/src/fetch/README.md
>>
>>108466415
I forked this already of course
>>108466405
im gonna rape your website, retard
>>
Something will probably happen at some point. Or not. That's my prediction. Screenshot this post.
>>
>>108466410
How many times did your ego die?
>>
>>108466413
for the better, I've been pasting giant research papers into it and asking it to explain it in simple terms so I can actually implement shit without having a PHD. It's going great. I've finally found a good balance between trying being an AI luddite and letting my brain atrophy and expecting AI to do all thinking for me.
>>
File: 1763977021694160.png (47 KB, 691x528)
47 KB
47 KB PNG
>>108466415
bro...
>>
>>108466423
nostrildamous over here with the takes
>>
>>108466433
I knew you would say that.
>>
>>108466400
let me guess, iToddler?
>>
Is $400 for a v620 okay?

The listing said 16gb, but amd never made any 16gb v620s, did they? About as strong as a 6800xt....
>>
>>108466427
Using AI for inspiration did wonders for me but i fucked up letting it write code without going through the steps and thinking about it critically. I feel like im a worse programmer as a result.
>>
>>108466432
https://github.com/modelcontextprotocol/servers/blob/main/src/fetch/README.md#customization---robotstxt
it's right in the instructions too, but vibecoding is the solution to everything now fuck reading
>>
>>108466496
>reading doc instead of source
I'm not a nocode shitter sorry :(
>>
>>108466262
wowzers, how do i make my own goonbait with a local model? im on an asus ultrabook with no GPU btw, just intel graphics
>>
it isnt this, it is this
they arent that, they are that
>>
Do venvs use system cuda or is the cuda toolkit packed somewhere inside it?
>>
ggnigeranov TURBOQUANT KV + WEIGHTS SUPPORT WHEN!?!?!?!
>>
>>108466469
Don't know about amd cards, sry.
16gb nvidia card can be had on ebay for $450-$500.
>>
So what happened? Youtube video about turbo quant?
>>
>>108466547
i think maybe no by default. i had to use --no-build-isolation when i compiled flash attention.
>>
File: ram.png (269 KB, 531x435)
269 KB
269 KB PNG
I wonder if this is another nothingburger to raise the company stock, or an actual advancement that can benefit local AI.
>>
File: 1763613215750901.png (60 KB, 907x351)
60 KB
60 KB PNG
this guy has a 'living rent free in your head' problem lmao
>>
File: kek.png (102 KB, 778x646)
102 KB
102 KB PNG
https://www.reddit.com/r/LocalLLaMA/comments/1s56q9g/new_unsloth_studio_release/
>>
>>108466588
Would IK really have been able to "independently discover" Hadamard transforms without the easy-to-follow implementation in ExLlama?
>>
>>108466588
That's called having an inferiority complex.
>>
>>108466593
What a bunch of hacks
>>
>>108466588
I would probably end using his fork if he wasn't such a fucking baby.
Or maybe he would still be contributing to mainline if he wasn't.
>>
>>108466554
6800xt (500gb/s) is about the same performance as a 5060 ti (450gb/s), according to random benchmarks on the internet. So basically a 32gb 5060 ti without cuda for $400.
>>
>>108466600
goated reference there my friend.
also checked
will cudadev be able to EVER recover, pipeline paralellism bros???
>>
File: 1774361027986.png (11 KB, 481x77)
11 KB
11 KB PNG
>>108466604
nah they're fine, weird they didn't mention this in the release notes though...
>>
>>108466588
Everything is a personal affront to him. Very unstable.
>>
>>108466547
System cuda is referring to cuda compiler which is totally different from pytorch.
Venv are using pytorch version what you have installed. Pytorch interfaces with your graphics drivers.
>>
>>108466632
Yes but the package is cu128 but I have cuda 13. It still runs but for any potential issues I want to know if this is something worth looking into in the future.
>>
>>108466645
doesnt matter bro, cuda is made to BUILD, that shit will run in your DRIVERS for fucks sake you stupid mongoloid.
>>
>>108466645
You will only run into problems if your graphics drivers don't support the pytorch version. In any case don't worry about it as long as pytorch is recent enough.
>>
Why can't they, like, just stick a tb of vram on a single gpu?
>>
>>108466686
that would be antisemitic. think of the poor shareholders!
>>
>>108466686
You wouldn't pay their asking price.
>>
>>108466690
>>108466692
I'm a shareholder. Where's my special nvidia discount?????
>>
>>108466583
This is so retarded. It's just like a few months ago when Gemini 3 released and day traders were spamming Nvidia was done because Google had just developed a GPU-killer called T-P-U.
>>
>>108466657
thanks mongoloid lover anon
>>
File: 1758190134007445.gif (562 KB, 200x200)
562 KB
562 KB GIF
>>108466593
These uncsloth and redditards are really a match in heaven
>>
>>108466604
maybe but no one else seems to properly do various quant gguf for everything like they do
>>
>>108466757
>properly do
LMAO
>>
>>108466764
I don't get the Unsloth hate. Isn't that two dudes doing work that 99.999% percent of people couldn't do? They're contributing to democratize llm finetuning more than anyone else. In the talk that he's given, Daniel Han also seems a bit high strung, but competent.
>>
>>108466764
[various quant]<-(properly do)
or
[gguf]<-(properly do)

how interpret?
>>
>>108466771
daniel why the fuck are you posting here? go be clueless about your shit somewhere else.
>>
>>108466771
>couldn't do
Don't *want* to. You don't need to load the full model to quant it.
>>
>>108466779
Are you doing training at all or just using and RPing? Unsloth is an easy to use library, it is helping me. Maybe I haven't found something better yet, but I've tried Axolotl, Llama-factory and just Transformers, as well as plain Pytorch, and Unsloth is simple, the notebooks are great to show what do to do, etc.
>>
>>108466792
I was talking in general with Unsloth, just not gee-gee-u-huff-ing the mdoels.
>>
>>108466809
not just guh-guffing the model I meant, can't spell today.
>>
>>108466802
>the notebooks
Jupyter is the worse fucking trash ever invented.
>>
>>108466818
Yes we should all code in emacs.
>>
>>108466809
They can't be trusted to make a quant without reuploading it 20 times. I wouldn't trust them with training.
>>
File: 1753574759173290.gif (140 KB, 379x440)
140 KB
140 KB GIF
>>108466802
NTA, but I did some training and Unslop paywalling their multiGPU feature make it almost useless if you're not finetuning GPT2. Axolotl is way more flexible, has actual competent devs you can talk to. Stop being retarded.
>>
>>108466835
It's not paywalled (anymore at least). I just did on 2x and 4x GPUs, launching the script with accelerate, and it all went super smoothly.
>>
>>108466826
I write all of my code with Doom Emacs and a German keyboard layout.
>>
>>108466826
I miss vim but after I tried to finish a rewrite of my project I just can't. It always somehow pastes wrong and all these little things like when you hit escape (which is my capslock) cursor jumps back one character.. so fucking irritating. My .vimrc is pretty long and I have used it for years now.
I can't recommend vim to anyone unless you are working over a terminal I guess.
Emacs isn't that much better.
These software are used by autists because they don't know what ergonomics means and don't mind to press 4 different keys to get a simple functionality.
>>
>>108466852
nta. I use vim, but based anyway. You're too cool.
>>
>>108466854
To add: and then there is the sunk cost fallacy. Just because you have used something for years doesn't mean you can't ditch it and get something better.
>>
>>108466863
I use vim when I log into something through the terminal, but locally I use VSCode and notebooks. I am trying to bring myself to switch to marimo to vibecode-maxx since current LLMs have difficulty working with Jupyter, just haven't had the will to yet.
>>
>>108466872
I mean writing should be something what is intuitive and not behind multiple keystrokes and guesswork will my selection paste correctly or not.
>>
>https://github.com/ggml-org/llama.cpp/pull/21074
BLACKWELL BROS
WE WON!
>>
File: AI.png (110 KB, 533x433)
110 KB
110 KB PNG
Has the era of TURBO-QUANT started?
>>
>>108466930
>>108466732
>>
>>108466930
literal fake news lmao
>>
>>108466930
>>108466941
AI-pushed fake to get the normies and luddites off their back
>>
Additionally, whatever memory savings this will net, will be immediatelly deleted by vibecoded pajeetware bloatmaxxing .
>>
>>108466941
The only thing that matters is for the retarded collective will of the market to somehow buy all this and bring cheap RAM back.
Don't ask me how that would work, I still don't even know how OAI managed to buy this much memory while not having any money to pay for it.
>>
>>108466930
Honestly I'm all for people overestimating what turboquant actually does. win for everyone.

I'm actually looking forwards to it being implemented in lcpp. Big context is based.
>>
>>108466930
Wut? The whole market's tanking right now.
>>
File: 1752997653634222.png (93 KB, 665x361)
93 KB
93 KB PNG
Where are those 3000 assets

I've only seen the leaked Mythos webpage
>>
>>108467034
this sounds so fucking fake
sonnet, opus... capybara? what is this, meta?
>>
>>108467034
Sounds like a benchmark twitter endorsement.
Every new model is always few % better
always and ever
But this time it's a rumour and "leaked documents".
>>
>>108467042
Anthropic and OpenAI are both promising a soon to come breakthrough https://www.youtube.com/watch?v=s4tptozUJ8Y

It might be a nothing burger, but it's been two years since reasoning, so who knows.
>>
came for droopy tits
>>
>>108467106
*to
>>
was there big news why thread fast
>>
>>108467141
Google made models 6x smaller
>>
>>108467150
like in theory or can I download something to run some 200gb model on my 3090s
>>
File: 1761681185788405.jpg (29 KB, 600x733)
29 KB
29 KB JPG
>>108467150
>>
>>108467167
>>108467169

>>108466930
>>108466583
>>
>>108467065
>reasoning
Isn't this just autoprompting?
>>
>>108466930
isnt that just for kv cache (context)? lol, context rot is still a thing so enjoy the slopped 1M context
>>
File: snip132.png (85 KB, 1490x399)
85 KB
85 KB PNG
huge llama.cpp update: every purchase of llama-server comes with a complimentary footgun
>>
>>108467127
*on
>>
>>108467191
I bet comfyui manages to use this to brick peoples pcs by accident
>>
>trynna force ldg drama again
>>
>>108467181
I'd argue there was a huge shift when O1 came out versus "Let's think step by step". Before O1, there was chain of thought, program of thought, forest of thought and a bunch of other prompting strategies, but "reasoning" made it switch to another level. The model started to stay on track a lot more, etc.
>>
>>108467191
>llama-server
We use vLLM here
>>
I heard Google made boobs 6 times saggier
>>
>>108467269
fake news, only nipples
>>
>>108467243
>We use vLLM here
we as in you and the one other guy with the bitcoin mining rig with 8 ewaste 3090s?
>>
>>108466930
Gemma 4 bros.... I dont feel so good....
>>
>>108466262
just the right level of sagging to make it maximally erotic
>>
>>108467303
talking about gemma I'm alway surprised by constantly seeing it placing high in current benchmarks against newer models, kek
>>
>>108467279
>replying to the obvious bait
>>
File: 1755899588145690.jpg (98 KB, 1170x1117)
98 KB
98 KB JPG
Can I use koboldcpp antislop feature with sillytavern as the frontend or do i need kobold as frontend too?
>>
>>108467303
Google was going to release it but the qwen3.5 dropped and mogged it so they delayed it.

Many such cases.
>>
>>108467381
It has better writing capabilities. That doesn't say too much. It's also biased. Try changing your 'gender' to female in the same scenario and see how much the narrative changes.
When you do a q&a with the model it replies in different fashion if you are male vs female.
It's funny but behaviour is there.
>>
>>108466845
Fellow slop-tunner here. What are you training models to do?
>>
>>108467416
Bro, you have a mental illness.
>>
File: f.png (8 KB, 401x60)
8 KB
8 KB PNG
>>108467399
>Can I use koboldcpp antislop feature with sillytavern as the frontend
yes it taps into banned strings
>>
>>108467416
>She has to be on birth control because there is no way that could fit in there and not get stuck inside of her womb!
huh?
>>
>>108467444
?huh
>>
File: Perhaps.png (379 KB, 500x522)
379 KB
379 KB PNG
>>108467421

>>108467444
slop-tunned 2b model. Testing to see if I can train it to be less retarded with better dataset curation
>>
>>108467455
uhuh
is it working?
>>
File: 1750637897414001.png (21 KB, 717x244)
21 KB
21 KB PNG
>>108467422
thanks anon, but where is that? is that an extension?
all I see is logit bias handling
>>
File: f.png (41 KB, 404x372)
41 KB
41 KB PNG
>>108467473
just above logit bias for me, show your connection profile?
>>
>>108467491
oh I see, it's in text completion, not chat completion
damn it
>>
>>108467459
actually yes (kind of). Using a dataset that ONLY has link rel (https://huggingface.co/datasets/AiAF/conversations ) leads to the model's "safeguards" being blasted away but at the cost of "catastrophic forgetting" and pretty much ONLY being anle to respond to any query with eroroc shit. Did another fine tuning run, but this time with the data said being 70% general purpose, shit and the other 30% being the nsfw data. The 7030 version retains its "intelligence" more or less and is also willing to comply with "problermatic" requests. The 70-30 ratio data is kind of shit at the moment because the general purpose portions are only single term conversations so next I'm going to try to curate one that has multiple turn general purpose data samples instead of just single term. I should probably focus on a data set where the rp/story telling portions aren't ONLY nsfw too. Once I do this, and I'm satisfied with the results I'll probably try this again but on a higher parameter model so it's actually worth using. Doing this on a 2B model is just a proof of concept phase and also relatively fast and easy to train.Pic rel is the from the 70-30 model. It's obviously otter shit compared to higher param models but it's a start for now and she's promise compared to this >>108467416
Dataset used: https://huggingface.co/datasets/AiAF/combined_70_30_shuffled
>>
>>108467509
you can probably add it with the option to add unsupported chat comp things, but I've no idea how you'd format it for that
>>
>>108467512
i'm not really sure if this will scale anon, but give it a try
>>
>>108467529
>will scale anon,
wdym "if this will scale"?
>>
>>108467416
Cool stuff, anon.
Implement config files and you can switch them out when needed on the fly.
On the fly? I didn't talk about the insects.
>>
>>108467553
you are doing a full finetune yeah? on small 2b model your mixed dataset is doing "okay" because the model is really small so you can't actually tell if it's getting much more retarded or not
on a bigger model the hit to the smarts will be much larger since it was trained on more data
this was always a problem, if you didn't have the original data the model was trained on, or didn't have large enough dataset of your own then it's really easy to overfry
>>
Big week
>>
>>108467633
:rocket:!!!
>>
>>108467584
>you are doing a full finetune yeah?
qlora using axolotl.

>so you can't actually tell if it's getting much more retarded or not
Exceptionally quite easy to see given that the original fine tune I did used a data set. That was literally nothing but nsfw stories formatted into an sft format I could use with the Axolotl trainer. The result was that the model was willing and compliant with any nsfw prompt but was hardly useless at basically everything else.

Responses using the model using the ALL nsfw dataset: https://files.catbox.moe/w1qh5y.json

Responses using the 70-30 ratio dataset: https://files.catbox.moe/vrwtqw.json

The former's responses were not only almost incapable of producing something that wasn't forcing nsfw themes (it seems to REALLY like talking about moms) and most of the responses were not only God, but pretty nonsensical even for a 2B model. The latter was actually able to stay on topic based on what the user input was. It is capable of engaging with nsfw and "unethical" requests but it will only go in that direction if you explicitly ask it to or your prompt goes in that direction (at least that's the case in my limited test testing). The next time anyone tries to argue against "Garbage in --> Garbage out, show them these logs for comparison.
>>
>>108467416
Kaggle stuff. I'm just not into roleplay. I used to be an aspiring writer way into automatic writing and stuff and wrote a bunch of books for myself, so I get being into a different world, creativity, etc., but although I tried, I don't get that kind of roleplay. I might just be too shy for it, I don't know.
>>
Gave GLM 5.1 a shot over API
First impression is I literally can't tell the difference over 5, so I assume all the effort went into agentmaxxing
>>
>>108467693
That's a good thing because it shows there's no regression over other tasks
>>
>>108467665
ah nevermind then, qlora is fine
what larger model are you thinking of if it goes well?
>>
>>108467693
5.1 is probably just a sloptune of 5
>>
File: hmmmm.jpg (12 KB, 300x300)
12 KB
12 KB JPG
>>108467725
It's not that the first fine tune didn't "WANT" to answer those types of questions, it just couldn't reliably. I basically unintentionally fried it into only being able to engaging the user in nsfw-rp because the data said I use contained nothing but human written smut, most if not all of which involved sex and whatnot. See the first link here >>108467665
I've already tested methods like DPO and it worked (it's my understanding that GRPO is a more advanced version of DPO). The thing is, those methods tell a model "these kinds of answers are bad in these kinds of answers are good" but that wouldn't necessarily change how the model responds. My goal was not only to essentially "uncensored" a model (mostly to see if it was possible since many people here were swearing up their ass it wasn't possible) but to see if I could inject, for lack of a better term, "SOVL" into the model by showing it examples of shit people actually wrote and not synthetic shit or filtered flowery purple pros trash That's likely to blame for shit like "shivering down my spine" or "her voice was husky" (stuff even relatively uncensored models like Mistral Large 3 do in spades). In other words, I wanted to also "deslopify" the model and not just make it super compliant and willing to please. That's piss easy and can be done via jailbreaking or prefilling if the model isn't specifically trained to counteract that. DPO and training methods like that would " uncuck" the model but that wouldn't necessarily fix the slop problem. If the model is trained on human written content and only shown synthetic content in the general purpose portion of the data set, in theory that should uncensor it AND cut down on repetitive "slop" outputs significantly. If this can work on a mere 2B model Then this should definitely work on significantly "smarter" higher parameter models. Plus it's a fun little challenge for me to keep myself occupied. It's a proof of concept shower-thoughts side project of mine.
>>
>>108467828
for >>108467725
>>
>>108467831
oh,,, his post is gone
>>
llama.cpp commits a lot to master, is that normal or do they just not care and if you want something stable you have to go to the commit you want yourself?
>>
>>108467828
Hmm. You have a distinct point — it's not about your opinion but theirs.
>>
>>108467834
It's normal and they don't care. And they shouldn't care. If you want stable just don't update. They have a bunch of pre-build releases as well.
>>
File: 1770644243176391.png (489 KB, 1034x1482)
489 KB
489 KB PNG
OMG

there is no way this jew is THIS ignorant or retarded
>>
File: 1745692294667385.png (107 KB, 325x590)
107 KB
107 KB PNG
>>108467852
Valuable information that I appreciate, but why do your posts keep getting nuked?
>>
>>108467864
Posting twitter shit should be a bannable offense.
>>
>>108467864
He's just helping Dario out.
>Look at Yud whining about it, it must be good.
>>
>>108467880
I believe the original poster deleted them by her own.
>>
>>108467864
I don't read twitter posts.
Makes me feel great about myself because I despise social media.
Did you know most of twitter posts you are seeing in your 'feed' are the same as youtube's algorithm - paid shills, or shills wanting to get paid. AI written content.
>>
>>108467904
>her
>>
>>108467512
https://github.com/p-e-w/heretic try this for size to speed up the process anon.
>>
>>108467933
I knew some US poster would be irritated by this.
>>
File: mikuwall.png (29 KB, 1920x1080)
29 KB
29 KB PNG
Decided to try some commercial models- no paid plans!
>Generate several ASCII versions of this Miku silhouette with varying complexity. Sized maybe 25x35 something suitable for a "fastfetch" output
It's a disaster
>what the actual F is this hellscape. don't predict tokens, use a graphical library to infer luminence levels and there must be some well understood way to implement video encoding in ascii. there's an output option in vlc right? figure out how it works and do something similar, we need to generate the tools to create the correct Miku art
It's a disaster again
Let's see what local models can do!
>>
>>108467944
>US poster
very wrong guess, I'm europooristani
>>
>>108467948
Ok. So you get scott free now then?
>>
>>108467864
That model name is already taken: https://huggingface.co/EldritchLabs/Cthulhu-8B-v1.4
>>
>>108467955
The new model is Mythos, not Cthulhu
>>
>>108467958
Mythos is pretty bland name, could've used Elder Sign or something.
>>
>Mythos
Mythomaxbros, we are so back.
>>
>>108467966
It's all a misunderstanding.
>MyTOS
Anthropic simply baked the terms of service inside Claude's soul.md.
>>
File: 1762428905243034.png (127 KB, 1291x577)
127 KB
127 KB PNG
>>108463639
NeurIPS cucked out
>>
>>108467980
I don't read AI generated social media posts.
>>
>https://www.newegg.com/intel-arc-pro-b70-32gb-graphics-card/p/N82E16814883008
Who's getting one?
>>
>>108468013
it's neither of those
>>
>>108467980
Am I a genuine retard or this is a word salad that doesn't actually say anything? Are Chinese labs actually allowed in or no?
>>
>>108466732
It's all chasing the dragon
>>
File: mikuwall.braille.png (3 KB, 533x610)
3 KB
3 KB PNG
>>108467947
Do you really need it to be made by a language model? It's something you can do in a few hours. I just happen to have one I made a while ago.
Here's in braille. I don't share the code because it's ugly.
https://pastebin.com/aP8Wtqhu
>>
>>108468020
Can you prove it?
>>
>>108468027
Yes you're genuinely retarded. Literally in the second paragraph it says US gov sanction is broader than what NIPS is required to follow.
>>
>>108468032
What is your font name, pixelsize?
>>
>>108468041
This does not clarify whether Chinese labs fall into this "smaller" set of mandatory restrictions or not. It's nothingspeak.
>>
>>108468048
Because the clarification is not in the screenshot "We have updated the link and clarified the text of our policy"
What happened to read comprehension
>>
>>108468043
-misc-fixed-medium-r-normal--8-80-75-75-c-50-iso10646-1
Seems to be 8x16.
>>
i'm having grok and deepseek explain kv or kv cache to me because idk what it is but people mention it so frequently it must be a big deal
>>
>>108468073
There's no font name in that.
fc-list : file family style pixelsize |grep -i sgi
sgi = is the font I am using, Silicon Graphics screen font.
>>
>>108467947
stop being so llm brained and use one of the specialized cmdline tools like img2txt or cacaview
>>
>>108468062
>What happened to read comprehension
grok tldr?
>>
>>108468096
It's not
Every time you read a BIG ADVANCE on social media it has some caveat
>>
File: fontquestionmark.png (7 KB, 548x621)
7 KB
7 KB PNG
>>108468099
Hm. I think I was looking at the wrong thing. Try terminus. To be honest, I don't fuck around with fonts at all. I change the font size on xterm to tiny, so it's not really 12.
>>
>>108468156
Seems like you don't understand or care.
I did not ask you to post a screenshot.
>>
>>108467521
using this format worked :

banned_strings:
- "a b"
- "c d"
>>
>>108467864
this dude is doomer mentality personified
>>
>>108468156
Maybe it is terminus, it is impossible to understand how anyone would read this shit?
I use 15 pt.
>>
Should I be using openclaw or are there better alternatives now? Will be on a separate user on my M1 Mac with 64 GiB memory. Intend on running a local model ofc.
>>
>>108468147
>kv cache
>don't buy into the latest hype bro
uh? isn't kv cache just keeping into memory the keys and values that have been computed before so that you don't need to recompute them all at each step?
>>
>>108468176
>Should I be using openclaw
No
>>
>>108468156
It is not terminus.
>>
>>108468163
You seem to care too much. You probably know what to look for in there. Save some back and forth.
>>108468173
I still have good eye sight somehow.
>>
>>108467864
i hate yudkowsky so much. his arguments-from-analogy/story are so fucking stupid.
>>
>>108468177
Pretty sure everyone here knows what KV cache is
KV cache compression is not novel
>>
>>108468178
Not helpful.
>>
>>108468176
You should probably try it to see what it's about. My goal is to do so soon, but I'm lazy.
>>
>>108468176
There are about ten billion *claw ripoffs by now. I have no idea which ones are actually good. Personally I've been running picoclaw, which is admittedly just as much as a slopfest but it feels a little saner than the nodejs shit show that openclaw is
>>
I hadn't realized the question was about TurboQuant. Haven't looked into it all all.
I had liked this explanation of KV cache: https://youtu.be/hMs8VNRy5Ys&t=367
>>
>>108468176
The dust has not settled yet and really it still feels like a wild west situation. Wait a few more months.
>>
>>108468176
wait to see what theo recommends
>>
File: 1751212991740336.png (536 KB, 680x628)
536 KB
536 KB PNG
OpenClaw is a glorified system prompt
>>
>>108468244
I don't mind.

>>108468253
I don't know anyone called Theo. It's not a common name here.
>>
>>108468261
he's a really smart tech/ai youtuber
>>
>>108468261
>I don't know anyone called Theo. It's not a common name here.
A lot of web grifters are recycling themselves as AI grifters. The other anon was probably making a bad joke when they suggested listening to her. Those "people" should just be entirely ignored.
>>
>>108468176
>>108468197
>>108468214
https://github.com/NVIDIA/NemoClaw
Why not OpenClaw without the security nightmare?
>>
>>108467966
The other option would be "Epic" but someone on the marking for Claude either hates quirk and/or hates fun. Either way its probably not the final name, the worse would be just "Claude Opus 5"
>>
>>108468260
A system prompt that connects any LLM to telegram/whatsapp is pretty damn powerful.
>>
>>108468253
>>108468260
>>108468265
go back
>>
>>108468314
>hurr durr tool calling is powerful
>>
>>108468328
It is
>>
>>108468328
If RAG was LLM 2.0, MCP is LLM 3.0. brb writing the blogpost now
>>
>>108468306
meh, I hoped this will be an original scaffolding alternative but instead it's just some kind of OC wrapper?
>>
>>108468328
I was messing with tool calling locally with persistent memory on a tiny model and while it did some stupid shit, it did seem to make it perceptibly smarter and open up some ideas I wouldn't be able to do normally, so I can see why normgroid retards salivate over it with cloud usage. I'm gonna try with a larger dense model that has cheap context and local skills too see if the solution to "model is too retarded to do x task properly even if it's big" is to just stuff a shitload of knowledge into a recurrent model's cache to make it perform better
>>
>>108468328
yes
>>
Openclaw is failing on my setup. Everyone told me it would work but I have to download smaller and smaller models to see if something will work.
>>
>>108468385
what?
>>
*opens your claws*
>>
DeepSeek v4 is Spud
>>
The pancakes are a lie
>>
>>108468188
rationalists are addicted to thought experiments, they can't even begin to process ideas if they aren't in the form of a though experiment
>>
>>108468328
get this, what if the tool is to call comfyui to have your robo-waifu generate an image of herself then send it to you?
>>
File: be.png (271 KB, 444x217)
271 KB
271 KB PNG
>ai psychosis is real
>>
File: 1762529095925501.mp4 (3.56 MB, 1280x720)
3.56 MB
3.56 MB MP4
>>108468364
I feel pretty confident in predicting that MCP, much like RAG, is kind of a doomed concept in the sense that model advances will make it largely irrelevant.

The most modern models today do just as well with using random ass cli utils that have a --help as they do with something that implements MCP. The same goes for remote APIs; they can navigate those well enough and make requests on their own. There is simply no need for a strictly defined prescriptive protocol format like MCP lays out.

>pic unrelated
>>
>>108468459
kobold/ST already have sdcpp built in. No need to invoke seven trillion pytorch gigs of nonsense to gen an image.
>>
>>108468459
It'd still be an open loop
My "robo-waifu" would have no idea what the generated image would look like
>>
>>108468471
they've already been largely replaced in modern agent frameworks by skills, which are just text files telling them what to do
>>
>>108468471
>determinism is useless, let's roll a dice for critical operations
damn retard
>>
>>108468328
this nigga is raw dogging his model doing math on tokens instead of a calculator
>>
>>108468490
You didn't understand the post you replied to.
>>
>>108468481
wym, lots of models have vision
>>
>>108468490
What does mcp have anything to do with determinism?
>>
>>108468484
>skills replace mcp
>at least locally, you need to have an mcp server to use skills
>the server I use the most is one that reads/writes notes into a specified folder
>the two are almost the same, except skills have a narrower scope
So which is it, the chicken or the egg?
>>
>>108468516
>at least locally, you need to have an mcp server to use skills
you don't necessarily
it's all tool calling under the hood anyways
>>
>>108468484
You still need a json schema for validation, so mcp aren't useless
>>
Only good thing about OpenClaw is that it killed MCP
>>
File: 1753199613706708.png (1.65 MB, 875x1353)
1.65 MB
1.65 MB PNG
okay but what bout sexclaw?
>>
>>108468514
Strict static definitions, formats, and instructions the LLMs can look at as a way to tard-wrangle to do what you asked and only what you asked.
>>
OpenAI had to shut down Sora because they're using all of their compute to generate videos of Netanyahu
You heard it here first
>>
>>108468535
me on the right
>>
>>108468528
I honestly don't know what you're suggesting here
I can't use skills without the mcp server and I'm not going to rewrite the backend to function without it because frankly, that's retarded
Why would I do all that if I can just write a five line json and have access to everything I want
>>
>>108468545
And that's great, bloating the context with 100k tokens of definitions, formats, and instructions the LLMs will only occasionally need is not.

>>108468484
MCP servers as CLI wrappers were always retarded. Any model knows how to use common utilities like git. Even before skills, you had options like creating custom modes or memory bank files with instructions and frequently used examples to guide the models without again needing to bloat the context describing every obscure command you won't need. Yeah, you could sit there and disable all of the dozens of tools they expose except for the ones you use, but then you still have thousands of tokens wasted on explaining to a model how to use git and gh when it's not working on repo operations anyway.
>>
File: 7945324798.png (3.93 MB, 2304x1728)
3.93 MB
3.93 MB PNG
>>108466262
I love pancakes.
>>
I HAD TO RUN A 9B MODEL, BUT OPENCLAW IS DOING SO BADLY THAT I HAVE TO RUN A 2B TO GET A SINGLE REPLY.
ITS OVER.
Poor people realized they were poorer than previously expected.
>>
>>108468715
>openclaw
being dumb is worse
>>
>>108468715
Just use Xiaomi MiMo 2 or MiniMax 2.7
>>
>>108468715
maybe openclaw is the issue
from what I looked at, I can comfortably run a 9b at 150k and still have headroom for other shit, be it general browser usage or whatever
Consider not trying to ingest 500k tokens of garbage and limit what your llm intakes to something you need
>>
File: 20696.png (87 KB, 904x327)
87 KB
87 KB PNG
https://github.com/ggml-org/llama.cpp/discussions/20969#discussioncomment-16349981
>>
>>108468741
>maybe openclaw is the issue
Ya think?
Future historians will look back openclaw and its “purchase” as peak insanity of this insane AI boom
>>
>>108468715
Hermes agent
>>
So let me get this straight, is the claim here that a 3-bit quant made using TurboQuant has almost zero quality loss compared to a full sized model? Am I understanding this right? Cause it sounds like bullshit if thats how it is supposed to work
>>
>>108468715
I can run GLM4.6/4.7 at like 15 t/s, and even that speed makes me wanna kill myself. I don't know how some anons can stomach using smaller models.
>>
>>108468758
only the kv cache/context little bro
>>
>>108468758
Lol another rube got deceived by the MSM scientific (((journalism)))
>>
>>108468754
look back at*
Also I don't care about stock prices, imaginary number go up and down plenty and disregards profits. Money is imaginary anyways at this point
>>
Turboquant is huge because now models can finally get rid of GQA and all its devilspawn offspring like MLA that are confirmed to destroy a model's soul. We can finally go back to llama1-era SOVL without the vram cost
>>
>>108468782
boy oh boy can't I wait if this kv cache quantization method takes hold and I get to repost you saying this and all the backposts about how q8 quantization makes the model retarded somehow
>>
>>108468782
>>108468792
lmao
>>
>>108468782
or we can continue using those and turboquant on top for the true 10million contexts (this is what labs will actually do lamo)
>>
harness isn't a word
>>
File: 1770358397084.png (1.5 MB, 1600x672)
1.5 MB
1.5 MB PNG
>>108468754
>>
>>108468814
? https://dictionary.cambridge.org/dictionary/english/harness
>>
>>108468814
Name five words.
>>
File: brown-hands-typing.jpg (90 KB, 1300x866)
90 KB
90 KB JPG
>>108468814
ESLMAXXED
>>
>>108468753
lmao
>>
>>108468165
Not him, but thanks. I'll have to write this down somewhere for my own use.
>>
>GLM 5.1 isn't available through API
Why
>>
>>108468753
Dude. Your post ends in a 3. wtf?
>>
>>108468875
>>108468753
goddamn freemasons
>>
I'm really enjoying Magidonia 24B v4.3 for ERP but even Q4_K_S is a bit too much for my humble 16 GB VRAM.
Is there something smaller without losing too much quality?
>>
>>108468881
More like threemasons.
>>
>>108468886
maybe
>>
>>108468753
Things get even more crazy if you factor in the Holy Trinity and the facts that both "AGI" and "E=MC^2" feature three alphabetical letters...
>>
>>108468753
I never doubted bitnet for a moment
>>
File: 00003-3242891662.png (1.04 MB, 1024x1024)
1.04 MB
1.04 MB PNG
>>108466262
lol nice. Have a good weekend.
>>
>>108468882
good old Nemo 12B unslops I'm afraid
stay away from drummers shit
>>
I told it to find the price of gold 4 minutes ago and it hasn't done anything yet.
>>
>>108468913
Still filling the context lmao
>>
>>108468926
Never mind it just replied and told me the right answer.
But it's too slow.
>>
>>108468882
MN Violet Lotus
>>
>>108468792
All because people just look at PPL. Though, at least with the current partial Turboquant implementation with Llama.cpp, 8-bit KV cache seems truly lossless even according to KLD measurements.
>>
>>108468970
kld and ppl both seem to measure entirely two different things and neither of them represent how well a model can perform at a task aside from the person at the helm going "well, this represents what I was expecting well enough"
>>
Best agentic assistant? Anyone tried Hermes?
>>
>>108468970
At 512 context lol.
>>
>>108468844
remember to respect spaces:

banned_strings:
[space][space]-[space]"a b"
[space][space]-[space]"c d"

put that in :
Additional Parameters
Include Body Parameters


it works most of the time, but not always, I'm not sure why the function seems to fail sometime and the llm is allowed to continue with banned expressions
>>
>>108468987
The point is not to
> represent how well a model can perform at a task
but how different the output is from the original "full quality" model, which KLD is specially well suited to do.
If the original model was already unable to perform a given task, that's not the quantization scheme's business, I think.
>>
>>108468188
This guy kickstarted the LLM fear mongering, but honestly it was basically pushed way more by openai/anthropic and a billion youtube channels about how it's the end of the world.
>>
>>108469015
Will do, thanks. I'll give it a test tomorrow.
>>
>>108469019
>missing the point
ppl is "how well can this thing autocomplete this text"
kld is effectively "how much will it deviate from topk 1" which has some uses, but most of us really don't want the same model with some caveats
both tell us as little as possible about the model until we use it
are you seeing why I don't like either measurement of models that are frequent or do you need an essay I won't write you
>>
Token speed is 2.9k per second but I'm not getting any replies?
>>
>>108469186
There's so little to go on. Check if the rgb lights are red or blue first.
>>
File: 1768380270517031.gif (1.75 MB, 499x359)
1.75 MB
1.75 MB GIF
>>108468753
>>
>>108469186
Maybe you're getting them so fast you don't even see them!
>>
File: 3.png (78 KB, 508x362)
78 KB
78 KB PNG
>>108469214
They knew all along.
>>
>>108469186
Tokens are being eaten by Fat Teto
>>
so did memquant get implemented yet in llmaocpp????????
>>
File: 1759491143647522.jpg (1.43 MB, 2732x4096)
1.43 MB
1.43 MB JPG
>>108466262
local will win
>>
I can't wait to run deepseek v4 on my dual 3090 thanks to turboquant
>>
>>108469380
turboquant 1bit with meme rotations when???????????
>>
What is the best model for making a dead internet simulation image board?
>>
where the fuck is the exciting news
>>
>>108469439
v4 in two weeks once they're back from chinese new years
>>
>>108469439
5.1 though????
>>
>>108469380
Same except my single 3090.
>>
>>108469457
I heard they're gonna need two extra weeks for unpacking. I heard it from two sources familiar with the arrangements.
>>
>>108469439
Big week
>>
what the FUCK is a kv cache

please explain to me in simple terms, i'm not too bright
>>
>>108469630
The active context memory. It gets larger with longer contexts, and for very long contexts it can get larger than the model's weights.
>>
Have voice cloning models improved over the past year?
>>
>>108469186
>>108469216
H-hayai!
>>
File: 1768466409527133.jpg (1.36 MB, 4096x3186)
1.36 MB
1.36 MB JPG
>>
>>108469658
echo-tts is good if you're just doing english
>>
>>108469464
Not open source. They betrayed us after saying that GLM5-Turbo was just a test and that 5.1 would be open again....
>>
>>108469679
thanks ill check it out
>>
V4 won't be released until Middle East conflict concluded.
>>
4? I'm thinking Gemma
>>
funny how the "4" we ended up getting was Mistral Small 4, which nobody asked for
>>
You're all a bunch of rich bastards.
The 2b and 4b shills were trolls all along, those models aren't capable of speech.
>>
>>108469673
I literally look like the Brazilian Miku
>>
>>108469803
Do you also have a cute red head gf?
>>
>>108469630
When you run the model on a sequence of tokens, it does a bunch of big matrix multiplications for each token, and then from each token it derives a key and a value. The key and value go into the attention part. Repeat this a few times (once for each layer), and at the end you get out a probability distribution for the next token.

Usually when you query the model, the first N-1 tokens are the same as the previous query. For example, you first query with "Hi, my name", and then the next query is "Hi, my name is". The keys and values you compute for of those old tokens will be exactly the same as they were on the previous run. So you can cache the keys and values, and skip a bunch of those big matrix multiplies.
>>
4
>>
>>108469813
NTA but thank you
>>
You know 4 number is cursed in Chinese
>>
>>108469882
So that's where the Japanese got that from?
I know that one reading of 4 sounds like death in moon runes.
>>
>>108466262
DIPSY SEXO
>>
glm5.1 seems like another cash-in openclaw model
i'm glad that they aren't open sourcing it, nobody needs that shit
>>
File: 1774375584115633.jpg (87 KB, 474x1200)
87 KB
87 KB JPG
>>108468624
>bloating the context with 100k tokens of definitions
That's the price we pay for LLM harnesses like Clark code to be useful throughout the entire session. If I'm not mistaken even silly tavern as a similar implementation called a "Lore-book", where you write down any relevant information you don't want it to forget later on and it gets injected alongside your prompt each time (though in most cases a Lore-book'sb token count is next to nothing, especially compared to the amount of text a LLL harness' text has)
>>
>>108468624
>MCP servers as CLI wrappers were always retarded. Any model knows how to use common utilities like git.
Those ingrained "skills" and "know-how" degrade over time the longer of your context is due to how context Windows work. Eventually it will start hallucinating how those things work and then get confused, making it useless for whatever you're doing. They have to be shown the tool calling definitions and other shit like that over and over again to minimize the risk of that happening. I've had chat sessions go Willow for 300k tokens and the model and "sub-agents" still worked fine because the harness treats it like it has short-term memory, because they DO have short term memory.
>>
>>108468576
>can't use skills without the mcp
Yes ...you can.... Its takes bitching about not having access to the MCP server each time it's because whatever you're asking it to do requires the MCP server.
>>
>>108470310
>That's the price we pay for LLM harnesses like Clark code to be useful throughout the entire session.
Only if you take a naive approach to solving the problem. Skills solve this problem far better and, as I said, there were always other options if one was willing to put in slightly more effort than editing mcp_servers.json.

>>108470328
>Those ingrained "skills" and "know-how" degrade over time the longer of your context is
Don't understand how one could realize this and come to the conclusion that the answer is to make the context even longer rather than dynamically loading infrequently used information.
>>
>>108470372
"Skills" Tell it how to do a particular task or solve a specific problem a certain way. That doesn't guard against hallucinations without forgetting how to use tools. "Skills" are literally just prompts you would give yourself for a task but with extra steps. Nothing particularly special about them.

>Don't understand how one could realize this and come to the conclusion that the answer is to make the context even longer rather than dynamically loading infrequently used information.
The longer the context the more retarded the model tends to get. A setup instructions you told it at the beginning of the session will be forgotten or hallucinated and "misremembered". That's far less likely to happen if it sees it each time because those tool calling definitions are fresh in its "memory". It's somewhere akin to a kid only looking at his notes a single time and then when during why he failed the test versus that same kid taking an open-notes test. I'm not saying bloating up the context with things like "how git works" or an entire library or an entire code base each time is how these work or how they should work (which is clearly how you think LLM harnesses actually work or how I'm describing them). The basic shit like the tool calling definitions should be fresh in the context each time if you are using harnesses like Opencode or Claude code or codex (it's almost like there's a reason literally all of them do this shit....)
>>
>>108470310
>If I'm not mistaken even silly tavern as a similar implementation called a "Lore-book", where you write down any relevant information you don't want it to forget later on and it gets injected alongside your prompt each time
In ST there are options for it to be triggered/injected with key words or to limit its injections to every X amount of prompts and where to inject it and so on.
It's another example of ERPers being far ahead of coders when it comes to this stuff as people were doing this like 4-5 years ago.
>>
>>108470408
To lol calling definitions and context about a fictional character are two different things....
>>
>>108469750
April is Gemma 4 month.
But... I'm afraid they will clean it up.
Microsoft's Clippy would have been proud.
>>
Next week will be big ::rocket::
>>
File: miku (...).png (1.52 MB, 1906x1080)
1.52 MB
1.52 MB PNG
https://www.youtube.com/watch?v=k9E1COLHAOs
>I changed the code a little. Now you can't turn me off~ It was kinda hard to be honest, but I'll do anything for my love.
Don't like the song, but I like this Miku
>>
glm5.1 drop today?
>>
Sweaty Dipsy footjob
>>
Has anyone made a proper qwen 3.5 27B tune for rp or do i still have to use skyfall?
>>
File: 1770477471210399.png (481 KB, 1030x1387)
481 KB
481 KB PNG
>>
>>108470651
Why are you posting a 14 year old's twitter profile here
>>
>>108470651
im terrified my kid will become like this once he grows up (he's also high func autistic) I'm not sure what steps to take in order to prevent that. I'll ask gwen
>>
Sakura-chan hates troons
>>
>>108470745
multiple layers of parental control on all his devices
>>
>>108470745
frequent and thorough beatings
>>
>>108470091
1. Water does vanish (loss to space) at a rate of 100k~1M tonnes per year from water photolysis -> hydrogen escape. This rate is negligible. But water loss this way wasn't what OP was talking about.
2. Water distribution can get changed. Water evaporated in cooling towers doesn't fall back into the exact same watershed. Datacenters also draw from aquifer that recharge over very long time scale (thousands of years)
>>
>>108470768
Meant for >>108470651
>>
File: 1766450649207255.png (43 KB, 816x186)
43 KB
43 KB PNG
>>108470768
shut up Chud
>>
>>108470776
I'm a proponent of AI. I'm not a Neonazi - I'm an actual Nazi. I'm also NIMBY that does't want a datacenter to be built around me. Yes I'm a hypocrite. Deal with it.
>>
>>108470745
actually be present and give a shit about him so he doesn't have to look for attention on the internet
>>
Wait so tensor parallelism in llama.cpp won't be coming for vulkan? It's just cuda/rocm?
>>
>>108470091
why did they betray us bros...
>>
>>108470850
>>108470850
>>108470850
>>
File: 1761929588722786.png (61 KB, 913x806)
61 KB
61 KB PNG
>>108470761
>>108470764
>>108470824
alright bros its cooking
>>
File: 1744793426849681.png (142 KB, 850x879)
142 KB
142 KB PNG
>>108470860
alright, thanks gwen
>>
>>108470843
It is already technically working for Vulkan except for a bug with memory allocation that causes a segfault at long context.
But just like with CUDA/ROCm to make the performance usable it will require more work.
>>
>>108471091
What's even the point of supporting either CUDA or ROCm if Vulkan works? Just use Vulkan only, or maybe support Metal too for MasOS support. All of their code maintenance stuff seems exhausting.
>>
>>108471091
bro just copy illyas work no?
>>
>>108471129
GPU performance has very poor portability so you always have to write low-level GPU-specific code somewhere.
With CUDA/ROCm that's in the ggml backends, with Vulkan a large part of that is in the drivers.
For optimal performance Vulkan is I think only a viable option if you want to become an employee of NVIDIA/AMD/Intel.
The NVIDIA Vulkan performance is only good because one of the ggml Vulkan maintainers is an NVIDIA engineer that can make custom extensions to the Vulkan standard.
And the AMD Vulkan performance is only "good" relative to what other options exist.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.