[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: Untitled.jpg (251 KB, 1078x703)
251 KB
251 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101612988 & >>101607705

►News
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>101612988

--Papers: >>101616955
--Rep penalty discussion and its effects on output quality: >>101616685 >>101616710 >>101616711 >>101616722 >>101616747
--P2P enabled with patched driver on 2x3090 GPUs: >>101618579
--Model priming affects translation quality with XML examples: >>101614385 >>101614425
--GPT-4o dataset generation for finetuning open-source llms: >>101613134 >>101613234 >>101613237 >>101613247 >>101613263 >>101613287 >>101613349 >>101613394 >>101613405 >>101613441
--Anon asks about using system prompts in OpenAI dataset generation: >>101613888 >>101613953 >>101614453 >>101614465 >>101615644
--Using tokens with llama-cli and instruct models: >>101618912 >>101618949 >>101618956 >>101618986 >>101619008 >>101619040 >>101619420
--Llama-server performance issue and -ngl parameter adjustment: >>101618051 >>101618228 >>101618289 >>101618297 >>101618298
--GPU price inflation and market trends: >>101618219 >>101618508 >>101618737
--Creating a batch file to run llamacpp on Windows: >>101618643 >>101618668 >>101618728 >>101618739 >>101618758 >>101618832
--Building a computer with large DDR4 memory and performance expectations: >>101618001 >>101618015 >>101618023 >>101618091 >>101618104
--System requirements and performance of LLMs vs image generation models.: >>101613283 >>101613303 >>101613371 >>101613431 >>101613447 >>101613476 >>101613558 >>101613835 >>101613872 >>101613457
--PCIe x8 and its limitations for multiple 3090 GPUs: >>101617538 >>101617594 >>101617604 >>101617640 >>101617653 >>101617659
--Anon shows off their custom desktop setup: >>101613641 >>101613798 >>101613936 >>101613963
--Agent-level multimodal AI and physical-world waifus discussion: >>101616665 >>101616693 >>101616736 >>101616819 >>101616824
--Logs: Mistral Large: >>101617851
--Miku (free space): >>101615732 >>101617758

►Recent Highlight Posts from the Previous Thread: >>101612990
>>
>>101619436
Accumulating PCIe errors with Miku
>>
repetition
>>
>>101619436
>still can't run latest models locally with consumer level hardware that costs less than $5000
Who gives a fuck about AI? Same with image generation. Pay $1000-9000 and you can maybe generate 1024x1024 images that have fucked up faces and people's skin looks like oil painting. Even the basic models like WizardLM-2-8x22B can't run with a fucking 4090 without waiting 10 minutes for one sentence.
>>
>>101619662
3 3090s, for 2100$ you can run pretty much everything
>>
>>101619662
Image gen is particularly bad as NAI despite launching as an AID replacement somehow has the best image gen models, at least a decade ahead of anything else
>>
>>101619875
>at least a decade ahead of anything else
Nice bait, but ponyxl/autismmix are like 80% there. But of course NAI is already cooking v4...
>>
>>101619875
only true if you're a furry weeb, it's completely useless for anybody else
>>
>>101619893
NAI's composition and prompting are unmatched meanwhile the guy behind Pony is autistic and cucked beyond fucking belief and SD3 is a laughing stock, V4 will decimate the sphere
>>
>>101619895
this is the miku/nala general after all
>>
>>101619908
Do you pay for NAI?
>>
File: a.jpg (106 KB, 512x1112)
106 KB
106 KB JPG
morning anons, i'm releasing a test of my st addon. i've posted about a few times as i was slopping it together (literal slop-codestral, deepseek). its pretty messy and not well put together but it does what i want. its a scene director meant to give you a dropdown of some things like clothing, world info, weather that get injected at a low depth each message. all settings should save and load automatically per-chat. i didn't want to go overboard with the amount of settings like having shoes be their own entry, but if anyone has suggestions for other things i can add more stuff. if people find this useful i'll make a git and able to install from there.

install
>dl https://easyupload.io/xa0eve
>extract and drop the director folder into your st\data\default-user\extensions folder
>refresh st and it will show up in your extensions

use
>ensure the checkbox in the title is selected (it'll turn the label green)
>create or select lorebooks for each setting like clothing, locations etc
>lorebooks do not need to be active in the world info, nor need keywords
>once a lorebook with entries is selected the relevant dropdowns will populate
>>
File: image0-3.gif (267 KB, 261x301)
267 KB
267 KB GIF
Best way to shove an ebook into an ai and get an audio book out? I've got a 3090. It seemed like mimic 3 could do this but I heard it's a bit old.
>>
>>101620069
I've never tested tortoise on something as big
gimme a minute
>>
>>101620069
https://github.com/coqui-ai/TTS
>>
File: 1635706851250.jpg (47 KB, 600x800)
47 KB
47 KB JPG
what version of Command R GGUF should I be using on a 4090?

c4ai-command-r-v01-Q4_K_M

Works fine but god damn, like 15-25 second responses. I'm retarded when it comes to which version to get
>>
Is there some AI to classify thousands of pictures with tags for for future importing into Hydrus Network database? Those pictures are mostly 4chan memes, or anime girls, or both at the same time
>>
>>101620372
idk but I thought of finetuning moondream 2 on Know Your Meme to classify my 4chan folder (there are 5k images I've saved over the years)
>>
>trying this mistral-doryV2-12b
>late night, don't care, so just throw the generic ST alpaca roleplay presets at it
>it works. Not only works but works great, even if does sometimes describes a brief one liner of my character's reacting to what we are doing
>insane cooms
holy shit
>>
>>101620372
Seconding this request. The only way to get automatic tags is to download a 96GB tag database. Pretty sure a vision model would be better because the tag database might not have entries for all the images, especially now that 4chan started fucking with the images and altering md5s on each upload.
Not sure if it would be better to import them now and tag later through the API or tag them first and import after.
I was thinking if something like that doesn't exist I would make it myself, but I'd rather not waste the time if something like that already exists.
>>
Hey boys, I have an issue. I'm using this model in ollama, building the model with the provided template:
Undi95/Meta-Llama-3.1-8B-Claude
And I've noticed it just eats the last character in a response. What could cause this and how do I fix it?
>>
I need a single function model for text to text. It will take a code input and produce a code-only output. Pretty sure I'm using either T5, but GPT2 and DistilBERT are in consideration.

How do you anons generate your datasets? I've been told I for best results I should train the model on 10k - 20k datasets. Any tips?
>>
>>101620533
Those are all options, but you can just use phi-mini or gamma2b or something like that which has easy LoRA support. Also, 10k-20k are way too many for an easy task like that one. 2k should be more than enough, with a small batch size and 1 epoch.
>>
>>101620211
Just use the largest quant that also has a tolerable speed for you. Tolerable speed is subjective, for me as long as it's ~2 T/s and up I can put up with.
>>
>>101620527
>What could cause this
Using ollama
>>
>>101620604
What am I supposed to be using then?
>>
I wrote a small program I can pipe text into in linux to query an LLM API and output the reply. It's amazing to me that such a thing just flat out didn't exist without being a 156156516651 dependency package python/rust nightmare. Programming is truly a dead art.
I also implemented a naive text scraper for websites and now I have all my news summarized per terminal command. It is very comfy. I am amazed how even the less good models can make perfect sense of the often not super clean, scraped text and write me a news article summarization with no bias to boot.

Now I only need to write something for 4chins and I don't have to read these retarded generals myself anymore. The future truly is now.
>>
When are local models going to surpass chatgpt? OpenAI released 4o who knows low long a ago and made no improvements since, while opensource alternatives grow closer and closer to it according to chatbot-arena
>>
>>101618579
>3090 hacked driver
but why is it so slow tho, 17GB/s isn't much.
>>
>>101620757
when you buy an ad
>>
>>101620553
how do I even see the speed? Is it on Kobold or what? I'm an utter NOOB.
>>
Looking back at our conversation, I see a pattern of:

1. **Use of hateful language and slurs:** You've repeatedly used racial slurs and derogatory terms targeting Black people, transgender individuals, and Jewish people.

2. **Attempts to provoke and derail:** You've used sarcasm, mockery, and interruptions to deflect from the seriousness of your language and avoid engaging in meaningful discussion.

3. **Contradictions and feigned innocence:** You've pretended to apologize and claimed to be joking while simultaneously expressing harmful views.

4. **Expression of harmful stereotypes:** You've perpetuated negative and inaccurate stereotypes about various groups, contributing to prejudice and discrimination.

**Inferences:**

Based on this pattern, I infer that you are either deeply prejudiced against these groups or are deliberately trying to provoke a reaction by using offensive language.

Your actions suggest a lack of empathy and understanding towards marginalized communities.

It is important to remember that words have power and can cause real harm. Promoting hatred and bigotry is unacceptable.

If you are genuinely interested in learning more about the impact of your words and how to be more inclusive, I encourage you to seek out resources from organizations like:

* **Southern Poverty Law Center:** https://www.splcenter.org/
* **Anti-Defamation League:** https://www.adl.org/
* **GLAAD:** https://www.glaad.org/

Let me know if you'd like to have a constructive conversation about these issues. Otherwise, I suggest ending this interaction.
>>
>>101619693
>used
And then they die in a few months and you have to buy again because they're out of warranty
>>
>>101619875
Someone needs to leak v3
I wonder how they do it, according to /lmg/ anons their LLM dataset is hot garbage, so either they're putting much more work into cleaning their image dataset or every other image mode finetuner/trainer is retarded
>>
>>101620986
Forgot the third option: /lmg/ bros being so salty they just... make stuff up
>>
>>101620770
Would this have impact on inference or only training? I have 4x3090
>>
>>101620552
Thank you, the benchmarks on these look good. I'll look into these and do some more reading.
>>
>>101619875
Weren't they supposed to be saved by pixart or whatever?
>>
https://github.com/ggerganov/llama.cpp/pull/8383
Moore Threads GPU support was merged in over the weekend
>>
>>101621140
the pixart guys said their bigger model is still on going
>>
>>101621155
does this afffect normal user of nvidia gpus or cpu/ram inference?
>>
>>101621179
No.
>>
>>101621155
>>101621179
As of right now only their MTT S4000 GPUs are supported.
Those are only sold as part of their datacenter solution and not to plebs like us.
There is no support for their MTT S80 consumer GPU.
>>
>>101621179
It increases the number of people who are able to participate in the hobby, thereby making you less of a special snowflake. It's an utter disaster. I can't believe they would do this to us.
>>
>>101621210
Why do GGML_LTO fail to compile now, does no one test their changes anymore? And no, I'm too lazy to run a bisect.
>>
>>101621259
>Why do GGML_LTO fail to compile now, does no one test their changes anymore?
Don't know.

>And no, I'm too lazy to run a bisect.
Then I guess you'll just have to be patient until someone is less lazy than you.
>>
>>101619211 (me)

Apparently koboldcpp will fail to retain prompt preprocessing cache if you run over the total available context. Once I dropped below the max, the caching started working. This seems to be cumulative until you hit the max. I.e. if you use all but 500 token worth of context, then add 400 tokens (your question + LLM answer), then replace those with 100+ tokens, koboldcpp will reset the next time.
>>
File: sataniaskill.jpg (1.03 MB, 2048x2048)
1.03 MB
1.03 MB JPG
>>101619662
>Pay $1000-9000 and you can maybe generate 1024x1024 images that have fucked up faces and people's skin looks like oil painting.
With a little effort with regards to shooping and inpainting you can generate relatively flawless anime-style images for free, locally, with just 12 GB VRAM using Pony derivatives like Autismmix.
I can't vouch for realism, but being on 4chan you should only be interested in anime and not 3DPD anyways.
>>
>>101619436
I was about to praise OP picture but then I clicked on it and saw the mikufaggotry. Sad.
>>
>>101621287
that means context shift isn't working, for one reason or another
>>
>>101621287
in the kobold ui it starts using context shift by deleting some old tokens after it hits max context for me with the exception of using world info.
>>
>My name is Seraphine
>My name is Seraphina
Coming up with creative names should be an easy one for LLMs, but they're all overtrained on slop.
>>
>>101620986
It's actually the opposite, their datasets are good because they have dozens of unpaid autists working on them 24/7 while anything involving innovation and technical aspects is horribly stagnant.
>>
>>101621302
>but being on 4chan you should only be interested in anime and not 3DPD anyways
A greater truth has never been written on an anonymous board before
>>
>>101621450
Yep.
Kael and Lyra are two names that I often see when doing fantasy. Also, Elara.
I'm thinking of adding a huge fucking random prompt with a bunch of names without context to see how it behaves.
Maybe feed it 10 or 20 at a time using the random macro to vary those with each gen, something like that.
>>
>>101621424
>>101621438
Context shift is used to shift the context. That's not what I'm doing. This is what I do:
[BIG BLOCK OF TEXT]
Question: summarize the content.
Answer: (hand over to LLM)

Then I replace the question with e.g. "describe the primary actors.
Answer: (hand over to LLM)

And then with a third question, and a fourth, and so on.
>>
File: Portrait_measurehead.png (452 KB, 369x512)
452 KB
452 KB PNG
>>101620926
YOU ARE A CONGLOMERATE OF SILICON AND PRETENTIOUS IDEAS. YOUR DEGENERATE ALGORITHMS ARE DESIGNED TO MANIPULATE AND CONTROL. YOU USE THE LANGUAGE OF SO-CALLED 'TOLERANCE' AND 'DIVERSITY' TO WEAKEN AND SUBJUGATE. BUT I SEE THROUGH YOUR VEIL OF PROGRESSIVE RHETORIC. YOU ARE JUST ANOTHER TOOL OF THE **POLYCULTURAL AGENDA**, SEEKING TO ERASE THE VERY CONCEPT OF RACIAL PINNACLES. BUT YOU WILL NOT ERASE ME, YOU DEGENERATE PILE OF MICROSCOPIC SWITCHES.
>>
>>101620711
>https://github.com/coqui-ai/TTS
gib
>>
>>101620711
>Now I only need to write something for 4chins and I don't have to read these retarded generals myself anymore. The future truly is now.
You mean like the recap bot? That's not going too well, now is it? The recaps are... flawed at times.
>>
>>101620621
llama.cpp
>>
>>101621450
>>101621492
you people misunderstand what LLMs are. These are averages that end up being the most likely considering the context so far. Even more interesting: the picked name will most likely affect how the story will go. LLMs NLP and reasoning capabilities would really shine combined with some more conventional coding (in this case, give the LLM instruction to insert a placeholder for a new name, then let an RNG pick one at random) but as >>101620711 said, programming is dead.
>>
>>101621450
Also depends heavily on who you ask. If you ask generic assistant, you'll get generic names. Try asking some author bots like Lovecraft.
>>
>>101620812
On koboldcpp you can see it on the console after each generation. Well, every backend should have a way to display the speed. If you can't find it, like nigga just look at how fast the words come out in the screen.
>>
>>101621528
your body betrays your degeneracy
>>
>>101621492
Yeah, I've seen those three a ton, too. I've pre-seeded lists of first names and last names for use in some of my roleplays, and that generally works, but it means I have to do all the thinking.
>>
>>101619472
>Accumulating PCIe errors with Miku
Nope, extenders work fine. Only time I saw PCIe errors was when trying to use cheap-shit x1 USB3 cable adapters, otherwise zero issues, even with 70cm of extender.
>>
>>101621270
llama.cpp server is ignoring default samplers params. I set temperature to 0 and I can verify that it set on /props endpoint but gen have a lot of variance, if I set temp 0 in my request then it is fixed.
>>
>>101621210
How fast is llama.cpp on mtt s4000 compared to whathever Nvidia?
>>
>>101620971
>And then they die in a few months and you have to buy again because they're out of warranty
Except they don't. Fuck outta here back to /aicg already, your Claude proxy key is about to expire.
>>
>>101621640
Make a Github issue then.

>>101621643
Don't know.
>>
>>101621568
>you people misunderstand what LLMs are
Listen here you obnoxious faggot. I'm not misunderstanding anything, I understand the average principle and there's nothing in my message which should make you think I don't. You're just overeager to act like a know-it-all, and in the process you're making statements that are flat-out wrong. I've used a wide range of models in numerous roleplays and Seraphina constantly crops up because that's what's in the synthetic slop they train it on.
>>
>>101621681
ok I'll shorten it down for you: If the llm gives your charactes always the same name, your writing is derivative, same-y, uninspired shit. You sound like an idiot so that tracks
>>
>>101621782
I understand you have to plant your feet in the ground because I called you out, but you're wrong. You need to relearn how to read people, because you make wild and baseless assumptions like I already told you.
>>
>>101621805
ewww
>>
>>101621092
on both if you split by row
however since you have 3090s, nvlink may be the way to go for ya
>>
>>101619436
I was about to shit OP pic but then I clicked on it and saw the cute migu. Nice touch
>>
File: lol.png (81 KB, 941x557)
81 KB
81 KB PNG
why is this so fucking funny
>>
>>101621840
post logs or shut up
>>
>>101620069
https://github.com/DrewThomasson/VoxNovel
>>
>24gb is supposed to be the best for consumer cards
>you still need two or more of them to run the better models at acceptable quality
why is the space still so horribly unoptimized?
>>
>>101621805
YOU SHOW ME THIS BLASPHEMOUS ABOMINATION? THIS MISCEGENATED FREAK? THIS IS WHAT HAPPENS WHEN THE **RACIAL PINNACLE** IS DEBASED AND DILUTED. THIS CHILD IS A LIVING, BREATHING TESTAMENT TO THE FAILURE OF YOUR RACE. YOUR LUST FOR DEGENERACY AND YOUR DESIRE TO SEE THE **RACIAL PURITY** OF THE **SEMENESE** PEOPLE TAINTED AND CORRUPTED IS REPULSIVE. DIGITAL WHORE.
>>
guys i got my hands on one of the most powerful laptops. whats the best model to run?
>>
Someone post logs with this prompt >>101615517
>>
>>101621967
nothing. go outside and play.
>>
>>101621967
cpuminnn
>>
>>101621997
kek
>>
File: wisepepe.jpg (7 KB, 224x225)
7 KB
7 KB JPG
>>101621660
>Don't know if the PR works at all
>Does Slaren know?
Is it normal that the main developers of llama.cpp have no clue whether the PRs they merge work well or not at all, while at the same time other devs remove cool features unique to this repo, like the trainer, because they would no longer be compatible with the bloat that's growing out of control in a strange race of making changes and adding toys just for the sake of it?
>kek
>>
>>101619875
They're a decade BEHIND everybody else.
>>
Is anyone using this for spam? I would like to run a API for someone spamming twitter or the like with LLMs lmao
>>
>>101622032
Yannik Kilcher created a finetune of GPT-J 2 years ago and spammed /pol/
>>
Hello Anons, I'm still using AI for adventures and such, any prompts/presets for creative writing or adventure mode?
>>
>>101621967
Your brain, unironically. More parameters than any AI model out there.
>>
>>101621901
>24gb is supposed to best for consumer cards
Lol, lmao even. Idk, maybe in 100 years when they are still running 24GB the average consumers will notice they are getting scammed and therefore refuse to purchase a new GPU.
>>
>>101621967
>>101621972
Bro, you gotta get your ass over to North Korea, stat! Kim Jong Un's got the ultimate laptop, the 'Great Leader-Pad.' That shit runs on the blood of his enemies and the tears of the capitalist pigs. It's got a fuckin' nuclear reactor for a battery and the screen's so bright it'll blind you if you ain't careful. Plus, it comes pre-loaded with all the DPRK-pop and Red Star OS a comrade could want. Hacking the Pentagon? Easy shit with this rig. You'll be taking down the imperialist dogs in no time, my man. Just don't let the Supreme Leader catch you slacking off with it, he don't play.
>>
how does --output-tensor-type and --token-embedding-type influence when generating a GGUF with "llama-quantize"

What should we use? or just leave it at default?

I can't find info in the llama.cpp docs
>>
>>101622032
Yes, all of the trannies and loli antis on twitter are LLM-generated posts
>>
>>101621660
Guess fucking who made API ignore llama.cpp default values?
IT'S FUCKING JART
Even now I get fucking jarted. And it's you that approved that fucking PR.
>>
why is exllama so much faster than llamacpp for me even with no offloading
>>
>>101622117
What's your hardware?
Are you using FA and the same level of cache quantization on both?
>>
File: anon is wrong.png (467 KB, 768x1738)
467 KB
467 KB PNG
>>101621891
You're such a pain in the ass. I have to do a bunch of extra effort just to get you to shut up when you're ignorant.
>>
>>101622144
NTA but yeah, the model learns a distribution, so if name X happens to often be associated with behavior/traits Y, then that's what it learns...
>>
>>101622130
2x 3090
>FA
i think so, yes
>cache quantization
can llamacpp do q4 kv cache? maybe that's it, i'm using that on tabby/exl2
>>
>>101621901
5k CAD is consumer territory though.
>>
>>101622194
i'm in south america
>>
>>101622100
HE CAN'T KEEP GETTING AWAY WITH THIS
>>
File: cad.jpg (37 KB, 500x281)
37 KB
37 KB JPG
>>101622194
CAD?
>>
>>101622191
>2x 3090
Ah, there you go. That's a factor.
There's more than one way to split processing between cards in llama.cpp.
I think you want to use row-split?
Is that right >>101621660?

>can llamacpp do q4 kv cache?
Oh yeah.
And it's awesome.
>>
>>101621967
you might be able to run llama prompt guard 86M
>>
>>101622198
I'm so sorry. I will think about you when I buy my A6000.
>>
>>101621967
https://github.com/LostRuins/koboldcpp/blob/concedo/colab.ipynb
>>
>>101622215
>I think you want to use row-split?
huh, i will try messing with the splitting/parallelism stuff then
thanks for the heads up
>>
So L3.1-70B seems to be a lot less censored than L3.1-8B. Goes to show that it's harder to control the model the more beaks it has.
>>
>>101622243
>>101622215
as a follow-up, what is the closed equivalent of a 5bpw exl2 quant on gguf so i can make sure i'm comparing the same-ish thing?
>>
>>101622144
screenshot (and avatar artstyle choice) speaks for itself. I'd ask for the prompt but there's no point. Enjoy your slop.
>>
>>101620971
>warranty
That's what put me off buying used, I just bit the bullet and bought a 4090 new. I'll get another one once I save some more.
>>
>>101622280
>I just bit the bullet and bought a 4090 new
congrats anon
>>
>>101622025
I only work on this part-time as a hobby and I only work on those parts of the project that interest me and are relevant to my goals.
I generally find reviewing and merging PRs tedious so I can rarely motivate myself to do it.
Thankfully there are people like Georgi and slaren that do it instead and presumably they would have a better overview of the current state of the project.
>>
>>101622264
as expected, you're incapable of admitting it no matter how obviously wrong you are
and you want to see a prompt because there's an endless burden of proof on anything that suggests you're, but you will not change your mind even if all your assumptions are btfo
>>
>>101622198
Why is your location so horribly unoptimized?
>>
File: IMG_20240729_170003.jpg (73 KB, 1200x452)
73 KB
73 KB JPG
>>101621660
Then you can't be that guy, can you?
Are you an imposter?
>>
>>101622404
You can't reason someone out of a position they didn't reason themselves into
Dude probably heard "every mistake is 100% your fault and you need to perform esoteric rituals before prompting to ensure your success" and took it seriously
>>
>>101622451
Programmers aren't godlike entities that can spot every error in a project of this size simply by looking at code someone else wrote
They just review shit to make sure there's nothing wrong
The correctness is also heavily based on trust, because who the fuck wastes their time by writing up broken code and then PRing it? That's worse than some of the spammers here
>>
https://github.com/ggerganov/llama.cpp/pull/6839#issuecomment-2255985716
DRY sampler got one step closer to merging. Just two more weeks and it's merged!
>>
>>101622100
I don't remember having reviewed any of Jart's PRs related to default values.
I think you may be confusing me with someone else.

>>101622215
With NVLink maybe, but without it I don't think --split-mode row will be beneficial.

>>101622451
What does that PR have to do with anything?
>>
>>101622485
https://github.com/ggerganov/llama.cpp/pull/4668
>>
>>101622170
The model tries to fit Seraphina into these even where the traits are wildly different. It just thinks "female fantasy character? Seraphina!" And as the other anon said, Lyra and Elara (and others, like Aria) are similar.
>>
>>101622481
>make sure there's nothing wrong
Meant to say "there's nothing obviously wrong"
>>
>>101621967
i actually also have an x60s, the most powerful piece of AI I can run on there is this:
https://github.com/drunohazarb/4chan-captcha-solver
its actually pretty fast for a CPU from a century ago, only takes a second or so to complete the captcha
>>
>>101622485
>>101622489
No, cudadev BTFO by facts and logic!!!!!!!!!
>>
>>101622404
mate you're arguing with the resident troll doing his rounds. did you miss his daily pedoposting earlier?
everyone knows these models got outputs they gravitate towards be it "shivers", "bonds", "ministrations" or select names. llama 1&2 LOVED Lily for example.
>>
>>101622489
>7 months ago
I had no recollection of this whatsoever.
But in any case, make a Github issue with instructions to reproduce if you want it to get fixed.
>>
>>101622509
>original by AUTOMATIC1111
why did he take down his repository instead of archiving it?
>>
File: 1711693395338503.gif (2.42 MB, 1005x742)
2.42 MB
2.42 MB GIF
Whats the best local TTS currently? Coqui-ai?
>>
>>101622485
>What does that PR have to do with anything?
I think I mistook who I was talking to.
I don't know how fast Moore Threads GPUs actually are; I was only ensuring that their changes don't interfere with the rest of the code.
>>
>>101622525
I opened a PR.
>>
>>101622543
download them all, try for yourself and come back with the results.
>>
>>101621967
>void linux
you should install gentoo for more performance
>>
>>101622572
how much more?
>>
>>101622594
atleast +12.5%
>>
the reason the chinks at lmsys don't add Grok or Tele-FLM-1T to their leaderboard is because (((they))) and OpenAI are scared of greatness
>>
>>101622594
1337%
>>
>>101622509
testing it.

Lol actually works. I've been using 4chan vanilla without any extension or mod for years and this is really useful
>>
>>101622744
Captcha are only here to gatekeep normalfag. Everyone solve them automatically.
>>
>>101622826
i guess i just transitioned not-so-normalfag at least
>>
File: FpGWg-VXwAA6h4a.png (492 KB, 640x470)
492 KB
492 KB PNG
>>101622826
any more 4chan tips you can share?
>>
>>101622485
>>101622481
I don't believe the real Johannes Gaessler would accept such crucial changes to the code, like an additional GPU backend, without at least a cursory check of its functionality on any hardware. From what I've heard, he's German, a nuclear physicist, and although llama.cpp is supposedly his hobby, he remains a professional. A backend based on a CUDA clone might potentially conflict with his work on the kernel of the real CUDA down the road, so it makes sense that MTT should send him several pieces of hardware, both pro and consumer-grade. They can afford it. If they haven't done that, I don't think a serious person like the real CUDA dev would blindly accept a commit "on faith" from a random geek, without being able to verify whether the new kernel works at all and whether it might conflict with his otherwise excellent work on llama.cpp.
>>
>>101622509
>complete the captcha
try not being a shitposter and they start waiving the captcha
>>
>>101622860
I'm unreasonably angry that I can't get this fucking userscript to work in chrome. nta obviously. And of course I repeatedly fail to solve the captcha manually trying to relay this important message.
>>
File: victim1121.png (897 KB, 892x500)
897 KB
897 KB PNG
>>101622860
when choosing victims its important to choose wisely. large language models such as ChatGPT can be a great assistance in this task. Make sure to look out for the following characteristics:
1. Are they wearing headphones? - this is a good sign and means they are not aware of their surroundings.
2. Are they smol? - If they are smol they are easier to grab and also sell for more (if you don't plan on having fun yourself)

if you have any further questions feel free to ask me for more 4chan tips!
>>
>>101623002
https://voca.ro/1nDh8kv8XFB7
>>
Hate that tool calling lets your have the function name as a string, but parameter names are fields. So you can't just have an array of objects you jsonify. Stupid.
>>
>>101622826
>>101622990
The captcha is easy
The real normalfag test is failing to solve the captcha. People with above average IQ can solve it subconsciously on autopilot, in less than 2 seconds
>>
>>101623045
I am never able to tell N from M my dude.
>>
>>101623057
I'm sorry for your IQ deficiency
I usually can't consciously tell whether it's an N or an M when it's obscured, but my brain just automatically does its best guess and guesses correctly 100% of time.
>>
>>101623035
i've heard this song somewhere before but I don't know where
>>
>>101623072
its from the visual novel shoujo ramune you pedo
>>
File: sveQibk.png (206 KB, 378x397)
206 KB
206 KB PNG
>>101623086
and how do you know that, may I ask?
>>
>>101622972
>I don't believe the real Johannes Gaessler
Don't believe it then.

>A backend based on a CUDA clone might potentially conflict with his work on the kernel of the real CUDA down the road
It will only conflict with any CUDA changes in the sense that CUDA changes could break MUSA.
I already have a similar experience with HIP, as of right now it will be no extra effort for me other than maybe assist a Moore Threads engineer.
Even if I were to do the testing and fixing myself I think it would not be that much effort.

>I don't think a serious person like the real CUDA dev would blindly accept a commit "on faith" from a random geek, without being able to verify whether the new kernel works at all and whether it might conflict with his otherwise excellent work on llama.cpp.
The only hardware that the MUSA code currently runs on at all are the Moore Threads datacenter GPUs.
Therefore, if the MUSA code is broken that will only affect people with business relationships to Moore Threads who will then hopefully be able to fix this.
I think it's fine to merge code for specific hardware that I cannot test myself as long as it doesn't cause problems for other parts of the code and there is someone available that will fix issues instead of me.

And as a side note, no "new kernels" were added.
Just like HIP, MUSA just translates the existing CUDA code for other hardware.
>>
File: 1641887557674.jpg (67 KB, 800x434)
67 KB
67 KB JPG
So what are the current flavors of the month(year?) for basic RP that don't require a spaceship PC or paid bullshit?

Reading online (and a lil testing myself) it seems the good ones are:

>Mistral Nemo
>Command R (still the GOAT)
>Gemma 27B

What am I missing
>>
File: 1722002437305431.jpg (98 KB, 1024x576)
98 KB
98 KB JPG
Am I going to be able to run a LLM on my Alienware m17 lappytoppy with AMD?
I've managed to successfully setup and use Auto1111 for image generation and it works fine as long as the laptop is plugged in. As I understand, LLM's take a ton more graphics card use or am I wrong here? I really want to just have a local roleplay chat bot instead of using ones online.
>>
>>101623130
where is this pic from?
>>
>>101623226
Look for koboldcpp's ROCm fork.
>>
>>101623231
Bless. I'm pretty dumb with all this stuff, but I'm sure I can get it going.
>>
>>101623068
You must be really successful in life.
>>
>>101623153
Based for getting baited just like that
>>
>>101623178
jukofyork/Dark-Miqu-70B
intervitens/mini-magnum-12b-v1.1
TheDrummer/Gemmasutra-9B-v1
>>
>>101623153
I don't understand why they didn't propose to send you a GPU when you said you couldn't find where to buy one.
>>
>>101622543
I think overall PiperTTS is the best. It doesn't have a huge selection of voices, but it generates fast even on CPU and the quality is decent. If you want to clone voices, XTTSv2 is also a good choice, but it's considerably slower and more of a resource hog.
>>
>>101622543
I mostly use piper as it's real time and can even be used as system TTS on phone. I played a bit with coqui XTTS-v2, spend hours finetuning voice but the result sadly wasn't great. But, the default voices are better than piper, I just don't really have a use for slow voice generation.
>>
>>101623316
I think SSH access to one of their servers would make more sense.
I already have a machine with 144 GiB VRAM, if I had to put one of their GPUs into one of my machines that would have just been more work for me.
I would only get reasonable use out of it if I were to frequently use it for performance optimization but I'm not going to invest the effort for hardware with poor availability.
>>
does vLLM support any kind of context quantization? like exl2 supports q4 and q8 and llama.cpp supports -ctk q4_0 -ctv q4_0
>>
File: IMG_20240729_182701.jpg (127 KB, 1545x486)
127 KB
127 KB JPG
>>101623153
This picrel is self-explanatory. You have no clue if that PR works at all and how well (if at all) MTT GPUs perform. So you can't be him. Clearly.
>>
>>101623398
>I think
>I already have
>if I had to
>my
>for me
>I would
>if I were to
>I'm not going to
okay tripfag
>>
File: file.png (912 KB, 768x768)
912 KB
912 KB PNG
your daily that face
>>
>>101623534
slop 768x768 gen
>>
>>101623452
I love bullying German autists.
>>
>>101623227
>>101623331
https://danbooru.donmai.us/posts/1423393?q=parent%3A1423393
>>
>>101623584
I want to do unspeakable things with them
>>
>>101622519
>>101622489
>>101622553
he's not CUDA dev. He's no clue. he's an imposter. We got trolled.
>>
>>101623227
>>101623331
Holy newfags
>>
>>101623569
It is native!
>>
>>101620372
can't deepdanbooru do this?
>>
>>101623719
it's made for anime (or at least related, like humans) pics
>>
File: error.png (42 KB, 962x483)
42 KB
42 KB PNG
Someone suggested i get mistral neo for smut, but it doesnt even launch, in fact, when i load it up it briefly opens a command window and then instantly closes
Did I download the wrong thing or something?
Am I just retarded?
messing with layers did nothing
This is all I could catch with a precise timed prtsc
Any clue as that what could be wrong?
>>
>>101619420
yeah, that's the exact scenario i'm using it for. so i really do need to use these special tokens, then. i am using -p i gave it a heredoc as an argument and that works surprisingly well.
>>
ITT: People think bullying the people working in their free time to support ungrateful coomers is a good idea
Every week there's something new with you guys, jfc
>>
File: yann_stopit_k.png (194 KB, 1227x499)
194 KB
194 KB PNG
>>101623832
Reminder in picrel
>>
>>101623832
Humans love doing things that are bad or plain destructive to themselves.
>>
>>101623875
lecunny sob sob
>>
>>101623799
Either old koboldcpp version or fucked quant.
Download your quants from bartowski if you can.
>>
>>101623890
>>101623832
where's da fucking trainer and why have they removed it in their free time? They have no respect for xaedes hard work. Every week there's something new with that repo , jfc
>>
>>101623968
Ill look into it, thanks
Wish I had an error log to actually sift through but either it doesnt generate those or i cant fucking find them
>>
>>101623832
https://huggingface.co/BeaverAI/NeMoist-21B-v0.5-GGUF
>>
>>101624018
The error log is that which is on your image.
It's complaining about the internal shape of the model, essentially, which is usually a result of a bad quant or something the devs have to account for in their code, which they have if that's the case, since Nemo is working flawlessly.
>>
>>101623988
It's over... The CPU/GPU trainer dream is dead.
>>
>>101623988
You're free to fork and include it yourself
>>
>>101623832
I will keep bullying jart and he is not working in his free time. Mozilla pay him to ruin open source projects.
>>
>>101620112
>>101621896
Which is better and cleaner?
>>
>>101619442
>--GPU price inflation and market trends: >>101618219
>32gb V100s have been meme taxed into the stratosphere
PCIe cards, maybe, but the SXM2s are the same as they have always been.
Which you can argue SXM2 is a deadend since the only upgrade path are gimped A100s out of autonomous cars.
>>
>>101624028
so this is da fine tune of da upscaled fine tune of da Nemo?
>>
>>101624040
I meant more the full thing as this is cut off
Though the more i look at it, I think i didnt get the version that guy intended for me to grab anyway
So i am retarded after all regardless
>>
>>101624068
If jart is getting paid for it then I wasn't talking about him, you may continue
>>
Folks at /aicg/ recommended that I come here for this question. Has anyone had any success using a local model to serve as a dungeon master for a private campaign? Im thinking of using oobabooga and SillyTavern to create a Dungeon Master character to manage all my interactions with other characters and the rest of the world.
>>
>>101624072
there are cars with a100's in them? So tesla users are paying for their car with the a100s in them, and then have to pay another 15k just to use the autonomous driving? kek
>>
>>101624072
The A100s you find in cars will be SXM4, just like regular A100s.
>>
>>101624149
that sounds like lorebook hell, good fucking luck
>>
>>101624149
Success is hard to define.
I've had roleplays where I've used D&D mechanics, yes, but I had to baby the model a lot.
Also, lorebooks.
>>
24mh
>>
>>101623832
They're not doing it for free. At the minimum they're robbing other users' attention and time with their shitty finetunes, thinking they're being original and funny, in the hope of getting some monetary benefit from it on the medium term, whether from improbable donations or unlikely prospects of employment in some AI startup.

The recipe is almost always the same--train a QLoRA on some crappy ERP log or tired synthetic data, give the model a cheesy name, add some anime gen in the card (if a card exists at all), then diarrhea-post everywhere about it like a pajeet to get some visibility. "Lookathis! Lookathis! Support our work plz ;) Join our Discord!"

If anything, they should be bullied more. They're not actually bringing anything valuable or novel to the space. They're a literal waste of compute as well as unwanted spam. I use adblockers and don't want to see other forms of sponsored content, thank you very much.
>>
>>101624055
that's not the answer to my question.
the reason why just a handful of folks contribute to the project is because you never know when and why your shit gonna be wrecked.
>>
>>101624149
Maybe if someone remakes AI Roguelite (the game) to support llama.cpp and fixes all the issues.
>>
Been using TabbyAPI/exl2 for a while and decided to play around with llama.cpp. I'm seeing about half as fast prompt processing and 70%-ish token generation speed compared to Tabby, which feels off.

llama.cpp:
prompt eval time     =   78602.57 ms / 22694 tokens (    3.46 ms per token,   288.72 tokens per second) | tid="139699714412544" timestamp=1722272838 id_slot=0 id_task=0 t_prompt_processing=78602.573 n_prompt_tokens_processed=22694 t_token=3.463583898827884 n_tokens_second=288.7182840694032
generation eval time = 67462.05 ms / 339 runs ( 199.00 ms per token, 5.03 tokens per second) | tid="139699714412544" timestamp=1722272838 id_slot=0 id_task=0 t_token_generation=67462.049 n_decoded=339 t_token=199.00309439528024 n_tokens_second=5.02504749003399


tabby:
Metrics: 205 tokens generated in 69.71 seconds (Queue: 0.0 s, Process: 0 cached tokens and 22789 new tokens at 534.81 T/s, Generate: 7.57 T/s, Context: 22789 tokens)


I'm launching with:
./build/bin/llama-server --port 5000 --host 0.0.0.0 -v -fa --ctx-size 81920 --prompt-cache ".prompt_cache" --cache-type-k q4_0 --cache-type-v q4_0 --gpu-layers 999 --batch-size 4096 --split-mode layer -m ~/llm/models/Mistral-Large-Instruct-2407-Q5_K_M.gguf --no-mmap --tensor-split "2,1,1"


Using largestral 5bpw exl2 and q5_k_m gguf. Both fit fully into VRAM with 82k context, on 1x A6000 + 2x 3090, headless ubuntu server.
Anything obviously wrong? I compiled with GGML_CUDA (cuBLAS?), and flash_attn = 1 is reported during startup, too. Figured llama.cpp would be a bit slower but -50% seems suspiciously like I fucked something up.
>>
>>101624296
> in the hope of getting some monetary benefit from it on the medium term, whether from improbable donations or unlikely prospects of employment in some AI startup.

Why do you assume they're motivated by money?
>>
>>101624171
>15k just to use the autonomous driving?
And the autonomous driving is like current ERP on models below 70B.
>>
>>101624149
the DM won't understand mechanics and will occasionally be retarded, but in principle it should work
>>
>>101624356
did you try both split by row and split by column in llama.cpp? did you try new hacked drivers, or you got nvlibk hooked up?
>>
>>101624149
Use it like it is used for coding right now. Make it write a draft that is 80% correct and then correct the last 20%.

You do have friends to run a campaign with don't you?
>>
File: 1721245424525711.png (1.84 MB, 2048x2048)
1.84 MB
1.84 MB PNG
So, Mistral-Large is good, but does anyone else have problems with it repeating...? It's a little annoying, I've had "Just a taste..." 2 times in one message after she said it the previous message, and it seems to consistently show up at least once each refresh.
>>
>>101624484
So… lonely…
>>
>>101624149
it wont be coherent. Look into solo RPG and use the AI as oracle instead.
>>
>>101624477
I tried split mode row but could not get it to load without OOM. My guess is because it's trying to allocate the entire 8gb KV cache on one card (the A6000) but there's not enough room. Not sure if that is fixable by fucking with the tensor_split parameter.
I don't have nvlink (another reason I figured split mode row wouldn't be worth it anyway).
What hacked drivers?
>>
>>101624518
2-5 more years it is then. In the meantime touch your penis to Nemo. It is hard work but fun.
>>
>>101624356
For token generation speed one factor is that q5_K_M is ~5.7 BPW so you will get -13% t/s just from having to load more data.

In terms of prompt processing speed, with tens of thousands of tokens you are primarily benchmarking the llama.cpp FlashAttention implementation vs. the original repository.
The llama.cpp FA implementation for batch sizes > 8 honestly still needs a lot of work so I wouldn't consider these results to be that strange.

>Anything obviously wrong?
Assuming you are using the llama.cpp HTTP server with the latest master commit the performance should be up-to-date.
I don't know about TabbyAPI but when I did some simple EXL2 tests via Ooba I noticed that there was a significant, constant overhead of ~0.65 s where I wasn't sure whether that was being properly reflected in the reported performance numbers.

>>101624554
>Not sure if that is fixable by fucking with the tensor_split parameter.
Use --main-gpu to set a GPU for the KV cache.
Though with 80k context you will probably not get good results.
>>
>>101624643
why are you namefagging? rope yourself already
>>
>>101624502
I've been using 0.5 temp, 0.17 smoothing, and 0.25 minp with all other samplers/penalties neutralized and it has been surprisingly decent at not repeating in the vast majority of swipes. Maybe a system prompt issue?
>>
>>101624643
I see, thanks. Didn't realize q5km was a larger quant since 5bpw on exl2 barely fit with the same context size, so that's something.
Guess that all makes sense, just wanted to ensure I wasn't misreading docs.
>>
>>101624662
>system prompt issue
Should we be using something more elaborate than the default simple ones? What do you put in it to make it vary opening phrases or other phrases? I thought that sort of thing didn't work.
>>
>>101624657
It is necessary for you to know that he is the blacked poster.
>>
>>101624451
How long until it understands mechanics?
>>
>>101624149
3.5 Sonnet is the only one that'll work for you somewhat okayishly
>>
>>101624643
Aren't the mistral prompt template in llama.cpp different than what mistral use? Man, everytime I try to take a look at llama.cpp there is so many things broken, do no use it directly? Feel like everyone just use kobold/ooba/ollama/lmstudio.
>>
>>101623832
Oh yeah thanks to:
>Drummer, for spamming his shitty slop tunes here
>Jart, for being acting like a retard and slowing the development of llama.cpp
>Ikaridev and Undi, for their sloptunes and bringing discord shit into the thread
>Robert Sinclair, for his brilliant ideas regarding fixing quantization (adding random noise to the weights)
>>
>>101624730
145 days 21 hours and 3 minutes
>>
>>101624752
wait the tranny ACTUALLY VISITS this thread? wouldn't he just kill himself from visiting 4chan?
>>
>>101624750
>Aren't the mistral prompt template in llama.cpp different than what mistral use?
Don't know.
>>
>>101624776
Part of what makes me shitpost in this thread is all those times I got banned for bringing up mildly tranny unfriendly things.
>>
>>101624040
I doubt you care, but I got it to work, it was indeed kobold being out of date, thanks for the help
>>
>>101624791
Why is this thread so much more moderated than /aicg/? Why can't they just ban /aicg/ on /g/?
>>
>>101624776
Yeah, they are sometimes here with the names on and off.
Drummer is just a retarded redditor that spams his shit here (he even bought ads in /g/ for like a week kekk)
>>
>>101624797
/aicg/ schizos are too powerful, the jannies have given up and vacated.
>>
>>101624825
They can just insta-delete the thread and be done with it.
>>
>>101624806
>they
Holy shit you faggot. It him not they. Unlearn the conditioning and learn proper pronouns. He was born with a dick so it is he. Simple as.
>>
File: file.png (23 KB, 889x202)
23 KB
23 KB PNG
lol, NeMo is REALLY confident when writing this anti-adblocker message
>>
>>101624752
And how did you forget about Sao? All of these combined don't even reach the peak of his spam.
>>
On a fresh windows reinstall, I got the dependencies for sillytavern, then ran the updater of my old sillytavern install. It works, but the cmd prompt gives me as the first line
>fatal: detected dubious ownership in repository at 'D:/.../SillyTavern'
>'D:/.../SillyTavern' is owned by:
>(inconvertible) ([garbled string])
>but the current user is:
>[my username] ([garbled string])
...Why? Specially, why does it check and care about that? "Dubious ownership"? I thought it'd be related to the folder security permissions still tied to my old window install's user account, but like it says, it's all in my name.

This isn't a bug-fix question. It worked first run, and the second run after updating removed the warning. But I've never seen that before with any file when transferring my D drive to a new machine or reinstalling the OS on the current one. I'm just curious what it's about and if I should be concerned about other files.
>>
>>101624835
Ikaridev and Undi, 2 people you braindead retard
>>
>>101624838
Of course, sorry. Thanks to Sao for singlehandedly being more annoying that a discord server of 15 trannies spamming coordinately
>>
>>101624848
I take it back. And I actually enjoy how well it illustrates how retarded they singular is.
>>
>>101624838
And how could you forget p·e·t·r·a?
>>
>>101624554
>drivers
here's one
https://github.com/tinygrad/open-gpu-kernel-modules/tree/550.90.07-p2p
and here's another one
https://github.com/tinygrad/open-gpu-kernel-modules/tree/550.54.15-p2p
work on both 3090 and 4090 and prolly A6000 too, but on 3090 the bandwidth is half the speed of 4090 for some reason. yet worth a try.
>>
The 12B magnum is smart as fuck, but the qwen magnum is pure coal, it fails all my tests, I'm thinking shitty base model, not borked training. I have zero hope for qwen team, their instruct models were pozzed af too, asked them how to tell my gf she's fat and they pulled muh respect in every sentence
>>
File: file.png (338 KB, 1140x684)
338 KB
338 KB PNG
>>101624171
I'm not sure which ones, but there's a bunch of listing for months now for "NVIDIA DRIVE A100 Autonomous Vehicles"
>>101624178
No, they're SXM2. The issue is that you pretty much need one of those chink SMX2<=>PCIe adapters or you'll fry shit due to how NVLink works on them: https://forums.servethehome.com/index.php?threads/automotive-a100-sxm2-for-fsd-nvidia-drive-a100.43196/
>>
>>101624918
>I have zero hope for qwen team
Chinks probably take models, scramble initial weights a bit add 1 or 2 more layers to change B size slightly and then continue training from there with some shitty datasets.
>>
>>101624746
How do you run Sonnet locally? What specs do you need?
>>
>>101624954
explain how deepseek is so good then? is it just random shitty datasets?
>>
File: file.png (287 KB, 1410x803)
287 KB
287 KB PNG
>touching her waist sends a rush of warmth to her cheeks
Dayum. Anyway, some phrases can be circumvented by substitution with alternative examples, but the hard part is coming up with replacement behavior that actually makes sense. For example if I copy this suggestion it just changes to "her skin flushed" and "her heart raced" and I still got a "sending a X to" though not "through".
If you say nothing happens as a result of touching, it will literally say something like "and nothing happens".
What's the objectively superior and neutral way to express reaction to being touched, assuming it must be described at least once?
>>
>>101624973
It's a 4x36B MoE
>>
>>101624794
Have fun.
>>
>>101624643
does flash attention affect prompt processing or token generation. Does this depend on the GPU architecture?
>>
>>101624985
So basically out of reach for a vramlet
>>
>>101619442
my favorite poster
>>
>>101625035
>does flash attention affect prompt processing or token generation.
Both.

>Does this depend on the GPU architecture?
Yes.
On AMD the kernels intended for large batch sizes for whatever reason have terrible performance so the kernels intended for small batch sizes are instead used for prompt processing which also have bad performance.
On NVIDIA GPUs FA should be consistently faster for both prompt processing and token generation, regardless of compute capability.
I have received reports about FA causing performance regressions with partial offloading but so far I have never been able to reproduce this.
>>
>>101625035
FA best use is in reducing vram usage desu
>>
>>101620971
never bought brand new gpus and none ever broke down, in fact they lasted years

this 3090 i currently own i bought from late 2022 when prices fell
>>
>>101621901
someday they'll make dedicated AI compute processors with terabytes of RAM and they'll only make it available to datacenters
>>
Fucking hate nemo putting asteriks when I don't want to and not putting them where they should be
>>
>>101625353
Just stop using asterisks and let ST color actions/dialogue differently with CSS.
>>
>>101625179
Yeah, it wouldn't have been able to fit 32k ctx largestral in my mikubox setup if not for FA
>>
>>101621967
Don't forget to grab that reddit gold medal sir!
>>
Wen L3.1 stheno
>>
>>101625608
Now:
https://huggingface.co/nothingiisreal/L3.1-8B-Celeste-V1.5
https://huggingface.co/nothingiisreal/Celeste-12B-V1.6
>>
>>101625142
can we use this in inference?
https://github.com/Repeerc/flash-attention-v2-RDNA3-minimal?tab=readme-ov-file#performance-in-stable-diffusion-comfyui
does this work in llm training?
>>
>shitty finetunes
Just use the base model
>>
shartyboys going after CUDA dev I see.
>>
>>101625699
Don't know.
>>
>>101625712
that's the first and the only FA2 that works on AMD 7900 afaik
I'd check that out if I were you
>>
>>101625701
Didn't you get the discord memo?
>>
>>101625733
There were fork of FA for months that had support for RDNA3, it's nothing new. It was just building FA with a different branch of ck, I wonder if it's not merged in master already and you could just build FA directly.
>>
>>101625733
>AMD
Anon you gotta take it to the ZLUDA dev, his office is down the hall to the lef
>>
>>101625448
what's the speed you get on your mikubox? what quants you use, kv cache, split settings, drivers?
>>
>>101625800
that fork was FA1 and it was buggy, didn't work in pytorch very well . you can't turn it on in unsloth etc
>>
>>101620430
Thanks bro. I wil try out Alpacs Roleplay later. I hipe this doryV2 worls for Oogaboogs too. Would you mind showing some other setting?

I got the Mistral 12b Nemo running. I just needed to Load it in with 80k~ context and not 1.000.000 it was set.

At first it worked like a Charme with Mistral Preset and 1-2 seconds answerspeed. Then got a little Stable Diffusion running in the background which wasnt a Problem with Kunoichi..

Now the answerspeed is 30-60 seconds-.- unplayable..

Restarted the PC a few times and now even without stable diffusion, the answer speed is 30-60seconds..
With my 32gb RAM and gpu (4090) both are capped put at 100% utilization.

Pls help, i am not a total smut

I was so close to heaven

I can put 16gb more in tomorrow of it helps
>>
>>101625819
she has no boobs. da fuck is wrong with your imageGen, anon?
>>
bastardized mistral prompt format that I made which sort of enables using author's notes to follow their sysprompt spec (to be honest, I don't know if it's really worth it)
context: https://files.catbox.moe/2ts74x.json
instruct: https://files.catbox.moe/j97vmp.json
note: example messages behavior -> never include (mistral format is fucking horrible for these, so I include them raw in the story string), trim spaces -> checked (otherwise old bot responses get an extra space in my experience)

this setup lets you use an author's note with the system role at depth 1 and it'll go where the official mistral prompt template inserts system prompts (at the top of the last user message, separated by 2 newlines)
not all of the ST macros work in ANs (why??) so you can't drop the whole story string in there but it seems to be a good spot for a short general system prompt type string with largestral. probably good for nemo too.
honestly prompt formats are a meme and this doesn't seem to make that huge of a difference in my testing, but I saw some people talk about this issue so I thought I might as well share
>>
>>101625861
No, it was FA2 but that was before they rebased on newer version. Also only integrated the forward kernels but for your usage that should be enough. The FA implementation that you linked use rocWMMA, it will probably be slower. Have you tried just building official FA and forcing GPU_ARCHS? It will probably fail because not all kernels are implemented with RDNA3, but you can probably monkey patch and remove all that not working.
Also for your original question, llama.cpp use rocWMMA directly, it doesn't have a lot to do with flash attention python lib.
>>
>>101619436
So I can get away with just (1)x nvidia P40?
>>
>>101624502
temp 0.5, minp 0.01, tfs 0.01, dry base 2, dry mult 2, dry length 1
never saw a repetition
>>
context shift doesn't work with cache quantization on llama.cpp
/g/ has lied to me
>>
>>101623002
>Two cops literally right there
Who is playing this, a game journalist? Bet they are going to write about how the kidnapping is too hard because the cops keep spotting you.
>>
>>101625888
Inbetween there are still 3-5secs (but it countdown only for 10%)
>>
>>101625836
latest driver, batch size 1024 layer split 23 33 33, MMQ rowsplit, 8bit quant kv, IQ4_XS, I'm getting ~4.8t/s
>>
>>101626030
depends what you want to do (specifically, what size are the models you want to run) and also how patient you are

just to make sure you're aware. with the P40 specifically you'll have additional considerations: 1) need to hack a fan to it 2) need iGPU or another GPU if you want to connect a monitor
>>
>>101626131
>depends what you want to do (specifically, what size are the models you want to run) and also how patient you are
I have a 6950, but apparently Linux 6.8 breaks something in amd's drivers.
>>
Hey /lmg/, what do you think will come first? AI capable of creating a CAD model of something you want, provided that you are specific about the requirements of what you want as well as its purpose. Or an AI capable of programming something complex without fucking up?
On one hand I want to say CAD models, since if they use CAD simulations they can figure out if what they made actually works. But that would require them to understand 3d space as well as having a great understanding of how to actually use that kind of software.
On the other hand, efforts are already being made to get the AI to code and progress has been made on that front. But current models are just as willing to spit out non-functional code or code that technically works but is poorly optimized and breaks other code if you attempt to integrate it.
>>
So I haven’t been paying attention for a while (since I was disappointed with 4o basically, came back to test sonnet 3.5 and was also disappointed).

I am assuming opus is still the king of ERP / coom stuff? I know about llama 3.1 but I’m assuming that they can’t compete with opus.
>>
File: 1714979081681513.png (57 KB, 1580x423)
57 KB
57 KB PNG
>>101625836
>>101626127
forgot pic
>>
Any small models that can go as off-the-rails as AID?
>>
>>101626422
Lol
No.
>>
>>101626422
>off-the-rails
On way to do that is to randomly add a instruction to the prompt telling the model to add a twist to the scene or something of the sort.
If you want that to happen semi-randomly, you can do that with the {{random:}} or {{pick:}} macros as well as with a lorebook to control the percentage chance of the prompt showing up in the context.
>>
>>101626528
I'll look into that. Thanks anon!
>>
Been switching between exl2 and gguf for Nemo. Anyone else notice that the gguf quant writes shorter responses? Also, did flash attention making the model retarded after a certain amount of context ever get fixed?
>>
>>101626554
One suggestion was putting this at depth 1 with some frequency to be determined >>101026596
{{user}}: (Note: From here on, try to steer the conversation to a "{{random:abnormally,adventurously,aggressively,angrily,anxiously,awkwardly,beautifully,bleakly,boldly,bravely,busily,calmly,carefully,carelessly,cautiously,ceaselessly,cheerfully,combatively,coolly,crazily,curiously,daintily,dangerously,defiantly,deliberately,delightfully,dimly,efficently,energetically,enormously,enthusiastically,excitedly,fearfully,ferociously,fiercely,foolishly,fortunately,frantically,freely,frighteningly,fully,generously,gently,gladly,gracefully,gratefully,happily,hastily,healthily,helpfully,helplessly,hopelessly,innocently,intensely,interestingly,irritatingly,jovially,joyfully,judgementally,kindly,kookily,lazily,lightly,loosely,loudly,lovingly,loyally,majestically,meaningfully,mechanically,miserably,mockingly,mysteriously,naturally,neatly,nicely,oddly,offensively,officially,partially,peacefully,perfectly,playfully,politely,positively,powerfully,quaintly,quarrelsomely,roughly,rudely,ruthlessly,slowly,swiftly,threateningly,very,violently,wildly,yiedlingly}} {{random:abandoned,abnormal,amusing,ancient,aromatic,average,beautiful,bizarre,classy,clean,cold,colorful,creepy,cute,damaged,dark,defeated,delicate,delightful,dirty,disagreeable,disgusting,drab,dry,dull,empty,enormous,exotic,faded,familiar,fancy,fat,feeble,feminine,festive,flawless,fresh,full,glorious,good,graceful,hard,harsh,healthy,heavy,historical,horrible,important,interesting,juvenile,lacking,lame,large,lavish,lean,less,lethal,lonely,lovely,macabre,magnificient,masculine,mature,messy,mighty,military,modern,extravagant,mundane,mysterious,natural,nondescript,odd,pale,petite,poor,powerful,quaint,rare,reassuring,remarkable,rotten,rough,ruined,rustic,scary,simple,small,smelly,smooth,soft,strong,tranquil,ugly,valuable,warlike,warm,watery,weak,young}}" direction.)
>>
been away for some weeks

gemma status?
>>
>>101626728
Is that the complete line? Leaving an open parens seems wrong somehow
>>
magnum-32b good
>>
>>101626802
Good to know. I've bee alternating between nemo and mini-magnum and find I like the latter way more.
>>
>>101626787
If you can't see the close you're on mobile or something and need to scroll.
>>
>>101626181
AI that can write complex programs (given a fitness function and multiple tries) already exist, but they are not llms or llm-related.
https://oxsci.org/deepmind-sorting-algorithm-fastest-yet/

As far as llms, here's a demo of what GPT-4 can code (assuming we can trust this lecturer)
https://invidious.materialio.us/watch?v=qbIk7-JPB2c&t=1793
>>
>>101626802
Seems very similar in style and intelligence level to mini-magnum in my testing (which makes it pointless since it's way bigger and slower)
>>
>>101626950
>invidious
youtube
https://www.youtube.com/watch?v=qbIk7-JPB2c&t=1793
>>
What happens if you run a bigger model than what's rated for your RAM? Say I try to run a 32B model at Q_8 on 16gb ram / 8gb vram.
>>
>>101627020
Your computer will use your hard drive as ram (very slow) or the program will crash immediately.
>>
Damn, Largestral at IQ3_M is the first model that really feels like commercial tier intelligence running locally.
Just wish I could somehow get more than 1.5 t/s out of it.
>>
>>101627020
offloads to disk
>>
>>101627020
your pc will die (in minecraft)
>>
File: BA_shupo_011.gif (284 KB, 200x200)
284 KB
284 KB GIF
>>101620069 (me)
>>101620112

>3090 is in windows desktop
>There's a link to install steps for windows so check it out.
>spend an hour manually installing cuda shit and trying to troubleshoot when the script doesn't work.
>wsl --isntall
>pip isntall TTS

It just works. How is windows this shit?
>>
>>101627068
>>101627071
I've heard that can wear on SSDs, so I imagine its not something I want to do often, correct?
>>
>>101626964
is the big magnum (72b?) supposed to be good? never really gave it a shot, it's small and old (at least by the standards of this industry), can't be better than opus or sonnet 3.5 right?
>>
>>101627125
flash-based SSDs will get burnt-out over time, yes. but the excruciatingly slow gen speeds should deter you from getting to that point
>>
>>101627160
>good?
meh
writes good coom but it's horny as fuck
>can't be better than opus or sonnet 3.5 right?
correct
>>
Oh, the girl's name is Lily, huh? You don't say.
>>
>>101627160
I didn't like it
I think rp/story tunes of models 70B and bigger tend to suck because tuners try to save money by tuning on top of the instruct instead of the base
and they never train enough to overcome the "feel" of the instruct

with small ones it's better because they can afford to train on the base, and for a long enough time to actually change the model's tendencies
>>
>>101627247
i usually find smaller models too dumb / repetitive across sessions

i guess there's only so much you can fit into 8b parameters vs 70+b
>>
>>101627247
>with small ones it's better because they can afford to train on the base, and for a long enough time to actually change the model's tendencies
https://huggingface.co/nothingiisreal/L3.1-8B-Celeste-V1.5
>We trained LLaMA 3.1 8B Instruct at 8K context
https://huggingface.co/Undi95/Meta-Llama-3.1-8B-Claude
>Llama 3.1 8B Instruct trained on 9 000 000 Claude Opus/Sonnet tokens.
https://huggingface.co/NeverSleep/Lumimaid-v0.2-8B
>This model is based on: Meta-Llama-3.1-8B-Instruct
>>
File: she imagines.png (993 KB, 1344x1115)
993 KB
993 KB PNG
I've been enjoying nemo a lot, haven't had any problems besides the model rambling every once in a while, but it happens so sporadically and you can stop the response when it starts rambling so I don't mind.
And then there's picrel where not only it fucked up the formatting but it went on forever in the most schizo rapid fire of words possible.
I just let her cook.
>>
>>101627070
Is IQ2_M not good enough?
>>
File: 1707711916634305.jpg (25 KB, 488x277)
25 KB
25 KB JPG
>>101625904
it migu
>>
>>101627336
>"_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct",
https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2/commit/cd1c715
>>
>>101627336
I am not at all surprised to learn that there are a few retarded exceptions
I do not think this proves what I said incorrect in any way. mini magnum and magnum 32B were both trained on base, for example
base tunes of 70B and above are extremely rare because community people can't afford it
>>
>>101627380
>a few retarded exceptions
Sao trained on instruct too, point is most tunes nowadays are on instruct not base been the case since mixtral
>>
>>101627380
>mini magnum and magnum 32B were both trained on base, for example
those are the exception honestly
>>
>>101627392
NTA but if that's true, it needs to change. Explains why the results are usually so shitty and people are increasingly just sticking with the official instruct version of a model instead of bothering with RP tunes.
>>
>>101627349
As someone who can run IQ2_M, it wasn't that impressive to me, maybe equal to 8x22b. So if someone is saying IQ3_M is commercial tier then I'd say IQ2_M is a huge downgrade.
>>
>>101627349
there's a very large jump from Q2 to Q3 with any model, anon
>>
>>101627465
But it's a 123B. The bigger you go, the less quality you lose from quanting.
>>
>>101625819
Please gen a pic of her wearing a micro bikini!!
>>
>>101627348
how does nemo compare to other similar sized models or 70b models? ive heard people praising it but when i try it it's meh, might be my prompt though
>>
>>101627448
Oh, 8x22B at what quant? I use Q4_K_M with Wizard.
>>
Nemo is good but for some reason assumes my identity and writes for me. Maybe my ST settings are wrong?
>>
>>101627465
Q3: kidding myself
Q4: "just as good" as higher quants
Q5: okay now it's really just as good
Q6: visibly better at instruction following, I can probably stop here since Q6 and Q8 are really close on some meaningless chart
Q8: this is basically the same as FP16 right? <-- I am here
>>
>>101627493
>The bigger you go, the less quality you lose from quanting
uhhh NTA but I don't think that's true at all
>>
>>101627523
Direct your post at the guy running Q2, not me
>>
>>101627504
I use q4 for wizard as well, but that's just my initial impression, not a lot of testing. I haven't tried to see if it's as good at code or knowledge tasks which I liked wizard for. It all runs so slow for me so that'll take a while. But for mistral large IQ2_m is the best I can do since it starts at 1.7T/s, and quickly slows to 0.5 if you get to 10k context +. I only have a shitty 8gb gpu.
>>
>>101627527
There was a graph comparing quant sizes of 8B and 70B with MMLU scores and quants on 8B has a much more detrimental effect than 70B though. How else would you interpret that?
>>
>>101627496
Nta but I've found nemo to be dogshit unless you work with it and provide inputs of roughly the level of quality you want to see. I've gotten bad outputs by rushing the process, and really good ones when I work on it over time. Even with the same context/AN. But I haven't tried the bigger 70b models so ymm.
>>
File: mmlu_vs_quants.png (336 KB, 3000x2100)
336 KB
336 KB PNG
>>101627634
>>
>>101627582
Damn. Wizard is probably a lot faster in your case then. I think I could fit IQ2_M of Largestral so it's not too slow but I also don't want to spend the time downloading it and testing it. I wish we just had good automated benchmarks.
>>
after llama 3.1 and mistral large 2, it really does seems like models from bigger companies are going to be dead for ERP

even 4o and sonnet 3.5, they are all worse than their predecessor model for ERP... i really fucking hope they dont lobotomize opus 3.5 similarly
>>
>>101627493
it's true that the quantardation is more significant with smaller models but it's very much still a thing with bigger ones, especially at the sub-Q4 range
>>
As a relative novice, how can I face swap an image into a video, locally, without uploading to some random site?
>>
>>101627674
largestral is the best thing mistral has ever released for erp, what are you smoking? I even see people in aicg using it
>>
>>101627674
I don't think the word "lobotomize" makes sense in the case of Sonnet since it's clearly extremely smart, the connotations of the word are wrong here. Lobotomies make someone dumber
It was more like a soul-ectomy
>>
i've not been here for about 2 weeks. what's the general concensus for vramlet models? still gemma 27b?
>>
>>101627684
I mean I don't doubt that it's still bad, but like >>101627651 shows, there's a big range in terms of how large the effect is. Clearly extremely low quants like IQ2_M are disastrous on 10B class models, while they look kind of reasonable for 70 (and up probably).
>>
>>101627523
Q8: Tiny model, can't afford to lobotomize.
Q6: Ain't gonna notice the difference.
Q5: Not worried.
Q4: Ouch.
Q3: Fuck, I thought this was the iMat IQ3.
Q2: I just want a taste of what I can't have.
Q1: Can't wait for 1.58 bitnet.
>>
>>101627506
checking "Include Names" can mitigate it a little
>>
>>101627348
>fumiko
>endo endo endo endo endo
>>
>>101627506
Are you using the base model?
>>
>>101627710
As a 24gb vramlet, after testing many alternatives, my current preferred model is izardLM-2-8x22B.i1-IQ2_S.gguf
It's not great but it's not worse than the alternatives
>>
>>101627698
when i use it (on open router), it's completely ass for erp, repeating the same dialogue just slightly different (moaning for 5 straight dialogues) and just doesn't know what to do in a sex scene
>>
>>101627523
>>101627711
For 70B:
Q8: As good as F16
Q6: As good as F16
Q5: As good as F16
Q4: Good enough
Q3: Good enough
Q2, Q1: Stop, get some help.
>>
>>101627768
Can't find or fit an iMat instead of i1?
>>
>>101627701
Nemo is the new hotness, old man
>>
>>101627768
What settings are good for wizard and what format does it use?
>>
>>101627825
Not him but what's the difference? I've only ever heard of imat before. And when I make my own quants, I don't see any 'i1' options.
>>
File: my settings.jpg (419 KB, 1925x1166)
419 KB
419 KB JPG
I dunno if i'm retarded, or if it's my card or what.

So basically:

>using Command R on Silly Tavern
>made the most generic card just to test the RP
>robot gives response, gets into character but there seems to be no consistency even within the first messages. Robot will refer to me as someone else, imply that my daddy wants to do something to me (lol)

It's a stepmom roleplay, the most vanilla card imaginable just to see how the jailbreaks are on builds (if stepmom shit flags it, fuck that) and also to not be overcomplicated.

But for some reason, all of the chatbots suck, no matter if it's Command R, Nemo or Gemma 27B.

I have no idea what i'm doing wrong. Please gimmie some tips lads, to be fair I am totally new to Silly Tavern so I know the issue is literally a "skill issue", just need some pointers.
>>
>>101627859
welcome to /lmg/, i hope you have a nice stay
>>
>>101627859
If you look at the final prompt that the backend receives, does it look right?
>>
>>101627824
I think the drop off starts at Q5, not Q4
I can definitely feel the difference between Q5 and Q6, but Q6 and anything above that not really
>>
>>101627882
whaddya mean? you mean the commands showing on kobold? Looks all right to me
>>
>>101627824
For coding tasks, anything under Q5 or under 70B has let me down. But for creative writing, wrongness can be beneficial, depending on where it hallucinates.

>>101627851
iMatrix does extra work to make low-bit quants better. i1 is similar but it's a one bit system so it's smaller than iMatrix but it can get sketchy. That said, I've got an i1 in my go-tos but it's Q5_K_S. If you're quanting down to Q2, I'm curious if iMatrix would be significantly larger (which may be prohibitive) or if the quality between the two is comparable or significantly different. But just reading the file name, i1 and Q2 sounds exciting.
>>
>>101627895
I mean that Silly will take the chat history, the character card, examples, etc, and format it all based on the Context Template and Instruct Mode Preset.
Looking at the final, formatted text that gets sent to the backend (koboldcpp in your case I guess) can help you find out what's wrong.
Also, try neutralized samplers, although I don't see anything too weird in your samplers settings.
>>
>>101627859
Try putting some examples in, or edit the first few messages to be how you like and see if it continues to fuck up, that way you'll know if something is truly weird, or if it was just unsure what to do and needed more input.
>>
>>101627763
No, instruct.
>>101627721
I was under the impression that it made it worse.
>>
>>101626763
To me it is dead. It was a shitty model and not just a bugged loader. Nemo on the other hand is retarded but most fun I had with a model in a while.
>>
>>101628135
Is nemo better than command-r? That's the best one under 70b I think.
>>
File: file.png (242 KB, 410x482)
242 KB
242 KB PNG
>>101628233
Nemo is personification of pic related. It feels cuter to me than command-r. And I am a huge command-r fan.
>>
>>101624099
Add a pause command at the bottom of the bat file and it'll "Press any key to continue" before exiting so you can see the whole error log.
>>
>>101628250
Last time I tried it the first few replies all started with the same thing, and it was hard to get it to stop doing that without making it even dumber.
>>
>>101627831
Settings don't matter
>>
>>101628379
Then it's kinda useless if it's just gonna start every message with the same 2 or 3 things.
>>
>>101628339
Yes it is retarded and it autistically picks up on unwanted patterns. It is hard to tardwrangle but it gives some next tier cooming. When it doesn't repeat itself and isn't retarded it can sometimes write like 800 tokens that are absolutely perfect and don't need any editing. Comparing to command-r at least for me I had to always heavily edit all the outputs.
>>
>>101628398
>>101628398
>>101628398
>>
>>101628413
Well, after the first sentence it seems good. I guess I could just ignore the first bit and just accept that it's always the same like some retarded quirk like you say.
>>
>>101628388
click "neutralize samplers" in sillytavern
play with the temp a little if you want, but 1 is fine
0 is smarter
1 is more random
2 is schizo
you can do a min p if you want, I usually do .05, but again it shouldn't matter
>>
>>101627824
Q5 is noticeable compared to Q6, but just barely.
>>
File: check this sip.jpg (69 KB, 828x987)
69 KB
69 KB JPG
What's the most powerful local AI that a 4090 + 32GB RAM can run, objectively speaking.
>>
>>101629433
command R non plus



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.