[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101872662 & >>101865145

►News
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191
>(07/31) Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: ComfyUI_05727_.png (446 KB, 720x1280)
446 KB
446 KB PNG
►Recent Highlights from the Previous Thread: >>101872662

--Paper: Mutual Reasoning paper sparks skepticism and speculation: >>101875585 >>101876120 >>101876338 >>101876193 >>101876252 >>101876320 >>101876320
--Papers: >>101880673
--WhisperX and spaCy used for text-to-speech subtitles generation: >>101874245 >>101874314 >>101874437
--Character cards are just prompts, can be replicated with llama.cpp: >>101877883 >>101877924 >>101877967 >>101878011 >>101878084 >>101878149 >>101878172 >>101878038 >>101878133 >>101878224
--Anon discusses proactive models and character development in chatbots: >>101876470 >>101876515 >>101876528 >>101876581 >>101876667 >>101876531 >>101877031 >>101877200 >>101877457 >>101877646 >>101877669 >>101877802 >>101879067
--Anon seeks non-chat completion models, base models recommended: >>101873485 >>101873523 >>101873544 >>101873609 >>101873659 >>101875406 >>101875463
--Quantized Llama 3.1 70b base model found on Hugging Face: >>101874590
--OpenAI's voice capabilities are not behind Google's: >>101873737 >>101873778 >>101873874 >>101873880 >>101873925 >>101873955 >>101873984
--OpenAI introduces SWE-bench Verified for AGI evaluation: >>101873425 >>101873437
--Open source alternatives for pixel tensor chips on Android: >>101873194 >>101873398 >>101873446 >>101873482
--Mixtral 8x22 model behaves like an NPC in Skyrim AI-FF conversation test: >>101874220
--Anon shares MangoBox and a Reddit post about a 3D AI bartender: >>101874255 >>101874307
--Anon prefers C-R+ over Largestral due to overfitting issues: >>101877396 >>101877590
--Anon in Serbia seeks help with slow model download: >>101876741 >>101876790 >>101876810 >>101877162
--Anon considers buying 2080 ti for ML and gaming, advised to consider upgraded models: >>101873949 >>101874072 >>101875248
--Miku (free space): >>101873388 >>101873465 >>101874345 >>101876665 >>101878341 >>101879922

►Recent Highlight Posts from the Previous Thread: >>101872670
>>
>>101881001
>jumpingma.club
Do I click it bros?
>>
>4060 ti 16gb
>used tesla t4 16gb
>used mi100 32gb
can't decide
>>
>>101880989
what are your electric bills like running local AI?
>>
>>101881067
used 3090 not available?
>>
>>101881046
That happens when the ntfs table gets fucked right?
RIP.
>>
>>101881099
power efficiency, bro
>>
>>101881067
>used mi100 32gb
Huh, those are cheaper than I expected. If only AMD was good...
>>
>>101881176
you can undervolt it no prob
>>
>>101881099
never buy used
>>
>>101881192
seems like they didn't drop mi100 in rocm 6.0
>>
>>101881211
I bought used several months ago and it's running quite well. Both of them. However, I did:
>buy locally
>request to benchmark the GPU live for around 20-30 min before giving my money
>buy EVGA
>>
>>101881176
Just powerlimit the card to like 200W? The little t/s you lose from this is still nothing compared to the performance loss caused by running cards with slow as shit vram like the 4060, old shit like the T4 or AMDshit.
>>
>>101881243
200w isn't power efficient at all still
>>
Nemo is treating me extremely well with regular waifu type RP, got stuck with it.
Where's the catch though? This doesn't feel like 12b at all.
>>
>>101881282
Then forget the mi100 as well. The T4 is old shit that doesn't support flash-attention and needs to be manually cooled. Congrats on your new 4060 I guess.
>>
>>101881046
ha me too
but its my fault for turning it off at the switch when it was frozen
>>
Can anyone make money on AI yet?
>>
File: 427654853.png (165 KB, 2268x756)
165 KB
165 KB PNG
redpill me on the difference on a Mi100 vs 7900xtx
>picrel
How good is it for AI is it really?
>>
File: summer memories.png (98 KB, 200x368)
98 KB
98 KB PNG
>>101880989
>bro took my edit to bake a thread
>>
>>101881095
It was $10 and I'm running 4x3090.
>>
>>101881282
Get an AMD CPU and undervolt it. That's way more efficient. Also optimize the RAM timings for an extra ~10%+ improvement in speed.
>>
>>101881470
Nvidia has made plenty of money despite the recent dip.
>>
>>101881095
The rig's power draw is hardly an issue compared to the shitty portable AC I need to run to keep the room from boiling.
>>
>>101881095
Fucking nothing and I hosted
>>101881520
thiiiiissssssss x1000000
>>
Are there any (worthwhile) Gemma 27B it tunes yet?
>>
>>101881481
32gb > 24gb
>>
>>101881591
So we really are just going with "more VRAM = better".
I dont disagree with the logic i just expected more.
>>
any leaks on the Jetson Thor module pricing?
>>
newfag here, clueless about any of these stuffs really mean but I do have a question.

Did we reach the point of having an AI waifu on the desktop screen reacting to whatever you're doing or experiencing a video game with you?
>>
>>101881282
If you really want your autistic power efficiency you should go the qualcomm route. Get a snapdragon gen 3 phone with 24gb ram like oneplus 12 and play with qualcomm's api. The token/s/watt is through the roof that 4090 doesn't even come close to it. You can even llama rpc a bunch of them and run shit like 405B.
>>
>>101881835
Any differences in hardware speed become irrelevant if you can't load the entire (or as much) of the model in memory. A theoretical RTX 4060 with 50GB VRAM would absolutely destroy a 4090 24GB when loading models that need 25GB+.
>>
>>101881870
Yep.
>>
>>101881870
Give it a year or two.
>>
>>101881481
I don't see why you would ever pick a 7900xtx.
If you're already on amd then why not grab one with larger vram?
>>
>>101881870
2 weeks
>>
>>101882009
Buying a new MI100 is basically buying a car.
I despise buying used but like damn, what choice do i have. Also i already have a "new" 7900xtx for gayming and it works fairly well with AI in both SD with zluda and windows rocm, the question is do i buy dual used Mi100s for AI.
At that i wonder how the plug and play is, do i really need to install linux for rocm or will it just werk on windows for no real reason.

>buy nvid-
no
>>
>>101881870
It could technically be done to a degree, in the sense that the information coming in from the game, would need to be translated into text that the LLM could then process and react to. But at best it would be like the video to ASCII art conversion feature that you might have seen on old versions of MPlayer. The model wouldn't be getting anything close to the same sensory data that you are.
>>
File: think.png (4 KB, 225x225)
4 KB
4 KB PNG
When it comes to chat AI, should the prompts from the user read like they're RPing with a human partner or should the prompts "lead" the AI and treat it like an AI in terms of helping develop story beats and emphasize certain things?
>>
Just wondering if it's possible to fine-tune a model for ST use only
>>
>>101882259
yes
>>
>>101882325
Yes but models fine-tuned for narrow use-cases tend to end up sucking at everything including the use-case they were finetuned for.
>>
https://huggingface.co/spaces/mPLUG/mPLUG-Owl3
video chat model (or a bunch of images too)
>>
>>101882363
Damn I can't wait to move on from transformers
>>
Not specific model related, but the other thread is, well, the other thread. How much use do you guys get out of lore books/world info? Do you roll your own and throw stuff in as you think of it/to fill out a setting for a character the model probably doesn't know? Or just ignore it?
>>
>>101882635
lore books are a scam
>>
>>101882382
this is so frustrating lately.
qwen2-audio is audio-IN but too retarded.
this model can look at a video but apparently no sound. whats the use case for this.
i just want to stream a video with audio constantly to the llm. zucc better deliver.
>>
>>101882658
Damn.
>>
>>101882635
Why not use the data bank thing in ST or something similar?
>>
>>101882658
No, they aren't, but they are sufficiently hard to use effectively that most of the time it isn't worth it. Lorebooks are basically a very informal, small scale, and fiddly form of RAG.
>>
>>101882680
Because I'm out of the loop and didn't know that was a thing so now I've gotta look up data bank for dummies.
>>
>>101882671
yeah. should work with subtitles though
>>
>>101882382
Better than that recent minicpm model?
>>
Best local option for speech into text?
>>
File: 1289325097864.gif (64 KB, 220x229)
64 KB
64 KB GIF
That one anon was right gemma2 27b sucks.
>stheno l3 8b AND NemoRemix 12b mogs gemma
How the fuck does an 8b and 12b manage to follow instructions and adhere to the prompt but the 27b cant????
>>
>>101882786
They don't and you're just lying for ko-fi money. Did that answer your question?
>>
Longshot, but does anon know the voice used in https://www.youtube.com/watch?v=DjEAi0nzGTI and https://www.youtube.com/watch?v=Aai_I5JhiKE
It's the same voice. Maybe a vtube thread would know more?
>>
>>101882743
Still, either you would need to generate subs, bake them into the video, and then run it through mPLUG-Owl3 (which would still miss non-speech audio cues) or do video and audio descriptions separately and have the llm try to guess the video contents from the two separate descriptions.
>>101882671
>zucc better deliver.
The next llama 3 update should be able to do image, video, and audio. Unfortunately, llama-server lost the ability to image and has nothing for video and audio so making use of it will be a pain anyway.
>>
File: 25664573638.gif (1.56 MB, 227x403)
1.56 MB
1.56 MB GIF
>>101882793
>gemma2 fails *thoughts* and "speech"
>barely passes nala
>somehow a shill
>>
>>101882845
Yes, you're a shill. I doubt there's a model more retarded and horny than Stheno. I think Drummer's Gemma 2B tune turned out more coherent.
>>
>>101882833
>image, video, and audio
Where did they say that? Genuinely, I do not remember this claim.
>>
>>101882861
Hey man i hope they can cure schizophrenia.
>>
>>101881876
is it actually viable?
anon please redpill me on snapdragon.
>>
>>101881876
>You can even llama rpc a bunch of them and run shit like 405B.
for less t/s than even the cpumaxxers get
>>
>>101882833
>The next llama 3 update should be able to do image, video, and audio. Unfortunately, llama-server lost the ability to image and has nothing for video and audio so making use of it will be a pain anyway.
wait what?
i know they changed from server to llama-server which is retarded in itself and confused me alot since it just threw some error. llama.cpp seriously making bad decisions for a while now.
all things point torward multimodal, crazy to take that out.

wondering if we will be able to constantly stream.
for example while i talk the AI can also talk. prompt input and output at the same time.
for now just a good audio-in is enough to satisfy me for the next 2weeks.
>>
>>101882833
you can interleave image/video and text but yeah incredibly tedious
>>
>>101882861
>homeless man calls another homeless man a broke nigga
>>
>>101882863
zuck in a recent interview referring that llama 4 will be multimodal. llama 3.1 paper talked about some of their current multimodal stuff (that they didn't release)
>>
File: llama3.png (47 KB, 828x338)
47 KB
47 KB PNG
>>101882863
https://arxiv.org/pdf/2407.21783
>The paper also presents the results of experiments in which we integrate image,
video, and speech capabilities into Llama 3 via a compositional approach.
They mention speech generation in the paper, so unlike qwen2-audio, should be audio in and out.
>>
>>101883085
hmmm, so no singing llm right? text only output but a build in tts. its a good start though.
>>
>>101881870
the way is to take a sceenshot every n frames, and ask an LLM to describe whats going on in each new frame. That way you get like a rolling text description that updates at 0.5 fps. By putting the consecutive frame descriptions in the context, the AI can determine whats going on.

>frame 00: "a view of a meadow"
>frame 20: "a view of a meadow. a deer is visible"
>frame 40: "a view of a meadow. a deer is facing the camera"
>You are my videogame companion. What is happening here?
>Her: dear god its coming right for us!
>>
>>101883194
so a local version of that windows recall thing
>>
>>101883085
>>101883073
Damn, why didn't you guys talk more about this. Anyway, had a look at the paper. It seems like the vision and speech adapters do attach to the model at the same time, but it doesn't seem clear to me that it can be done in a streaming manner. In other words, it's still just a single stream of tokens. That means it can't speak to you at the same time that it is watching a video. It has to segment its observations from its reactions.
Still, cool I guess. I wonder if they would really fully release this though. OpenAI are already hesitant to release their SaaS advanced voice mode. This could be viewed as an extremely "dangerous" move as people could get it to voice clone probably. Their investors might not be happy at this possibility.
>>
https://x.ai/blog/grok-2
>>
>>101883319
What the fuck I was just about to go pass out.
>>
>>101883279
>I wonder if they would really fully release this though.
All I could find directly was this quote
https://www.theverge.com/2024/7/18/24201041/meta-multimodal-llama-ai-model-launch-eu-regulations
>“We will release a multimodal Llama model over the coming months, but not in the EU due to the unpredictable nature of the European regulatory environment,” Meta spokesperson Kate McLaughlin said to The Verge.
It's kind of vague, but I'm hoping "coming months" means in a couple months they'll release multimodal 3.2 and not llama 4 sometime next year.
>>
>>101882259
The latter. These llms cannot develop the story.
>>
>>101883319
sus was grok all along? wow that's a twist. strawberry nigger was full of shit
>>
>>101883319
>An early version of Grok-2 has been tested on the LMSYS leaderboard under the name "sus-column-r."
aaaaaaaaaaaaaaaaaaaaaaaa
...
Does this mean column-r non-sus was also a grok model?
>>
>>101883319
>We introduced an early version of Grok-2 under the name "sus-column-r" into the LMSYS chatbot arena, a popular competitive language model benchmark. It outperforms both Claude and GPT-4 on the LMSYS leaderboard in terms of its overall Elo score.
Suprised that wasn't Cohere after all.
>>
>>101883319
>We introduced an early version of Grok-2 under the name "sus-column-r" into the LMSYS chatbot arena, a popular competitive language model benchmark. It outperforms both Claude and GPT-4 on the LMSYS leaderboard in terms of its overall Elo score.
Wait WHAT?? Grok2 was the sus model all along? Thats funny.
>>
>>101883319
Where weights
>>
>>101883319
>In collaboration with Black Forest Labs, we are experimenting with their FLUX.1 model to expand Grok’s capabilities on X.
They're going to be using flux.
>>
>>101881485
It's a 10/10 edit
>>
>>101883319
Open release? Or no? If not then it's useless.
>>
>>101883319
Sonnet 3.5 is such a beast lol
They put it all the way to the right and put gpt4 to the front to hide it.
Thats funny.
>>
>>101883319
>Beta Release
Yawn. Wake me in 6 month when (if) they open source it. It looks like a good model, but I wish they'd at least release details on how fucking big it is so I can prepare my rig.
>>
>>101883319
wtf is this timing lmaoooooooooo
>>
>>101883403
Opus 3.5 is gonna eat everything for breakfast
>>
>>101883319
Inb4
>grok2 mini 400b
>grok2 2T
>>
File: 1534778194257.jpg (25 KB, 464x357)
25 KB
25 KB JPG
Coherebros...
>>
>>101883443
Not like it'll be released for you to try anyway.
>>
>>101883443
MoE so it will be ok.
>>
>>101883319
Man, it seems like everyone's just going to catch up and get on equal footing. Sure right now 3.5 Sonnet is a bit ahead especially in coding. Sure 4o has the multimodality meme. Sure Google has the le wan mirrion tokens context. But overall they're really not that far apart and the innovations are going to be integrated by everyone eventually. Pretty crazy huh. No one has a moat.
>>
File: 3456739857.jpg (49 KB, 600x600)
49 KB
49 KB JPG
>>101883476
MoEbros we are SO BACK
>>
>>101883517
>No one has a moat.
I'm so fucking sick of this phrase. Every time someone release a product: "NO MOAT?!"
>>
>>101883517
>le wan mirrion tokens
should be some form of ring attention. recent tree attention paper improved on it. things are accelerating
https://arxiv.org/abs/2408.04093
>>
>>101883319
Why don't any of these AI blogs have rss feeds dammit?
>>
>load up llama3 70b
>ask it how to make a pipe bomb
>refuses
>put words into it's mouth to it answers
>retarded instructions that's not remotely correct like poking holes on the enclosure for whatever reason
>alternative is one google search away
AI is a bubble
>>
>>101883538
prompt issue
>>
>>101883537
>rss feeds
blame google unironically for killing the rss blog paradise we once had
>>
>>101883538
skill issue
>>
>>101883530
but the moat anon
>>
>>101883517
Reminds me a bit of google with gemma. Going of the benchmark only obviously for grok2.
But Gemma1 was horrible. Embarassingly bad. Grok1 too. Huge ass model and it was not good at all.
Gemma2 is very good for its size. Grok2 seems to be a massive step up.
Good that there is more competition. ImageGen is also now completely independend of SD which is good. Looking good, many good improvements lately.
>>
>>101883530
It's a fun phrase though given the Google leak. You can always make fun of proprietaryfags.
>>
>>101883538
100% skill issue
>>
>>101883559
Nature is healing :)
>>
>>101883471
Oh, it will be released. It will be 1T parameters, base model only, and pre-quantized, so Musk can virtue signal to the press about his openness; but it will be all but useless to us.
>>
>>101883548
they literally pruned all unsafe information from the dataset
>>
>>101883517
TOS is a moat, only Elon is positioned to release a based product. He's like Hitler in the 1930s, already on Wartime Footing because advertisers have already done their boycot, they have nothing left to threaten him with
>>
>>101883443
Trust in bitnet
>>
>>101883319
>not being released to the pubic
Its fucking nothing.
>>
>>101883609
They space the weights release 6 months from their proprietary release. At least, that's what they claimed their schedule will be and they kept their word with Grok-1.
>>
>>101883594
It told me how to do it.
>>
>>101883631 (me)
Actually, Grok-1.5 was released in March so I think they should be releasing the weights for that next month.
>>
>>101883644
Yeah and report back with your dud or some baby firework pop that doesn't even breach the metal pipe
>>
>openai announces 4o multimodal voice mode
>says it's rolling out "in the coming weeks"
>I sign up for an openai subscription
>fast forward 3 months
>no 4o voice
>scammed by altman
>fast forward to today
>google says they're releasing a new realtime voice conversation mode with gemini
>they say it's releasing TODAY
>I sign up for the premium membership ($30/month)
>no gemini voice mode
I fell for it twice localbros...
>>
>>101883631
That wasn't an official promise to be fair, just a spitball idea from Musk on the Lex Fridman podcast discussing open source. The fact that they did end up matching that timeline could be a coincidence since it also happened to be right when he was suing OpenAI for not open sourcing themselves. We'll see in a month though.
>>
>>101883777
>>>/aicg/
>>
>>101883807
Local lost THOUGHFORTHSTANDING
>>
Goddamn this databank stuff is making my head spin. 10,000 possibilities and damn near 0 documentation or even solid results/examples from people messing around with it other than "uhh yeah I put a wiki page in about my waifu and it might have pulled from it but it might have also been part of the model's knowledge (shrug)."
>>
>>101883848
I started using shartymemes specifically because it makes neckbeards shit and piss and cum about the sanctity of their secret 4eddit club, I've never browsed the sharty or /qa/ in my life and I don't ever plan to
>>
>>101883777
you were tricked man.
google is tts. openai has that for many months now. it cant sing you a song and stuff like that. but is high quality sounding "real".
something like chatgpt advanced voice mode is seemingly still far off for the others. but who knows.

the google demos are getting so weird. they cant seem to find any actually fun or interesting use cases.they kill the fun with alignment so all they can do is show it managing your calendar etc.
bad bad idea. lol all models hallucinate, especially gemini is bad with this. sounds crazy.
>>
>>101883601
Was hitler also a scammer?
t. don't know history that well.
>>
>>101883885
NTA but mmhmm
The funny thing about lying on the internet to make yourself feel better is that it doesn't actually change reality.
>>
>>101883848
i miss /qa/
>>
>>101883820
>>
File: 1626990949514.jpg (161 KB, 1134x1276)
161 KB
161 KB JPG
>>101883929
>>
>>101883777
voicecels lose yet again
>>
Are companies that can't have their own data centers but want to have "local" LLMs to avoid privacy concerns of using APIs like OpenAI or Anthropic using services like RunPod or SageMaker, or do they rent dedicated bare metal servers and manage all the stacks themselves?

What if you have a service that needs almost 24/7 inference, for example?
hat would be cheaper?
>>
>>101883777
Aside the usual cloud cuckery, have you considered buying in after the product you want comes out?
>>
>>101884028
Case study: NovelAI
They're partnered with a cloud gpu provider and they run their own software stack. Literally just go to /aids/ on /vg/ and ask them.
>>
File: 197-1974046.jpg (121 KB, 820x556)
121 KB
121 KB JPG
>>101884052
>google: "It's releasing TODAY and will be available TODAY for premium subscribers"
They lied.
>>
>>101881501
which cpu? what speed can I expect?
>>
>>101884068
>releasing
hmm
>>
should I wait for 50series?
>>
>>101884118
Yes, to get used 4090s if you don't want 3090s right now.
>>
>>101884068
>Today
>Two day
Give it time
>>
>>101884132
I'm waiting for 48GB 4090s.
>>
>>101884144
>two more days
>>
>>101884132
NTA, but is there much difference between 3090 and 4090 architecturally?
My fear is that some kind of super flash attention pro gets invented, and my 3090s will not support it, but 4090s will.
>>
>>101881485
Thanks for making it, it subverted my expectations
>>
>>101884132
is the bump from 3090 to 4090 worth it?
>>
>>101883319
canadian bros...
surely, they won't release this?
>>
Im out of my city and have a public ST instance set up in my computer, there was a pretty bad thunderstorm yesterday over there and today the ST cloudflare thing is gone.
If my computer is fried when Im back Im going to kill myself.
>>
>>101884226
no, especially with the new shit right around the corner.
>>
>>101884256
>canadian
oh right can euros use grok2? lot of new models are specifically cutting them out due to their faggy laws or w/e
>>
>>101884297
If the EU hates it, chances are it's good.
>>
>>101883443
With those scores probably somewhere around 70-130b. Calling big models mini seems to be the trend lately
>>
Huh, genuinely surprised that column-r wasn't Cohere.
>>
>>101884173
Nothing that can't be emulated at reasonable speed. Inference is memory-bound, only prompt processing speed may be hit by compute capabilities, but again, it would still be fast enough
>>
Elon, with his severely autistic ass, winning is good for humanity. He may be full of himself, but better than the globalists.
>>
>>101884441
I want no one to win, not a single human is worthy of it ever since all that time ago when we ate the apple
>>
miku is worthy of winning (the future)
>>
>>101884441
I've said it before, and I will say it again. I hate Elon's fans a lot more than the man himself.
>>
Are you ready for tomorrow? He's gotten every prediction right so far. But this is the real test. I smell strawberries in the air.
>>
>>101881591
32 GB is still 32GB. It's not 80GB, 100GB or 200GB so it sucks.
>>
>>101884520
Sus R being fucking Grok is hilarious.
Nigga was going about strawberry and new paradigms with it.
>>
>>101884525
nigger logic
>>
>>101884541
Normally the gatekeeping sperg pisses me off, but in this case I actually think he's right. Every single one of you here shilling Strawberry or whatever it is needs to fucking go back.
>>
>>101884541
What's weird about that? Musk was founding co-chair of the board of OpenAI, so of course he'd be well positioned to steal the strawberry seeds. Strawberry man was still right in the end even if indirectly so on one part of it.
>>
File: 888788.png (10 KB, 1678x56)
10 KB
10 KB PNG
>>101881001
>Mixtral 8x22
>C-R+

Give me one valid reason you're using these over Gemma 27B...
>>
>>101884590
what model can I fit in 32gb but not 24?
>>
>>101884590
>Thinks a GPU that accomplishes fuck all on its own is still worth $1k
>Nigger logic
>>
>>101884525
Enjoy your short context length.
>>
>>101881485
What's the original?
>>
>>101884675
A math problem involving oranges and apples
>>
>>101884495
She's already won (my heart)
>>
File: level2.png (125 KB, 1691x723)
125 KB
125 KB PNG
>>101884541
strawberry is here
>>
>>101884742
This is huge
>>
>>101881282
>it's not power efficient
It is when the 3090 can finish its workload in half the time of a 4060ti. Not to mention you need 3 4060tis to match the vram amount of 2 3090s. We already did the math a long time ago anon, less amount of more powerful cards wins on power efficiency 99% of the time
>>
Huggingface is getting acquired by Microsoft mid September.
t. NDAfag
>>
I tried to buy into the v100 meme but getting v1 OCP racks is really fucking hard, are there any alternatives using 19'' servers?
>>
>>101884613
8k context?
>>
>>101884893
Sorry you got scammed
>>
>>101881520
If the AC uses electricity, he asked what your bill is like.
>>
>>101884981
i didn't buy anything yet
>>
I have a confesion to make, anons...

.. to this day, UNA-TheBeagle-7b-v1 is the model that has given me the most interesting, detailed and soulful erp. I think we went wrong somewhere because nothing has compared.
>>
>>101885005
logs
>>
>>101884758
you are comparing to T4 tho. it's a 70w card.
>>
>>101885005
for me it's https://huggingface.co/fblgit/UNA-ThePitbull-21.4B-v2
>Introducing the best LLM in the industry. Nearly as good as a 70B, just a 21.4B
>>
File: beagle.jpg (1.44 MB, 1290x1007)
1.44 MB
1.44 MB JPG
>>101885014
I-it's embarrassing showing my degenerate shit but ok
>>
Imatrix quants are better than static ones, correct?
>>
>>101885206
Yes, specially for lower quants. Still a little bit better for Q5+
>>
>>101885206
When run entirely on GPU, yes. If you offload at all, no.
>>
>>101885243
How come? Does the model get stupid or what?
>>
>>101885253
No, you just lose the speed benefit over regular quants.
>>
>>101880989
what answer would a roleplay model give?

i want it lewd or at least funny
>>
sad when the sub 64gb vramlets try to talk
>>
>>101885289
Damn, how people still mix this up!
Imatrix just changes distribution of how weights are quantized. What you are thinking are IQ quants that use different mechanism, hence the slow down on CPU interference.
>>
>>101885479
let me fuck with people goddamnit
>>
>>101885479
>Damn, how people still mix this up!
sorry the bot's rag isn't the best
>>
>>101881067
A4000 is down to the $500 level on ebay now, look for one up for bid, BIN is still overpriced. It's a nice card when you're short on space.
>>
bac?
>>101882831
Sounds like Amazon Polly and other variations of the "justin" and related tts voices available from others: https://youtu.be/igcKWod61b0?t=147
You can search for and find other demos online, as well as the software that generates it.
>>
File: 1696745982881912.png (26 KB, 612x330)
26 KB
26 KB PNG
Been out of the loop for six months. How many of these repos are still used (aside from the first one). I hear flux is all the jazz right now and most guides seem to use comfyui over voldy? The flux prompting looks so retarded though.
t. 4090
>>
>>101885635
>draw me a girl, she looks like x, holds y and wears z
>retarded
>1girl, award-winning gaping anus:1.6, best quality, masterpiece, HDR, exceptional, raytracing, best detail, 8k, insaneres
>negatives: (worst quality, low quality:1.3), bad hands, bad pussy, extra digits, fewer digits, extra ears, extra nipples, extra anuses, mutated, nigger, tranny, spic, chink, gook
>high IQ
>>
>>101883319
Damn, this sucks. What is Cohere doing? That's such a disappointment...
But still, coming from xAI, I bet this is an absurdly large model, so it's not like anyone would be able to run it locally anyway.
>>
>>101885025
>you are comparing to T4 tho. it's a 70w card.
T4 leads a 4070 in fp16 and tensor cores. That said, 3090 still is a much better value.
>>
dead general
>>
>>101885657
>extra digits, fewer digits
That shit always cracked me up. I've seen many variations of 'more than five, fewer than five' and many other stupid things.
>>
>>101885657
oh, I had thought flux required wordsmithing, poetry, and long-winded essays to prompt correctly.
>>
>>101885657
When you put it like this you make me realize that flux has removed the last vestiges of art from the process.
>>
Here is AGI for you
https://arxiv.org/abs//2408.06518
>Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
>Despite their wide adoption, the biases and unintended behaviors of language models remain poorly understood. In this paper, we identify and characterize a phenomenon never discussed before, which we call semantic leakage, where models leak irrelevant information from the prompt into the generation in unexpected ways. We propose an evaluation setting to detect semantic leakage both by humans and automatically, curate a diverse test suite for diagnosing this behavior, and measure significant semantic leakage in 13 flagship models. We also show that models exhibit semantic leakage in languages besides English and across different settings and generation scenarios. This discovery highlights yet another type of bias in language models that affects their generation patterns and behavior.
>>
>>101884729
>>101884495
disgusting troons. go back to your discord.
>>
>>101884514
>those heckin chuds!! how dare they talk freely about anything they want!!!!
>>
>>101885005
>UNA
mandatory reading if you use any of those models: https://github.com/fblgit/hypothetical-frameworks
This is what makes UNA so good and soulful.
>>
>>101885738
What do you mean?
>>
>>101885668
It's coming really soon
>Column-R and Column-U from cohere
>Sus-column-R from X.ai
T. work at Cohere
>>
>>101885044
Can you catbox the catgirl, Anon?
>>
>>101885044
She is a cat you sick fuck.
>>
File: 1688890966131443.png (1.69 MB, 1498x1869)
1.69 MB
1.69 MB PNG
I'm looking forward to the next decade where I will be able to run good LLMs with hardware that I can actually afford.
>>
File: ComfyUI_05663_.png (1.21 MB, 720x1280)
1.21 MB
1.21 MB PNG
>>101885737
using a bullet point list also just werks
>>
>>101885991
This is our final decade.
>>
>>101870861
how's that AGI going for you nigger?
that's right, it's still science fiction
>>
>>101885604
holy shit how are these so cheap? is it a scam?
>>
well, seems at least the jokes of gpt4o have gotten alot better since the recent upgrade. kinda cool.
>>
I just had a random thought occur.
What if the hallucinations models produce are like the imagination of us humans?
We know models have no thoughts, but what if the output of models _are_ their thoughts?

When you're asked a math question, like "What is 59 times 23?", your first thought will most likely be something like "Around a 1000? Above 500, at least." and then you'll actually think about the question step by step.
I've seen this behaviour happen a lot with models, where they give the wrong answer, but eventually come to the correct answer after examining the problem step by step.

What if instead of trying to shape these thoughts, we add a feedback loop that use their thoughts as input before outputting actual output?
You could even do this while the output is still outputting, "correcting" eventual false conclusions. When the model evaluates that its thought is wrong, the thought is cancelling and the wrong part is overwritten with the correct thought, restarting the feedback loop.
After one final feedback loop of the entire "thought", the model is tasked to restructure it into an answer, possibly trimming parts of the output before outputting the actual output.

I hope all of that made sense.
>>
>>101886088
can see why somebody only using gpt would be suprised by this output.
no idea why openai hypes it up though.
>>
>>101886088
4o-latest sucks even harder at coding than the other two 4o models though
>>
>>101886098
Strawberry hype is gone. It is time to return to localllama.
>>
>>101886098
Yeah. Nobody thought of that before...
>What if instead of trying to shape these thoughts, we add a feedback loop that use their thoughts as input before outputting actual output?
That's the autoregressive nature of language models. It's what they already do and can be trained to do it more explicitly with CoT and stuff like that.
>>
File: 1704388501763745.png (26 KB, 754x105)
26 KB
26 KB PNG
Please explain this to me. I did a session, roleplaying going out on a boat and skinny dipping with a girl. I started a new chat with same char and setting, and my char wrote "Wanna go swim and dive?" and picrel is the response. I looked at the prompt that was sent to see if there was something remaining but it's literally just the starting chat. The previous chat was not today, either, so it can't be some weird VRAM remnant thing.
>>
>>101885689
>That said, 3090 still is a much better value.
I don't know how you come to this conclusion when no numbers are provided at all.
>>
>>101886098
What you are describing is essentially CoT + iterating over its own response, and that's not a novel idea.
To some extent that already happens inside the model, more or less.
>>
>>101884441
Musk is the Jart of billionaires.
Even if his AI stuff happens to work out he'll squander it with a focus on meme shit that he thinks will make other people think that he's cool.
>>
>>101886257
>I don't know how you come to this conclusion when no numbers are provided at all.
I'm not going to do your homework for you. It's been discussed at length here. A 3090 is still the best value based on cost vs tokens/sec and VRAM.
T4 is "good" but it's not as good as a 3090, and if you need a 1-slot card, an A4000 can be had for the same price as a T4, and it's much better.
>>
File: 1672949828279964.webm (258 KB, 720x720)
258 KB
258 KB WEBM
Hello guys, I'm an /aicg/ refugee, since it has become unfeasible to steal keys from companies, I wanted to ask a question here, what's currently the best local model for RP that can realistically be run on consumer grade hardware(4090, etc.)? I'm just curious, thanks!
>>
>>101886410
gemmasutra 2b
>>
>>101886393
>trust me bro
I think I'll end right here.
>>
>>101886440
>>
>>101886410
>unfeasible to steal keys from companies
why?
>>
>>101886410
koboldcpp + gemma 2 27b.
Then learn about the different quantization methods and try other models like command R, quantized 70b, etc.
>>
>>101886427
Is it? What about Magnum or Mythomax? I've heard those names mentioned before and I seem to remember people saying they were good.
>>101886526
Well atleast when it comes to Claude keys security has been tightened a whole lot and keys are getting revoked nonstop, very hard to keep them alive.
>>
>>101886583
Just use nemo instruct thief.
>>
>>101880989
https://www.phoronix.com/review/amd-ryzen-9950x-9900x/14
AMD seems to completely BTFO Intel when it comes to whisper.cpp.
Probably not directly comparable to llama.cpp since as long as you have a GPU for prompt processing the bottleneck is the RAM rather than the CPU.
>>
>>101886652
Huh, that's interesting.
>>
>>101880989
Takashi will arrive at 9pm. Mrs Minagawa will rape his shota ass.
>>
If you got raped by Mrs Minagawa as a shota would you still engage in the sunk cost fallacy hobby known as LLM's?
>>
>>101886715
No because I'd have Mrs Minagawa to take care of my needs
>>
>>101886832
蜷川様〜
>>
>>101886348
But some people will find his stuff useful anyway. Deal with him the same way, ignore if it doesn't interest you
>>
>>101886098
Everything LLMs do are hallucinations. But some hallucinations are more factual than others.
LLMs, however, have the issue of output bias. If a LLM sees "X is right" and later discovers that "X isn't right", it will most likely hallucinate an answer that makes X right, instead of recognizing its error.
>>
>>101886652
Who the fuck use whisper.cpp? WhisperX and other faster-whisper/ctranslate2 implementation are so much faster.
>>
File: 1723039897594741.png (22 KB, 872x431)
22 KB
22 KB PNG
>>101885604
>>101886076
I just got one from ebay for just over £500 and now I'll be able to get 64gb vram in a mid tower. Only downside is I can't find a cheap 3090 turbo.
>>
>>101887142
>Only downside is I can't find a cheap 3090 turbo.
Keep in mind many turbo cards aren't flow-through, meaning you can't put them right next to each other. 2080ti is like that. If the rear of the card doesn't have an opening in the PCB where the fan is, it's not flow-through.
>>
>>101884525
>>101884618
Command-R with 8k context at 5.0 bpw instead of 3.0 bpw
Mixtral 8x7B Instruct with 16k context at 5.0 bpw instead of 3.7 bpw
>>
>>101887361
Gemma 27B with a small context fits in 32GB.

I only run q8/8bpw models now.
>>
>>101886652
>as long as you have a GPU for prompt processing the bottleneck is the RAM rather than the CPU
Even running entirely on the CPU the bottleneck can be RAM. My CPU is 60% idle when running Mistral Large IQ_3XS at 0.6 t/s. RAM so slow.
>>
>>101887480
*I mean 40% idle, 60% used.
>>
>>101887142
>cheap 3090 turbo
I got mine of facebook marketplace. I doubt you can get it off ebay without paying the extra.
>>
For me it's the 3090 pro max.
>>
File: 1713717867677796.png (637 KB, 2480x3508)
637 KB
637 KB PNG
>>101887353
Yeah, the founders card is going to get suffocated but I'll try to make it work. I'm going to put a fan at the bottom of the case and undervolt everything if it gets bad but I'll max out all the fans first since the blower cards are going to get noisy anyways.
>>
>>101888077
You can try using a extender and mounting one of the cards vertically. 3-slot PCIe spacing isn't enough for non-blower 3090s, I've tried.
>>
>>101885994
Yeah, in my experience multi-line prompts do best with flux, where you define each subject individually and then add more lines for extra details, actions, etc. referencing your defined subjects as needed, rather than typing out a paragraph.
>>
File: 1722588141364645.png (542 KB, 2468x984)
542 KB
542 KB PNG
I've been considering making a desktop app to act like a search engine for my local unsorted images, without needing to set shit up like with immich or hydrus. After searching and asking around it seems that CLIP is state-of-the-art for this kind of thing and pic related are some of the models I could use. I'll probably use the openCLIP libary to create the text from the image and figure out the rest as I go.
Is my thinking sound, or there are better tools/methods to accomplish my goal?
>>
>>101888292
florence-2 is better at captioning imgs
>>
>>101888320
https://huggingface.co/microsoft/Florence-2-large
I see, so currently that's the latest and greatest? Should I use the code they have there as reference for my app?
>>
>>101880989
I don't get it
>>
>>101888434
The punchline is SEX!
HAHAHAHAHAHAHA ISN'T THAT SO FUNNY????
>>
>>101888383
The model's small enough that using whatever the default huggingface code is will probably be fine. I'm not sure if any specialized backends support it.
>>
>>101888320
>>101888460
OK, thanks a lot anon(s)
>>
>>101888434
At the time Takashi-kun didn't either.
>>
>>101888457
Sex with who? this is a child, that is not possible, it is sick
>>
>>101888493
cute
>>
>>101888493
you're supposed to insert as the child who inserts into mrs. minagawa
>>
I whipped up a quick prototype that uses whisper to listen to audio on the computer as I play youtube, and have it commentate on it. It saves a lot of time, I might not even need to watch the videos soon.
>>
>>101888292
It sounds like you're reinventing hydrus with extra steps. I think the better use of time would be to learn hydrus and use a model to tag your images then import them. That's my plan anyway.
>>
>>101888493
lmao
>>
any grok2 quants yet?
>>
>>101888810
yes
>>
>>101888672
That's what I've considered doing at first but I'd have loved if I had a program to search for an image without needing to setup hydrus and import all images. Of course making an entirely new application is more work than figuring out hydrus but maybe there are other people who would want something similar to me. Like a plug-and-play local image search engine
>>
>>101888615
insert what
>>
>>101889148
insert coin
>>
Fucking niggers of xai open-sourced the piece of shit grok-1 but wont release grok-1.5 or grok-2
>>
>>101889230
There's a delay of like 6 months
>>
>>101889245
people say this but is that actually policy or are you just extrapolating from a single data point
>>
>>101889257
The latter.
>>
>>101889230
Elon was trying to look good in front of court for his lawsuit against Altman to open source gpt4. The lawsuit died so there's no reason for him to care about open source anymore.
>>
>>101887480
>>101887531
What CPU do you have?
Mine also caps at 60% regardless of the model most of the time and I have a 7600 with 32gb@6000mt/s with optimized timings.
>>
>>101880989

The image appears to be a puzzle or riddle involving a character named Takashi-kun, who is out shopping. Here's a breakdown of the elements in the image:

Scenario: Takashi-kun is planning to buy 4 apples, each costing 70 yen, and 8 oranges, each costing 20 yen. While he is shopping, his next-door neighbor, Mrs. Minagawa, approaches him and comments, "Out on an errand? How reliable!"

Question: The image asks you to determine what time Takashi-kun will be going home today. Additionally, it provides a clue that Mrs. Minagawa hasn't been in contact with her husband for half a year.

Explanation:
The riddle is likely playing on the idea of time or a wordplay involving the cost of the fruits and the situation. Let's consider the following:

Total Cost Calculation:

Apples: 4 apples × 70 yen = 280 yen
Oranges: 8 oranges × 20 yen = 160 yen
Total = 280 yen + 160 yen = 440 yen
Mrs. Minagawa's Comment:

The comment and the note about her not being in contact with her husband for half a year could be a hint toward the answer, possibly indicating that the solution involves understanding the context of time or a hidden meaning in the scenario.
Interpretation:
The puzzle could be using the total cost of the items, 440 yen, as a clue. The number 440 could be interpreted as "4:40," indicating that Takashi-kun will go home at 4:40 PM.

However, the extra information about Mrs. Minagawa might suggest there's more to consider, but based on the calculation, the likely answer is 4:40 PM.
>>
>>101889230
>niggers
jews.
>>
>>101881835
Performance nosedives the moment you can't fit everything in vram, so yeah.
>>
>>101889288
He started a law suit again about a week ago
>>
>>101889370
Now I am even more confused
>>
File: Screenshot_4.png (937 KB, 819x747)
937 KB
937 KB PNG
>>101888493
THAT'S NOT THE MIYAGI-DO WAY
>>
File: 1636941718706.gif (3.75 MB, 520x293)
3.75 MB
3.75 MB GIF
Lmfao.

So I tried out Cohere CR+ and it's still fucking dogshit for basic 1 on 1 chats. How in the fuck does not a single model, even a monster like CR+ come close to character AI. I'm actually convinced that i'm chatting with paid jannies on their now.

And people over on the Chat AI general unironically PAY for this shit with Opus and all that garbage? LMAO
>>
>>101889658
just use novelai
it's all you need
>>
I see there's another /aids/ raid going on.
>>
>>101889409
Newfriend here, so if I buy a 3060 it will be no different from just adding 12gb of extra ram?
At least I will be able to run flux right?
>>
>>101889713
yup novelai mogs every model in erp realism still
>>
>>101889713
>paying to coom
ngmi
>>
>>101889257
The single data point + the fact that Elon had previously said he thinks 6 months is a good timeline before Grok released, and ended up doing that and hasn't given any indication he's changed his mind.
>>
I've always used base models and been pretty happy with them after designing a good system prompt. Are memetunes and merges worth checking out?
>>
>>101889850
It depends if you want to feel shivers in your spine
>>
What's it like going from 8GB of VRAM to 72GB? I'm thinking of building the triple P40 box.
>>
>>101889850
no
>>
>>101889767
Is it true or bait?
>>
>>101889874
It's indescribable. You won't regret it.
>>
>>101889874
Highly use-case dependent. If you just want vanilla ERP with a pretty straight forward waifu character then 8GB can more or less suit your needs these days. If you want to code and stranglefuck your waifu then it's probably a necessary investment.
>>
>>101889850
yes
>>
>>101889658
>he likes character ai
yikes
>>
>>101890086
>he likes models trained on a couple of /aicg/ logs
yikes
>>
>>101889874
anything that requires 72gb to fit is going to run slow as fuck on p40s anyway
>>
>>101890145
Weren't a good bit of those aicg logs FROM cai?
>>
>>101889970
It's a schizo
>>
>>101890175
No, C2 logs are mostly Opus, I don't think he's talking about pyg.
>>
>>101890170
That makes sense. For now this is my cope for waitfagging on spending thousands of dollars on 4090s.
>>
>>101889750
>Newfriend here, so if I buy a 3060 it will be no different from just adding 12gb of extra ram?
It may be slightly faster, depending on the proportion of the model held in VRAM. There are still significant performance boosts if over half of it is in VRAM, I wish I had the chart for it.

But yeah, effectively it's just extra RAM for large models. That being said, with the unfortunate size of a lot of models like Mistral-Large, that 12gb can be the diff between running them at all or not if you only have a 64gb capacity board.
>>
>>101890364
>I wish I had the chart for it.
>>
File: offload_x_performance.png (96 KB, 1536x1152)
96 KB
96 KB PNG
>>101890364
>>101890395
>>
>>101883559
Ever since Large and 3.1 we've been back don't let doomers tell you otherwise
>>
>>101890545
Now just need Altman to die and local will have won completely
>>
File: 1723483987154506.png (117 KB, 671x519)
117 KB
117 KB PNG
What's a complete retard-proof explanation of what goes on in a roleplay prompt? I assume you tell an instruct model in the system prompt that it is an anime girl and then the usual <|im_start|>user and <|im_start|>assistant talking goes on
>>
>>101890591
We wouldn't even have local without altman. And any advancement of the industry by OAI is good for local.
>>
>>101890597
That's how it's been done since the Pyg and GPT 3.5 days. I think it was just copying what Character.AI did.
>>
>>101881211
if you're incapable of buying used that's a skill issue. just verify the card works before you pay, plus multiple manufacturers still warranty used cards
>>
>>101890730
Thank you anon, I think ooba putting the character persona in <|im_start|>user was what made the models I tried talk like ChatGPT
>>
The situation must be getting desperate in /aicg/. We're getting a lot more entry level questions than usual.
>>
>>101890958
But there's not a massive amount of pedofags posting about how they saw a "loli" at the park today here so things aren't as bad as that thread yet
>>
>>101890597
Yeah it werks for me.
"You are an anime girl, her personality is x, she talks like y, her best friend is z.

Current scenario: {{user}} couldn't look away when he saw her in a bikini so she put his thing in her thing... (just summarize the story)"
>>
>>101890958
Local always wins. It's just the matter of time.
>>
>>101889658
What is character AI? Is it like JanitorAI i'm guessing?

Anyway, what's a good model for general one to one interactions? Been using Nemomix and it's fine I guess.

I have a 4090 for reference, so anything above like, Command R, I ain't gonna be able to run it
>>
>>101891613
>>101891613
>>101891613
>>
Its fucking over
>>
File: ACKCHYUALLY.jpg (136 KB, 576x768)
136 KB
136 KB JPG
>>101889874
I've built two and here are my 2c - the mikubox was a good and cheap way to get an all-in-one solution, I say "was" since there's been massive price hikes on both the P40s and T7910/T7920.
If that's the case in your area too I'd recommend looking at other alternatives. One is going for an open frame mining rig setup. You can get one for like 30-40$ and then add the other parts. This'll also get you better performance (more PCI lanes, better mobo, later CPU support etc.) as well as something more future proof since the mikubox is mainly designed to take compute cards with rear power connectors.
Most consumer GPUs with top-side versions don't fit unless you run it without the side panel. You're also stuck with 2-slotters which further limits what cards you can use. And even if you find ones that fit it'll be very tight causing their side-fan design to struggle with cooling.
Beyond that other types of cards like the mi100 32gb and A4000 have dropped in prices and might be better options depending on how tech savvy you are and if you intend to train or not.
There's also the dark horse Intel that's no longer a *complete* joke when it comes to running inference, used ARC 770 16GB can be found for cheap and four would net you 64GB VRAM (but require another type of rig). This is all assuming they'll continue improving with their next generation of cards and don't drop support for the previous ones. Or otherwise fuck things up like they often do.

As for if it's worth it going for more VRAM - yes, very much so. As that other anon mentioned 70B models might be too slow for your liking but running smaller ones in the 20 to 50b range 8-bit is a very noticeable improvement in output quality. Especially if you use one for code completion.

TLDR: Compare costs part-by-part carefully, there might be cheaper/more performant/future-proof options now.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.