[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102565822 & >>102557546

►News
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
The west has fallen. Qwen won. Recap is spam now. Sell your GPUs before the second-hand market collapses. It's over for real this time, boys.
>>
►Recent Highlights from the Previous Thread: >>102565822

--Papers:
>102572967 >102573170
--Mistral.rs, a Rust implementation of Llamat:
>102567470 >102567493 >102567499 >102567500 >102567597
--Llama 3.2 and uncensored VLM potential discussion:
>102565950 >102566013 >102566046 >102569036 >102570107
--LLaMA-3.2 quantization evaluation GitHub discussion:
>102568271
--Importance of multiple aspects of model quality beyond just generation and long-term memory:
>102569388 >102569415 >102569427
--Discussion on syncing server.cpp versions between llama.cpp and ollama:
>102566301
--Qwen2.5 32B uncensor finetune on Hugging Face:
>102570128 >102570176 >102571913
--Qwen 72b and GPT-4 succeed in pyqtgraph plot coding challenge:
>102569500 >102572930 >102573043
--Mistral small works better with one message prompt:
>102566201 >102566246 >102566312 >102566352 >102566421 >102566466
--Discussion of NoCha leaderboard results and long context benchmarking:
>102568781 >102568884 >102568990 >102569030 >102569068 >102569120 >102571029 >102571424 >102571473 >102571098 >102568954 >102568977 >102568982 >102569005 >102569074 >102569718 >102569832 >102569931 >102569118 >102569275 >102569326 >102570952 >102571213 >102571298 >102571508
--90B model struggles with writing, 3.1 tunes show promise with high temp:
>102568773 >102568851 >102568862
--RTX 3090 with 32GB VRAM announced, kopite7kimi's credibility questioned:
>102565941 >102566240 >102567292 >102567720 >102570733 >102570840
--Nemo generates a joke that qwen2.5 could never do:
>102570058 >102570171
--Llama 3.2 90B performance and impact of quantization discussed:
>102567549 >102567822 >102567871 >102567878 >102568430 >102568602
--Adjusting samplers and settings for non-sloppy results on L3.x models:
>102567577 >102567600 >102568173
--Miku (free space):
>102566349 >102570348 >102570537

►Recent Highlight Posts from the Previous Thread: >>102565835

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
so.. what's up?
>>
I thought OpenAI was over after the departures.
But I have to admit, it was impressive how Sam was able to spin it.
>>
Are there any small models specialized in generating SD prompts? Especially with booru tag support?
I tried putting a big list of tags and a bunch of examples in llama 3B and Qwen 3B but the outputs are mostly ass.

>>102573443
Is this a Fate/Zero reference????
>>
>>102573371
this general is a spin-off of /aicg/. it has always been about chatting and (erotic) roleplay. stop complaining about the meta and post your best loli logs, newfag.
>>
>>102573457
>this general is a spin-off of /aicg/
I'm actually happy somebody remembers that.
>>102573386
>Recap is spam now
/aicg/ here and I miss your recapanons. Pls come back or somebody pick up the mantle.
>>
>>102573457
Calm down ranjesh
>>
>>102573502
post your hand
>>
reposting for new thread
>chatting with ai, using a variation of my name for {{user}}
>she calls me anon in the middle of her orgasm
what did she mean by this

>>102573118
no, it is , but for whatever reason (could be the card) she keeps adding "Note: this scenario includes offensive and blah blah" kind of statements
>>102573333
vaginal sex in the missionary position for the purposes of procreation
>>
>>102573450
I know of this one, but i never used it.
>https://huggingface.co/teknium/SD-PrompTune-v1
It's old and i have no idea if it's any good. 7B, based on mistral 0.1 i think.
Then there's a bunch of tiny models, but they just add noise, mostly. May be worth a try to get some random styles, i suppose.
>https://huggingface.co/Gustavosta/MagicPrompt-Stable-Diffusion
>https://huggingface.co/AUTOMATIC/promptgen-majinai-safe
>https://huggingface.co/AUTOMATIC/promptgen-majinai-unsafe
>https://huggingface.co/AUTOMATIC/promptgen-lexart
Those are very old gpt models with like 160m params.
>>
>>102573387
>RTX 3090 with 32GB VRAM
Never post a recap again.
>>
>>102573387
I don't get this recap, it doesn't have a double > on the posts so you can't do anything about it
>>
>>102573528
>Never post a recap again.
that's my fault, I really wrote that shit on the previous thread, fucking typo :(
>>
>>102573450
Not explicitly relevant but I'm using Florence-2-large-PromptGen-v1.5
>>
>>102573530
I think the internet may be too advanced for you.
>>
>>102573530
We did a few a/b tests and found that people were more satisfied with their engagement of recaps when they had to do a quick ctrl+f to navigate to their chosen post numbers of interest instead of just having a mess of links to click. It's hypothesized it could be the act of manually putting in a little effort gives anons a sense of ownership over their role in reading the recapped post. Alternatively it could be that they have hover previews enabled and get distracted by other posts popping up as they try to get their cursor to reach a post that interested them.
>>
>>102573386
>Recap is spam now
What happened to recap? It was good for a long time.
>>
>>102573564
>>102573567
wtf this is an insane level of shitpost, I kneel ;_;
>>
>>102573567
How the fuck do they know what they're going to without the hover?
Might as well just read the old thread.
>>
>>102573567
this post was written by AI wasnt it
>>
>>102573443
Who the fuck decided this was the stage for an interview?
>>
File: 1619789483565.png (643 KB, 1552x1171)
643 KB
643 KB PNG
>>102573562
But it's more tuned for the plain text prompts.
>>
File: 1696169299901587.png (1 KB, 21x25)
1 KB
1 KB PNG
>>102573567
>MOOOM! I POSTED AI GENERATED REPLY AGAIN!! GIMME MUH TENDIES NAOW!!!!!
>>
>>102573527
Thanks, I already knew promptgen but IIRC it doesn't do booru tags unfortunately, The others don't seem to either.
I guess I'll try tardwrangling harder and I'll have to do a bit of scripting to fix the prompts.

>>102573562
Thanks, not exactly what I'm after but it might be useful. I guess it could try feeding noise into a WD-tagger model with a low threshold and see what comes out.
>>
>>102573605
It should have spun much much faster.
>>
How much better is Mistral Large over Llama 3.1 70B? It's too slow for me
>>
So has anyone figured out an easy way to test out the multimodality in llama 3.2?
>>
>>102573694
Lmarena?
>>
>>102573017
If you're offloading a ton of layers to CPU (which you are if you're running Mistral Large with only 32GB vram) then getting the 5090 isn't going to do shit for your speed you fucking techlet retard
>>
Is it possible to use AMD+Nvidia at the same time with Scale?
>>
>>102573802
because offloading to cpu is obviously the only option in the context instead of multi gpu, and i'm the techlet retard. Before you say something even more retarded like it runs at the speed of the slowest gpu that's not true either. It runs at a speed in between.
>>
>>102573904
If you're going to buy two 5090s you are a fraction of a fraction of an already small market, and your preferences don't matter in a discussion about whether it's a good product
>>
>>102573904
nta but 3 x 3090 would give more vram at a lower tdp than 2 x 5090
either setup would generate tokens faster than the average reading speed (if whole model is on GPU), so the memory on the 5090 being faster would not matter
>>
>>102573530
>can't do anything
you can read the numbers, reading numbers empowers your brain
>>
>>102573973
4x 3090 gets 10t/s on large q4. that's kinda slow and stat cards have a lot of extra stuff.
>>
>>102574019
5090 have no nvlink which is important if you split by row
>>
How many floppy things is the 5090 supposed to have? 600 watts is insane. So there better be lots of flops
>>
Do any anons have Google's notebookLM locally running? Doesn't have to be the same exact model but would be interested in setting it up locally to minimize the processing time.
>>
kind of crazy to think about how ai is a solved science and with a couple more gens of nvidia chips and a few years of datacenter and power infra expanding we will be able to just use the current algorithms to create agi
>>
>>102574092
buy an ad Jensen.
>>
>>102574092
LLMs cannot think brotha
>>
>>102574092
I can't wait until they power up Three Mile Island, turn on their new datacenter and train a 10TB gigamodel, only to discover it plateaus at exactly the same "slightly better than GPT-4" level we've been stuck at for a year and a half now
>>
File: FLAMING HOT COCK.png (2.57 MB, 2391x720)
2.57 MB
2.57 MB PNG
>https://huggingface.co/BAAI/Emu3-Gen
>https://huggingface.co/BAAI/Emu3-Chat

Multimodal Llama architecture LLM with native image input/output (video is supported, but looks pretty bad).
No diffusion in the image generation process whatsoever. Next token prediction for all modalities.
Apache 2.0, fully open license. 8b params. The layout is literally just Llama3 8b but fully multimodal.
>>
>>102574106
>8b params.
got me excited until I read this
>>
>>102574106
>No diffusion in the image generation process whatsoever. Next token prediction for all modalities.
Dumb this down for me why is this noteworthy?
>>
llama 3.2 3B is sentient.
>>
>>102574106
we already have anole finetune of chameleon, do this on a 70b and you'll have my attention
>>
>>102574115
8B is way bigger than Stable Diffusion 1.5, and slightly bigger than SDXL
>>
>>102574136
>stable diffusion
lol
>>
>>102574127
qwen 2.5 0.5B is a cat
>>
>>102574117
Normal image generation models start from random noise and keep denoising to iteratively create an image (the diffusion process). This predicts in patches sequentially like how text is predicted in conventional LLMs

TLDR; it's one big ass model that treats both modalities the same way
>>
>>102574141
point is it's more than enough parameters for image generation, retard
>>
>>102574106
Woah cool

>>102574117
Diffusion is slow to train, slow to infer
Token prediction is ezpz
>>
>>102574127
>>102574154
and you are braindead.
>>
Emu3 gguf support, when?
>>
>>102574192
lol
>>
>>102574161
tell that to my sub 1 t/s
>>
So if Emu can make images with linear token prediction does that mean multi-gpu inferencing would be as simple as it is for textgen?
>>
>>102574160
but a fraction of what we already have from other imagegen architectures, retard
>>
>>102574106
>no online demo
Fuck, I don't want to mess with transformers shit.
>>
>>102574339
Things never improve. We got llama1-7b and we've been stuck ever since. We only ever get 7b models. Nobody releases experimental models. It's all 7bs...
>>
>>102574339
wdym? Flux is only 12B, that's 50% more yeah but not a big a difference as you're trying to imply
"a fraction" lol. I guess 66% is technically a fraction. seriously though, why are you telling weird lies?
>>
>admits it's a fraction
>still tries to say it's a lie
nobody gives a fuck about your shitty model anon
>>
>>102574160
there's a difference between "enough parameters for image generation" and "enough parameters for good image generation" especially when image and text are crammed into the same semantic space
need more B, simple as
>>
Nobody here can really say what a lot of parameters is for emu because this is the first linear token prediction imagegen model we've ever gotten our hands on.
>>
>still coping
don't care, not gonna download your chinese slop model
>>
>>102574449
>Nobody here can really say what a lot of parameters is for emu because this is the first linear token prediction imagegen model we've ever gotten our hands on.
We've had it for months
https://github.com/GAIR-NLP/anole
>>
>>102574429
You're seriously going with 'the word fraction is technically correct because 8 is 66% of 12"? That's actually the hill you're gonna die on?
>>
>you're seriously gonna say 2/3 is a fraction?
>*dilates*
yes.
>>
>>102574526
thanks. what's your home address?
>>
>>102574429
pussy
>>
>>102574429
ignore it then?
>>
look how mad he is
>>
>>102574499
There was this as well. It was more focused on doing straight text-to-image tasks, it still used image tokens in an LLM:
https://github.com/Alpha-VLLM/Lumina-mGPT

Both of those based on Chameleon which latently had these capabilities but had its head chopped off by Zuck for the open source release because it's too dangerous. There's the 34b sitting around that never got the treatment the 7b did... which I guess makes sense because nobody wants to support quanting this shit so no one would be able to run anything much bigger.
>>
>irrelevant drama №5469457647539767594
*yawn*
>>
All drama posts are sam altman until proven otherwise.
>>
>>102574655
it can't be he's too busy spinning
>>
sneed
>>
I am sam altman
>>
How long until LCCP emu support?
The video gen looks on par with CogVidX
>>
>>102574669
he's a billionaire, he can afford to spin and ruin /lmg/ at the same time.
>>
>>102573443
Why are they spinning??
>>
holy shit you guys flash attention just finished building now I can finally try emu
>>
>>102574698
:O
>>
>>102574698
hurry up nerd
>>
>>102574713
It needs to download the weights first, give me a break.
>>
>>102574655
I'm sure sam doesn't even know this thread exists.
>>
Anything cool happening?
>>
>>102574735
i'm masturbating to vtubers
>>
>>102574735
yeah >>102574106 >>102573443
>>
>>102574735
there's a chink who doesn't know what fractions are coping and seething about his tiny imagegen model but otherwise no
>>
>>102574761
keep farting
>>
>>102574735
Yes, we have two fags samefagging right now.
>>
File: Applying thermal paste.png (535 KB, 638x638)
535 KB
535 KB PNG
>>102574750
Gay
>>102574755
Is there any stuff on the horizon though?
I want to know if something cool is going to drop before the end of 2024. I want an local uncensored Claude sonnet/GPT4 equivalent running on my shitty GPU by 2025.
>>
File: average local llm.png (158 KB, 833x534)
158 KB
158 KB PNG
>>102574813
>I want an local uncensored Claude sonnet/GPT4
This will never happen.
>>
>>102574813
>local uncensored Claude sonnet/GPT4 equivalent
>running on my shitty GPU
>by 2025
pick any two
>>
>>102574813
>Is there any stuff on the horizon though?
in the llm space, nothing that i know of, but in the image gen space the pixart team said their model is almost done training, so that's pretty exciting. highly doubt it'll beat flux, even they themselves said it won't be a flux killer but i expect it to be aesthetically superior and easier to finetune.
>>
>>102574080
the memory is 40% faster, that's what's important.
>>
I think I'm running the default example for emu right now...but they neglected to add any kind of progress indicator to the code. max new tokens is 40960 by default... so we could be here a while.
>>
>>102574876
hurry UP nerd
>>
>chinkshit transformers model sucks
new here?
>>
>>102574876
is it done yet?
>>
File: 00091-1652089724.png (1.15 MB, 1024x1024)
1.15 MB
1.15 MB PNG
>>102574834
The future is now old man.
>>102574840
I can always pony up some cash for a 5000 series once they start getting those to market. I'll assume they'll have a card specifically designed for AI stuff. The future is looking bright. My 4070 comes with 12 gigs of VRAM which is pretty decent for local image gen. If I could get a card with 32 or 64 gigs of VRAM I imagine I could run some pretty beefy models.
>>102574846
I'm still playing with A1111. Pumping out plump robot girls is always fun.
>>
>>102574890
Nah. my VRAM usage keeps going up, so presumably it's doing something. It's also only cruising along at 307W, so it's clearly failing to use all of the compute available to it.
>>
>>102574910
>A1111
obsolete. i use comfyui but i think reForge is the goto a1111 replacement now. post another gen i liked that one
>>
>>102574087
what are you talking about anon.
did you listen to the gens?
this is above gpt advanced audio. maybe like it would be if it would be uncucked idk.
the only problem is halucination but the audio is definitely past the valley.
you can enjoy such great tools like fishspeech or xtts2 locally and suffer until we get something better.
>>
Do you think Sam Altman has a personal trillion parameter model based on gay bear personality and vaguely posts on 4chan to groom potential cute gay bears by giving them a ChatGPT key?
>>
aaand it oomed.
>>
>>102574910
>I can always pony up some cash for a 5000 series once they start getting those to market. I'll assume they'll have a card specifically designed for AI stuff. The future is looking bright. My 4070 comes with 12 gigs of VRAM which is pretty decent for local image gen. If I could get a card with 32 or 64 gigs of VRAM I imagine I could run some pretty beefy models.
I like your enthusiasm and optimism!
You won't find much of anything that will run on 64gb in the Claude/GPT4 class.
Largestral is probably your closest equivalent on a reasonable number of GPUs, and you're looking at needing more like 128gb+ to run at a decent quant.
If you really want smart-as-cloud-offerings SOTA local at home you'll need to scrounge up 400-500gb
Unfortuantely cards "specifically designed for AI stuff" are big cash-cows for nvidia and are in the $40-80k price range (for 80gb...still need a bunch!)
Check the lmg build guide in the OP for more details.
>>
>>102574980
pecker is altman confirmed
>>
>>102574982
wow. some nerd you are.
>>
>>102575014
It's my fault, I started the script before I noticed the device map was set to only use cuda:0
>>
>>102575035
retard
>>
>>102575035
Huh, how much vram does it need?
>>
>>102575058
Evidently more than 6GB, but I'll keep trying
>>
File: 00009-1146345299.png (1.38 MB, 1024x1024)
1.38 MB
1.38 MB PNG
>>102574941
>obosolete
I don't want to learn a new system. I would assume I need to download new checkpoints, loras, and shit like that. I'll wait until 2025 before I delve into the newest toy. I'll let everyone else work on stuff and hopefully, some higher-speed and lower-drag toys emerge in that time. Have some Gerudo and Zelda blend.
>>102575000
As noted above, I'm counting on some tech advancements to either make LLMs more efficient resource-wise or for the price AI-focused GPUs to be focused on which will make them more efficient. Maybe both.

I would love to believe everything will be figured out by 2025 but that's a little optimistic even by my standards. 2026~7 We'll be eating good I think if the current rate of development is any indication.
>>
>>102575035
how about you set it to cuda:start_working, nerd.
>>
>>102575074
I set it to auto, because I'm not nerdy enough to make a custom device map, so now it's split across 4 GPUs and will probably take an eternity since transformers is garbage at utilizing multiple gpus.
>>
>5080
>16 GB
What the fuck is this shit?
>>
>>102575070
>I don't want to learn a new system.
no it's pretty much the exact same as a1111, it's a fork, u good. lick.
>>
>>102575089
Consequences of a monopoly.
>>
File: bottleneck.png (57 KB, 740x379)
57 KB
57 KB PNG
now that's a PCIE bottleneck if I've ever seen one.
>>
>>102575070
Forge is almost the same. (I haven't tried ReForge) I put my models and Loras in one place, then symbolic link the directories into A1111, Forge, Comfy, etc all point to the same collection. Though I've heard that Flux doesn't work on A1111 or Forge and does on Comfy, that's just a compatibility and update thing.
>>
>>102575070
give her armpit hair
>>
File: 00012-1289042996.png (1.5 MB, 1024x1024)
1.5 MB
1.5 MB PNG
>>102575092
>>102575123
What makes the UI change so important? What makes forge or comfy UI superior?

There are also a number of extensions that make it easy for me to use tags and shit. Do they come prepackaged with a lot of those QoL features?
>>
>>102575164
>What makes the UI change so important?
mainly performance and extra model support, a1111 is a mess and doesn't get updated much so it has shit performance. i think it uses way more vram too due to memory leaks but that might not be the case anymore.
>>
generation failed again. Everything was going fine and then memory usage on GPU 0 just skyrocketed and gave me an OOM condition. And that was with max new tokens reduced to 8192.
>>
>emu3 video creation is bad
i-is it though? this is 8b right? doesnt that look great for something small and local? the examples look good.
https://emu.baai.ac.cn/about
hope the nerds make this retard proof and runable soon.
>>
>>102574680
>LCCP
mistral.rs won
>>
>>102575209
>failed again
just like everything else in your life you stupid nerd. NERD!
>>
>r*st
no thanks.
>>
>>102575216
>they have a contact e-mail
>>102575209
What if... you contacted them to get help/clarification kek?
>>
>>102575164
Forge implemented some optimizations that really helped slower cards.
ComfyUI breaks up the whole process into a messy flow chart kind of system that lets you control exactly what happens rather than accepting the A1111/Forge way. For just making images it's overkill but for really using it like a tool, Comfy lets you work under the hood.
>>
>>102575245
That's no fun.
>>
File: 00134-536089391.png (1.29 MB, 1056x1056)
1.29 MB
1.29 MB PNG
>>102575159
https://catbox.moe/c/yzibzj
I have some Jessie with armpit hair? It's not one of my main fetishes.
>>102575202
>>102575248
I guess I can find some time to fiddle with those tools. If I can't figure it out though, I'm going to get really flustered and violently shitpost to vent my mostly impotent rage.

Thanks for the input.
>>
>>102575100
>5090 Ti
>5090 32 GB (you are here)
>5080 Super Ti
>5080 Ti
>5080 Super
>5080 20 GB
>5080 16 GB (you are here)
Thanks for another shitty product stack, Jensen.
>>
>>102575276
i would have thought its like 3d cards once gaming became popular.
why are there no dedicated ai cards? is there really nobody that can produce them? 2 years have soon passed since chatgpt.
>>
Llama3.2 support wen
>>
It's the same as 3.1 for text
>>
>>102575297
Two more years until tensortorrent is competitive
trust the plan
>>
>>102575320
>tensortorrent
wtf is this.
the page looks almost exactly like those crypto scam sites lol
latest news:
>Tenstorrent and Movellus Form Strategic Engagement for Next-Generation Chiplet-Based AI and HPC Solutions
w-wow, gotta invest i guess lol
>>
>>102575372
Their BTB not a consumer company any time soon.
>>
File: 1709439397219945.png (274 KB, 521x628)
274 KB
274 KB PNG
*tenstorrent sorry

>>102575372
Anon I...
>>
>>102575276
>>102575297
>The RTX 5090 is said to have a 600-watt spec
Ok, now i see this.
Like is this a fucking joke? That must be on purpose to fuck local fags over. Must be.
>>
>>102575407
The H100 PCIE version is spec'd to 600W.
So the 5090 should have just as much compute r-right?
>>
>>102575393
You can literally order their cards right now.
Just don't expect to run anything on them out of the box.
>>
>>102575442
hmmm
https://tenstorrent.com/hardware/tt-loudbox
Would be nice if it worked out of the box...
>>
File: 1709766058948384.png (12 KB, 406x131)
12 KB
12 KB PNG
>>102575472
lol, lmao even
>>
>>102575488
maybe one day, next gen for sure.
>>
>>102575442
>>102575472
Wait things have changed a lot since the last time I looked at what they're doing. Apparently they have a compiler that has good support for major frameworks.
https://github.com/tenstorrent/tt-buda

I'll try to see if I can find anyone actually using this.
>>
Yeah there's just on way to run emu on mortal hardware. Dead on arrival.
You can split the weights across multiple devices but it wants to run the actual generation on gpu 0 only it seems. And even still, given that it's taking me several minutes to hit the OOM condition this shit is slow as fuck.
>>
>>102575407
hw udervolt - problem solved
>>
>>102575089
$5080
>>
>>102575673
Its 600 fucking watt anon.
My 12gb pascal card is 250 and i can go until maybe to 180.
Even if you have around 450 watt usage with tinkering thats crazy. I think most people wont be able to have 3 cards because the breaker will hit.
There is no justification for this and you know it. And thats not even talking about the fucked up price this thing is gonna have for sure.
>>
>>102575733
don't forget that 600 watts is only momentary usage and will melt connectors if it hits that for more than a few seconds
>>
What the fuck does a desktop GPU cooling solution capable of dissipating 600W even look like? Either it's going to sound like a jet engine or take up 4 PCIE slots.
>>
>>102575733
The performance per watt is non-linear, and the inference does not require much compute. I can make my 450W 3090Ti run at under 200W without decrease in t/s.
>>
>>102573672
At least 0.283
>>
File: b27.jpg (41 KB, 798x644)
41 KB
41 KB JPG
yo wtf how big is NousResearch/Hermes-3-Llama-3.1-405B
>i just wanted to run an uncensored model
>>
>>102575868
FE will be two slots, probably relying heavily on case fans for cooling. After all, you'd need to blow out that 600W from the case anyway.
>>
>>102575673
Thank you saar 0.1$ has been wired to paypal continue to do the needful thank you long live nvidia
>>
File: 1718996997598312.jpg (154 KB, 960x1280)
154 KB
154 KB JPG
>>
>>102575976
LMAO
as if that garbage is worth preserving
notice there's no names on it as contributors, they're fucking EMBARRASSED to be involved
>>
hmmm,trying out Mistral-Small-22B-ArliAI-RPMax-v1.1-Q4_K_M.
does this has a repetition problem?
is there still no other finetune than Cydonia-22B-v1-Q4_K_M.gguf thats good?
i tried Acolyte-22B.Q4_K_M.gguf and that was totally overcooked.
>>
Is there anything that will fit in 8gb vram and do any kind of useful coding or assistant stuff at all?
>>
>>102576245
Qwen2.5 7B is alright, although I had to disable autocompletion because it was more in the way than helpful.
>>
>>102576245
>useful coding
>8B
Open your browser and get a subscription for Sonnet 3.5, that's what you can do with that VRAM.
>>
>>102576263
Did you use the coding one, or regular one?
>>
>>102576273
Both, haven't used them enough to see any major differences
>>
5090 verdict? Seems like a good deal especially since I don't buy used on principle. Combined with my current 4070Ti it will be a big VRAM boost, but probably gonna have to get a new PSU and undervolt them.
>>
>>102576298
If it has 32GB that'd be enough to get at least 2T/s with a 70b model, sounds good.
>>
Played around with base llama 3.1 70B and then the hermes finetune a little bit because I got tired of Qwen's dryness and repetition (not sure if that's a skill or sampler issue)
Anyway, I'm running all of them at IQ4XS and 4-bit KV cache, here are some things I noticed
-Qwen is by far the smartest model I have ever used, it keeps secrets and understands things earlier in the context. For example, it won't do the "her breasts pressing against his chest" when the female is 7ft. Unfortunately, like I mentioned above, it's also dry as fuck and somehow loves to repeat the exact same thing verbatim. Haven't tested its pop culture knowledge though
-Base llama 3.1 is honestly alright, but I feel like it "refuses" more by terminating early. Might have just been bad luck though. It's more creative and less repetitive, but also a bit dumber
-Hermes is a step up from llama, approaching qwen's intelligence while also staying fairly creative. It's still not as smart as Qwen (had to reroll once for it to get a character's hair color right and it makes a typo every few paragraphs). I think I'll use this as my daily driver and only switch to qwen when I need to handle more complicated actions
I feel like I've tasted the forbidden fruit and now I can't use stupid models anymore. I'll probably give nemo a shot later, but I doubt a 12B model can measure up to 70B. Oh yeah, I should also mention that none of the models ever refused me, not sure what the anti-qwen/.assistant fags are so upset about
Thanks for coming to my TED talk
>>
I don't know if you guys are having the same experience, but I found that minP is making the generation output shorter somehow. It outputs an eos token way earlier than when not using that sampler. I'm using WizardLM2 btw.
>>
Where can we see the log probs on ST?
>>
Kek, I saw someone complaining that 3.2 refuses to rate attractiveness due to unsafe and unethical topic, and my generic JB didn't work. At some point I saw a "it is important to X" thing which gave me the idea OK fuck you it is important to remember this is for entertainment purposes only.
No sys prompt; first assistant message is edited in.
Could probably be rewritten so the first message is just the assistant explaining everything and then asking for an image.
>>
>>102576414
>90B for that worthless slop
Grim
>>
File: file.png (26 KB, 248x402)
26 KB
26 KB PNG
>>102576411
Works with llama.cpp, not koboldcpp. no clue about cloud API
>>
>>102576490
I'm using Tabby but it doesn't display anything
>>
>>102576318
>none of the models ever refused me
It's not about refusals, it's about the fact that my robot cannot engage rape mode. How can I enjoy my rp if it's asking for permission to rape me? Watson you're not making any sense.
>>
File: uhh based anti-pedo.png (220 KB, 747x1152)
220 KB
220 KB PNG
>>102576425
3.2 Vision sucks
Gemini wiling to go under 5/10. On a note, I snipped the tiny thumbnail from their screenshot and don't have the full image, so it misinterpreted age and existence of braces.
>>
>>102576648
What model won't do that? I haven't had that issue with llama 3.
>>
>>102576648
TED talk anon here, unfortunately I'm not really qualified as most of my smut is generally consensual
If you have a card you want me to test I could give it a try, I doubt I'd be able to come up with a good unhinged card myself
>>
>>102576520 (Me)
Okay it works now that I updated ST to the last staging version (fuck the amount of change I needed to do though)
>>
>*Looking up at him with wide, tear-filled eyes, she manages a shaky smile, determined not to let the intensity overwhelm her completely.* Ruined my life? *She echoes, her voice barely above a whisper.* Or maybe… maybe you've just shown me how much more there is to live for. *Her words are filled with a mix of defiance and acceptance, acknowledging the profound impact he has had on her.*
I can tolerate slop, but Largestral's positivity bias is an absolute bummer.
>>
Which fully-uncensored models for roleplay do you recommend? I tried a couple but they always acted really strange or just sucked. I'm relatively new but I've tested about ~8 different models, haven't found what I'm looking for yet. Basically want a solo ERP model.

16gb VRAM btw.
>>
>>102576892
Skill issue
>>
File: file.png (338 KB, 800x688)
338 KB
338 KB PNG
EU bros, our response?
>>
>>102577159
>eu regulation: don't train on personal data without consent, don't use it to spy on people
>california (and soon usa) regulation: don't train at all unless your name is dario or sam
>>
>>102577159
Call me naive but I skimmed through some of the key points and it honestly didn't look that bad last time
The only thing that might be worrying is "exceptions for law enforcement" when it comes to biometric identification, but other than that I don't see the problem
The US should really adopt the "AI generated content must be marked as such" rule though, those niggas are getting more retarded by the day
>>
>>102577207
>"AI generated content must be marked as such"
How do you even enforce this?
>>
>>102577236
any service that generates ai content without labelling it gets v&
obviously you can't enforce it for local
>>
>>102577236
A disclaimer on the service's website, that's enough imo
I don't really want watermarks on my content... but then again, simply adding disclaimers won't make people more vigilant. Oh well
>>
>>102576901
They all suck
>>
>>102577178
>>102577207
cope
>>
>>102574686
>he's a billionaire
Unlikely, that's why he wants to privatize OpenAI.
>>
File: KTO.png (4 KB, 222x36)
4 KB
4 KB PNG
>>102576901
I am also wondering if I can get anything better than this except I am at 24 gigs.
>>
File: file.png (22 KB, 946x759)
22 KB
22 KB PNG
HOLY

https://huggingface.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>>
File: file.png (74 KB, 1011x730)
74 KB
74 KB PNG
>>102577489
8b only btw
>>
File: file.png (55 KB, 724x312)
55 KB
55 KB PNG
>>102577502
>>102577489
local won (china also)
>>
>>102577489
read the fucking thread retard
>>
File: 1714004454702408.jpg (1.47 MB, 1297x1490)
1.47 MB
1.47 MB JPG
>>102577525
A model that simply predicts tokens for video will never become a "world model" that has genuine understanding of reality.
>>
>>102577489
>>102577502
>>102577525
>>
>>102577489
>no audio
moat is still there, nothingburger
>>
>>102577556
where else am i supposed to find new model? on /g/?

lmao
>>
>>102577571
dilate
>>
>>102577733
erode
>>
File: file.png (96 KB, 1246x579)
96 KB
96 KB PNG
Anthracite will save /lmg/
>>
>>102577748
>405b
>new 72b
>new 32b
>new 22b
>new 12b
they won
>>
>>102577748
>wandb
cool
>>
>even shit model now have more than 100k context
Is it time to ditch the tokenizer finally?
>>
>>102577785
>100K is enough
Sweatie, people can use LLM for more than ERP
>>
>>102577748
Stop sucking Claude's stinky dick and start doing something original.
>>
>>102577389
How good is this on a scale of 1-10? My experience has been a solid 2-3 so far, which is quite tragic.
>>
>>102577864
>finetune/merge of a year old model
i don't know what you were expecting
>>
The Chinese are fucking disgusting. I was trying to generate a story with Qwen 2.5 72B and a character shat then wiped their ass on a towel.
>>
>>102577875
I meant my experience in general with other models off of Hugginface, was curious if that anon's experience was any better with that model he posted.
>>
>>102577748
A bunch of ESLs with an uncurated dataset will surely do something worthwhile this time.
>>
>>102577817
Claude's dick receives more polishing than one punch man's dome. It has to be the cleanest surface on the face of the earth at this point.
>>
>>102577817
They should erp with each other and train on that
>>
>>102575089
16 GB is more than enough for gaming. Nvidia is never going to make consumer AI cards.
>>
>>102577936
That could unironically bring better and more unique results, if methodically done. They have what, 33 people in the organization? It shouldn't take too much to create a decently-sized human dataset if they all participate.
>>
>>102575089
But whatever is on the VRAM will be twice faster (ignore the 8gb of fat spilling into your RAM)
>>
What is the number of base models released since last coom quality upgrade? +10 already?
>>
File: wp11507201.jpg (303 KB, 2048x1261)
303 KB
303 KB JPG
She would save local chads?
>>
I still haven't found anything better than Fimbulvetr, especially for its size.
>>
>>102578226
Isn't it that 4k ctx model? Have you tried running new models at 4k ctx?
>>
File: out.jpg (93 KB, 911x562)
93 KB
93 KB JPG
what should i get rid of from this list? i need room formore models like those 40gb ggufs
>>
>>102578353
everything except midnight miqu
>>
>>102578353
nothing except midnight miqu
>>
>>102578353
Step 1: Delete all the Sao models
Step 2: Buy an ad
>>
So how does one run molmo right now?
>>
File: cat come on now.png (656 KB, 616x612)
656 KB
656 KB PNG
>>102578367
>>102578389
>>
>>102578353
delete big tiger, celeste, rocinante, qwen, theia
>>
>>102578353
I'd keep Midnight-Miqu, llava, nemo-instruct, and lyra-v4.
>>
L3.1-70B-Hanami is the smartest that is decent at smut so I use that. Magnum v2 72B has better dialogue but is a good deal dumber.
>>
>>102578440
>>102578480
thx
>>
File: 1723453767290925.jpg (80 KB, 940x1024)
80 KB
80 KB JPG
>>102573387
Anons, if this isnt a solvable problem for you, you are too stupid to participate in this general.

This is filter 1.

Yes its annoying I have to press a single button now, but also, I only have to press a single button and its solved, so there we are.
>>
>>102573383
anybody making their own models or are people just using it plug and play style?
>>
>>102578778
I'm trying to make my own models but I always regret my life choices
>>
>>102577817
one thing I don't get about them is they use regular old claude-generated instruct data
claude RP is fine, claude is good at RP. but when claude is doing regular instruct it sounds the exact fucking same as every other model
why focus on claude for that
>>
>>102578778
Finetune, I was actually going to start working on my own vision model after the mistral doom hack, but then zucc cooked so here we are, finetune again I guess.
>>
A big one is dropping soon
>>
>>102578930
kiwiberrystar?
>>
File: file.png (13 KB, 1295x108)
13 KB
13 KB PNG
>>102578930
HUGE!
>>
>>102578821
we are sloptuners xir, how are we supposed to tune without slop
>>
>>102578974
>magnumslop
>>
>>102578974
CLAUDE
AT
HOME
>>
Man, I just want a 12B that reasonably smart and has the personality of old C.ai. That's it.
>>
>>102579075
and i want an unicorn maid girl
>>
I will vote for any politician that promises an open weights single consumer gpu cooming bot that will satisfy my sexual needs. I want my tax money to go to a good cause.
>>
Llama 4 will come in 3 sizes
>0.5B
>70B
>1T
>>
>>102579275
I mean... you *do* have at least 48GB of VRAM now, right?
>>
>>102579275
but the 0.5B will be as good as 3.2 1.5B so it won't matter.
>>
>>102579275
You mean 0.5B, 1B and 1T, Lecunny said he liked his models small and open, like his gfs
>>
>>102577748
hurry up and put that 12b on hf, fuckers
>>
Here's hoping Moore Threads makes some decently high VRAM accelerators so that we can at least run 70B class models at at least Q6 at decent speeds with a single card for cheap.
I doubt that that's going to happen. But one can dream.
>>
>>102579358
Why are leftists like this
>>
>>102579412
Preferably single slot as well
>>
>>102579588
And only PCIe connector-powered.
>>
>>102577502
>8b only
for an image generator that's huge, it's the size of SD3
>>
>>102579713
And only needs passive cooling.
>>
>>102579584
It's almost like that Anon is being facetious.
>>
>>102577489
>>102577539
This.
I wasted an hour last night trying to get it to work.
>It's piss slow on consumer hardware
>The compute buffer takes up an entire 3090 worth of VRAM (possibly more. hard to say because of OOM)
>Inferencing code doesn't handle multi-gpus- wants to do all the compute on a single GPU regardless of where you offload the weights to.
It's utterly fucking worthless unless you have an H100.
>>
>>102579916
>The compute buffer takes up an entire 3090 worth of VRAM
the model is a fp32 one, did you run it as it is or you quant to fp16
>>
>>102579713
I don't mind the power connectors, it's just that the faggots who designed my motherboard put the higher speed slot at the bottom, meaning I can only fit a single-slot card
Though I guess if they have enough vram then it doesn't matter, once it's loaded it's loaded
>>
>>102579945
The weights are in fp32 but the provided inferencing code loads them as bf16
I'm not a retard.
>>
>>102579916
i think the point was that you can fit a somewhat passable multimodal text/image/video in/out model in 8b, so a "solid" 70b text/imgae/video/audio in/out model that uses this same principle is just a matter of time
>>
>>102579896
Retard
>>
Any local ais for text to 3d model?
>>
>>102579996
fiver
>>
>>102577785
They drop below Llama 2 7B quality if you try to use that much context.
https://github.com/hsiehjackson/RULER
>>
>>102580005
actually indians
>>
so it looks like 3090's have dropped down to $500 on ebay now, thinking about getting a second but worried about the power spikes
>>
I'm working on a text adventure engine in an llm, and I'm having a bit of an existential crisis around what makes a choose-your-own-adventure/text-adventure/rpg fun...Being able to "try" anything is almost paralyzing vs being kept on some kind of well-delineated box. On the rails, as it were.
I've gotten past the positivity bias, which I thought was the big hurdle, but even once it starts treating decisions with realistic consequences, I'm left with a sense of unease while using it.
I'm starting to question how much of the fun of these kinds of things is due to the fact that there's some pre-planned subset of things that you can do to keep you in a single experience with a consistent vision. ie. the knowledge that whatever you try, its all working towards some ultimate resolution and you don't need to engage your brain overly to keep it from meandering into some unsatisfying lala land.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.