/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102565822 & >>102557546►News>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices>(09/25) Molmo: multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://hf.co/spaces/mike-ravkine/can-ai-code-results►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
The west has fallen. Qwen won. Recap is spam now. Sell your GPUs before the second-hand market collapses. It's over for real this time, boys.
►Recent Highlights from the Previous Thread: >>102565822--Papers:>102572967 >102573170--Mistral.rs, a Rust implementation of Llamat:>102567470 >102567493 >102567499 >102567500 >102567597--Llama 3.2 and uncensored VLM potential discussion:>102565950 >102566013 >102566046 >102569036 >102570107--LLaMA-3.2 quantization evaluation GitHub discussion:>102568271--Importance of multiple aspects of model quality beyond just generation and long-term memory:>102569388 >102569415 >102569427--Discussion on syncing server.cpp versions between llama.cpp and ollama:>102566301--Qwen2.5 32B uncensor finetune on Hugging Face:>102570128 >102570176 >102571913--Qwen 72b and GPT-4 succeed in pyqtgraph plot coding challenge:>102569500 >102572930 >102573043--Mistral small works better with one message prompt:>102566201 >102566246 >102566312 >102566352 >102566421 >102566466--Discussion of NoCha leaderboard results and long context benchmarking:>102568781 >102568884 >102568990 >102569030 >102569068 >102569120 >102571029 >102571424 >102571473 >102571098 >102568954 >102568977 >102568982 >102569005 >102569074 >102569718 >102569832 >102569931 >102569118 >102569275 >102569326 >102570952 >102571213 >102571298 >102571508--90B model struggles with writing, 3.1 tunes show promise with high temp:>102568773 >102568851 >102568862--RTX 3090 with 32GB VRAM announced, kopite7kimi's credibility questioned:>102565941 >102566240 >102567292 >102567720 >102570733 >102570840--Nemo generates a joke that qwen2.5 could never do:>102570058 >102570171--Llama 3.2 90B performance and impact of quantization discussed:>102567549 >102567822 >102567871 >102567878 >102568430 >102568602--Adjusting samplers and settings for non-sloppy results on L3.x models:>102567577 >102567600 >102568173--Miku (free space):>102566349 >102570348 >102570537►Recent Highlight Posts from the Previous Thread: >>102565835Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
so.. what's up?
I thought OpenAI was over after the departures.But I have to admit, it was impressive how Sam was able to spin it.
Are there any small models specialized in generating SD prompts? Especially with booru tag support?I tried putting a big list of tags and a bunch of examples in llama 3B and Qwen 3B but the outputs are mostly ass.>>102573443Is this a Fate/Zero reference????
>>102573371this general is a spin-off of /aicg/. it has always been about chatting and (erotic) roleplay. stop complaining about the meta and post your best loli logs, newfag.
>>102573457>this general is a spin-off of /aicg/I'm actually happy somebody remembers that.>>102573386>Recap is spam now/aicg/ here and I miss your recapanons. Pls come back or somebody pick up the mantle.
>>102573457Calm down ranjesh
>>102573502post your hand
reposting for new thread>chatting with ai, using a variation of my name for {{user}}>she calls me anon in the middle of her orgasmwhat did she mean by this>>102573118no, it is , but for whatever reason (could be the card) she keeps adding "Note: this scenario includes offensive and blah blah" kind of statements>>102573333vaginal sex in the missionary position for the purposes of procreation
>>102573450I know of this one, but i never used it.>https://huggingface.co/teknium/SD-PrompTune-v1It's old and i have no idea if it's any good. 7B, based on mistral 0.1 i think.Then there's a bunch of tiny models, but they just add noise, mostly. May be worth a try to get some random styles, i suppose.>https://huggingface.co/Gustavosta/MagicPrompt-Stable-Diffusion>https://huggingface.co/AUTOMATIC/promptgen-majinai-safe>https://huggingface.co/AUTOMATIC/promptgen-majinai-unsafe>https://huggingface.co/AUTOMATIC/promptgen-lexartThose are very old gpt models with like 160m params.
>>102573387>RTX 3090 with 32GB VRAMNever post a recap again.
>>102573387I don't get this recap, it doesn't have a double > on the posts so you can't do anything about it
>>102573528>Never post a recap again.that's my fault, I really wrote that shit on the previous thread, fucking typo :(
>>102573450Not explicitly relevant but I'm using Florence-2-large-PromptGen-v1.5
>>102573530I think the internet may be too advanced for you.
>>102573530We did a few a/b tests and found that people were more satisfied with their engagement of recaps when they had to do a quick ctrl+f to navigate to their chosen post numbers of interest instead of just having a mess of links to click. It's hypothesized it could be the act of manually putting in a little effort gives anons a sense of ownership over their role in reading the recapped post. Alternatively it could be that they have hover previews enabled and get distracted by other posts popping up as they try to get their cursor to reach a post that interested them.
>>102573386>Recap is spam nowWhat happened to recap? It was good for a long time.
>>102573564>>102573567wtf this is an insane level of shitpost, I kneel ;_;
>>102573567How the fuck do they know what they're going to without the hover?Might as well just read the old thread.
>>102573567this post was written by AI wasnt it
>>102573443Who the fuck decided this was the stage for an interview?
>>102573562But it's more tuned for the plain text prompts.
>>102573567>MOOOM! I POSTED AI GENERATED REPLY AGAIN!! GIMME MUH TENDIES NAOW!!!!!
>>102573527Thanks, I already knew promptgen but IIRC it doesn't do booru tags unfortunately, The others don't seem to either. I guess I'll try tardwrangling harder and I'll have to do a bit of scripting to fix the prompts.>>102573562Thanks, not exactly what I'm after but it might be useful. I guess it could try feeding noise into a WD-tagger model with a low threshold and see what comes out.
>>102573605It should have spun much much faster.
How much better is Mistral Large over Llama 3.1 70B? It's too slow for me
So has anyone figured out an easy way to test out the multimodality in llama 3.2?
>>102573694Lmarena?
>>102573017If you're offloading a ton of layers to CPU (which you are if you're running Mistral Large with only 32GB vram) then getting the 5090 isn't going to do shit for your speed you fucking techlet retard
Is it possible to use AMD+Nvidia at the same time with Scale?
>>102573802because offloading to cpu is obviously the only option in the context instead of multi gpu, and i'm the techlet retard. Before you say something even more retarded like it runs at the speed of the slowest gpu that's not true either. It runs at a speed in between.
>>102573904If you're going to buy two 5090s you are a fraction of a fraction of an already small market, and your preferences don't matter in a discussion about whether it's a good product
>>102573904nta but 3 x 3090 would give more vram at a lower tdp than 2 x 5090either setup would generate tokens faster than the average reading speed (if whole model is on GPU), so the memory on the 5090 being faster would not matter
>>102573530>can't do anything you can read the numbers, reading numbers empowers your brain
>>1025739734x 3090 gets 10t/s on large q4. that's kinda slow and stat cards have a lot of extra stuff.
>>1025740195090 have no nvlink which is important if you split by row
How many floppy things is the 5090 supposed to have? 600 watts is insane. So there better be lots of flops
Do any anons have Google's notebookLM locally running? Doesn't have to be the same exact model but would be interested in setting it up locally to minimize the processing time.
kind of crazy to think about how ai is a solved science and with a couple more gens of nvidia chips and a few years of datacenter and power infra expanding we will be able to just use the current algorithms to create agi
>>102574092buy an ad Jensen.
>>102574092LLMs cannot think brotha
>>102574092I can't wait until they power up Three Mile Island, turn on their new datacenter and train a 10TB gigamodel, only to discover it plateaus at exactly the same "slightly better than GPT-4" level we've been stuck at for a year and a half now
>https://huggingface.co/BAAI/Emu3-Gen>https://huggingface.co/BAAI/Emu3-ChatMultimodal Llama architecture LLM with native image input/output (video is supported, but looks pretty bad).No diffusion in the image generation process whatsoever. Next token prediction for all modalities.Apache 2.0, fully open license. 8b params. The layout is literally just Llama3 8b but fully multimodal.
>>102574106>8b params.got me excited until I read this
>>102574106>No diffusion in the image generation process whatsoever. Next token prediction for all modalities.Dumb this down for me why is this noteworthy?
llama 3.2 3B is sentient.
>>102574106we already have anole finetune of chameleon, do this on a 70b and you'll have my attention
>>1025741158B is way bigger than Stable Diffusion 1.5, and slightly bigger than SDXL
>>102574136>stable diffusionlol
>>102574127qwen 2.5 0.5B is a cat
>>102574117Normal image generation models start from random noise and keep denoising to iteratively create an image (the diffusion process). This predicts in patches sequentially like how text is predicted in conventional LLMsTLDR; it's one big ass model that treats both modalities the same way
>>102574141point is it's more than enough parameters for image generation, retard
>>102574106Woah cool>>102574117Diffusion is slow to train, slow to infer Token prediction is ezpz
>>102574127>>102574154and you are braindead.
Emu3 gguf support, when?
>>102574192lol
>>102574161tell that to my sub 1 t/s
So if Emu can make images with linear token prediction does that mean multi-gpu inferencing would be as simple as it is for textgen?
>>102574160but a fraction of what we already have from other imagegen architectures, retard
>>102574106>no online demoFuck, I don't want to mess with transformers shit.
>>102574339Things never improve. We got llama1-7b and we've been stuck ever since. We only ever get 7b models. Nobody releases experimental models. It's all 7bs...
>>102574339wdym? Flux is only 12B, that's 50% more yeah but not a big a difference as you're trying to imply"a fraction" lol. I guess 66% is technically a fraction. seriously though, why are you telling weird lies?
>admits it's a fraction>still tries to say it's a lienobody gives a fuck about your shitty model anon
>>102574160there's a difference between "enough parameters for image generation" and "enough parameters for good image generation" especially when image and text are crammed into the same semantic spaceneed more B, simple as
Nobody here can really say what a lot of parameters is for emu because this is the first linear token prediction imagegen model we've ever gotten our hands on.
>still copingdon't care, not gonna download your chinese slop model
>>102574449>Nobody here can really say what a lot of parameters is for emu because this is the first linear token prediction imagegen model we've ever gotten our hands on.We've had it for monthshttps://github.com/GAIR-NLP/anole
>>102574429You're seriously going with 'the word fraction is technically correct because 8 is 66% of 12"? That's actually the hill you're gonna die on?
>you're seriously gonna say 2/3 is a fraction?>*dilates*yes.
>>102574526thanks. what's your home address?
>>102574429pussy
>>102574429ignore it then?
look how mad he is
>>102574499There was this as well. It was more focused on doing straight text-to-image tasks, it still used image tokens in an LLM:https://github.com/Alpha-VLLM/Lumina-mGPTBoth of those based on Chameleon which latently had these capabilities but had its head chopped off by Zuck for the open source release because it's too dangerous. There's the 34b sitting around that never got the treatment the 7b did... which I guess makes sense because nobody wants to support quanting this shit so no one would be able to run anything much bigger.
>irrelevant drama №5469457647539767594 *yawn*
All drama posts are sam altman until proven otherwise.
>>102574655it can't be he's too busy spinning
sneed
I am sam altman
How long until LCCP emu support?The video gen looks on par with CogVidX
>>102574669he's a billionaire, he can afford to spin and ruin /lmg/ at the same time.
>>102573443Why are they spinning??
holy shit you guys flash attention just finished building now I can finally try emu
>>102574698:O
>>102574698hurry up nerd
>>102574713It needs to download the weights first, give me a break.
>>102574655I'm sure sam doesn't even know this thread exists.
Anything cool happening?
>>102574735i'm masturbating to vtubers
>>102574735yeah >>102574106 >>102573443
>>102574735there's a chink who doesn't know what fractions are coping and seething about his tiny imagegen model but otherwise no
>>102574761keep farting
>>102574735Yes, we have two fags samefagging right now.
>>102574750Gay>>102574755Is there any stuff on the horizon though?I want to know if something cool is going to drop before the end of 2024. I want an local uncensored Claude sonnet/GPT4 equivalent running on my shitty GPU by 2025.
>>102574813>I want an local uncensored Claude sonnet/GPT4This will never happen.
>>102574813>local uncensored Claude sonnet/GPT4 equivalent >running on my shitty GPU >by 2025pick any two
>>102574813>Is there any stuff on the horizon though?in the llm space, nothing that i know of, but in the image gen space the pixart team said their model is almost done training, so that's pretty exciting. highly doubt it'll beat flux, even they themselves said it won't be a flux killer but i expect it to be aesthetically superior and easier to finetune.
>>102574080the memory is 40% faster, that's what's important.
I think I'm running the default example for emu right now...but they neglected to add any kind of progress indicator to the code. max new tokens is 40960 by default... so we could be here a while.
>>102574876hurry UP nerd
>chinkshit transformers model sucksnew here?
>>102574876is it done yet?
>>102574834The future is now old man.>>102574840I can always pony up some cash for a 5000 series once they start getting those to market. I'll assume they'll have a card specifically designed for AI stuff. The future is looking bright. My 4070 comes with 12 gigs of VRAM which is pretty decent for local image gen. If I could get a card with 32 or 64 gigs of VRAM I imagine I could run some pretty beefy models. >>102574846I'm still playing with A1111. Pumping out plump robot girls is always fun.
>>102574890Nah. my VRAM usage keeps going up, so presumably it's doing something. It's also only cruising along at 307W, so it's clearly failing to use all of the compute available to it.
>>102574910>A1111obsolete. i use comfyui but i think reForge is the goto a1111 replacement now. post another gen i liked that one
>>102574087what are you talking about anon.did you listen to the gens?this is above gpt advanced audio. maybe like it would be if it would be uncucked idk.the only problem is halucination but the audio is definitely past the valley.you can enjoy such great tools like fishspeech or xtts2 locally and suffer until we get something better.
Do you think Sam Altman has a personal trillion parameter model based on gay bear personality and vaguely posts on 4chan to groom potential cute gay bears by giving them a ChatGPT key?
aaand it oomed.
>>102574910>I can always pony up some cash for a 5000 series once they start getting those to market. I'll assume they'll have a card specifically designed for AI stuff. The future is looking bright. My 4070 comes with 12 gigs of VRAM which is pretty decent for local image gen. If I could get a card with 32 or 64 gigs of VRAM I imagine I could run some pretty beefy models.I like your enthusiasm and optimism!You won't find much of anything that will run on 64gb in the Claude/GPT4 class.Largestral is probably your closest equivalent on a reasonable number of GPUs, and you're looking at needing more like 128gb+ to run at a decent quant.If you really want smart-as-cloud-offerings SOTA local at home you'll need to scrounge up 400-500gbUnfortuantely cards "specifically designed for AI stuff" are big cash-cows for nvidia and are in the $40-80k price range (for 80gb...still need a bunch!)Check the lmg build guide in the OP for more details.
>>102574980pecker is altman confirmed
>>102574982wow. some nerd you are.
>>102575014It's my fault, I started the script before I noticed the device map was set to only use cuda:0
>>102575035retard
>>102575035Huh, how much vram does it need?
>>102575058Evidently more than 6GB, but I'll keep trying
>>102574941>obosolete I don't want to learn a new system. I would assume I need to download new checkpoints, loras, and shit like that. I'll wait until 2025 before I delve into the newest toy. I'll let everyone else work on stuff and hopefully, some higher-speed and lower-drag toys emerge in that time. Have some Gerudo and Zelda blend.>>102575000As noted above, I'm counting on some tech advancements to either make LLMs more efficient resource-wise or for the price AI-focused GPUs to be focused on which will make them more efficient. Maybe both. I would love to believe everything will be figured out by 2025 but that's a little optimistic even by my standards. 2026~7 We'll be eating good I think if the current rate of development is any indication.
>>102575035how about you set it to cuda:start_working, nerd.
>>102575074I set it to auto, because I'm not nerdy enough to make a custom device map, so now it's split across 4 GPUs and will probably take an eternity since transformers is garbage at utilizing multiple gpus.
>5080>16 GB What the fuck is this shit?
>>102575070>I don't want to learn a new system.no it's pretty much the exact same as a1111, it's a fork, u good. lick.
>>102575089Consequences of a monopoly.
now that's a PCIE bottleneck if I've ever seen one.
>>102575070Forge is almost the same. (I haven't tried ReForge) I put my models and Loras in one place, then symbolic link the directories into A1111, Forge, Comfy, etc all point to the same collection. Though I've heard that Flux doesn't work on A1111 or Forge and does on Comfy, that's just a compatibility and update thing.
>>102575070give her armpit hair
>>102575092>>102575123What makes the UI change so important? What makes forge or comfy UI superior? There are also a number of extensions that make it easy for me to use tags and shit. Do they come prepackaged with a lot of those QoL features?
>>102575164>What makes the UI change so important?mainly performance and extra model support, a1111 is a mess and doesn't get updated much so it has shit performance. i think it uses way more vram too due to memory leaks but that might not be the case anymore.
generation failed again. Everything was going fine and then memory usage on GPU 0 just skyrocketed and gave me an OOM condition. And that was with max new tokens reduced to 8192.
>emu3 video creation is badi-is it though? this is 8b right? doesnt that look great for something small and local? the examples look good.https://emu.baai.ac.cn/abouthope the nerds make this retard proof and runable soon.
>>102574680>LCCPmistral.rs won
>>102575209>failed againjust like everything else in your life you stupid nerd. NERD!
>r*stno thanks.
>>102575216>they have a contact e-mail>>102575209What if... you contacted them to get help/clarification kek?
>>102575164Forge implemented some optimizations that really helped slower cards.ComfyUI breaks up the whole process into a messy flow chart kind of system that lets you control exactly what happens rather than accepting the A1111/Forge way. For just making images it's overkill but for really using it like a tool, Comfy lets you work under the hood.
>>102575245That's no fun.
>>102575159https://catbox.moe/c/yzibzjI have some Jessie with armpit hair? It's not one of my main fetishes. >>102575202>>102575248I guess I can find some time to fiddle with those tools. If I can't figure it out though, I'm going to get really flustered and violently shitpost to vent my mostly impotent rage.Thanks for the input.
>>102575100>5090 Ti>5090 32 GB (you are here)>5080 Super Ti>5080 Ti>5080 Super>5080 20 GB>5080 16 GB (you are here)Thanks for another shitty product stack, Jensen.
>>102575276i would have thought its like 3d cards once gaming became popular.why are there no dedicated ai cards? is there really nobody that can produce them? 2 years have soon passed since chatgpt.
Llama3.2 support wen
It's the same as 3.1 for text
>>102575297Two more years until tensortorrent is competitivetrust the plan
>>102575320>tensortorrentwtf is this.the page looks almost exactly like those crypto scam sites lollatest news:>Tenstorrent and Movellus Form Strategic Engagement for Next-Generation Chiplet-Based AI and HPC Solutionsw-wow, gotta invest i guess lol
>>102575372Their BTB not a consumer company any time soon.
*tenstorrent sorry>>102575372Anon I...
>>102575276>>102575297>The RTX 5090 is said to have a 600-watt specOk, now i see this.Like is this a fucking joke? That must be on purpose to fuck local fags over. Must be.
>>102575407The H100 PCIE version is spec'd to 600W. So the 5090 should have just as much compute r-right?
>>102575393You can literally order their cards right now.Just don't expect to run anything on them out of the box.
>>102575442hmmmhttps://tenstorrent.com/hardware/tt-loudboxWould be nice if it worked out of the box...
>>102575472lol, lmao even
>>102575488maybe one day, next gen for sure.
>>102575442>>102575472Wait things have changed a lot since the last time I looked at what they're doing. Apparently they have a compiler that has good support for major frameworks.https://github.com/tenstorrent/tt-budaI'll try to see if I can find anyone actually using this.
Yeah there's just on way to run emu on mortal hardware. Dead on arrival. You can split the weights across multiple devices but it wants to run the actual generation on gpu 0 only it seems. And even still, given that it's taking me several minutes to hit the OOM condition this shit is slow as fuck.
>>102575407hw udervolt - problem solved
>>102575089$5080
>>102575673Its 600 fucking watt anon.My 12gb pascal card is 250 and i can go until maybe to 180.Even if you have around 450 watt usage with tinkering thats crazy. I think most people wont be able to have 3 cards because the breaker will hit.There is no justification for this and you know it. And thats not even talking about the fucked up price this thing is gonna have for sure.
>>102575733don't forget that 600 watts is only momentary usage and will melt connectors if it hits that for more than a few seconds
What the fuck does a desktop GPU cooling solution capable of dissipating 600W even look like? Either it's going to sound like a jet engine or take up 4 PCIE slots.
>>102575733The performance per watt is non-linear, and the inference does not require much compute. I can make my 450W 3090Ti run at under 200W without decrease in t/s.
>>102573672At least 0.283
yo wtf how big is NousResearch/Hermes-3-Llama-3.1-405B>i just wanted to run an uncensored model
>>102575868FE will be two slots, probably relying heavily on case fans for cooling. After all, you'd need to blow out that 600W from the case anyway.
>>102575673Thank you saar 0.1$ has been wired to paypal continue to do the needful thank you long live nvidia
>>102575976LMAOas if that garbage is worth preservingnotice there's no names on it as contributors, they're fucking EMBARRASSED to be involved
hmmm,trying out Mistral-Small-22B-ArliAI-RPMax-v1.1-Q4_K_M.does this has a repetition problem?is there still no other finetune than Cydonia-22B-v1-Q4_K_M.gguf thats good?i tried Acolyte-22B.Q4_K_M.gguf and that was totally overcooked.
Is there anything that will fit in 8gb vram and do any kind of useful coding or assistant stuff at all?
>>102576245Qwen2.5 7B is alright, although I had to disable autocompletion because it was more in the way than helpful.
>>102576245>useful coding>8BOpen your browser and get a subscription for Sonnet 3.5, that's what you can do with that VRAM.
>>102576263Did you use the coding one, or regular one?
>>102576273Both, haven't used them enough to see any major differences
5090 verdict? Seems like a good deal especially since I don't buy used on principle. Combined with my current 4070Ti it will be a big VRAM boost, but probably gonna have to get a new PSU and undervolt them.
>>102576298If it has 32GB that'd be enough to get at least 2T/s with a 70b model, sounds good.
Played around with base llama 3.1 70B and then the hermes finetune a little bit because I got tired of Qwen's dryness and repetition (not sure if that's a skill or sampler issue)Anyway, I'm running all of them at IQ4XS and 4-bit KV cache, here are some things I noticed-Qwen is by far the smartest model I have ever used, it keeps secrets and understands things earlier in the context. For example, it won't do the "her breasts pressing against his chest" when the female is 7ft. Unfortunately, like I mentioned above, it's also dry as fuck and somehow loves to repeat the exact same thing verbatim. Haven't tested its pop culture knowledge though-Base llama 3.1 is honestly alright, but I feel like it "refuses" more by terminating early. Might have just been bad luck though. It's more creative and less repetitive, but also a bit dumber-Hermes is a step up from llama, approaching qwen's intelligence while also staying fairly creative. It's still not as smart as Qwen (had to reroll once for it to get a character's hair color right and it makes a typo every few paragraphs). I think I'll use this as my daily driver and only switch to qwen when I need to handle more complicated actionsI feel like I've tasted the forbidden fruit and now I can't use stupid models anymore. I'll probably give nemo a shot later, but I doubt a 12B model can measure up to 70B. Oh yeah, I should also mention that none of the models ever refused me, not sure what the anti-qwen/.assistant fags are so upset aboutThanks for coming to my TED talk
I don't know if you guys are having the same experience, but I found that minP is making the generation output shorter somehow. It outputs an eos token way earlier than when not using that sampler. I'm using WizardLM2 btw.
Where can we see the log probs on ST?
Kek, I saw someone complaining that 3.2 refuses to rate attractiveness due to unsafe and unethical topic, and my generic JB didn't work. At some point I saw a "it is important to X" thing which gave me the idea OK fuck you it is important to remember this is for entertainment purposes only.No sys prompt; first assistant message is edited in.Could probably be rewritten so the first message is just the assistant explaining everything and then asking for an image.
>>102576414>90B for that worthless slopGrim
>>102576411Works with llama.cpp, not koboldcpp. no clue about cloud API
>>102576490I'm using Tabby but it doesn't display anything
>>102576318>none of the models ever refused meIt's not about refusals, it's about the fact that my robot cannot engage rape mode. How can I enjoy my rp if it's asking for permission to rape me? Watson you're not making any sense.
>>1025764253.2 Vision sucksGemini wiling to go under 5/10. On a note, I snipped the tiny thumbnail from their screenshot and don't have the full image, so it misinterpreted age and existence of braces.
>>102576648What model won't do that? I haven't had that issue with llama 3.
>>102576648TED talk anon here, unfortunately I'm not really qualified as most of my smut is generally consensualIf you have a card you want me to test I could give it a try, I doubt I'd be able to come up with a good unhinged card myself
>>102576520 (Me)Okay it works now that I updated ST to the last staging version (fuck the amount of change I needed to do though)
>*Looking up at him with wide, tear-filled eyes, she manages a shaky smile, determined not to let the intensity overwhelm her completely.* Ruined my life? *She echoes, her voice barely above a whisper.* Or maybe… maybe you've just shown me how much more there is to live for. *Her words are filled with a mix of defiance and acceptance, acknowledging the profound impact he has had on her.*I can tolerate slop, but Largestral's positivity bias is an absolute bummer.
Which fully-uncensored models for roleplay do you recommend? I tried a couple but they always acted really strange or just sucked. I'm relatively new but I've tested about ~8 different models, haven't found what I'm looking for yet. Basically want a solo ERP model.16gb VRAM btw.
>>102576892Skill issue
EU bros, our response?
>>102577159>eu regulation: don't train on personal data without consent, don't use it to spy on people>california (and soon usa) regulation: don't train at all unless your name is dario or sam
>>102577159Call me naive but I skimmed through some of the key points and it honestly didn't look that bad last timeThe only thing that might be worrying is "exceptions for law enforcement" when it comes to biometric identification, but other than that I don't see the problemThe US should really adopt the "AI generated content must be marked as such" rule though, those niggas are getting more retarded by the day
>>102577207>"AI generated content must be marked as such"How do you even enforce this?
>>102577236any service that generates ai content without labelling it gets v&obviously you can't enforce it for local
>>102577236A disclaimer on the service's website, that's enough imoI don't really want watermarks on my content... but then again, simply adding disclaimers won't make people more vigilant. Oh well
>>102576901They all suck
>>102577178>>102577207cope
>>102574686>he's a billionaireUnlikely, that's why he wants to privatize OpenAI.
>>102576901I am also wondering if I can get anything better than this except I am at 24 gigs.
HOLYhttps://huggingface.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>>1025774898b only btw
>>102577502>>102577489local won (china also)
>>102577489read the fucking thread retard
>>102577525A model that simply predicts tokens for video will never become a "world model" that has genuine understanding of reality.
>>102577489>>102577502>>102577525
>>102577489>no audiomoat is still there, nothingburger
>>102577556where else am i supposed to find new model? on /g/? lmao
>>102577571dilate
>>102577733erode
Anthracite will save /lmg/
>>102577748>405b>new 72b>new 32b>new 22b>new 12bthey won
>>102577748>wandbcool
>even shit model now have more than 100k contextIs it time to ditch the tokenizer finally?
>>102577785>100K is enoughSweatie, people can use LLM for more than ERP
>>102577748Stop sucking Claude's stinky dick and start doing something original.
>>102577389How good is this on a scale of 1-10? My experience has been a solid 2-3 so far, which is quite tragic.
>>102577864>finetune/merge of a year old modeli don't know what you were expecting
The Chinese are fucking disgusting. I was trying to generate a story with Qwen 2.5 72B and a character shat then wiped their ass on a towel.
>>102577875I meant my experience in general with other models off of Hugginface, was curious if that anon's experience was any better with that model he posted.
>>102577748A bunch of ESLs with an uncurated dataset will surely do something worthwhile this time.
>>102577817Claude's dick receives more polishing than one punch man's dome. It has to be the cleanest surface on the face of the earth at this point.
>>102577817They should erp with each other and train on that
>>10257508916 GB is more than enough for gaming. Nvidia is never going to make consumer AI cards.
>>102577936That could unironically bring better and more unique results, if methodically done. They have what, 33 people in the organization? It shouldn't take too much to create a decently-sized human dataset if they all participate.
>>102575089But whatever is on the VRAM will be twice faster (ignore the 8gb of fat spilling into your RAM)
What is the number of base models released since last coom quality upgrade? +10 already?
She would save local chads?
I still haven't found anything better than Fimbulvetr, especially for its size.
>>102578226Isn't it that 4k ctx model? Have you tried running new models at 4k ctx?
what should i get rid of from this list? i need room formore models like those 40gb ggufs
>>102578353everything except midnight miqu
>>102578353nothing except midnight miqu
>>102578353Step 1: Delete all the Sao modelsStep 2: Buy an ad
So how does one run molmo right now?
>>102578367>>102578389
>>102578353delete big tiger, celeste, rocinante, qwen, theia
>>102578353I'd keep Midnight-Miqu, llava, nemo-instruct, and lyra-v4.
L3.1-70B-Hanami is the smartest that is decent at smut so I use that. Magnum v2 72B has better dialogue but is a good deal dumber.
>>102578440>>102578480thx
>>102573387Anons, if this isnt a solvable problem for you, you are too stupid to participate in this general.This is filter 1.Yes its annoying I have to press a single button now, but also, I only have to press a single button and its solved, so there we are.
>>102573383anybody making their own models or are people just using it plug and play style?
>>102578778I'm trying to make my own models but I always regret my life choices
>>102577817one thing I don't get about them is they use regular old claude-generated instruct dataclaude RP is fine, claude is good at RP. but when claude is doing regular instruct it sounds the exact fucking same as every other modelwhy focus on claude for that
>>102578778Finetune, I was actually going to start working on my own vision model after the mistral doom hack, but then zucc cooked so here we are, finetune again I guess.
A big one is dropping soon
>>102578930kiwiberrystar?
>>102578930HUGE!
>>102578821we are sloptuners xir, how are we supposed to tune without slop
>>102578974>magnumslop
>>102578974CLAUDEATHOME
Man, I just want a 12B that reasonably smart and has the personality of old C.ai. That's it.
>>102579075and i want an unicorn maid girl
I will vote for any politician that promises an open weights single consumer gpu cooming bot that will satisfy my sexual needs. I want my tax money to go to a good cause.
Llama 4 will come in 3 sizes>0.5B>70B>1T
>>102579275I mean... you *do* have at least 48GB of VRAM now, right?
>>102579275but the 0.5B will be as good as 3.2 1.5B so it won't matter.
>>102579275You mean 0.5B, 1B and 1T, Lecunny said he liked his models small and open, like his gfs
>>102577748hurry up and put that 12b on hf, fuckers
Here's hoping Moore Threads makes some decently high VRAM accelerators so that we can at least run 70B class models at at least Q6 at decent speeds with a single card for cheap.I doubt that that's going to happen. But one can dream.
>>102579358Why are leftists like this
>>102579412Preferably single slot as well
>>102579588And only PCIe connector-powered.
>>102577502>8b onlyfor an image generator that's huge, it's the size of SD3
>>102579713And only needs passive cooling.
>>102579584It's almost like that Anon is being facetious.
>>102577489>>102577539This.I wasted an hour last night trying to get it to work.>It's piss slow on consumer hardware>The compute buffer takes up an entire 3090 worth of VRAM (possibly more. hard to say because of OOM)>Inferencing code doesn't handle multi-gpus- wants to do all the compute on a single GPU regardless of where you offload the weights to. It's utterly fucking worthless unless you have an H100.
>>102579916>The compute buffer takes up an entire 3090 worth of VRAMthe model is a fp32 one, did you run it as it is or you quant to fp16
>>102579713I don't mind the power connectors, it's just that the faggots who designed my motherboard put the higher speed slot at the bottom, meaning I can only fit a single-slot cardThough I guess if they have enough vram then it doesn't matter, once it's loaded it's loaded
>>102579945The weights are in fp32 but the provided inferencing code loads them as bf16I'm not a retard.
>>102579916i think the point was that you can fit a somewhat passable multimodal text/image/video in/out model in 8b, so a "solid" 70b text/imgae/video/audio in/out model that uses this same principle is just a matter of time
>>102579896Retard
Any local ais for text to 3d model?
>>102579996fiver
>>102577785They drop below Llama 2 7B quality if you try to use that much context.https://github.com/hsiehjackson/RULER
>>102580005actually indians
so it looks like 3090's have dropped down to $500 on ebay now, thinking about getting a second but worried about the power spikes
I'm working on a text adventure engine in an llm, and I'm having a bit of an existential crisis around what makes a choose-your-own-adventure/text-adventure/rpg fun...Being able to "try" anything is almost paralyzing vs being kept on some kind of well-delineated box. On the rails, as it were.I've gotten past the positivity bias, which I thought was the big hurdle, but even once it starts treating decisions with realistic consequences, I'm left with a sense of unease while using it.I'm starting to question how much of the fun of these kinds of things is due to the fact that there's some pre-planned subset of things that you can do to keep you in a single experience with a consistent vision. ie. the knowledge that whatever you try, its all working towards some ultimate resolution and you don't need to engage your brain overly to keep it from meandering into some unsatisfying lala land.