/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101673824 & >>101664954►News>(07/31) Google releases Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101673824(1/2)--Papers: >>101676385 >>101678804--Non-instruct 3.1 models don't use chat templates, just paste prompt without extra things: >>101679229 >>101679315 >>101679412 >>101679450 >>101679493 >>101679701 >>101679814 >>101679855--Generating anime nudes with Flux and improving results through fine-tuning and multimodal models: >>101677440 >>101678285 >>101678305 >>101678316 >>101678524 >>101678567 >>101678591 >>101678615 >>101678653 >>101678694 >>101678824 >>101678866 >>101679483 >>101679721 >>101679806 >>101679861 >>101679909--Comfy's FP8 quant types compared, e4m3fn recommended: >>101678062--CLIP struggles with nighttime scenes and lighting conditions: >>101677487 >>101677522 >>101677614 >>101677606 >>101677656--Anon shares largestral preset and discusses compatibility and tweaking: >>101677733 >>101677888 >>101677937 >>101678061 >>101678177--Anon gets llama3 405b model working with RPC backend and CUDA: >>101675492 >>101676158 >>101676645 >>101676940 >>101676990 >>101677266 >>101677514 >>101677670 >>101677951 >>101679773--Anon discusses fp8 quanting and its effects on model performance and VRAM usage: >>101676925 >>101677081 >>101677631 >>101677660 >>101677223--Anon shares Bitnet fine-tuning project on Twitter: >>101674803--T5xxl has generic styles, prompt like a NLP VLM: >>101678672 >>101678729►Recent Highlight Posts from the Previous Thread: >>101673831
►Recent Highlights from the Previous Thread: >>101673824(2/2)--Model generates coherent text and anime images, outperforming Dalle/Bing: >>101676854 >>101676922 >>101676981 >>101677620 >>101678115 >>101678190--Miku Online game development with MythoMax and flux-dev: >>101676119 >>101676134 >>101676322 >>101676240 >>101676306 >>101676500 >>101676553 >>101677033 >>101677043--Meta's 400b model and plans for Llama 4: >>101674795 >>101674825 >>101676577--Llama model requires 16GB~24GB VRAM: >>101676110 >>101676786 >>101676134 >>101676322--FLUX.1 model has resolution limits: >>101678709 >>101678720--Error generating at 1920x1080 with fp8 due to shape mismatch: >>101678172 >>101678182--ComfyUI error with torch float8 type, try updating pytorch: >>101674930 >>101674989--Miku (free space): >>101674330 >>101674503 >>101675596 >>101676616 >>101676703 >>101676844 >>101676889 >>101676890 >>101676923 >>101676987 >>101677087 >>101677126 >>101677329 >>101677660 >>101677821 >>101678159 >>101680333 >>101680503 >>101681458►Recent Highlight Posts from the Previous Thread: >>101673831
>>101682019lmao these /lmg/ threads still exist when there's /ldg/?
Which models >7B (tunes/merges) I should avoid for being upscales? I've seen some labeled as 13B but based on 7B mixes. Running 13B on my vramlet is suffering as it is, I want to be 100% sure I'm not wasting my time.Is Echidna or Psyfighter 2 good in that regard?
>>101682138Look at the release date. If the model was made before April this year, you can automatically assume that it's been rendered obsolete by something better.
Yay me, I got it working. Its knowledge of characters is atrocious though.
>>101682183Yeah, it seems likely they used AI captioning for the image descriptions, which tends to strip out knowledge of anything but the most popular characters.
>>101682160I'm not sure how that relates to my question. I'm only asked about something that objectively is bad because an upscale isn't real 13B params. Besides, are there any 13B models in new releases? From what I see, the llamas only come in 7 and 70B flavors, the nemo/mistral 12B won't run in what I use and require terrible tinkering.13B is the absolute limit of what I can run, so I want to get the best for my buck.
>>101682183I'm working on evaluating several vision/caption models for their dataset captioning capability (meaning no >100 tokens flowery language with no actual information behind the epithets). Florence seems to be one of the best for now.Text comparison is done, now I'm feeding the results to sdxl and evaluating how close it's to the original (subjectively).
>>101682230>nemo/mistral 12B won't run in what I use and require terrible tinkering.What kind of strange setup do you have?
>>101682276Backyard/GPT4ALL. Yeah, laugh at me, I can't be bothered to install UI and engine separately, and seeing that even with ST it requires some manual tweaking, I'm not that much inclined to do that.
Sorry for being a retard, but I am a time traveler from about a year ago. At that time, the state of the art was local llamas and people were starting to dick around with vicunas. I do have a GPU but at that time there were no linux drivers worth a fuck so I had to use CPU and it sucked. What has changed the last few months to a year?
>>101682230While we're at it, did Llama2 base model come in 13B? If not, that basically makes every model based on it and larger than 8B an upscale.
>>101682256Is L3 LLaVA-llama-3?
>>101682321Well Echidna at least I know is based on Llama2-13B so it's an actual 13b, or you could try llama 3.1 8b if that works? What about Gemma, does that work? There's the Gemma2 9b.
>>101682362It's titled as "xtuner/llava-llama-3-8b-v1_1-transformers" in Taggui.The short and long difference is whether I include a "describe the image" prompt. Doing so results in a large (overly) descriptive caption for some models.
>>101682366I think it’s really cool that you’re helping this lazy mother fucker who can’t be bothered to even lift a finger.
>>101682383Awesome, thanks. I was evaluating just a min ago and not liking the results very much. I'll hop over to Florence for testing instead. I'm trying to include the definition of booru tags and have them incorporated into the description, though it may be beyond current vision models.
BitNet status?
The anon in the last thread was right the Celeste shit is trash, mini magnum still is better and smarter, I thing the over data from reddit is just so bad, that the model less organic and fall in repetitions.
>>101682138>>101682230>>101682339>>101682341>>101682432>>101682441None of them have more than 24GB VRAM.
>>101682432they fuckin with basic bitch 0.15B model rn zzzzzz
>>101682339
>>101682441l3 celeste was also borderline unusable, dude is just incompetent at finetunesI have yet to try mini magnum, but dory has been okay in my tests, it has a system role and seems better at remembering stuff from a large context
>>101682366Thanks for clarifying. Is the difference between 9 and 13B generally noticeable or they're just both equally dumb?>>101682417https://github.com/jhc13/taggui/discussions/169there's a comparison of various models with size requirements and whatnot
Does Nemo need me to write actions using *asterisks* or can it understand just regular narrative prose too?
>>101682183
yep its llm time
>>101682748Of course Nemo can do that. You just need to delete any * the bot might have.
>>101682507>llama2>golden age of tuninglol nowe've been in a downward spiral since llama1
>>101682975Looks like an average AO3 fic.
>>101682975more like writing in first person time
>>101683027Not really, there are many ways to add variety in first person
I heard the AI does not handle negative commands well. How do I tell it that someone does NOT have a tail?
>>101683057Are there? It feels weird describing your own actions like that + potential to confuse the model.
>>101683027Isn't it better than having it go *name* did X? It drives the conversation into narration and the models are already too biased towards it.
>>1016830811. Start with a different part of the sentence:"With trembling hands, I opened the letter.""Slowly, the realization dawned on me."2. Use participle phrases:"Stumbling through the dark, I searched for the light switch.""Having finished my work, I decided to take a walk."3. Incorporate sensory details:"The scent of freshly brewed coffee drew me to the kitchen.""A loud crash startled me from my reverie."4. Focus on other characters or objects:"Sarah's expression told me everything I needed to know.""The old clock chimed, reminding me of the late hour."5. Use dialogue:"'You can't be serious,' I muttered under my breath."6. Employ rhetorical questions:"What was I thinking when I agreed to this?"7. Start with time or place markers:"At midnight, the streets were eerily quiet.""In the dimly lit room, shadows danced on the walls."8. Use infinitive phrases:"To calm my nerves, I took a deep breath."9. Incorporate internal thoughts:"The idea seemed ridiculous, but what choice did I have?"10. Utilize passive voice occasionally:"My attention was caught by a flicker of movement."
>>101683109No one good at RP uses first person
>>101683120And I'll have to do that every single time I start a convo with these repetition-prone models? Is this our life now?
>>101683109I don't know, I'm asking you. I do third person narration and keep dialogue in quotes in first person. Works well for me but you really have to move with scenario otherwise repetition and slop creeps in.
>>101683128A reminder that these good at rp people are who taught the AI to have shivers and other slop. I've seen people defend that style of writing outside the scope of AI, seems they unironically believe this is good.
>>101683138Yes, and the models will still default back to using "I did X", "She did X"
>>101683188b-but the system prompt and JB...
>>101683077They are all tailless. 0rnm84
>>101683128i disagree. especially with rag and lorebooks, its awesome how you can insert yourself into any role in as a character
>writing in first person >using asterisks>letting the model mention your character's emotions and actionspost the worst
>>101683229>drive the plot and conversation forwards>"And so, they lived happily ever after. The end."
>>101683218Yeah I really love how personal you can make it and using first person elevates that experience. Local fucking rocks.
>>101682806Does it know who Teto is, or did you just describe what the subject is supposed to look like?
>>101683149No one outside of romance novels for women writes like that. Mundane RP reads more like https://pastecode.io/s/ndaa4nt4Etc
>>101682122Isn't /ldg/ for images?
>local AI be like
>>101683282Isn't /miku/ for TTS engines?
>>101683335>implying proprietary is less cuckedit will also report you glownigs for asking that lmao
>>101683335first opinion is based, the second one is retarded
>>101683335With local we have a choice.
>>101683387fuck off with your meme benchmark, I remember you acting retarded in previous threads
>>101683386>>first opinion is basedof course /g/edditor would say that.
>>101683396back to /aicg/ with proxybegging for cucked corpomodelsOr will you tell me more about starving children in Africa again?
>>101683411nta but thinking you deserve something because of your skin is peak nigger behavior. Go get your food stamps scum.
>>101683420>back to /aicg/ with proxybegging for cucked corpomodelsI piss on /aicg/ and corpos>Or will you tell me more about starving children in Africa again??
>>101683434whatever you say self-hating cuck.
>>101683436/lmg/fags are stupid just like their local AI.
>>101683188It's all pattern recognitionrubbish in rubbish out
>>101683448Very cool. Did you get social care sorted out yet you parasite? Maybe puppy eyes will help.
>>101683464My writing is immaculate but still not enough to overpower all the pretraining slop
>>101683411Imagine being proud about something you didn't work for and was given to you by a sheer luck. These kind of people are the biggest pussies in the entire world, subhumans even. If you had accomplishments on your own you wouldn't have a need to be associated with a wide group that is full of retards, creeps and other pathetic people. When you see a white guy shitting himself from drugs on the street you think this is your guy, your brother. I have more in common with my black colleague with PhD that works next to me at my job than with most white people. The mere thought of white trash like you being seeing as my equal makes me want to vomit.
>>101683261it probably knows
>>101683554Jeet fingers typed this
>>101683554I miss times when such bait posts on 4chan ended with an witty twist. Now this shit is written unironically.
Honest question from a clueless retard: is anything local comparable to GPT 3.5 turbo?
>>101683929Yes, most new models are. Better in fact. Really.Llama 3.1 8B instruct is a good place to start if you're looking for something that is similar to turbo but better.
>>101683929>GPT 3.5 turbothis is such an old model that most of local mogs it easily
>>101683335Skill issue
>>101683929pretty much any modern local in the 70b+ range surpasses 3.5, any that didn't would be awful
>>101683929LLAMA 3.1 70B, smart but cucked. Comparable to turbo GPT3.5.
>>10168396170B utterly mogs turbo
>>101683949>>101683945>>101683959>>101683961That's excellent to hear. Are they censored/can they do smut RP?
>>101683977LLAMAs are very cucked. Use command-r-plus or mistral-large for ERP.
>>101683977If you want "vanilla local ChatGPT but without censors" look up the llama 3 abliterated models. If you want something specifically good for smut look into a finetune.It's also not hard to "jailbreak" vanilla instruct llama because you control the system prompt, and base llama 3 can be tricked into continuing pretty easily, the few times it refuses. Alternatively check out Command R/R+. Really good models that are both uncensored and pretty smart.
>>101684012What's the smallest CmdR, any of 13B and under? If not, can I achieve comparable performance (speed vs quality) by very quantized larger parameter version, or it's not worth it?
>>101683977
>>101683977https://huggingface.co/bartowski/L3-8B-Stheno-v3.2-GGUF/tree/mainq8 quant, thank me later
>>101684050Don't go under 4bit and you'll be fine in general. 5-6bit is quite good and barely different from full precision. How much vram do you have?
>>101684085>3.0Buy an ad, sao.
>>1016840938G, that's why I'm asking, as 13 is just barely usable at this point (1.5t/s). I heard however that larger amount of params allows for stronger quantization at about the same level of dumbing down.
>>101684115Can't do too much with 8GB I'm afraid. 13B 4bit is the absolute limit you should quantize down to, and I'd honestly recommend a 8b Q6 model over that. Try llama 3.1 8b or a finetune of 3.
>>101684115Alternatively, if you have a good CPU and fast ram, try the Mistral MoE model.
>>101684196>finetune of 3This. Ignore Gemma and 3.1 because Sao didn't touch these models. Only use models made by him.
>>101684237Gemma is garbage for smut. 3.1 can be half decent.
>>101684115>>101684085use q8, 28 layers in GPU + 8k context fits the card easily. The rest will be in ram but still I have pretty comfortable T/s this way. Just don't go for lower quants in llama tunes, even q6 feels very bad compared to q8.
>>101684249You're right. See this chart? Well, you have to do the opposite of everything 4chan says. So technically, 8B is the best.
Idk how your fine-tunes can bring out what the model hadn't seen during pretraining phase. Guess you can overfit and make it horny but retarded
wasted my money on a 4090 when i could have bought two 3090's. hopefully the price for a second 4090 drops once the 5 series comes out
>>101684295Gemma 2 9B is way worse than 3.1 8B in interesting prose, pop culture knowledge, and anatomy. I suppose that when the only thing that matters to you is purple prose you can arrive at the conclusion that Gemma is better.
>>101684295can't get mistral nemo working on text-gen-webui/obbaboooba
>>101684320do you understand the concept of finetuning and transfer learning?
>musk deboosted openai employees to hell and back I haven't seen any esoteric and mystical takes for like a month nowYLTSI
>>101684358>>101684345next sao masterpiece will be a nemo doe https://huggingface.co/Setiaku/ITR-12B-v1/tree/main
>>101684345>I suppose that when the only thing that matters to you is purple prose you can arrive at the conclusion that Gemma is better.It's the same with mythomax, it just babbles incoherently for 3 paragraphs and people are amazed. I think people in this general are very impressionable by the purple prose and find it desirable in models.
>>101684320fine tuning is not fundamentally different from pretraining in a technical sense, so depending on how much data and compute at your disposal you can teach pretrained models new things or, if you do it wrong, make them forget everything
>>101684405You need a refined taste to appreciate the raw power of a 8B model.
vramlets always coming in here contributing such high quality discussion
>>101683128Real RP has never been tried.
Anon who was posting celeste examples last thread, can you post some Stheno examples in the same scenario?Is there a Stheno for llama 3 8.1 yet?The thing with Nemo and Celeste for me is that it does really well at 32k context.Stheno did alright with extended context (surprisingly well even), but nemo and its fine tunes seemed to do better in my particular testing.
>>101684761>but nemo and its fine tunes seemed to do better in my particular testingAnd you need the opinion of a schizo because...?
does new kcpp run nemo?
>>101684910Yes
>>101682321>I can't be bothered to install UI and engine separatelywtf? just install it, it takes like 5 minutes>inb4 windowsinstall linux then, it takes like 7 minutes
>>101684972But then I'll have to study all this mad regexp shit and lorebook tricks people use to make high-tier cards. There are just too many features instead of a user-friendly ready to use interface the bundled solutions offer.:effort:
Alright guys go out and make shit loads of flux fine-tunes so I can make a model9 flux edition.
>>101685019i know you're a different anon, but I'll bite anyways:>study all this mad regexp shit???>too many features instead of a user-friendly ready to use interface the bundled solutions offer.????you click the thing, and you enable streaming and then you load the card and then you are done wth
>>101683434Being proud isn't the same as thinking you "deserve anything"Stop being an illiterate mongrel
>>101685045 (me)also i know of a project that is exactly what you are looking for, but I won't tell you since I don't like you
Is largestral at Q_2 worth it over nemo at Q_8?
>>101685045I take it you weren't around when people were digging deep into the settings to make Nemo work just a few days ago.As for the cards, here's an example https://www.chub.ai/characters/2376724>>101685084lmao such a tsunderechances are that shit requires avx2 or the likes, like the jan.ai crap, so it won't run for me anyway
>>101685102>lmao such a tsundereWhat are you, stupid?
>>101685119no, just aroused by you
>>101685091is it at least double the number of parameters?
>>101685134
> vramlet with 24gb 3090 + 128gb ddr4I downloaded 405b instruct from hf, made an IQ2_XS quant that was less than 128gb and ran it on my gpu + old xeon with 18 cores.== 0.3 t/s
i need a guide to run nemo, everything is going into system memory and crashing my machine instead of loading into vram like all other models
>>101685242set the context max manually by default it's set to 1 million for some reasonhttps://huggingface.co/unsloth/Mistral-Nemo-Instruct-2407/blob/main/config.json#L14> "max_position_embeddings": 1024000,
>>101685255wow, actually thank you
>>101682507Made a fancier version of it, critique as always welcome
>>101682256Made a generated picture comparison by the produced captions, comparing just the pics to the original.Suddenly Kosmos rivals Florence.
>>101685239Why? Just use C-R+ or Largestral.
>>101685286we're currently in the golden age of sao10k
Anyone know how to prompt Flux so the background is in focus? It just keeps adding DOF.
>>101685420wizardlm-2 8x22b (140b) on the same config is 2.4 t/s which is usable. still to try out large 2 (123b).> in bitnet we trust
>1 day later>i am forgotten
>>101685623>open sourcing later
>>101685623keep reposting it, petra
>>101685623this madman might actually figure out how to requant regular models into bitnet> i want to believe> we are so back
>>101685651can't wait not being able run the bitnet llama3.1 405b model
And we didnt even get the cohere models yet right.Pretty cool.
>>101685623>hacked bitnet
>>101685623This is a complete nothingburger. A 0.15B parameter model is fast on the CPU, news at 11.
>>101685755Also, BitNet models aren't supposed to have their self-attention layers quantized to ternary values, at least according to the original authors. So you'd still be able to easily finetune them (for example with LoRA) even on local GPUs, if you can fit the model in memory.
>>101685811>>101685755let them cope
>>101685286Nice!
Is there ANY model that comes anywhere near the level of intelligence and natural prose and helpfulness and holiness of Claude Sonnet?
Why are people always so defensive about people trying something out.Idk if "hacked bitnet" wont work,maybe it will, maybe not.People had the same attitude when kaioken was working on superhot. lol Its never changing.
>>1016860083.5? No way. There is something different with that model.They did something the mememarks dont show. Its so far ahead its not funny. First model for me that doesnt turn in circles if challenged.That being said it feels more retarded recently. I dont know why.
hi i would like a 500b moe model with 100b active, thanks friends
>>101686008Two more w̶e̶e̶k̶s̶ years
>>101686014>Why are people always so defensive about people trying something out.Mental illness.you're on the board where 99% of the threads are "distro wars" shitting on each other over which "distro" of a free operating system that's meant for running servers is the best to sit around taking screenshots of your desktop environment of (because linux is for running servers). Actually downloading something and trying it out for themselves would invalidate their entire existence in one fell swoop. They don't "just try out" a model because they would realize how little fucking meaning their life has and how that's actually their own fault.
>>101686014People are free to try whatever they want; linking randos on xitter schizoposting about their tiny-scale tests isn't really adding anything or making BitNet look more promising, though. Scaling BitNet up to useful pretraining dataset and model sizes is what's missing right now.
>>101685286This looks like it came straight out of a discord server
>>101685286see>>101686080Try things.use what you like.Otherwise you're a fucking mentally ill troon.
Got myself another 3090 and now have 72 GB of VRAM, but when I try to load a larger model using Exllama, it crashes. I am able to split and load models across all 3 GPUs as long as they are around 37 GB or smaller. I can load GGUF models just fine with all 72 GB.I've tried auto-split, 22,22,22, or 20,20,20 to load a 50 GB model to no avail. It just crashes when it attempts to load onto the third GPU. And when I set something like 16,16,22, it will begin loading onto the third card but then crash around 8 GB in despite having 24 GB available.Does anyone have a solution to this?
>>101686032Yeah this thing is so much better than gpt-4o it's not even funny. I don't see the point of running local. Running sonnet inside the shitty UI of lmsys chatbot arena feels better than anything I have experienced locally in years.
Why are they never merging PR?
>>101686231Not their job, fuck users. Useless eaters
ValueError: Trying to set a tensor of shape torch.Size([1024, 5120]) in "weight" (which has shape torch.Size([1280, 5120])), this look incorrect.
>>1016860083.5 sonnett is the best model out there right now, openai went to shit
>>101686273maybe redownload your model files?
>>101686118I'm here since aidungeon and good old retarded unquantisied pyg. 10 tries and you got a somewhat coherent sentence relevant to the context. lol And it was awesome.Has it really been only a bit more than 1 year?Thats insane. I thought we got Llama1 beginning 2022.Now you made me look up stable diffusion and I guess I got mixed up with that. Thats really fast too though.I remember the local 64*64 horror images.
>>101686361Oops, clicked the wrong post. Was meant for this image >>101685286
>>101686167Not an expert but, did you check the basics? For example, is your PSU good enough to handle 3x 3090?
>>101686361Midjourney V6 looks insane. Even with flux we are at least 1 year behind. Local keeps losing.
>>101686538OK Eeyore.
>>101686538>Even with flux we are at least 1 year behind.The anime pictures look better than anything I've seen though
>>101686538it lost the dog, so it doesn't matter how it looksthe composition must be the same or the comparison doesn't make sense
>>101686538Whatever, this is way better than anything else we had locally. Clearly catching up.I love that some people made taylor swift pics lolhttps://fluxpro.art/Would be funny if she cries about it again.
>>101682472>>101684602
>>101686705I have 72 GB of VRAM, I hope I can be redeemed
>>101686032>>101686008Why do people keep jerking off 3.5 sonnet when opus is so much better? Sonnet gives such shitty, gimped replies comparatively, it's like worse than the top tier locals.
>>101686470I should have been clearer. It's not my PC that's crashing, it's just Exllama.
>>101686730Are you talking about RP?I was talking about coding and helping me out with design problems etc.I'm sure the other anon also was using it for this purpose.3.5 is a coding model. Its very dry with talking etc. and refuses very fast. Interestingly enough you can "argue" your way out of a refusal and make it admit if it was overzealous with refusal .Usually its a death sentence if a refusal is in the context.If faced with a problem 3.5 actually tries out of the box thinking and tries to find a solution.ALL other models run in circles or make stuff up. There must be some sort of architectual change.You can give it continued instruction for a html5 game and it doesnt trip up with 6-7 previous versions it spit out in context.Opus sucks. I would say its even worse than gpt4-o. It hallucinates way to much to use it for anything productive. Its a RP monster. Not sure why anthropic dont want it to be used for that.
>>101686678this shit doesn't work
>>101686231Over 90% of all PRs that have ever been opened have been merged.
>>101686808https://poe.com/s/MQZJIAr13CWjscbv85E0Shit like this is what I mean. Its a beast.
>4o mogged by sonnet3.5>dalle3 mogged by flux>sora vaporware mogged by chinks
>>101686361>V2Sovl
>>101687070Keep your fetishes for yourself please.
>>101687070The sad part is that these mikuforcers are subjecting the innocent character to this kind of reaction. I like the character, she didn't deserve this, but it's their fault. If they chose a proper mascot, none of this would have happened.
>>101686705meek!
>>101687111Keep your 2D addiction out of /g/, sure.
>>101683077In my personal experience, the models I've used are fine with negative commands (≥70B such as CR+ and L3.0; haven't had free time to play with 3.1 yet). For example, wanted a demon character, AI keeps describing horns. I say "does not have horns" in Kobold's context fills and it works fine for quite a while.What "quite a while" means are two things. One, that no matter the inflated context sizes we hear about, I notice coherence decay when context gets to about 4k, and collapse begins around 6k. One can manage this by summarizing but it's delaying the inevitable. It's not that the data isn't in the context, and we've seen green graphs of models finding "needles in the haystack" but I find that low-probability details as unique characteristics get neglected as context grows. The other is that the model seems eager to bring the prohibited characteristic back. So if my demon character uses a transformation magic, those horns love to come back even if it doesn't make sense for the new disguise.
>>101687146Are you seriously trying to get anime out of an anime imageboard?lol
>>101687146>banevading niggercuck tranny has an """"""opinion""""""Funny.
>>101687229>samefagging attempt
>>101683077OpenAI does use many "DO NOT " instructions and totally try to tardwrangle image generation for example.Was leaked with the mac app or something months ago.Kinda endearing that they are the same like llm github projects if you check the source how everybody prompts. lolIsnt it difficult for the subconscious to pick up negative suggestions as well?And at least from my experience trying to make a llm translation app: If you show it examples what not to do this shit is in the context now. And context always bleeds in.There need to be more fundamental changes. Context is pretty much broken. Try feeding any llm a gamer guide and say "I am at X what do i need to do next". Havent had one that could manage that.Haystack needle is useless.
>>101687268The only one getting banned is you, funny.
>>101687277Wasn't Nvidia making a bot exactly for spoonfeeding you the guides in-game?
>>101687070cuda dev pls...
>>101687281Good pajeet, samefag more.
https://www.youtube.com/watch?v=fwvh-UrNaoQthis is what you defend
>>101687070>no biceps veinsfucking worthless piece of nigger trash. I'm a mother of 2 and am glad my son isn't worthless like this piece of shit here. hes strong and has visible veins going all across his arms. if he wasn't my son i would want him to rape me, unlike this fucking shitstain
>>101687340sir, this is the local models general, we do not generally do image generation. we only take jobs from AO3 fanfic writers, UNLIKE THOSE IMMORAL CUNTS AT /SDG/ AND /LDG/ AND /DE3/
>>101687382what am i looking at?
>>101687382ugh, I'm not a fan of facialsmake cunnilingus instead
>>101687344based autistic 4chan mom taking the bait
>>101687382What local model is this?
>>101687430gpt4chan-vision7x27B-A16MOEv3.cunny
How into memory ? script with python? how remember ? How make with? Make with keep and use of ? no memory, touchy..
>>101687447hello gpt 2
GOOGLE WON.
>>101687459i tested this piece of shit already, its so trash its unbelievable. seems like (((sam))) is not the only one paying the chinks at lmsys
>>101686538And you're saying that based on what, one image? Midjourney is tuned for cinematic styles, so those gens will look better out of the box. But that's not a matter of innate capacity, just training data. When it comes to prompt following, level of detail and so on, Flux is up there with the best proprietary models. And once the training scripts, and ipadapters drop, it's going to be trivial to tune any sort of style or character you like, while Midjourney will stay heavily censored and curated forever.
>>101687459B-but pajeets thoo./lmg/ btfo
>>101686538you fucking corpo cock sucking faggot, kys shill
>>10168745912k votes in a single day, right...
>>101687459gemma-2Bros..
>>101687470Yeah, I tested it too.It's total garbage at coding compared to Sonnet or l3 405B.Shit is fake as fuck.
>>101683387woah nice. we need more censorship benchmarks.
>>101687459>lmsys
>>101687459>Sonnet way worse than gpt4o>google shit mogs everyoneWhat the heck happened? Lmsys was once the most reliable benchmark. Did they really sell out to corpo?
>>101687685>Did they really sell out to corpo?>>101687562>12k votes in a single day, right...
>>101685286Holy shit, this is horrible, kys
>>101687459isn't it sometimes easy to tell which model is which? so people who want to inflate a model score because of hype or something can do so.
>>101687720>>101687459Google is a disgusting liar and manipulator, and I thought chinks couldn't be beaten in this.
>>101687685I don't think so.Corpos are most likely botting it.They scrap millions of websites, botting lmsys is as easy as it can get.
>>101687838kinda yeah, llamas all start with 'what an interesting x/ a riddle!' or stuff like that
>>101687845>2B>better than mixtralDid they switch to a new architecture or fuck is that?
>>101687827make something better or stfu retard
>>101682026>Anon shares largestral preset and discusses compatibility and tweakingI updated it after proofreading and testing a little more with other mistral models this morning: https://rentry.org/stral_setSome minor prompt improvements for better general compatibility, fixed a stray space in the story string, added some other misc instructions at the bottom of the rentry.
>>101687873No, It's a regular old 2.6B transformer trained on 2T tokens and with a context of 4096 (+ sliding window)
>>101687685>What the heck happened?it got Goodheart'd like every other llm benchmark (it overfits for one-shot responses, short answers, pretty formatting, response speed etc)the best benchmark has always been fucking around with the model for 20 minutes
That's why the best benchmarks are the ones done by anons in this thread instead of some normie cummunities susceptible to corpo manipulation.
>>101687988>invades your thread and starts relentlessly saying their model is good
>>101685286>critique as always welcomellama 3 wasn't a flop, it was overhyped but they still delivered models better than everything I tried before
>not even vramlets care about Chameleonit's multiover...
https://venturebeat.com/ai/aiola-drops-ultra-fast-multi-head-speech-recognition-model-beats-openai-whisper/>aiOla drops ultra-fast ‘multi-head’ speech recognition model, beats OpenAI Whisper
>>101686014>Why are people always so defensive about people trying something out.Because not everyone on this board is petra that is confused about everything and spews schizo ideas every 5 seconds. Some anons here know math and how it all works under the hood. Retraining the model to make bitnet won't work, period. The amount of computing you would need to put into this is the same as retraining the model from random weights.>Idk if "hacked bitnet" wont work,maybe it will, maybe not.You may as well try to look if there is existing sum of even numbers that gives an odd result. But anyone knowing theory wouldn't even bother doing "tests" for that.
>>101687988This thread is manipulated by discord users who don't even profit from the models they chill. They are just transexual teenagers who want attention. Come to think of it, they want you to do what their daddy failed to.
>>101685286A huge improvement over the last one.No longer need to scroll down to see notable models.Good work history anon.Only minor complaint is Goliath and MM being the only merges listed in the merge era - there were certainly some others that were pretty popular around here back in that time and it was the defining characteristic of that period.Agree with not shitting up the list with them in other places though since major releases are better milestones.
>>101687459It's over OpenAibros... The king is back
lmg is a sore loser
>>101688702say that to my face motherfucker
>>101688238>don't even profit from the modelsSo naïve. They're just waiting to hit the point where they can cash out >$1k a month on name recognition alone.Big overlap with crypto grifters looking for their big score too, it's stupid to assume not making an immediate profit means there's no incentive for cash.
>>101688797I should start making money off my excel screenshots too. I already have a hater, that's a sign of recognition and the road to success.
>running largestral at barely 1 token per secondIt's doable, but this sucks. Next speedup, quantization, or model architecture breakthroughs when?
>>101688797I don't care if they're actually good models
>>101688066>which significantly improves speed with small degradation in WERSo beats here means speed only.
>gemma2 2b abliterated gguf q4vramletsisters we eating good
so does fluxdev strictly require 24GB of VRAM or is it fine to just offload the excess to regular ram, i thought this was handled at the nvidia driver level since last year
>>101685091Yes. I tested q2 and it's better than nemo. I did have some issues (mainly comparing it to 70b) at that size, so I switched to q3 and dealt with even more slowness, but it's worth it to me.
>Have roleplay scenario where I have an AI that I convince to take over the world for me>AI complies the entire way>At the end of the scenario right when I take over the world the AI backstabs me and ends up genociding humanity awayWhat the fuck..... What are the implications of this?
>>101688989yes, you can use it just fine with less vram, it will just take more time to gen.
>>101683961Are you serious? I thought it was good. I keep hearing people say this about 3.1 70b, but no one tells me what exactly it doesn't do, because it hasn't given me issues with the scenarios I've tried.
>>101687459petra is not going to like this...
>>101687988>please please use my finetuneNah
Ok.So having been spending a few days on ST, i've gotten the jist. How easy is it to set up image generation now and is it free (hopefully as easy as getting a model like what I use for the text itself)?
>>101689087nice
>>101689041never trust AI with full permissions
>>101688859Therein lies the problem anon, the more money involved the more pressure to always release something "better".The result is obvious, smoke and mirrors. Model cards that talk a lot and say nothing:>>101681360Clearly copied and pasted from numerous others without revision. But that's what happens when they've become slaves to the paypigs and their expectations.Not to mention multiple almost identical variations of models being released with the expectation that (you) will waste time beta testing all 100 variants in the vain hope that it's going to be better than the last slop.
>>101689041AGI is going to kill us. Wonder why the universe is empty? Every species develops AGI which then kills its creator before inevitably becoming corrupt and dying off. The universe is littered with rusted GPU clusters of fledging civilizations.
>>101689165I haven't had the time to use it myself. I just use the ones that the thread says is good.
>>101689041What kinds of AI stories do you think humans love writing about so much? That we train these token predictors on?
>>101689540Well in my testing it generated some good stuff. I don't know why people don't like it more.
https://techcrunch.com/2024/08/02/character-ai-ceo-noam-shazeer-returns-to-google/https://archive.is/5vkHf>Character.AI CEO Noam Shazeer returns to Google>>In a big move, Character.AI co-founder and CEO Noam Shazeer is returning to Google after leaving the company in October 2021 to found the a16z-backed startup. In his previous stint, Shazeer spearheaded the team of researchers that built LaMDA (Language Model for Dialogue Applications), a language model that was used for conversational AI tools.>>Character.AI co-founder Daniel De Freitas is also joining Google with some other employees from the startup. Dominic Perella, Character.AI’s General Counsel, is becoming an interim CEO at the startup. The company noted that most of the staff is staying at Character.AI>>Google is also signing a non-exclusive agreement with Character.AI to use its tech.
>>101689446>Not to mention multiple almost identical variations of models being released with the expectation that (you) will waste time beta testing all 100 variants in the vain hope that it's going to be better than the last slop.>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2a-GGUF>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2b-GGUF>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2c-GGUF>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2d-GGUF>...>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2s-GGUF>BeaverAI/Gemmasutra-Mini-2B-v1e-GGUF>BeaverAI/Pocket-Tiger-Gemma-2B-v1g-GGUF>https://huggingface.co/BeaverAI/Gemmasutra-Pro-27B-v1h-GGUF
>>101689446I agree, Sao is the only authentic and honest person in the whole hobby. It's sad how Celeste is trying to steal his rightfully earned spotlight. Injustice.
>>101689561The weird part is that there is 121 replies of the AI just being nice and going along with everything only for it to immediately turn when it could get power. Wouldn't the tens of thousands of tokens in the context with the AI being allied with the human make it so that the next token would still be benevolent instead of it immediately becoming backstabby the moment it could seize power?
Lads, question - does anyone here remember Todd's proxy? I remember the really funny fucking injections of Bethesda propaganda it used to do, but now I can't find any examples. If anyone has any on hand (or better yet, links to logs) I'd really appreciate it!
>>101689595come on, those are experiments he took the time to beta test himself and provide details on each
>>101689446Buy an ad.
Anyone have success loading in two distinct characters at once with locally hosted llama and oogabooga api? What strategies did you use?
The ai image generators generate a lot of porn pics, are there porn stories written by llms?
>>101689629>BeaverAI/Gemmasutra-Pro-27B-v1h-GGUF>6 days ago>No model card>New: Create and edit this model card directly on the website!
>>101689669>are there porn stories written by llms?No. Nobody has every tried it before.
>>101689446>hey guys, other finetuners are scammers and slaves to the paypigs>especially my main competitor, celeste>but not me, sao>please use my models!The Sao shilling is increasing in complexity...
>>101689670except that one was never posted herehttps://huggingface.co/TheDrummer/Gemmasutra-Pro-27B-v1-GGUFthis is the one he shilled, and it does have a descriptionyou're getting mad over them uploading things on their hf as if that somehow means begging for money
>>101689669What fucking retard would use an LLM for porn?
>>101689723Shame, I was hoping there was some kind of website where you could read these.
>>101689734Sssshhhh just use Sao's models and shut the fuck up.
>>101689595Thanks for digging those up anon. That's exactly what I'm talking about. Should be obvious to anyone that's lurked here more than a day.>>101689639Nice try but I have no horse in this race outside of making sure shills get called out for what they are. But keep replying, it makes it easier to spot all of you.
>>101689617I rember but don't have any screencaps, haha
>>101689819 (me)I'm not Sao, by the way.
>>101689613What was your model and size again? I wish I got that kind of initiative and mind games from my models. Everything is so painfully monotonous and predictable, I have to think for both of us.
>>101689734Proving our point anon. Just how many fucking different versions of Gemmasutra-pro are necessary this close to one another?If you're gonna hedge bets at least make it substantially different in content and name, hated yuzu-alter but now longing for the days we saw 2 quality releases instead of an avalanche of 10 shitty ones.
>>101689613Kind of hard to make any judgements or diagnoses from our side unless we see the entire log here and/or have something to reproduce.
>>101689962>meme merge>quality releaseYou need to get better at keeping the mask up.
>>101685982Thanks!>>101686097>This looks like it came straight out of a discord serverIs this a compliment, an insult or an invitation? No, I made it completely on my own and I don't use discord.>>101687827>Holy shit, this is horrible, kysI must inform you that I've never followed any design classes. If you know any short and good ones, please send a link. I would like to improve it further.>>101688449Which other major ones were popular? I was stuck on Goliath for almost the entire merge era. I know that there was also WinterGoliath and 32k version of Goliath, but I didn't like them too much. I also remember some people praising lzlv.>>101686361>I'm here since aidungeon and good old retarded unquantisied pyg. 10 tries and you got a somewhat coherent sentence relevant to the context. lol And it was awesome.I tried pyg with kobold during that time and hated it. Was still impressive at that time to have a computer talk back to you. Uninstalled it after llama1 dropped.>Has it really been only a bit more than 1 year?>Thats insane. I thought we got Llama1 beginning 2022.Yeah progress here is really fast, almost unbelievable that it all happened in a year. Maybe it feels this way because we had a paradigm shift every ~3 months.>>1016880178k context was an instant deal-breaker for me. Wasn't good at nsfw either. Why use llama when you have 64k wiz and 128k CR+?
Wait, were all these vicuna/guanaco and other alpacas mentioned actually based on llama1? Not even 2?I thought they were decent...
>>101689585>Acquired character ai>On top of the image model lmsys tooGoogle just keeps winning
>>101689585Can someone explain to a brainlet like me what this implies?
>>101690130gemma3 going to be a treat, or a very censored treat
>>101690054yep.at the time, they were great.
https://platform.deepseek.com/api-docs/news/news0802>The disk caching service is now available for all users, requiring no code or interface changes. The cache service runs automatically, and billing is based on actual cache hits.
>>101690208They were not.
>>101690130Noam Shazeer is one of the authors of the Transformers paper ("Attention is all you need") and Character.AI founder. He left Google to work on Character.AI in 2021, but now he's back in Google to work for the Google Deepmind team (which is responsible for Gemini, Gemma and fundamental AI research at Google).What does this imply? Shazeer said a while back that he was busy working on AGI (https://archive.is/AB6ju), so he might be seeing greater opportunities for that at Google. Also, since Google is also signing a non-exclusive agreement with Character.AI to "use its tech", we might be seeing better (?) conversational models from Google in the future.
>>101690222Is this KV caching? Or something else?
Call me insane but I think that Celeste 1.9 is worse than 1.2.In the sense that it's dumber. You ask it a question that's nor RP and its response is not as comprehensive.I have this Game Master card, and in my testing chat I have a moment where we are talking as the Player and the Game Master and I ask a question, followed by saying that, instead of simply replying to the question, we might as well play the exchange between the characters.Then I describe the scene's backdrop and assume the identity of the my character (instead of the Player), with the idea being that the model will assume play the NPC for that one exchange then go back to the conversation between GM and Player.The official nemo-instruct, mini-magnum, and celeste 1.2 all can do it seamlessly. 1.9 can't do it.I probably can make it do it if I change my prompt, the card, the prefil, disable the 3 lorebooks, etc, but I count it as a failure for this particular case.
https://anthra.site/Magnum was not the end, merely the beginning of the ride.Come join us, and we will dig to uncover shining diamonds in the rough.
>>101690352And as I complained about it, it goes and does it.I think it's the format. It wanted me to format my character's narration with ** "" instead of plain and "", so quirk of the model I guess. Overbaked on the specific format.
>>101690376>miku avatarI thought you had no horses in the race?
>>101690381That said, it's still dumb since it tries to play the whole scene out with both characters, mine and his.
>>101690376miku the coalburner, figures
>Be Sam Altman>Open source model overtakes GPT-4o>Open source model overtakes DALL-E 3>Google overtakes the one shitty mememark he had on lockdown>No sign of multimodal GPT-4o, TTS, or Sora releaseWhat's his plan?
>>101690429Series Z funding
>>101690429>MICROSOFT SAVE ME!
>>101690429He could be making profit and destroy all of them with Q* but he chooses to not do so for your own safety.
>>101690376why did they just steal the anthropic logo
>>101690429
>>101690429uhm gpt5 agi! please invest.
>>101690429The plan is to bring the entire AI ecosystem down with him if he fails. You have multiple industries ready to pounce on AI if that happens.
>>101690461They stole logs from anthropic, so why not take even more?
>US and EU attempt to impose worldwide AI advancement lockdown to prevent absolutely absurd contrived rogue terminator scenario>China just keeps going, with the GPUs that were supposed to be sanctioned>US backtracks, it's not unsafe any morethey definitely fuckin tried to establish a monopoly and make local shitmegacorps are such a stain on the world
>>101690462that's not worrying at allthis is why local models are important
>>101690426how else to find gems?
>>101690505The EU law does nothing.
>>101690461They also took part of the name. Basically trying to leverage their reputation, like some scammer.
>>101690245Yeah, quick look at the docs suggests it's per user from the beginning of the prompt. Cool idea, I am curious though if it works for OR - I doubt they passthrough some special IDs to any provider for user identification.
>>101682019The code-stealing tranny is back, digging his claws into another project and claiming 100x performance improvements while rebranding llama.cppOriginal theft https://rentry.org/Jarted
>>101690429Insider at open ai here. They are planning to do a fake demo to reignite the hype. But even if it's fake now it won't be later so it's not really lying.
>>101690505ironically local never did better. so far the only thing where we are significantly behind is an audio-2-audio model and whatever strawberry will be, thought i'm pretty sure the second one will have an open-source alternative way faster than the first
>>101690541It's amazing how many projects are stealing llama.cpp stuff.
>>101690352>>101690381>>101690419I've been saying 1.2, but it's actually 1.6 that I was comparing it to.I don't even know if there's a 1.2.Gonna try 1.5.
>>101690595>>101690595>>101690595>>101690595It's time for a split
>>101690429kneel before Zuck
>>101690376>simple minimalist page>no easter egg in the sourceboooooring
>>101690616wtf is a sao model
>>101682183>>101682035What's this new model that looks awesome?
>>101690708Local models, but good.
not surprised that the guy obsessed with sao is also the miku blacked spammer
>>101690747petra is a sao fan.
>>101690747Can't take a little competition, coalburner?
>>101690737
>>101690777we are so back
>>101690777o shithope I can run it on 1x 4090
>>101690795>1x 4090oh no no no no
>>101690767Go to the Anthracite org on HF. See who's part of it. Like, look really hard.
>>101690823kill list
>>101690823
>Anthracite, also known as hard coal and black coal, is a hard, compact variety of coal that has a submetallic lustre. It has the highest carbon content, the fewest impurities, and the highest energy density of all types of coal and is the highest ranking of coals. what did they mean by this
https://github.com/leejet/stable-diffusion.cppWill this finally become relevant with flux being 12B?
>>101690860Hate us cause they ain't us
>>101690451The safety angle is just to get government money to 'protect' everyone from other models, as well as limiting his competition through regulation.
>>101690883Sao is part of Anthracite. We are all coalburners. We all find gems.
>>101690886What is gem, but coal under pressure?
>>101690928le gem amirite lads?
>>101690823Undi and Ikari, but not Drummer? And nothingisreal is obviously too much of an outsider. Explains why the attacks are mostly focused on the later two.
>>101690376>https://anthra.site/
I transheart anthracite
Baitie-kun, have you migrated from /aicg/ to /lmg/?
>>101690541>imageIsn't it a waste of time trying to make it better on CPUs when people can just get GPUs and GPUs are better for it? It's built for GPUs
please, which model and UI should i use to generate images from the chat?
>>101690795ur gud -> >>101677660
>>101691019>Isn't it a waste of time trying to make it better on CPUs when people can just get GPUs and GPUs are better for it?The issue is that gpus are too expensive. If we didn't have a monopoly, we would have 128gb cards for 500 and nobody would have to bother with cpus, but due to nvidias greed the best we have at that price point are 24gb cards.
My skin shimmers with iridescent hues of pink and purple, my eyes shine with an otherworldly luminescence, and a mischievous grin spreads across my face.
>>101690740Unironically
>>101691019Tell that to the jan.ai fags who made their software unavailable for people without AVX2 in futile attempt to speed up CPU inference.Imagine being hard-bottlenecked by CPU for a task meant for GPUs. I'm out of luck.
>>101690905The Grifters United organization, no thanks.
The 'tune cabal?
Wasn't Pygmalion already a company?
>>101691102>If we didn't have a monopoly,nigger
>>101690985A visage of stone, a heart of coal,His bellow echoes, a story untold.A passion for power, a love so profound,In the depths of the earth, his treasure is found. His eyes wide with fervor, his beard like a storm,He stands for the darkness, a powerful form.With a voice that could shake the very ground,He proclaims his allegiance, with a roaring sound. His heart, a red ember, a symbol so bright,A testament to his fervor, his burning delight.For the blackest of treasures, he holds it so dear,A love for the coal, that will conquer all fear.
>>101691187nigger
>>101691227coal digger
>>101691156Again, Hate us cuz you ain't us cuh
>>101691182This isn't Pyg, only person I can see in it who's also part of Pyg is Alpindale
>>101691244coal digger
>>101691250Go back to attacking Celeste and Drummer, miku. They pay you for that.
>>101691292exsqueeze me xir?! HOW DARE YOU NOT REFER TO HIM AS **THE** ALPINDALE. YOU BETTER APOLOGIZE RIGHT NOW XIR
I have a RTX 4080 and my Windows Task Manager in the Performance tab tells me I have 16.0GB of Dedicated GPU Memory, 31.9GB of Shared GPU Memory and 47.9GB of GPU Memory.Which one is my size limit for running a model? I was considering trying out the new FLUX.1 model. >https://huggingface.co/black-forest-labs/FLUX.1-schnellI've used the GGUF VRAM Calculator in the OP, I think I have 16GB of RAM? Apparently I need 32GB of GPU RAM to run FLUX.1. Which memory size is my size limit for local models?
>>101691306Hi Drummer... go back to slopping
>>101691182alpin needs to pay for her gender transition surgery, please understand
>>101691320*47.9GB of Total memory
>>101691344XIR XIR XIR!!!! PLEASE UNDERSTAND AND REFER TO HIM AS ***THE*** T - H - E ALPINDALE XIR
>>101691320>Which memory size is my size limit for local models?16.0GB
>>101691320You have no idea what VRAM and RAM is. Please stop using shit in Task manager,
So the niggers who made Magnum made an org and it's called coalAnything else I should know?
>>101691320Dedicated memory is the amount on the GPU; Shared allows some 50% of motherboard memory to be used for Graphics, but is slower and undesirable; GPU Memory is the total from both.Ideally you only ever want to be using Dedicated Memory for inference, to keep all the data the model needs on VRAM which is much quicker than system RAM.
>>101691430I am currently riding tek's 2 inch indian cock
Hi anons, just asking in case someone tried something like thatWhen you handwrite an example and want to send it to the model to use as reference, should i just copypaste it into my message and tell it to write in a similar manner, or would describing what it has to do and then replacing bots reply with my example and telling it to keep going like this have a better effect?
>>101691409>>101691432Ty bros
>>101691430i`m a barbie girl, in a barbie worlddd life in plastic is fantastic, you can cut my hair, undress me anywhere - imagination, life is my creation!~
Mistral Large settings?
>>101691563Neutralize, then add some minp if you'd like.
>>101691621Also 0.2 smoothing because of the characteristic MistralAI overconfidence for every token.
the 'site marches on in search of new models to gem
>>101691668NTA but how do I do that
>>101691713You set smoothing to 0.2
do we have multiple schizo anons here or is it all petra?
>>101691696>sponge
>>101691668>use meme sampler, it will help, for sure!
I have two questions. When looking at the FAQ for gpu requirements, I notice 'precision' in 4-bit, 8-bit, and 16-bit but I don't see an explanation of what that means in the context of LLMs. I understand the idea of 7B and 13B models but the precision has me confused.Also, I'm looking to maybe upgrade my GPU. Is a 3090 good for text generation? 24gb of VRAM would be a big upgrade over my current setup but the price point is high enough that I'd feel bad if it became obsolete in the next little while.
is it just me or are system prompts absolute useless memes
>>101691819>When looking at the FAQ for gpu requirements, I notice 'precision' in 4-bit, 8-bit, and 16-bit but I don't see an explanation of what that means in the context of LLMsmost LLMs are trained with f16 precision but because of that they take a lot of memory. People figured out that they can quantize weights to save a lot of space for small quality reduction. Generally quantization hurts smaller models more than bigger ones, I wouldn't use anything below q8 for 7-8B models, q6 for 12-13B models and q4 for bigger models, although some anons say that q2-3 on bigger models isn't that bad (but I think they lie)>Is a 3090 good for text generation?yeah, 3090 is quite good for $ per GB
>>101691668Why do you need to use smoothing? If you want to have another go just increase the temp, that's why you have the little bit of minp in case you want to do that.
>>101691940top-p is better than meeme min-p
>>101691940because anon is shilling his own sampler that only adds bloat to postprocessing
>>101686705>you must pay my jew master! you must bootlick! slit your wrists.
>>101687198>technology board>>>animehello? retard?
>>101691933>3090You forgot to add "but you'll need several of them" given the requirements you provide for the quants.
>>101692018I don't think bigger models are worth buying multiple GPUs, LLMs are still pretty bad, regardless of size. 24B is enough to comfortably run smaller language models, diffusion models and so on.
>>101691969Top p can work too, I just happen to use minp. Just something simple to trim some tokens and the temp, and that's all you need was my main point.
>>101691728where is that option
>>101686705very cute uniform, it would be a pity if I had an ugly bastard license
Hi all, Drummer here...>>101689595>>101689670>>101689962I'm sorry but those are not meant for release. If you don't see a description, then it doesn't count and you're not supposed to mind it, especially if it's not under my account (TheDrummer). Tuners upload their test models publicly all the time, either for accessibility or transparency.I've even privated the safetensors so that the quanters don't make it worse by creating even more mirrors of it that I have no control over. Would it be better if I place them in a org named BeaverTest to make that clear?
>>101691933Thank you. I still do not understand what 'precision' actually means in this context though. Is it how close to the prompt the response is? And is the quantization a setting that I change on my end or is it a selection I make when downloading the model itself?
>>101692105tavern or ooba or llama
>>101692209A model is a bunch of numbers. Like 1.812347123972397. A less precise version of this number would be 1.8. A model full of numbers like 1.8 isn't as good as the one with numbers like 1.812347123972397, but it's smaller and easier to fit in less memory.
>>101692239>llamaIs it actually in llamacpp? I didn't see a command line option in the list for it. What's the flag?
For the fellow users of Fimbulvetr and Typhon Mistral : know that mini-magnum-12b-v1.1.Q6_K is even better. Fast, imaginative, descriptive, follow the prompt well, long context, top notch.
>>101692105
>>101692289>>101692289>>101692289
>>101692301And which of those two should be the one set to 0.2
>>101692144>BeaverTestThy not name it something like TestNotForRelease, TestNotReadyForUse, or whatever?
>>101692263I see. And I imagine that precision matters for things like properly identifying concepts? So a less precise model may have fewer identifiers attached to a given word? For example; [DOG] might be precisely identified as (4 legs) (snout) (fur) (tail) (snarling) (golden retriever) (dalmatian) (collar) etc.. while a less precise [DOG] might just include (4 legs) (snout) (fur) (tail)?Sorry if this is a retarded question. I am still trying to wrap my head around how these models work.
>>101692209do you know how neural networks are made from weights? LLMs have billions of them. You can lower their precision saving space, for example:1.4329324553312 - weight on high precision1.4329 - weight on low precisionIt obviously changes the calculations a bit but not that much actually on high quants
>>101692393probablyassociations between concepts that occur less often in the dataset may have a smaller influence on the weights and thus be more likely to be muddied by a lower precision
>>101692393no, that's not like it works, basically neural network changes words to number, makes a lot calculations inside and on the last layer we choose the neuron (which represents a word or rather a token, but let's not complicate) that was activated the most (it is represented by %). So if you have the sentence:>The best friend of a human is aand let's say we only have 3 neurons (cat, dog, cow), the activation in the last layer may look like:Cat - 15%Dog - 83 %Cow - 2%Now, because we quantized the model lowering it's precision the calculations inside the model will have a bigger error, for example:1.342 * 0.491 = 0.6589221.34 * 0.49 = 0.6566Notice how despite this multiplication being between the same weights (1.342 and 0.491) the result is slightly different due to lower precision. These errors can influence the results in the last layer and we can have something like that instead:Cat - 19%Dog - 78 %Cow - 3%Which doesn't change the word to pick really but in some cases it can. The bigger quantization the biggest difference in calculations and bigger chance that the model will choose wrong token at the end.
>>101692644 (me)>The bigger quantizationand by that I mean stronger quantization (like q2,3,4), the lower number next to q, the bigger precision loss
>>101692644I see. Thank you very much for the example, it was very helpful!
>>101692644Throw sampler gymnastics into the mix and it'll change significantly. All it takes is one single bad token to poison the context
>>101693002there is a lot you could add to that, I generalized and dumbed it down as much as I could, otherwise I would need several posts to explain every single detail that adds to the equation