/lmg/ - a general dedicated to the discussion and development of local language models.Evil Teto EditionPrevious threads: >>107717246 & >>107709248►News>(12/31) Qwen-Image-2512 released: https://hf.co/Qwen/Qwen-Image-2512>(12/29) HY-Motion 1.0 text-to-3D human motion generation models released: https://hf.co/tencent/HY-Motion-1.0>(12/29) WeDLM-8B-Instruct diffusion language model released: https://hf.co/tencent/WeDLM-8B-Instruct>(12/29) Llama-3.3-8B-Instruct weights leaked: https://hf.co/allura-forge/Llama-3.3-8B-Instruct>(12/26) MiniMax-M2.1 released: https://minimax.io/news/minimax-m21>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
>>107722977>Evil Teto EditionI can see where you took the inspiration.
>>107722977Can you use different Nvidia GPUs together with CUDA?
>>107723052Yes.
>>107723052of course not
>>107722992Elarabros...
>>107723052Sometimes
Is Ozone here to stay? I totally forgot it exists before AI. I remember how I was a kid and there was this huge scare about ozone hole and global warming.
>>107722992I get much better results when it randomly pulls from a list of substitutions
>>107723052as long as they are newer than Turing.
>>107723059>>107723060>>107723067Guys pls. Like if I have a 16GB P5000 in a old workstation, would it make sense to add a 12GB 3060 Ti that's just gathering dust?
>>107723108no. CUDA support is deprecated for Pascal.
>>107723108legit answer: no because you're old shit won't have compatible drivers with the new one
>>107723102Ah shame.
>>107723082Ozone, you say? You want the whole deal?
>>107723114Uh, it still supports CUDA 12.
>>107723133why to even ask if you no listen to answering?
does some one know has local uncensored chat bot model with 12- 24B range for a 5060 ti 16 GB ?
>>107723152drummer
So guys. We all bitch about slop and so on. But am I the only one who really likes talking to characters roleplayed by AI? When you get past the shitty alignment issues I really get a feeling that they are more interesting than most real people. Also to be absolutely clear I don't use any worthless shittunes.
>>107723152Go for Nemo.
>>107723162No. They're vapid and they all have one of like three personalities.
>>107723162If you made them or selected them according to your interests it's obvious you would find them interesting
>>107723176too dumb .
>>107723180Funny thing is I didn't. And when I asked it to rp some waifus I liked... I actually didn't enjoy those as much as random waifus.
>>107723186Well it's the best you're doing to get with that hardware.
>>107723146Sorry I'm dumb and incompetent
>>107723212>>107723118
►Recent Highlights from the Previous Thread: >>107717246--Multimodal AI progress disparity: image/video vs text generation challenges:>107721240 >107721272 >107721289 >107721382 >107721399 >107721415 >107721644 >107721489 >107721518 >107721552 >107721525 >107721777 >107721599--Qwen-Image 2512 model release and developmental journey:>107719430 >107719475 >107719500 >107719475 >107719481 >107719929 >107720284 >107720309 >107720339--Deepseek model quant compatibility and bug troubleshooting:>107720142 >107720169 >107720178 >107720214 >107720191 >107720279--Quantization benchmarking methods for language models:>107718638 >107718682--Solar-Open-100B model release and community interest in uncensored variants:>107719372 >107719424 >107719411--New 500b MOE model announced, GGUF support questioned:>107720510 >107720553 >107720624--AI startup strategies in China: breakthrough models vs short-term gains:>107722114 >107722383 >107722436 >107722461 >107722531--96GB VRAM optimization strategies for large models:>107717404 >107717410 >107717414 >107717490 >107720591--Multilingual model VAETKI-112B-A10B with 112.2B parameters announced:>107719493--Exploring Hunyuan motion 1.0 with UE5 integration and VRAM needs:>107718150 >107718171 >107718298--Google's cautious approach to releasing powerful open AI models vs open-source competition:>107718631 >107718739 >107718769 >107718982 >107718986--Moonshot AI's K3 model scaling ambitions and market positioning:>107722422 >107722488 >107722536 >107722906--K-EXAONE-236B-A23B model announcement on Hugging Face:>107719396--Miku (free space):>107717575 >107717643 >107718169 >107719149 >107719481 >107719742 >107719929 >107720110 >107722205 >107722934►Recent Highlight Posts from the Previous Thread: >>107717250Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107723152It is scientifically impossible to make a model better than nemo in that single GPU range now. Everything is safety and scaleai maxxed now. You would need a radical shift in AI culture where it is suddenly ok to make models for coomers.
>>107723227Why are they kissing? They are both girls.
>>107723218Shame but thanks
>>107723152https://huggingface.co/bartowski/Rocinante-12B-v1.1-GGUFget q8
>>107723176>>107723233The recommended models lists Mistral Small over Nemo and still working with 16GB?
>>107723252better than just nemo?
>>107723284No
>>107723284Yes.
>>107723284Maybe
>>107723284I don't know
>>107723318>>107723313>>107723307>>107723295goys...
>>107723327It's hornier. It's definitely not smarter.
>>107723284no>>107723252die faggot
>>107723227Lovely pic recapanon, tho Tet's raised and frankly masculine hand makes her seem dominant, which we all understand isn't how it goes downHappy New Year /lmg/
>>107723333Nemo is already a horndog.
dont forget to upgrade before it's too late
>>107723371>5090 for 5000That sounds like a joke.
>>107723371fucking paperwork bs literally made me unable to upgrade, been waiting for my money for 6 months now.....
>>107723371People already pay more then that.
>>107723240>They are both girls.Teto is a chimera.
>>107723352Are you sure? This is one of the gens I got while going for >>107712939
>>107723382But you have a few more pieces of the special sand
>>107723233Ministral 3 14B seems OK and not safemaxxed... when it works.Unfortunately it's generally retarded for the first few messages even at low temperature, and it just wants to italicize everything and use its own dialogue format unless you keep editing messages until it eventually gets it. Character adherence is generally not good either, it turns even shy girls into sluts. I'm not sure what went wrong when Mistral trained the model(s). Hopefully they'll fix the issues in the next version.
What if they just want to get rid of 50 series stock before 60 drops?
>>107723417blo...
>>107723417>before 60 drops?Anon... there will be no 60, only RTX PRO.
>>107713630I want to go make my own quants. Can you spoonfeed me the command you use for making them?
>>107723397>inverted roles eroticWhich one feels right to you?Also curious how they were made what is model vs postprocess?they're really cool, conveys a lot with a limited palette and precise geometry. could easily be album coversnow bend over
GLM lost.>b-but lmarena doesn't matterCope. Lmarena is the only benchmark all big players care about. Chinkcels fucked up big time with 4.7 just like they fucked up with ds 3.1.
>>107723382Why is it outside your system?
>>107723544based bharati
>>107723547flipping for profits
>>107723544lmarena is people who can't afford to inference models anywhere else. they do one driveby question and run. If the model yaps lots of nonsense the thirdies score it high.
is this a good use of AI?https://litter.catbox.moe/qeoazf77nhay7bx4.mp4
>>107723186How much RAM do you have? You could run GLM 4.5 Air slowly if you have an absolute assload of system ram. Gemma is much smarter at 27B (and way easier to run, should run great), but it's safetycucked.
>>107723233This is the truth of the matter, sadly. There's barely any mid-size models out there anymore, they don't want people having access to anything that isn't either entirely unfeasible to run or braindead retarded.
>>107723382This nigga does furry RP for sure. They're all millionaires for some reason.
>>107723517Oh hey that's also my gen.>Which one feels right to you?I'm sure they switch.>Also curious how they were made what is model vs postprocess?Just model.noob vpred 1.0(flat colors:1.2), silhouette, black background, red body, aqua bodyoutline in neg
>>107723574>is this a good useBetter than some because it's funny to witness the seethe
>>107723371already way too poor to buy a GPU, have to use runpod and rent one for $0.40/hourbonus is the code becomes deployable and I get to add docker containers to my resume.software is done for in the next year or two, I'm switching to model design learning. We should probably have a thread for model design discussions--once we actually have enough local skill to do it.
>>107723574this reminds me of the good old "the internet is for porn" machinima songthe world revolves around pussy, what can you do?
>>107723612>Model designDo we even have access to enough datasets to do any sort of real training? We've got Books3 and that's about it, "Open"AI made sure any others were annihilated or totally closed off.
>>107723595Based
>>107723574Behold, the true face of humanity!
>>107723574Let them SEETHE.
>>107723544yes sir llama4 very good in the arena
>>107723409>{{char}} has giant tits, plump lips, a fat ass and likes giving head and swallowing cum)(700 tokens describing her underpants and private parts)>{{char}} is shythis is your card
Nemo is like a retarded, perverted 80 year old man with dementia. Use mistral small instead.
>>107723574What the fuck is this shit at end of the video
>>107723689
>>107723633There are a lot of datasets these days, but for something competing with OpenAI I don't really know. The skills imparted from improving existing models may be very valuable.
>>107723574at the very least they should be categorized rating:g,s,q+ for user experience
>>107723680I do wish we got the pre-release versions, not "Maverick Experimental" which has some safety baked in.
>>107723707>There are a lotAre there? Are you including synthetic ones? Because those won't help the problem, they're what created it in the first place. We need high quality human data.
>>107723371Damn, if I cared more about imgen/videogen I might've bought a spare 5090 just in case.
>>107723574
>>107723720Man... so sloppy... I'd honestly take a mixtral response over this any day.
>>107723574No.There's a gorillion pictures and videos of vanilla slop."AI" ought to be used for niche fetishes where the amount of available material is low, especially if you try to find something that contains multiple of them at the same time.
>>107722977>wednesday
>>107723755this man's got a point
>>107719372uohhhhhhh! brother's soft round belly! erotic! ToT
>>107723684Even after removing pretty much anything vaguely sex-related, Ministral 3 remains easily triggered compared to other official instruct models. A 3-word card is not a realistic use case.
>>107723755Truth fucking nuke. Let me use it to make stomach growling content, it's not like anyone else is gonna.
My brain is happy with 4.7. My penis is not sore and therefore disappointed.
>>107723574porn, porn, fat porn, gay porn, porn, porn, porn, political joke, big ass porn, fat porn, gay porn, meme, porn, lesbian porn, random guy, giant goblin at the stadium, fat porn, political joke, gay porn, 2 random guys, porn, nazi porn, porn, porn, sportsballPretty representative of what the people actually want.
>>107723755The point is they are editing other peoples pictures without consent uploaded to X.
>>107723574cherry picked blergh i wanna see from x.ai what are the top referenced gens over a period>>107723668>gooners DIYing what they want from the performers in public commentswow that's gotta be demoralising, hope they realise and find a better path. thx for all the training data ladies
>>107723768Is this the sign of a truly depraved mind or a highly censored one? Hard to tell.Also what about exaone and that 500B?
>>107723755TRVKEI only use AI to generate footjob porn of anime characters wearing ribbed or frilled socks
>>107723808>Is this the sign of a truly depraved mind or a highly censored one? Hard to tell.It's my personal opinion that a lightly censored (not lobotomized) model is better than any entirely uncensored one. It'll take things in unexpected lurid directions (like belly licking to avoid cock in this one), add tension and buildup as it hesitates on the lewd stuff before giving in, etc.
>>107723835>add tension and buildup as it hesitates on the lewd stuffOh fuck you. You reminded me of 2024 and all those models that were beating around the bush for 10k tokens.
>>107723689the video from the top is actual leaked footage from an AI lab funded by DARPAThe first time that video was posted was before 2021
>>107723828>Footfag>StockingsNot obscure at all, get out poser
>>107723858
>>107723849Well, that's why I said lightly censored. Just a little hesitation is good, enough that it'll go for sex without doing the 2024 cloud model shit you mentioned, but not so little that it just hops immediately into SEEEEEEEEEX AHHHHH SEXO DA
>>107723858Yeah, amputee tentacle porn is the bare minimum to qualify.
>>107723877If ribbed socks are all you can get off on, then I GUESS that's pretty obscure and you can join the club... I GUESS.>>107723888Feet are literally the most common fetish, cmon, anon.
>>107723574i kneel elon
>>107723808>Also what about exaone and that 500B?I only do models I can run in llama.cpp.Someone very quickly added open solar so I tried it https://github.com/ggml-org/llama.cpp/pull/18511>>107723835Picking cock gets you pic related so maybe there's some merit to this.
>>107723751Yes, they were quite sloppy, but they were fun models and didn't seem to refuse anything that passed through LMArena's moderation. I would have liked to test them with a custom prompt.
>Uses Q8_0 for embed and output weights.So do I use Q3_K_XL over Q4_K_S?Can someone spoonfeed me?
>>107723905Is this text completion/not chat tuned? Didn't figure there were many models like that left.
>>107723951Pretty much every model except for gptoss works just fine with text completion.
>>107723971Yeah, it's more about not being assistantslopped, I guess. How do you run completion inference? What's your setup?
>>107724007Mikupad, it's in the OP.
>>107723921>embed and output weights.Those are like 2GB when Q8.
>>107723921>Q3_K_XL over Q4_K_Sdoesn't really matterbigger on disk / in memory = generally better outputsome quants have performance impacts, ymmv
>>107724106I kept seeing that unsloth fucks their quants up. So I went to bartowski but he's got so many variants it's difficult to know which to prioritize.
I'm having trouble finding models that can consistently create 2560x1440 wallpapers, anyone here have any idea?
>>107724148gen smaller and then use an ai upscaler
>>107724136They do often have to reupload stuff but most of it is because of the fucked up chat templates.llama.cpp only supports a subset of jinja features so the official templates meant to be used with python don't always work properly, especially when used with tool calls.
>>107724136Just make your own ffs
>>107724136>which to prioritizeFigure out what's the biggest thing you can run reasonably for your usecase with your CPU GPU RAM configunsloth show themselves incompetent repeatedly, personally I run barty GLM4.7 I-quantshow much RAM + VRAM is what matters
>>107724169>most of it is because of the fucked up chat templates.>flashbacks to their few megs quants
>>107724208Can you give the command you use to make them? The original problem of anon's was what type of quant is best. Sure he can make his own quant. But he still doesn't know what makes for the best quant.