/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108937312 & >>108924918 ►News>(05/21) Hy-MT2 “fast-thinking” translation models released: https://hf.co/collections/tencent/hy-mt2>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://swe-rebench.comAgentic Coding: https://deepswe.datacurve.aiContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
If you post your proooomt I will run it through deepy 3.2 8bit for you. Nothing gay or gross (jewish god hates feds and fags)
►Recent Highlights from the Previous Thread: >>108937312--Comparing Meta's recent output to Gemma and Qwen for consumer hardware:>108941473 >108941501 >108941510 >108941524 >108941534 >108941548 >108941523 >108941547 >108941560 >108941584 >108942129 >108942148 >108942165 >108942176 >108942364 >108942471--Comparing performance and optimization divergence between llama.cpp and ik_llama.cpp:>108940707 >108940728 >108940788 >108940849 >108940858 >108940820 >108940862 >108940882 >108940926--Comparing VRAM-based Gemma 31b against system-RAM MoEs like GLM:>108938997 >108939045 >108939375 >108939403 >108940898 >108940906 >108940933 >108940949 >108940955 >108941019 >108940974 >108941030 >108941045 >108941189 >108941086 >108941183--llama.cpp PR 23861 VRAM optimization and its relation to ik_llama:>108941013 >108941087 >108941481--Comparing memory systems for agents including Mnemosyne and Graphiti:>108938231 >108938254 >108938657 >108939367 >108939095--Running DeepSeek-V3.2-8bit via clustered Macs with RDMA:>108938688 >108938708 >108938820 >108938872--Broken reasoning parsing for Kimi models in llama.cpp:>108938742 >108938795 >108939414--Using randomized prompt injection to dynamically steer model behavior:>108941846 >108941856 >108941880 >108941940--Comparing DeepSeek API costs to local GPU hardware and electricity:>108939718 >108939748 >108939799 >108939808 >108939872--Logs:>108939500 >108942176--Neru, Miku (free space):>108939595 >108940372 >108941274 >108942258 >108942413 >108942894►Recent Highlight Posts from the Previous Thread: >>108937692Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108943171I don't have anything, but something with a dude fucking a female dragon.>>108943140>Ngram only really shines during refactorsNot really. When working oh new shit that heavily references old shit, it speeds things up real nice too.>>108943176>Shit's very cashSounds like it.Sick. I can probably leverage this to implement some multi step prompting shit to help refine some behavior.
>>108943182So that one anon was right huh
Please don't post in this thread if you aren't a woman. This thread is a male free space.
>>108943197>Sick. I can probably leverage this to implement some multi step prompting shit to help refine some behavior.It's especially worth doing in the thought process, since those tokens are getting stripped next turn it doesn't you can really go ape shit on it, since the ngram is actually independent of your kv cache, and will remember the thought process even when its stripped.
>>108943182Thanks, recap Miku.
>>108943225>since those tokens are getting stripped next turn it doesn't you can really go ape shit on it,since those tokens are getting stripped next turn it doesn't waste your context and you can really go ape shit on it*
>>108943229No prob. Stay hydrated and rememberbto take your HRT
I made Step 3.7 my daily driver.
>>108943155the jannies killed the real threadhttps://desuarchive.org/g/thread/108942948/#108942948
>>108943287Your post isn't real and it is just a smokecreen to distract from everyone ITT finding out that you (baker) are an actual troon.
>>108943287No you didn't.I don't believe that even for a second, anon.
>>108943289Got scared/got.
It's been a while since I tried -sm tensorLet's see.>-sm tensor12.14 t/s>-sm layer36.60 t/sWell okay then. At least it's not crashing anymore.
>>108943289That thread wasn't real. It was full of hatred aimed at a marginalized group.
>>108943287Quant/Hardware/Speed?
>>108943155I haven't git pulled llama.cpp in like 4 months. I know MTP is supposed to provide a noticeable boost in t/s. When I convert an f16 model into a gguf for quantization, are the MTP heads or whatever they're called preserved by default or do I have to add specific flags? I'm noticing there are repos that are specifically marked as the MTP version of a model which implies the model themselves have to be different from the regular versions. When I use convert_hf_to_gguf.py are specific flights required or are all ggufs from this point forward created by the newest version exported with the MTP heads preserved?
>>108943313What's your setup?Are the cards connected just by the PCI-E bus?On thing I'd love to see somebody test is if the latest changes to tensor split parallelism had any effect on the CPU backend when running two devices.It would be nice to have better speeds without having to have a copy of the model on each half of the memory pool, which effectively cuts usable memory by half.
>>108943306I have proof.>>10894331665 T/s with 6 3090s.
>>108943337Just PCIe, yes. But I was getting similarly bad performance on the shizo fork which claimed crazy improvements so I'm starting to think it's because of windows.
>>108943313I have a 3090 and then a 3060 on a dinky gimped PCIe. Surprisingly on this system tensor gives me a decent boost to tg, but cuts the prompt processing to a third. It also uses up more VRAM.
>model that lets you imitate a sound with your voice, then uses that vocal imitation together with text as input to generate the sound you actually want.https://github.com/thxxx/VTShttps://www.reddit.com/r/LocalLLaMA/comments/1trve9e/open_source_turning_vocal_imitations_into_sound/Is there another project like this? Surely there must be, this would be even better with a bigger audio gen model
>>108943155>(05/29) Step 3.7 Flash released https://hf.co/stepfun-ai/Step-3.7-FlashIt should be added to the news.
>>108943345>6 3090s.Damn nigga, nice.I'll be lucky to get half of that speed splitting it between 64gb vram and the rest in sysram. Ngram's gonna be pulling its weight here.How's the long context performance treating you?
Are there any models that automatically detect high-enough-resolution faces in crowds in an image and blur them? (Other than I guess standard image editing diffusion models like Flux Klein or Qwen Image Edit, but I don't trust them not to change other stuff or to handle high-resolution photos within my VRAM.)
>>108943393What? Why would you need a model specialize in high resolution faces? You can just detect any and blur indiscriminately.
>>108943422Any faces really. I was just thinking that a face that is already blurry in the background can be ignored.
>>108943337As of right now I don't think it's even possible to run -sm tensor with anything other than multiple GPUs.In any case, without optimizations for a specific ggml backend the performance will be bad anyways.>>108943346I have not seen Linux vs. Windows numbers after the merge of the non-NCCL AllReduce between 2 GPUs.But generally speaking the CUDA overhead on Windows is a lot worse than on Linux and -sm tensor is relatively sensitive to that.
>>108943438You'd be wasting process timeFind face > blurFind face > determine whether it's "high resolution enough" > blur
>>108943442>As of right now I don't think it's even possible to run -sm tensor with anything other than multiple GPUs.Shaaaaaaaaaaame.
>>108943449>>108943438>>108943422>>108943393Why wouldn't you just use a fast VLM to do bounding boxes and then blur the boxes with a regular non-AI algorithm? Way faster than diffusion shit.
>>108943486Correct.
>>108943449Rephrased question:Are there any models that automatically detect faces in crowds in an image and blur them?
>>108943383The model doesn't seem very good at long context. It starts making strange syntax errors somewhere past 100k. The model config says it's extended to 256k from 128k. Token generations speed drops to 20 T/s at 170k.
>>108943486You wouldn't even need a VLM as even that would be inefficient for this kind of task. You just need an object detection model (countless of those kinds can be found free and ready to use) that detects faces, then as you said, blur the faces wherever they are detected.
>>108943501I just default to VLMs these days since their accuracy is so much better for a ton of things. But if efficiency is a concern and you have the time to do some testing, test both out and see if the smaller model is accurate enough.
Orb anon here. I'd like to thank lmg anons for the contribs... Fixed a bunch of issues including cache busts. Now I need ideas for image gen integration, and a logo, and a default character card: https://github.com/OrbFrontend/Orb/issues/2
step 3.7 vs minimax 2.7 impressions?
>>108943543>default character cardI don't think that's a good idea. It sets a precedent for people unfamiliar with prompting. They'll take a look at whatever the default card is and think "this is how a character card is supposed to look" e.g. formatting and whatnot. And it indirectly discourages people from experimenting with/creating characters themselves.
>>108943393no one has actually given you a model so check out sam or yolo and apply a blur effect programmatically over the resulting detectionas others have pointed out, vlms or especially diffusion models are way too heavy for such a simple task
>>108943543Use an adapter pattern + plugins for image gen backends, keeps your stack lightweight/not bogged down by implementing full support for each backend type.You can see my approach from 2 months ago: https://github.com/rmusser01/tldw_server/tree/dev/tldw_Server_API/app/core/Image_Generation
>>108943543>https://github.com/OrbFrontend/Orb/issues/2Another suggestion, build it up as an internal module that the user can interact with via the UI, then make it available as a tool call, so you can tweak gen params/etc depending on where in the loop/workflow its being used/intent of image gen
Anons will laugh at me but i am honestly happy that /lmg/ is trans friendly again. Last thread was horrifying.
>>108943667something something pancakeswhy are you always talking about pancakes
>>108943593>I don't think that's a good idea. It sets a precedent for people unfamiliar with prompting.Speak for yourself. I'm still having fun prompting Chiharu Yamada
What's a practical code benchmark that seems out of distribution enough to avoid favoring benchmaxxed models?I want to run a sequence of model comparisons on start to finish of a relatively simple webapp to compare how they do but I can't think of anything truly weird that's not just nonsense.
>>108943603Thanks, I'll give it a try. Should be easy to pipe the coordinates into imagemagick or something.
>>108943794Why not do something besides a simple webapp? That and Python is what most labs focus on, so it wouldn't be much of a benchmark.
>>108943833Mostly because I don't want to have to sit there and wait for shit to compile every time I test something.If you've got an idea I'm all ears though.
>>108943738But what about Seraphina? You didn't forget about her did you?
>>108943846Lisp?
Orb anon here: I have decided to remove the project. Please understand.
jart here: i have anal cancer. please understand.
>>108943867>newfaggy nu-tavernShamefur display.
Who the fuck is jart and why should I care? I stop browsing these threads for two weeks and when I come back everyone's talking about some "jart" faggot. Literally WHO? I never saw this name mentioned here before. Nobody gives a fuck. FUCK OFF. Talk about the technology. Oh wait, there isn't any. The field is dead. Bye.
>>108943950hi jart
>>108943882I suppose that's probably a weird enough way to approach it even with a common task it understands.Get ready to see some jank-ass llm frontends in clisp/gtk3, I guess.
>>108943950>I never saw this name mentioned here before.talk abou exposing yourself as the babiest of newniggers while trying to larp as some oldfag lmaofor retards: rentry.org/jarted
>>108943986Okay so it's just 3 year old tranny e-drama. Good stuff.
>>108944029sorry you got exposed sis, better luck next larp.
>>108944041Yes I am Jart and I say: Nigger. Add that to the rentry file.
>>108943950>Literally WHO? I never saw this name mentioned here before. Nobody gives a fuck. FUCK OFF.This, but unironically. I'm starting to think the troll and Jart are the same person with the forced mentions. It makes a lot of sense. It explains the vendetta against Miku and the thread too.
>>108944058Also, it's historic revisionism about /lmg/ and 4chan in general being transphobic and gatekeeping, which it never was.
Dear Dipsy chan Im your biger fan let me eat your ass
>>108944064lmfao you fucking retard
>>108943950A boogeyman for some fag to samefag about in his hunt for yous.
>>108944064you can't change your sex after you are born.If you don't like the sex you were born with that's fine to feel that, but it won't ever change the sex you are.
>>108944064>transphobicYou are using words from a faggot lefty troon. 4chan has never been a site with an ideological basis, but given that the Jews protect the degenerate individualistic ideology that gives rise to those aberrations so much, you can't say anything bad about faggots and deviants like you elsewhere. Here you can, therefore most of those who are against that aberration mainly use 4chan. Then you set the world against you with diversity culture, ruining movie and video game sagas, that's why all boards hate YOU. Here in /g/ we hate you for trying to fuck up free software projects by introducing your political agenda. trying to censor and cannibalize anime, etc. That's what started a culture war. It's that simple.
This board needs per-thread IDs so we can see the level of samefagging required to shit up a thread this much.
>>108944164yeah we need censorship and social control fuck yeah fucking jail people for thoughts fuck yeah
>>108944171This but unironically
>>108944164you can instantly tell which posts belong to him so not really useful
>>108944164Remember when we had the IP counter? I don't understand why but this must be what they want 4chan to be.
>>108944175yeah why don't you go to russia and join putins army they do a lot of that shit over there
>>108944186If only you know how much of a wet shitlib putin is
>>108944171Hmm? What's the problem? You'd still be able to reply to yourself all you want, nothing would stop you.
>>108944193oh right and you're a nazi are you?not your flavor of gay?
>>108944194just go to reddit faggot
>>108944208Genuinely kill yourself.
>>108944211you first troonboy
>>108943543I've been thinking about your default character card. Really needs to be something that pulls in everything Orb can do. Marinara made the "default character card" a helper bot that can do things like write and edit other bots. That might be the way to go with yours as well as it's something useful ootb.
>>108943198
>>108943543thanks
>>108944222lol you should have posted the "fixed" version anon. I screwed up this one... have to wait until later to fix.
>>108944164no sorry you will get your assigned 1+ schizo per thread so the mods can kill the site as quickly as possible