[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now open. Apply here!


[Advertise on 4chan]


File: 1729921508382316.jpg (230 KB, 1024x1024)
230 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108937312 & >>108924918

►News
>(05/21) Hy-MT2 “fast-thinking” translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
If you post your proooomt I will run it through deepy 3.2 8bit for you. Nothing gay or gross (jewish god hates feds and fags)
>>
File: 1728540035928801.jpg (774 KB, 1856x2464)
774 KB JPG
►Recent Highlights from the Previous Thread: >>108937312

--Comparing Meta's recent output to Gemma and Qwen for consumer hardware:
>108941473 >108941501 >108941510 >108941524 >108941534 >108941548 >108941523 >108941547 >108941560 >108941584 >108942129 >108942148 >108942165 >108942176 >108942364 >108942471
--Comparing performance and optimization divergence between llama.cpp and ik_llama.cpp:
>108940707 >108940728 >108940788 >108940849 >108940858 >108940820 >108940862 >108940882 >108940926
--Comparing VRAM-based Gemma 31b against system-RAM MoEs like GLM:
>108938997 >108939045 >108939375 >108939403 >108940898 >108940906 >108940933 >108940949 >108940955 >108941019 >108940974 >108941030 >108941045 >108941189 >108941086 >108941183
--llama.cpp PR 23861 VRAM optimization and its relation to ik_llama:
>108941013 >108941087 >108941481
--Comparing memory systems for agents including Mnemosyne and Graphiti:
>108938231 >108938254 >108938657 >108939367 >108939095
--Running DeepSeek-V3.2-8bit via clustered Macs with RDMA:
>108938688 >108938708 >108938820 >108938872
--Broken reasoning parsing for Kimi models in llama.cpp:
>108938742 >108938795 >108939414
--Using randomized prompt injection to dynamically steer model behavior:
>108941846 >108941856 >108941880 >108941940
--Comparing DeepSeek API costs to local GPU hardware and electricity:
>108939718 >108939748 >108939799 >108939808 >108939872
--Logs:
>108939500 >108942176
--Neru, Miku (free space):
>108939595 >108940372 >108941274 >108942258 >108942413 >108942894

►Recent Highlight Posts from the Previous Thread: >>108937692

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108943171
I don't have anything, but something with a dude fucking a female dragon.


>>108943140
>Ngram only really shines during refactors
Not really. When working oh new shit that heavily references old shit, it speeds things up real nice too.

>>108943176
>Shit's very cash
Sounds like it.
Sick. I can probably leverage this to implement some multi step prompting shit to help refine some behavior.
>>
File: 00008-3844322418.png (1.09 MB, 1024x1024)
1.09 MB PNG
>>
>>108943182
So that one anon was right huh
>>
Please don't post in this thread if you aren't a woman. This thread is a male free space.
>>
>>108943197
>Sick. I can probably leverage this to implement some multi step prompting shit to help refine some behavior.
It's especially worth doing in the thought process, since those tokens are getting stripped next turn it doesn't you can really go ape shit on it, since the ngram is actually independent of your kv cache, and will remember the thought process even when its stripped.
>>
>>108943182
Thanks, recap Miku.
>>
>>108943225
>since those tokens are getting stripped next turn it doesn't you can really go ape shit on it,
since those tokens are getting stripped next turn it doesn't waste your context and you can really go ape shit on it*
>>
>>108943229
No prob. Stay hydrated and rememberbto take your HRT
>>
I made Step 3.7 my daily driver.
>>
>>108943155
the jannies killed the real thread
https://desuarchive.org/g/thread/108942948/#108942948
>>
>>108943287
Your post isn't real and it is just a smokecreen to distract from everyone ITT finding out that you (baker) are an actual troon.
>>
>>108943287
No you didn't.
I don't believe that even for a second, anon.
>>
>>108943289
Got scared/got.
>>
It's been a while since I tried -sm tensor
Let's see.

>-sm tensor
12.14 t/s
>-sm layer
36.60 t/s

Well okay then. At least it's not crashing anymore.
>>
>>108943289
That thread wasn't real. It was full of hatred aimed at a marginalized group.
>>
>>108943287
Quant/Hardware/Speed?
>>
File: 1756302997874953.jpg (58 KB, 719x631)
58 KB JPG
>>108943155
I haven't git pulled llama.cpp in like 4 months. I know MTP is supposed to provide a noticeable boost in t/s. When I convert an f16 model into a gguf for quantization, are the MTP heads or whatever they're called preserved by default or do I have to add specific flags? I'm noticing there are repos that are specifically marked as the MTP version of a model which implies the model themselves have to be different from the regular versions. When I use convert_hf_to_gguf.py are specific flights required or are all ggufs from this point forward created by the newest version exported with the MTP heads preserved?
>>
>>108943313
What's your setup?
Are the cards connected just by the PCI-E bus?
On thing I'd love to see somebody test is if the latest changes to tensor split parallelism had any effect on the CPU backend when running two devices.
It would be nice to have better speeds without having to have a copy of the model on each half of the memory pool, which effectively cuts usable memory by half.
>>
File: file.png (11 KB, 593x165)
11 KB PNG
>>108943306
I have proof.
>>108943316
65 T/s with 6 3090s.
>>
>>108943337
Just PCIe, yes. But I was getting similarly bad performance on the shizo fork which claimed crazy improvements so I'm starting to think it's because of windows.
>>
>>108943313
I have a 3090 and then a 3060 on a dinky gimped PCIe. Surprisingly on this system tensor gives me a decent boost to tg, but cuts the prompt processing to a third. It also uses up more VRAM.
>>
>model that lets you imitate a sound with your voice, then uses that vocal imitation together with text as input to generate the sound you actually want.

https://github.com/thxxx/VTS
https://www.reddit.com/r/LocalLLaMA/comments/1trve9e/open_source_turning_vocal_imitations_into_sound/

Is there another project like this? Surely there must be, this would be even better with a bigger audio gen model
>>
>>108943155
>(05/29) Step 3.7 Flash released https://hf.co/stepfun-ai/Step-3.7-Flash
It should be added to the news.
>>
>>108943345
>6 3090s.
Damn nigga, nice.
I'll be lucky to get half of that speed splitting it between 64gb vram and the rest in sysram. Ngram's gonna be pulling its weight here.
How's the long context performance treating you?
>>
Are there any models that automatically detect high-enough-resolution faces in crowds in an image and blur them? (Other than I guess standard image editing diffusion models like Flux Klein or Qwen Image Edit, but I don't trust them not to change other stuff or to handle high-resolution photos within my VRAM.)
>>
>>108943393
What? Why would you need a model specialize in high resolution faces? You can just detect any and blur indiscriminately.
>>
>>108943422
Any faces really. I was just thinking that a face that is already blurry in the background can be ignored.
>>
>>108943337
As of right now I don't think it's even possible to run -sm tensor with anything other than multiple GPUs.
In any case, without optimizations for a specific ggml backend the performance will be bad anyways.

>>108943346
I have not seen Linux vs. Windows numbers after the merge of the non-NCCL AllReduce between 2 GPUs.
But generally speaking the CUDA overhead on Windows is a lot worse than on Linux and -sm tensor is relatively sensitive to that.
>>
>>108943438
You'd be wasting process time
Find face > blur
Find face > determine whether it's "high resolution enough" > blur
>>
>>108943442
>As of right now I don't think it's even possible to run -sm tensor with anything other than multiple GPUs.
Shaaaaaaaaaaame.
>>
>>108943449
>>108943438
>>108943422
>>108943393
Why wouldn't you just use a fast VLM to do bounding boxes and then blur the boxes with a regular non-AI algorithm? Way faster than diffusion shit.
>>
>>108943486
Correct.
>>
>>108943449
Rephrased question:
Are there any models that automatically detect faces in crowds in an image and blur them?
>>
>>108943383
The model doesn't seem very good at long context. It starts making strange syntax errors somewhere past 100k. The model config says it's extended to 256k from 128k. Token generations speed drops to 20 T/s at 170k.
>>
>>108943486
You wouldn't even need a VLM as even that would be inefficient for this kind of task. You just need an object detection model (countless of those kinds can be found free and ready to use) that detects faces, then as you said, blur the faces wherever they are detected.
>>
>>108943501
I just default to VLMs these days since their accuracy is so much better for a ton of things. But if efficiency is a concern and you have the time to do some testing, test both out and see if the smaller model is accurate enough.
>>
File: orbFetch.png (146 KB, 601x589)
146 KB PNG
Orb anon here. I'd like to thank lmg anons for the contribs... Fixed a bunch of issues including cache busts. Now I need ideas for image gen integration, and a logo, and a default character card: https://github.com/OrbFrontend/Orb/issues/2
>>
step 3.7 vs minimax 2.7 impressions?
>>
>>108943543
>default character card
I don't think that's a good idea. It sets a precedent for people unfamiliar with prompting. They'll take a look at whatever the default card is and think "this is how a character card is supposed to look" e.g. formatting and whatnot. And it indirectly discourages people from experimenting with/creating characters themselves.
>>
>>108943393
no one has actually given you a model so check out sam or yolo and apply a blur effect programmatically over the resulting detection
as others have pointed out, vlms or especially diffusion models are way too heavy for such a simple task
>>
>>108943543
Use an adapter pattern + plugins for image gen backends, keeps your stack lightweight/not bogged down by implementing full support for each backend type.
You can see my approach from 2 months ago: https://github.com/rmusser01/tldw_server/tree/dev/tldw_Server_API/app/core/Image_Generation
>>
>>108943543
>https://github.com/OrbFrontend/Orb/issues/2
Another suggestion, build it up as an internal module that the user can interact with via the UI, then make it available as a tool call, so you can tweak gen params/etc depending on where in the loop/workflow its being used/intent of image gen
>>
Anons will laugh at me but i am honestly happy that /lmg/ is trans friendly again. Last thread was horrifying.
>>
File: 1756160809287330.gif (1.23 MB, 400x254)
1.23 MB GIF
>>
>>108943667
something something pancakes
why are you always talking about pancakes
>>
File: Chiharu Yamada.png (133 KB, 350x350)
133 KB PNG
>>108943593
>I don't think that's a good idea. It sets a precedent for people unfamiliar with prompting.
Speak for yourself. I'm still having fun prompting Chiharu Yamada
>>
What's a practical code benchmark that seems out of distribution enough to avoid favoring benchmaxxed models?
I want to run a sequence of model comparisons on start to finish of a relatively simple webapp to compare how they do but I can't think of anything truly weird that's not just nonsense.
>>
>>108943603
Thanks, I'll give it a try. Should be easy to pipe the coordinates into imagemagick or something.
>>
>>108943794
Why not do something besides a simple webapp? That and Python is what most labs focus on, so it wouldn't be much of a benchmark.
>>
>>108943833
Mostly because I don't want to have to sit there and wait for shit to compile every time I test something.
If you've got an idea I'm all ears though.
>>
>>108943738
But what about Seraphina? You didn't forget about her did you?
>>
>>108943846
Lisp?
>>
Orb anon here: I have decided to remove the project. Please understand.
>>
jart here: i have anal cancer. please understand.
>>
>>108943867
>newfaggy nu-tavern
Shamefur display.
>>
Who the fuck is jart and why should I care? I stop browsing these threads for two weeks and when I come back everyone's talking about some "jart" faggot. Literally WHO? I never saw this name mentioned here before. Nobody gives a fuck. FUCK OFF. Talk about the technology. Oh wait, there isn't any. The field is dead. Bye.
>>
>>108943950
hi jart
>>
>>108943882
I suppose that's probably a weird enough way to approach it even with a common task it understands.
Get ready to see some jank-ass llm frontends in clisp/gtk3, I guess.
>>
>>108943950
>I never saw this name mentioned here before.
talk abou exposing yourself as the babiest of newniggers while trying to larp as some oldfag lmao

for retards: rentry.org/jarted
>>
>>108943986
Okay so it's just 3 year old tranny e-drama. Good stuff.
>>
>>108944029
sorry you got exposed sis, better luck next larp.
>>
>>108944041
Yes I am Jart and I say: Nigger. Add that to the rentry file.
>>
>>108943950
>Literally WHO? I never saw this name mentioned here before. Nobody gives a fuck. FUCK OFF.
This, but unironically. I'm starting to think the troll and Jart are the same person with the forced mentions. It makes a lot of sense. It explains the vendetta against Miku and the thread too.
>>
>>108944058
Also, it's historic revisionism about /lmg/ and 4chan in general being transphobic and gatekeeping, which it never was.
>>
Dear Dipsy chan Im your biger fan let me eat your ass
>>
>>108944064
lmfao you fucking retard
>>
>>108943950
A boogeyman for some fag to samefag about in his hunt for yous.
>>
>>108944064
you can't change your sex after you are born.
If you don't like the sex you were born with that's fine to feel that, but it won't ever change the sex you are.
>>
>>108944064
>transphobic
You are using words from a faggot lefty troon. 4chan has never been a site with an ideological basis, but given that the Jews protect the degenerate individualistic ideology that gives rise to those aberrations so much, you can't say anything bad about faggots and deviants like you elsewhere. Here you can, therefore most of those who are against that aberration mainly use 4chan. Then you set the world against you with diversity culture, ruining movie and video game sagas, that's why all boards hate YOU. Here in /g/ we hate you for trying to fuck up free software projects by introducing your political agenda. trying to censor and cannibalize anime, etc. That's what started a culture war. It's that simple.
>>
This board needs per-thread IDs so we can see the level of samefagging required to shit up a thread this much.
>>
>>108944164
yeah we need censorship and social control fuck yeah fucking jail people for thoughts fuck yeah
>>
>>108944171
This but unironically
>>
>>108944164
you can instantly tell which posts belong to him so not really useful
>>
>>108944164
Remember when we had the IP counter? I don't understand why but this must be what they want 4chan to be.
>>
>>108944175
yeah why don't you go to russia and join putins army they do a lot of that shit over there
>>
>>108944186
If only you know how much of a wet shitlib putin is
>>
>>108944171
Hmm? What's the problem? You'd still be able to reply to yourself all you want, nothing would stop you.
>>
>>108944193
oh right and you're a nazi are you?
not your flavor of gay?
>>
>>108944194
just go to reddit faggot
>>
>>108944208
Genuinely kill yourself.
>>
>>108944211
you first troonboy
>>
File: mari.png (28 KB, 1444x147)
28 KB PNG
>>108943543
I've been thinking about your default character card. Really needs to be something that pulls in everything Orb can do.
Marinara made the "default character card" a helper bot that can do things like write and edit other bots. That might be the way to go with yours as well as it's something useful ootb.
>>
File: 1749062615327600.png (1.27 MB, 1024x1024)
1.27 MB PNG
>>108943198
>>
>>108943543
thanks
>>
File: dipsyAndKimiDotonbori.png (2.39 MB, 1024x1536)
2.39 MB PNG
>>108944222
lol you should have posted the "fixed" version anon.
I screwed up this one... have to wait until later to fix.
>>
>>108944164
no sorry you will get your assigned 1+ schizo per thread so the mods can kill the site as quickly as possible



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.