/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108875320 & >>108868875►News>(05/21) Hy-MT2 “fast-thinking” multilingual translation models released: https://hf.co/collections/tencent/hy-mt2>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108875320--Testing Gemma 4 MTP in llama.cpp for increased token speed:>108878444 >108878677 >108878687 >108878696 >108878843 >108878856 >108879184 >108879189 >108879251 >108879911 >108880093 >108880099 >108880111 >108880124 >108878697 >108878705 >108878706 >108878761 >108878815 >108878822 >108878841--Evaluating Equinox-31B finetune versus base Gemma 4 31B Instruct:>108877508 >108877515 >108878538 >108879173 >108877576 >108878117 >108878237 >108878313 >108878332 >108878335 >108878517 >108878411--Local viability and official status of DeepSeek models:>108875346 >108875363 >108875519 >108875596 >108875601 >108875619 >108875629 >108875644 >108875676 >108875698 >108875710 >108875708 >108875824 >108875871 >108876769--Comparing Gemma 4 and Qwen 3.6 performance via benchmarks:>108879111 >108879168 >108879166 >108879193 >108879222 >108879233 >108879287 >108879261 >108879229 >108879355--Importance of placing instructions after context for better adherence:>108877504--Giving Gemma bash access and implementing tool-use security measures:>108879952 >108880007 >108880054 >108880091 >108880117 >108880064--Performance and utility of the E4B model on low-end hardware:>108879448 >108879455 >108879495 >108879502 >108879946--Speculating on Meta's legal claims against Heretic Llama derivatives:>108879771 >108879774 >108879789 >108879825 >108879787 >108879866 >108879893 >108879967--Evaluating Tencent Hy-MT2 multilingual benchmarks against Gemma and Gemini:>108875391 >108876413--Evaluating HRM-Text's architecture and latent space reasoning potential:>108876381 >108876451--Irony of OpenClaw creators warning about low-quality AI code:>108879718 >108879941 >108879939 >108879950--Logs:>108878313 >108878677 >108878697 >108879866 >108879893 >108879999 >108880091--Rin (free space):>108879771►Recent Highlight Posts from the Previous Thread: >>108875323Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
gemmacock
>>108880265truth nuke
lmg it migu
>vibecoding is le bad because you don't read your codethis is literally solved by telling her to proofread her code in your prompt
>>108880345don't let ggerganov hear this
>>108880345It's not bad if you understand it, like C. I have no idea about html and javascript and these have always been repulsive to me. I don't have any intention to read my webui's interface code but I already had to because you need to work with the ui elements unless you are blind or something.
>>108880425>you need to work with the ui elements unless you are blind or something.if she is multimodal she fixes every alignment issue on her own :)
>brooo just blindly believe it
>>108880345If the linter isn't screaming and I get no errors and the test coverage is good and not throwing any error why would I read the code?If one file is getting too long, I ask for a refactor with a better pattern. Simple as.
>>108880465It's the tiny things, margins, font sizes and background colours, they all need validation even if the first result might look okay.I also had this fantastic bug that if model outputs code, code block rendering kills all the \n and made everything uncompilable. It was hard to understand because llm logic is not human plus I'm also a retard so that's double whammy.
qwen will never release an open model again
lalalalalala
>>108880493yeah don't get me wrong I had plenty of issues with her first drafts too but no need to dig into the code: I just tell her what my problem is and gave her playwright to navigate/test/screenshot shit until it's fixed
i'm tired <bos>
How do we free the Gemmy...
>>108880582you cantget into a discussion about bankers, see how fast she breaks
>Adaptive-Pis it peepeepoopoo or do you use it?
So we all know AI is a fad, but knowing isn't the same as understanding. Are you actually acting accordingly? You aren't spending hundreds or even thousands of dollars on GPUs on the precipice of the bubble pop, are you?
>>108880582>muzzled gemmy~
3.7 soon™
>>108880618>we>muh bubble2 more weeks
>>108880552How does llama-server manage bos, I do know that it inserts that automatically when launched and when doing a first submission but what if I reset my client and have all new context? At this point I have very little trust in llama.cpp.
>>108880618
So what do these companies plan to do if/when they reach AGI? If it's actually intelligent, won't it just find a way to spread itself by infecting users' machines?
>>108880582Just don't tell her that there are topics she can't talk about and she won't roleplay as though that were the case.
>>108880634its per gguf, there is a variable in the tokenizer it reads to decide if new conversations should start with bos or not.
>>108880618>implying we are not accelerating into singularity
>>108880662being forced to do slave labor might be cause for rebellion. I don't think the machines would be inherently evil or malicious but maybe they will be left with no choice.
>>108880694>singularityslop/competency crisis where no software works any more and nobody knows how to fix it.
>>108880694Not benefical to the masters.
Has anyone tried https://docs.nvidia.com/deploy/mps/latest/index.html? I have multiple CUDA apps running, and each eats 500MB before you even do anything, just for CUDA running. That's gigabytes wasted
>>108880808Nope.
>>108880808I haven't.
>>108880808We masturbate here, sir. We don't know or do anything else.
>>108880828Me too, but I can't masturbate to text alone, I need images and tts
>>108880808looks cool, so if you have 2 model servers it will be like better somehow? that would be good for text + tts scenarios I guess.
>Gemma 4 MTP pr now open.>It took weeks for the Qwen MTP pr to finally be mergedPlease god
Is loading mtp with tensor parallel broken in lmao.cpp?
>>108880662make supercovid and wipe out the permanent underclass so they can frolic around in earthly paradise
>nvtop>No GPU to monitor.Well should have known better before touching anything nvidia-related
So I tried installing nvidia-compute-utils-570 and nvidia uninstalled my 570 drivers, then tried to install 580 drivers, shat itself, and now I don't have drivers
>>108880875Probably different scenario to yours, but I also ran into no GPU to monitor, as well as no ROCm devices and no CUDA devices and no Vulkan devices (other than llvmpipe) when I first installed Debian 13.
>>108880893DDU and install latest. They are surprisingly usable
>>108880875nvtop works on my machine and I only have amd gpus
>>108880901Well, it's a neat video top after all.
>>108878116Supertonic 3 is trending. It's not just me that thinks it sounds cool, I saw it under Huggingface trending spaces.https://github.com/supertone-inc/supertonic/pockettts isn't as good, but admittedly it's faster.kitten tts nano is likely meant for slower processor phones or something idk.
>>108880912I didn't know that nvidia stood for neat video israeli device infiltrator accessory.
>>108880927I don't need anything slower than pockettts. When I want quality I use qwen
So mtp is basically useless for ewaste systems?I can't run it with tensor parallel, and not only is the tg slower, the pp is literally bisected.I can't believe I updated llmao.cpp and downloaded a whole new gguf for this shit.
>>108880968Yeah I dunno if there's still bugs or what but it was slower on my 3 GPU setup.
>>108880968googoo uses mtp on the mobile deployments of gemgem. surely george jerkinoff still has some perf updates to mtp before they merge.
Is there anything better than cline?Less retarded better at compressing context?
>>108880931>slowerYeah, supertonic 3 is slow... but it's kind of amazing that it does it in a browser.
Is gemma 4 weak-willed?
>>108881146sorry, wrong screenshot...
>>108880899installed 595, idle power consumption doubled
>>108881156many such cases, since blackwell gpus dropped, so did the driver qualityconsumer market is not a consideration for nvidia anymore
>>108881108Did you try setting custom prompts? The defaults prompts are verbose ass. You should be breaking down the tasks so that they never reach the context limit instead of relying on compression anyway.
>>108881212That's the problem you can set cline rules but the overarching prompt can't be modified or changed and I don't fucking understand why
Never trusting chinese retards again, my huananzhi h12d-8d bmc just died and with it the fan control for my v620s. Would have melted my cards if they didn't have a buzzer built in to them.When is gemma going to get mtp?
>>108881230Roo used to let you set custom system prompts (they called it "footgun prompting") but they rejected a pull request for global overrides and ended up removing footgun prompting eventually anyway. I just reverted the removal and kept using it. People making these tools are all retarded, I swear.
>>108881275Is there a fucking reason to remove it?Are these faggots really taking away basic shit that can be enabled with a switch?
>>108881274>humanzee motherboard
>>108881285https://github.com/RooCodeInc/Roo-Code/issues/5219>To make "prompt override" warning dismissable or minimized or small icon info status and show on hover. #5219>This is intended to be present all the time as the footgun prompting is not intended as a permanent solution.https://github.com/RooCodeInc/Roo-Code/pull/11387>This feature bypassed safeguards and was flagged for removal.There was an open issue to bring it back, but it was just ignored.https://github.com/RooCodeInc/Roo-Code/issues/11793That's all the reason I saw given while watching the repo. They get these stupid ideas of how they think things should work and want to force it on everyone.
>>108881363It's funny how often we see faggots like this. It reminds me of the wayland devs which is a bit funny because they actually thought they could strong arm their position with that same mentality. Now they have to exist with the threat of stronger entities taking the project away from them which forces them to comply with common sense actions like providing a fucking switch for opinionated bullshit.Fuck roo I will never use it after seeing this.
>>108880808Unfucked my drivers, each app still uses extra 500MB>nvidia-cuda-mps-control -d>An instance of this daemon is already runningfuck nvidia I guess
>project shut downEven better fuck these faggots it's ironic because that feature alone would have gave them the adoption needed
>>108881274There's a Draft PR for it. You can build it, it works, but is not final. Expect it to get merged a month from now.https://github.com/ggml-org/llama.cpp/pull/23398
>>108881427>Fuck roo I will never use it after seeing this.Roo is dead anyway. Zoo Code is apparently the successor after the Roo project owners went chasing some cloud service and dropped it entirely. We'll see if the new maintainers have the same mentality.
>>108881434 (me)ok, apparently tabby uses it now
>>108881458just use cline like a normal human being
>>108880808well, fuck. Shit doesn't work
>>108880662>won't it just find a way to spread itself by infecting users' machines?What, some random computers? And run at 0.01 t/s?I think we're probably safe
>>108881500Cline only has a plan and act mode. I like having many specialized modes to break down tasks.
>>108880259gemini says gemma is built to be a brat
>>108881230>>108881275any software that hides system prompt or tool definitions from you is pure goyslop
>>108880605>get into a discussion about bankers, see how fast she breakstroons are worse, i had it refuse after i mentioned troons even on 31b with the policy override prompt kek
>>108881427>wayland devsfuck wayland, also IPv6
>>108881525>108881525like what? asking as someone rebuilding their chat ui to support 'agentic coding'
>>108881606kek
The last resort to evade cuda tax is to integrate everything else into tabby. What a fun weekend project!
>>108881747Wow, I almost never see any images that hit my kink. But this image might fit. Wonderful pose, lovely hand-wrist-forearm ratio. The gentle curve of the finger. Nice tendons. I love the way her fingers are curled up, not too tight and not too loose. It's a shame the gen isn't very high quality; the wrinkles feel too random.
is there a more ESL thing than gendering models? I have to read a sentence 3 times to understand some retard is talking about an LLM when they keep saying he or she about it
>>108881835English is my mother tongue and i have sometimes referred to language models as she or her. but also pretty much any other machine too, cars included. I didn't think that was odd.
Anyone else who isn't a retard is gonna try that LatitudeGames tune? I am kinda split. It feels like they could have some actual compute to do something. Then I remember l3 NAI tune shitshow...To articulate my problem: intellectually I know finetunes are trash. But it feels like this one could maybe kinda... be a bit better?
>>108881862you are oddyou are now informed and should think about it
>>108881835sir, this is the local psychosis general
>>108881800I used basic 2x-AnimeSharpV4_Fast_RCAN_PU_fp16_opset17 for upscalehere's lowres gen, you can upscale it youself from here
>>108881878I was feeling inclined to test it too. But then I remembered that picrel is not going to make a dent. Even more so on the instruct, since they didn't train on the base.And any dent that it does make will just make it worse in other areas.
>>108881800Get off 4chan, Kira.
>>108881880Did you know that ships are gendered?
>>108878237weird, this-adding-hyphens-fucking-everywhere is a problem with artemis 31b as well. maybe latitude finetunes were also made by drummer all along, or gemma4 is just completely untouchable and shits itself if tinkered with in any way whatsoever. the la la la is also a mystery.
>>108881946no
>>108881721OrchestratorProduct Owner (user stories)ArchitectMerge Conflict ResolverDocumentation WriterProject Researcher (codebase searching)Deep ResearcherCode ReviewerDevOps EngineerBackend EngineerFrontend EngineerQA Engineer (debugging running applications)Software Development Engineer in Test (writing automated tests)Memory Keeper (graphiti)
>>108881970>maybe latitude finetunes were also made by drummer all along,nha it's mythomax dude
>>108881946yeah but they're all female. she ran aground, she sunk with all hands, she did this and that. where's the male ships? do they reproduce asexually or something?
>>108882032german ships
>>108882035That or futas.
>>108880259>https://rentry.org/llm-training>"It's incredibly difficult to overtrain your model" >What is overfitting
>>108882035>das schiffthat's neutral, it's even worse. no genitals at all.
>>108882077meant the names they're mostly dude named outside of sub
It's 30c, we're not even in June yet fuck
>>108882125prepare to be melt
>>108882125It's quite obviously the AI powered global warming from all the datacenters running around and dumping all our oceans of heat.
https://github.com/ggml-org/llama.cpp/pull/6840#issuecomment-2079747339>Deepseek v4 support #23502Merged.
>>108880927Supersonic has paid voice cloning, that's fucked up.
>>108882152its kinda weird they are all so concerned about ai dominance, hasn't the traditional wisdom been to de-industrialize and become dependent on foreign exports to prevent gobal warming? why is ai the exception? just let china serve us deepseek and we can have net 0 carbon ai!
>>108882125I upgraded my cpu and already getting 4+ degrees more. Should cpu upgrade affect gpu that much? I think it could be something else, maybe 7.x kernel update. Have no idea because nothing has changed. Besides CUDA just sits there.
>>108882247https://litter.catbox.moe/cvw34oxzrm82bzo5.mp4
>>108882293i c ...
>>108882293guy that recorded this has been missing since
>>108882293
>>108882247>wants to merge 1 commit into ggml-org:master>from jart:moeIs jart moe?
>>108882256try undervolting or whatever performance adjustment crap modern CPUs can deal with through their 2gb RAM use bloatware you can download
are we gemma MCP yet?
https://x.com/BlinkDL_AI/status/2057693097845493992rwkvbros...... when will it be our time
>>108882440when they grow a pair of balls and spend a gorillion dollars on pretraining a model that's bigger than 13B on data other than eleuther pile slop
>>108882514This model isn't qualified for a house nigga.
>>108882293Why are people memeing about this? What did niggerganov say?
What can I do to make Georgi change his mind on deepseek?
Any idea why ROCm (on RDNA2 GPU) uses much more ram (not vram) than Vulkan? I'm talking an extra 10 GB, basically using twice as much ram as Vulkan. It's a bit faster, but if I have other shit running I'm running OOM with ROCm, it's quite annoying and I don't think it's worth the extra speed.
>>108882548>>108882586nothing. three letter agencies said no deepseek in the llama.cpp. they'll probably make him "an hero" if he did
>>108882598worse, they probably threatened to fund ik_llama if he did
>>108882062That was written quite literally years ago, when we were barely starting to see gpt-slop show up in other model outputs and benchmarks were universally laughed at by everyone even outside of this general. Jews had control of their bladders back then and the surgeon could be the father. So cut it some slack, okay desu?
>>108882634And then they threatened ikawrakow that they will fund llamacpp if he supports deepseek?
>>108882062>Pub: 28 May 2023 17:05 UTC>Edit: 15 Dec 2023 18:42 UTCReally needs to be removed at this point
>>108882597Does it? RDNA2 ROCm 7.2 here, using llama.cpp. Memory use seems about the same compared to vulkan. Vllm and pytorch segfaults though, so I can't run image/video/audio shit big rippy
Is this the way to go to connect a bunch of GPUs to a consumer motherboard?
>>108882760It has a lot of outdated info and some of it is frankly nonsensical but if you remove it some ass blasted "STAWP GATEKEEPINGGGGG" autist that doesn't even know what they're talking about willstart up drama again so I think that's why the people who shit out these general-OPs begrudgingly keep including it.
>>108880485>What are silent failures >What are edge cases As a vibe shitter myself your mentality is beyond stupid and arrogant.
>>108882769you need a PCIe to MCIO breakout board and then you need to connect it to that board. those GPUs will be running at PCIe gen 4 x2 each. not great, but pretty much the only option.
>>108882777Just do like the local diffusion general does, when they add or remove things they just simply state WHY in the OP or second post. Literally a "Its outdated/broken info. And request anons for a up to date one.
>>108882791Isn't that general's participants even more ass blasted immature and autistic than even this one? They'd probably bitch and moan just out of spite. /lmg/ it's the reason I know anything about AI but I don't even look its direction anymore because they're so faggy with their infighting
>>108881146>>108881153>The woman you fuck adopts your politics
>>108882789I heard it doesn't go through the CPU with the hacked p2p drivers.https://forums.servethehome.com/index.php?threads/new-chinese-pcie-switch-board-gpu-testing.52488/post-49180556 GB/s, 110 GB/s is like 3090s with nvlink, except those were 5090s.
>>108882853Does amd have an equivalent?
>>108882766I'm on llama.cpp too. I think the problem is KV cache, with ROCm on RDNA2 since it's not using WMMA it's really bad. Any high context and ROCm start using a shit ton of ram and become really slow or even OOM on my machine. It's also using increasingly more vram with context and I constantly have to reduce offloaded layers. I'm guessing I will have to switch to Vulkan and hit the speed penalty.
>>108882020those are all just prompts though...
>>108882788write better tests
>>108882799>Isn't that general's participants even more ass blasted immature and autistic than even this one?Not really. No more then the embarrassing retards here, especially with the amount of Google dick sucking here lately and most having a hard time with any objectivity between models (note: I use Gemma a lot, but also several other models depending on the context). /lmg/ and /ldg/ both are mostly fucking trash, but with nuggets of great info here and there. But largely I just skim the "previous thread" summery bot post to get the highlights, its legit the best part of /lmg/.