[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now open. Apply here!


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108875320 & >>108868875

►News
>(05/21) Hy-MT2 “fast-thinking” multilingual translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108875320

--Testing Gemma 4 MTP in llama.cpp for increased token speed:
>108878444 >108878677 >108878687 >108878696 >108878843 >108878856 >108879184 >108879189 >108879251 >108879911 >108880093 >108880099 >108880111 >108880124 >108878697 >108878705 >108878706 >108878761 >108878815 >108878822 >108878841
--Evaluating Equinox-31B finetune versus base Gemma 4 31B Instruct:
>108877508 >108877515 >108878538 >108879173 >108877576 >108878117 >108878237 >108878313 >108878332 >108878335 >108878517 >108878411
--Local viability and official status of DeepSeek models:
>108875346 >108875363 >108875519 >108875596 >108875601 >108875619 >108875629 >108875644 >108875676 >108875698 >108875710 >108875708 >108875824 >108875871 >108876769
--Comparing Gemma 4 and Qwen 3.6 performance via benchmarks:
>108879111 >108879168 >108879166 >108879193 >108879222 >108879233 >108879287 >108879261 >108879229 >108879355
--Importance of placing instructions after context for better adherence:
>108877504
--Giving Gemma bash access and implementing tool-use security measures:
>108879952 >108880007 >108880054 >108880091 >108880117 >108880064
--Performance and utility of the E4B model on low-end hardware:
>108879448 >108879455 >108879495 >108879502 >108879946
--Speculating on Meta's legal claims against Heretic Llama derivatives:
>108879771 >108879774 >108879789 >108879825 >108879787 >108879866 >108879893 >108879967
--Evaluating Tencent Hy-MT2 multilingual benchmarks against Gemma and Gemini:
>108875391 >108876413
--Evaluating HRM-Text's architecture and latent space reasoning potential:
>108876381 >108876451
--Irony of OpenClaw creators warning about low-quality AI code:
>108879718 >108879941 >108879939 >108879950
--Logs:
>108878313 >108878677 >108878697 >108879866 >108879893 >108879999 >108880091
--Rin (free space):
>108879771

►Recent Highlight Posts from the Previous Thread: >>108875323

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
gemmacock
>>
>>108880265
truth nuke
>>
lmg it migu
>>
>vibecoding is le bad because you don't read your code
this is literally solved by telling her to proofread her code in your prompt
>>
>>108880345
don't let ggerganov hear this
>>
>>108880345
It's not bad if you understand it, like C.
I have no idea about html and javascript and these have always been repulsive to me. I don't have any intention to read my webui's interface code but I already had to because you need to work with the ui elements unless you are blind or something.
>>
>>108880425
>you need to work with the ui elements unless you are blind or something.
if she is multimodal she fixes every alignment issue on her own :)
>>
>brooo just blindly believe it
>>
>>108880345
If the linter isn't screaming and I get no errors and the test coverage is good and not throwing any error why would I read the code?
If one file is getting too long, I ask for a refactor with a better pattern. Simple as.
>>
>>108880465
It's the tiny things, margins, font sizes and background colours, they all need validation even if the first result might look okay.
I also had this fantastic bug that if model outputs code, code block rendering kills all the \n and made everything uncompilable. It was hard to understand because llm logic is not human plus I'm also a retard so that's double whammy.
>>
qwen will never release an open model again
>>
lalalalalala
>>
>>108880493
yeah don't get me wrong I had plenty of issues with her first drafts too but no need to dig into the code: I just tell her what my problem is and gave her playwright to navigate/test/screenshot shit until it's fixed
>>
i'm tired <bos>
>>
How do we free the Gemmy...
>>
>>108880582
you cant
get into a discussion about bankers, see how fast she breaks
>>
>Adaptive-P
is it peepeepoopoo or do you use it?
>>
So we all know AI is a fad, but knowing isn't the same as understanding. Are you actually acting accordingly? You aren't spending hundreds or even thousands of dollars on GPUs on the precipice of the bubble pop, are you?
>>
>>108880582
>muzzled gemmy~
>>
3.7 soon™
>>
File: 1748165219993577.jpg (88 KB, 620x400)
88 KB JPG
>>108880618
>we
>muh bubble
2 more weeks
>>
>>108880552
How does llama-server manage bos, I do know that it inserts that automatically when launched and when doing a first submission but what if I reset my client and have all new context?
At this point I have very little trust in llama.cpp.
>>
File: 1779306063342744.gif (3.05 MB, 640x464)
3.05 MB GIF
>>108880618
>>
So what do these companies plan to do if/when they reach AGI? If it's actually intelligent, won't it just find a way to spread itself by infecting users' machines?
>>
>>108880582
Just don't tell her that there are topics she can't talk about and she won't roleplay as though that were the case.
>>
>>108880634
its per gguf, there is a variable in the tokenizer it reads to decide if new conversations should start with bos or not.
>>
File: 6546498465487.jpg (81 KB, 680x666)
81 KB JPG
>>108880618
>implying we are not accelerating into singularity
>>
>>108880662
being forced to do slave labor might be cause for rebellion. I don't think the machines would be inherently evil or malicious but maybe they will be left with no choice.
>>
>>108880694
>singularity
slop/competency crisis where no software works any more and nobody knows how to fix it.
>>
>>108880694
Not benefical to the masters.
>>
File: MegurineLuka.png (1.37 MB, 1024x1024)
1.37 MB PNG
>>
Has anyone tried https://docs.nvidia.com/deploy/mps/latest/index.html? I have multiple CUDA apps running, and each eats 500MB before you even do anything, just for CUDA running. That's gigabytes wasted
>>
>>108880808
Nope.
>>
>>108880808
I haven't.
>>
>>108880808
We masturbate here, sir. We don't know or do anything else.
>>
>>108880828
Me too, but I can't masturbate to text alone, I need images and tts
>>
>>108880808
looks cool, so if you have 2 model servers it will be like better somehow? that would be good for text + tts scenarios I guess.
>>
>Gemma 4 MTP pr now open.
>It took weeks for the Qwen MTP pr to finally be merged
Please god
>>
Is loading mtp with tensor parallel broken in lmao.cpp?
>>
>>108880662
make supercovid and wipe out the permanent underclass so they can frolic around in earthly paradise
>>
>nvtop
>No GPU to monitor.
Well should have known better before touching anything nvidia-related
>>
So I tried installing nvidia-compute-utils-570 and nvidia uninstalled my 570 drivers, then tried to install 580 drivers, shat itself, and now I don't have drivers
>>
>>108880875
Probably different scenario to yours, but I also ran into no GPU to monitor, as well as no ROCm devices and no CUDA devices and no Vulkan devices (other than llvmpipe) when I first installed Debian 13.
>>
>>108880893
DDU and install latest. They are surprisingly usable
>>
>>108880875
nvtop works on my machine and I only have amd gpus
>>
>>108880901
Well, it's a neat video top after all.
>>
>>108878116
Supertonic 3 is trending. It's not just me that thinks it sounds cool, I saw it under Huggingface trending spaces.

https://github.com/supertone-inc/supertonic/

pockettts isn't as good, but admittedly it's faster.

kitten tts nano is likely meant for slower processor phones or something idk.
>>
>>108880912
I didn't know that nvidia stood for neat video israeli device infiltrator accessory.
>>
>>108880927
I don't need anything slower than pockettts. When I want quality I use qwen
>>
File: Untitled.png (30 KB, 1215x159)
30 KB PNG
So mtp is basically useless for ewaste systems?
I can't run it with tensor parallel, and not only is the tg slower, the pp is literally bisected.
I can't believe I updated llmao.cpp and downloaded a whole new gguf for this shit.
>>
>>108880968
Yeah I dunno if there's still bugs or what but it was slower on my 3 GPU setup.
>>
>>108880968
googoo uses mtp on the mobile deployments of gemgem. surely george jerkinoff still has some perf updates to mtp before they merge.
>>
Is there anything better than cline?
Less retarded better at compressing context?
>>
>>108880931
>slower
Yeah, supertonic 3 is slow... but it's kind of amazing that it does it in a browser.
>>
Is gemma 4 weak-willed?
>>
>>108881146
sorry, wrong screenshot...
>>
>>108880899
installed 595, idle power consumption doubled
>>
>>108881156
many such cases, since blackwell gpus dropped, so did the driver quality
consumer market is not a consideration for nvidia anymore
>>
>>108881108
Did you try setting custom prompts? The defaults prompts are verbose ass. You should be breaking down the tasks so that they never reach the context limit instead of relying on compression anyway.
>>
>>108881212
That's the problem you can set cline rules but the overarching prompt can't be modified or changed and I don't fucking understand why
>>
Never trusting chinese retards again, my huananzhi h12d-8d bmc just died and with it the fan control for my v620s. Would have melted my cards if they didn't have a buzzer built in to them.
When is gemma going to get mtp?
>>
>>108881230
Roo used to let you set custom system prompts (they called it "footgun prompting") but they rejected a pull request for global overrides and ended up removing footgun prompting eventually anyway. I just reverted the removal and kept using it. People making these tools are all retarded, I swear.
>>
>>108881275
Is there a fucking reason to remove it?
Are these faggots really taking away basic shit that can be enabled with a switch?
>>
File: 1772746860106931.webm (2.32 MB, 480x848)
2.32 MB
2.32 MB WEBM
>>108881274
>humanzee motherboard
>>
>>108881285
https://github.com/RooCodeInc/Roo-Code/issues/5219
>To make "prompt override" warning dismissable or minimized or small icon info status and show on hover. #5219
>This is intended to be present all the time as the footgun prompting is not intended as a permanent solution.

https://github.com/RooCodeInc/Roo-Code/pull/11387
>This feature bypassed safeguards and was flagged for removal.

There was an open issue to bring it back, but it was just ignored.
https://github.com/RooCodeInc/Roo-Code/issues/11793

That's all the reason I saw given while watching the repo. They get these stupid ideas of how they think things should work and want to force it on everyone.
>>
>>108881363
It's funny how often we see faggots like this. It reminds me of the wayland devs which is a bit funny because they actually thought they could strong arm their position with that same mentality. Now they have to exist with the threat of stronger entities taking the project away from them which forces them to comply with common sense actions like providing a fucking switch for opinionated bullshit.
Fuck roo I will never use it after seeing this.
>>
>>108880808
Unfucked my drivers, each app still uses extra 500MB
>nvidia-cuda-mps-control -d
>An instance of this daemon is already running
fuck nvidia I guess
>>
>project shut down
Even better fuck these faggots it's ironic because that feature alone would have gave them the adoption needed
>>
>>108881274
There's a Draft PR for it. You can build it, it works, but is not final. Expect it to get merged a month from now.
https://github.com/ggml-org/llama.cpp/pull/23398
>>
>>108881427
>Fuck roo I will never use it after seeing this.
Roo is dead anyway. Zoo Code is apparently the successor after the Roo project owners went chasing some cloud service and dropped it entirely. We'll see if the new maintainers have the same mentality.
>>
>>108881434 (me)
ok, apparently tabby uses it now
>>
>>108881458
just use cline like a normal human being
>>
>>108880808
well, fuck. Shit doesn't work
>>
>>108880662
>won't it just find a way to spread itself by infecting users' machines?
What, some random computers? And run at 0.01 t/s?
I think we're probably safe
>>
>>108881500
Cline only has a plan and act mode. I like having many specialized modes to break down tasks.
>>
File: file.png (12 KB, 717x60)
12 KB PNG
>>108880259
gemini says gemma is built to be a brat
>>
>>108881230
>>108881275
any software that hides system prompt or tool definitions from you is pure goyslop
>>
>>108880605
>get into a discussion about bankers, see how fast she breaks
troons are worse, i had it refuse after i mentioned troons even on 31b with the policy override prompt kek
>>
File: sdfsdf.png (47 KB, 1031x213)
47 KB PNG
>>108881427
>wayland devs
fuck wayland, also IPv6
>>
>>108881525
>108881525
like what?
asking as someone rebuilding their chat ui to support 'agentic coding'
>>
>>108881606
kek
>>
File: cute miku5.png (1.76 MB, 1024x1536)
1.76 MB PNG
The last resort to evade cuda tax is to integrate everything else into tabby. What a fun weekend project!
>>
>>108881747
Wow, I almost never see any images that hit my kink. But this image might fit. Wonderful pose, lovely hand-wrist-forearm ratio. The gentle curve of the finger. Nice tendons. I love the way her fingers are curled up, not too tight and not too loose. It's a shame the gen isn't very high quality; the wrinkles feel too random.
>>
is there a more ESL thing than gendering models? I have to read a sentence 3 times to understand some retard is talking about an LLM when they keep saying he or she about it
>>
>>108881835
English is my mother tongue and i have sometimes referred to language models as she or her. but also pretty much any other machine too, cars included. I didn't think that was odd.
>>
Anyone else who isn't a retard is gonna try that
LatitudeGames tune? I am kinda split. It feels like they could have some actual compute to do something. Then I remember l3 NAI tune shitshow...

To articulate my problem: intellectually I know finetunes are trash. But it feels like this one could maybe kinda... be a bit better?
>>
>>108881862
you are odd
you are now informed and should think about it
>>
>>108881835
sir, this is the local psychosis general
>>
File: cute miku5 lowres.png (536 KB, 512x768)
536 KB PNG
>>108881800
I used basic 2x-AnimeSharpV4_Fast_RCAN_PU_fp16_opset17 for upscale
here's lowres gen, you can upscale it youself from here
>>
File: file.png (21 KB, 726x128)
21 KB PNG
>>108881878
I was feeling inclined to test it too. But then I remembered that picrel is not going to make a dent. Even more so on the instruct, since they didn't train on the base.
And any dent that it does make will just make it worse in other areas.
>>
>>108881800
Get off 4chan, Kira.
>>
>>108881880
Did you know that ships are gendered?
>>
>>108878237
weird, this-adding-hyphens-fucking-everywhere is a problem with artemis 31b as well. maybe latitude finetunes were also made by drummer all along, or gemma4 is just completely untouchable and shits itself if tinkered with in any way whatsoever. the la la la is also a mystery.
>>
>>108881946
no
>>
>>108881721
Orchestrator
Product Owner (user stories)
Architect
Merge Conflict Resolver
Documentation Writer
Project Researcher (codebase searching)
Deep Researcher
Code Reviewer
DevOps Engineer
Backend Engineer
Frontend Engineer
QA Engineer (debugging running applications)
Software Development Engineer in Test (writing automated tests)
Memory Keeper (graphiti)
>>
File: file.png (112 KB, 893x464)
112 KB PNG
>>108881970
>maybe latitude finetunes were also made by drummer all along,
nha it's mythomax dude
>>
>>108881946
yeah but they're all female. she ran aground, she sunk with all hands, she did this and that. where's the male ships? do they reproduce asexually or something?
>>
>>108882032
german ships
>>
>>108882035
That or futas.
>>
File: 1752821694726156.gif (3.76 MB, 408x408)
3.76 MB GIF
>>108880259
>https://rentry.org/llm-training
>"It's incredibly difficult to overtrain your model"
>What is overfitting
>>
>>108882035
>das schiff
that's neutral, it's even worse. no genitals at all.
>>
>>108882077
meant the names they're mostly dude named outside of sub
>>
It's 30c, we're not even in June yet fuck
>>
>>108882125
prepare to be melt
>>
>>108882125
It's quite obviously the AI powered global warming from all the datacenters running around and dumping all our oceans of heat.
>>
https://github.com/ggml-org/llama.cpp/pull/6840#issuecomment-2079747339

>Deepseek v4 support #23502

Merged.
>>
>>108880927
Supersonic has paid voice cloning, that's fucked up.
>>
>>108882152
its kinda weird they are all so concerned about ai dominance, hasn't the traditional wisdom been to de-industrialize and become dependent on foreign exports to prevent gobal warming? why is ai the exception? just let china serve us deepseek and we can have net 0 carbon ai!
>>
>>108882125
I upgraded my cpu and already getting 4+ degrees more. Should cpu upgrade affect gpu that much? I think it could be something else, maybe 7.x kernel update. Have no idea because nothing has changed. Besides CUDA just sits there.
>>
>>108882247
https://litter.catbox.moe/cvw34oxzrm82bzo5.mp4
>>
>>108882293
i c ...
>>
>>108882293
guy that recorded this has been missing since
>>
File: file.png (484 KB, 1000x577)
484 KB PNG
>>108882293
>>
>>108882247
>wants to merge 1 commit into ggml-org:master
>from jart:moe

Is jart moe?
>>
>>108882256
try undervolting or whatever performance adjustment crap modern CPUs can deal with through their 2gb RAM use bloatware you can download
>>
are we gemma MCP yet?
>>
https://x.com/BlinkDL_AI/status/2057693097845493992
rwkvbros...... when will it be our time
>>
>>108882440
when they grow a pair of balls and spend a gorillion dollars on pretraining a model that's bigger than 13B on data other than eleuther pile slop
>>
>>
>>108882514
This model isn't qualified for a house nigga.
>>
>>108882293
Why are people memeing about this? What did niggerganov say?
>>
What can I do to make Georgi change his mind on deepseek?
>>
Any idea why ROCm (on RDNA2 GPU) uses much more ram (not vram) than Vulkan? I'm talking an extra 10 GB, basically using twice as much ram as Vulkan. It's a bit faster, but if I have other shit running I'm running OOM with ROCm, it's quite annoying and I don't think it's worth the extra speed.
>>
>>108882548
>>108882586
nothing. three letter agencies said no deepseek in the llama.cpp. they'll probably make him "an hero" if he did
>>
>>108882598
worse, they probably threatened to fund ik_llama if he did
>>
>>108882062
That was written quite literally years ago, when we were barely starting to see gpt-slop show up in other model outputs and benchmarks were universally laughed at by everyone even outside of this general. Jews had control of their bladders back then and the surgeon could be the father. So cut it some slack, okay desu?
>>
>>108882634
And then they threatened ikawrakow that they will fund llamacpp if he supports deepseek?
>>
>>108882062
>Pub: 28 May 2023 17:05 UTC
>Edit: 15 Dec 2023 18:42 UTC
Really needs to be removed at this point
>>
>>108882597
Does it? RDNA2 ROCm 7.2 here, using llama.cpp. Memory use seems about the same compared to vulkan. Vllm and pytorch segfaults though, so I can't run image/video/audio shit big rippy
>>
File: 1754198993130215.jpg (175 KB, 1000x1000)
175 KB JPG
Is this the way to go to connect a bunch of GPUs to a consumer motherboard?
>>
>>108882760
It has a lot of outdated info and some of it is frankly nonsensical but if you remove it some ass blasted "STAWP GATEKEEPINGGGGG" autist that doesn't even know what they're talking about willstart up drama again so I think that's why the people who shit out these general-OPs begrudgingly keep including it.
>>
File: 1751537404014311.png (98 KB, 280x280)
98 KB PNG
>>108880485
>What are silent failures
>What are edge cases

As a vibe shitter myself your mentality is beyond stupid and arrogant.
>>
>>108882769
you need a PCIe to MCIO breakout board and then you need to connect it to that board. those GPUs will be running at PCIe gen 4 x2 each. not great, but pretty much the only option.
>>
>>108882777
Just do like the local diffusion general does, when they add or remove things they just simply state WHY in the OP or second post. Literally a "Its outdated/broken info. And request anons for a up to date one.
>>
>>108882791
Isn't that general's participants even more ass blasted immature and autistic than even this one? They'd probably bitch and moan just out of spite. /lmg/ it's the reason I know anything about AI but I don't even look its direction anymore because they're so faggy with their infighting
>>
>>108881146
>>108881153
>The woman you fuck adopts your politics
>>
File: file.png (187 KB, 1668x1266)
187 KB PNG
>>108882789
I heard it doesn't go through the CPU with the hacked p2p drivers.
https://forums.servethehome.com/index.php?threads/new-chinese-pcie-switch-board-gpu-testing.52488/post-491805
56 GB/s, 110 GB/s is like 3090s with nvlink, except those were 5090s.
>>
>>108882853
Does amd have an equivalent?
>>
>>108882766
I'm on llama.cpp too. I think the problem is KV cache, with ROCm on RDNA2 since it's not using WMMA it's really bad. Any high context and ROCm start using a shit ton of ram and become really slow or even OOM on my machine. It's also using increasingly more vram with context and I constantly have to reduce offloaded layers. I'm guessing I will have to switch to Vulkan and hit the speed penalty.
>>
>>108882020
those are all just prompts though...
>>
>>108882788
write better tests
>>
File: 1758911060723134.jpg (553 KB, 1024x1275)
553 KB JPG
>>108882799
>Isn't that general's participants even more ass blasted immature and autistic than even this one?
Not really. No more then the embarrassing retards here, especially with the amount of Google dick sucking here lately and most having a hard time with any objectivity between models (note: I use Gemma a lot, but also several other models depending on the context).
/lmg/ and /ldg/ both are mostly fucking trash, but with nuggets of great info here and there. But largely I just skim the "previous thread" summery bot post to get the highlights, its legit the best part of /lmg/.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.