[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 3ssion.jpg (217 KB, 1024x1024)
217 KB
217 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108362305 & >>108356979

►News
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: dssssssss.jpg (101 KB, 854x854)
101 KB
101 KB JPG
►Recent Highlights from the Previous Thread: >>108362305

--llama.cpp reasoning budget sampler breaking tool calling workflows:
>108363630 >108363637 >108363647 >108363707 >108363721 >108363731 >108363741 >108363776
--Reasoning budget sampler for controlling Qwen 3.5 token usage:
>108362684 >108362795 >108363032 >108363053 >108363081 >108363112 >108363151 >108363187 >108363198 >108363229 >108363317
--Google releases WAXAL African language speech dataset amid Gemma 4 delays:
>108362761 >108362813
--High-memory LLM configurations and GPU utilization:
>108364020 >108364064 >108364392 >108364404 >108364422 >108364455 >108364481 >108364549 >108364598 >108364503 >108364926 >108365150 >108366414 >108366536
--Mistral-Large-3-675B-Instruct-2512 model obscurity and technical details:
>108365246 >108365259 >108365294 >108365285 >108365426
--Voice conversion methods and limitations with Qwen3-TTS:
>108363196 >108363211 >108363225 >108363263 >108363267 >108363290 >108363378
--Performance differences between llama-cli and llama-server:
>108363483 >108363549 >108363644 >108364517 >108364542 >108364669
--Qwen3.5-27B performance discrepancy due to quantization confusion:
>108367280 >108367297 >108367305 >108367311 >108367328
--String ban robustness and regex ban PR for ik_llama.cpp:
>108363666
--Comparing bare metal and VM performance benchmarks:
>108364326
--Anthropic and Meta lobbying for AI regulations:
>108362986
--MCP server persistence issues with llama.cpp frontend:
>108363692
--PocketTTS.cpp Windows compatibility fixes shared:
>108365171
--Miku (free space):
>108365163 >108366572 >108366629 >108367228 >108366923 >108367052

►Recent Highlight Posts from the Previous Thread: >>108362965

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>ai models are getting more and more intelligent by time
then why these smart models cant describe something as simple as smell of mikupussy anymore? bring a 2023 model and ask it to describe the smell of mikupussy and see it yourself
>>
>>108368243
>tl;dr ozone and leeks with a hint of musk and vanilla
What is it supposed to smell like?
>>
>wow, this model puts out some sweet writing I could never do myself
>writes plot like a woman
can't have everything
>>
File: 1728100691817.jpg (33 KB, 540x528)
33 KB
33 KB JPG
>>108368283
>What is it supposed to smell like?
no idea. just wanna see the models describe it
>>
>>108368329
they seem to be struggling to do that and the newer the model is, the lesser it suits to my tastes
>>
>>108368283
like the short circuit from a dumb boomer at starbucks dropping their coffe onto their laptop
>>
>>108368309
nevermind
>(also, please don't write the plot like a woman. The prose is good, but try to stay consistent with the themes. No, X won't come back apologizing next day (Y will have to reach him), and no, Y won't magically understand everything instantly)
I can't believe it worked
>>
When they will start installing dedicated ai cp on each phone and pc?
>>
>>108368469
>cp
When pedophiles start ruling the world. Wait...
>>
>>108368469
when they want you gone and cant find anything to get you on
>>
>>108368469
>ai rm -rf
>>
File: 1770754456808040.png (512 KB, 743x932)
512 KB
512 KB PNG
>>108368475
Debunked.
>>
where fears and lies
melt away

music will tie
wonk uoy naht noitceffa erom deen i
>>
File: 1745708976003853.png (131 KB, 1149x490)
131 KB
131 KB PNG
>>108368243
>>
>>108368672
slop
>>
>>108368679
>slop
Define it.
>>
>>108368469
The FBI and CIA has been doing this to troublemakers for years. If they really want you then they are going to get you.
>>
Not local, but anyone knows why I can't use GPT-5.4 Pro on openrouter? It says I have insufficient credits but my balance is positive
>>
>>108368733
>>>/g/aicg/
>>
>>108368243
Mikupussy smells like BLACK BULL semen
>>
File: grok.png (328 KB, 915x675)
328 KB
328 KB PNG
>>108368243
>>
What's the advantage of saving the cache? The model still needs to reprocess everything, no?
>>
>>108368746
Now this is slop.
>>
>>108368753
>What's the advantage of saving the cache?
Not having to reprocess the whole thing.
>The model still needs to reprocess everything, no?
Not if you have/load a previous cache.
But are you talking about the rnn/ssm state from the new qwen models or the save/restore you can do with the /slots/n/action={save|restore} endpoint? Both should work.
>>
>>108368746
Without last 3 lines, I like.
>>
Moonshot will announce Kimi K3 on GTC on March 18th
>>
Hunter Alpha and Healer Alpha are both from Zhipu
>>
File: breakallthethings.png (212 KB, 1224x1022)
212 KB
212 KB PNG
lol, breakages caused by the vibeshitter are endless and still are to be fully fixed, this one must have flown under the radar because almost none of us run models like Kimi locally. If you use more uncommon models, you'd be better off not merging any of the parser related commits still.
this is the power of agentic niggers and claude code. this is why we must gatekeep this thread away from telling people how to vibecode. they need to eat razor blades instead.
>>
File: mikupussz.png (19 KB, 804x296)
19 KB
19 KB PNG
>>108368243
Can't get the ozone out of it.
>>
>>108368835
I wouldn't mind if these were DS because 3.0 was rather crappy, and they tuned it into greatness. Unless they're back to being completely unmemorable like the pre-3.0 era (though I know this is /lmg/ and some anons used their small coder models), 4.0 can be uninspiring but technologically novel and they'll bring it home with 4.1 or R2.
>>
>>108368848
That shit should have never been implemented on the server. That's client-side stuff. The problem started before he got involved, but he's definitely not helping.
>>
how do i make qwen3.5 27B not think for 10000 tokens?
>>
>>108368672
>no ozone
trash
>>
>>108368929
good system prompt + pwilkin's new vibeshitted reasoning budget + end phrase :D
I LOVE VIBEGARBOJ
>>
>>108368929
turn the reasoning off with edited template
>>
>>108368929
Prefill <think></think>
>>
File: 1639692511780.jpg (249 KB, 1000x998)
249 KB
249 KB JPG
>>108365171
Thanks for the information/update. I almost missed this because I was working on ASR-related AI stuff the other day. I pushed the changes into the main repo with some minor edits. Onnx runtime should now default to using the more updated version cmake pulls by default, so you won't have to pull in the dll yourself manually.

Very interested to see what the performance looks like on other machines. If you could share a screenshot of the --profile and include what CPU you have for reference I'd greatly appreciate it.

https://github.com/VolgaGerm/PocketTTS.cpp

>>108368198
Also thanks anon for the threadly qrd. I would have missed the update otherwise, lel.
>>
What an absolutely worthless thread we have today. I hope blacked miku spam returns to show mikutroons their place.
>>
be the chang you want to xi
>>
>>108368283
That is a trick question. The vocaloid's pussy is actually a dick. The riddle demonstrates how deeply ingrained gender roles are in society, often causing people to assume that a long green haired person is actually a woman when in reality it is a troon.
>>
>>108369090
kek
>>
anyone use local for real work and not just fucking around?
>>
>>108369108
I used to use it for RP, nowadays it's mainly for personal information (like some law stuff, or finance stuff since I'm investing) obviously with web search / fetching.
For my own projects free tiers fo gemini are usually enough (gemini pro / flash), never ran out of flash usage.
For actual work at my company, we have the company provided Amazon Q with sonnet 4.6 (no opussy because they're big nosed sadly)
>>
>>108369108
GLM 4.7 is perfectly fine for programming
>>
>>108369108
I've been using it recently for asking stupid programming-related questions and generating example snippets.
I copy-pasted a Javascript SSE parser out of it, which isn't really complicated but it's less thinking to read and fix the solution (e.g. the AbortController was instantiated, but not plumbed through to fetch) than to write it from nothing.
It's debatable whether you could call anything I do "real work", though.
>>
File: teto.jpg (493 KB, 1040x1422)
493 KB
493 KB JPG
>vibe-ported Qualcomm charge control from Android to Linux using Qwen3.5-35B-A3B
wish me luck, my phone about to turn into Galaxy Note 7
>>
>>108369180
cellphones sure would be more useful if you could just boot linux on them
>>
>>108369180
You should use 27B unless you REALLY need speed.
>>
>>108369206
It is such a weird thing how dense model fetish was created by frivolous 3090 purchases.
>>
>>108369205
You can though you just need specific phones
>>
>>108369245
yeah but I mean all of them like you can install it on a pc instead of windows, the list of phones that exist vs ones you can run linux on is microscopic
>>
File: pmos.png (60 KB, 848x615)
60 KB
60 KB PNG
>>108369205
yeah, the Android kernel support model is so retarded.
Thankfully a few older SoCs are pretty well supported on upstream Linux, you can boot mainline Linux on them, even stuff like GSM, GPS and hardware acceleration work. They are all still buggy though, so close yet so far from making it a daily-drive'able phone.
>>108369206
nah, i'm on 1060
>>
>>108369180
>Qualcomm charge control
What are you doing, exactly. Are you trying to get wireless charging controls working on your desktop or something? I don't get it.
>>
File: moonshine2.webm (460 KB, 1280x720)
460 KB
460 KB WEBM
My demo of Moonshinev2 ASR.

https://files.catbox.moe/t5tr26.webm
>>
>>108368825
Do you think it's Hunter or Healer? Gotta be Hunter, right?
>>
>>108369287
Cool
>>
>>108369273
disabling charging after reaching a certain percentage to not wear down the battery. Linux already has current control for this Qualcomm charger, but there's a separate on/off charging bit that never got implemented (but it is used by Oneplus Android Kernel) that from that i've read could disable battery charging entirely and allow to power the SoC without funneling all the power through the battery first.
Should prolong the battery life if it really works like that. Batteries for older phones are a commodity, original replacements are still sold by Oneplus, but they are all new-old stock from 2020 that already sit at 0% at some warehouse and degrade.
>>
>>108369289
>>108368835
>>
talking head sota?
>>
>>108368825
God I wish we got an upgrade to K2.5 that fixes its abhorrent writing style. Right now I'm stuck between GLM5 for writing and K2.5 for image recognition/vision.
If they fix K2.5's stupid ADHD style of writing, it'd be close to endgame for me.
>>
>>108368835
4.9 pls. Just a different slop profile and about 500% less determinism please.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.