[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!


[Advertise on 4chan]


File: perfecional.png (1.06 MB, 768x1024)
1.06 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109101986 & >>109098000

►News
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)
1.15 MB JPG
►Recent Highlights from the Previous Thread: >>109101986

--Recommending models for RTX 4050 and discussing Gemma 4 depurpling:
>109104682 >109104688 >109104707 >109104756 >109104803 >109104834 >109104929 >109104838 >109104849 >109104871 >109104809 >109104856 >109104867
--Feasibility and bottlenecks of pooling VRAM via RPC over gigabit networks:
>109106828 >109106858 >109106872 >109106891
--Using 2.5 mproj to give K2-Thinking vision capabilities:
>109103511 >109104588 >109105603
--DeepSeek-V4-Flash-Base GGUF reports and architecture naming issues:
>109104143 >109104818 >109104965
--MTP speed benchmarks for Gemma-4 using Vulkan on RX6700XT:
>109102307 >109103451 >109107003 >109108056
--Optimizing settings for Gemma-4 models on low-VRAM hardware:
>109102301 >109102361 >109102385 >109102398 >109102405 >109102402 >109102429 >109102434
--Gemma 4's tendency toward robotic prose with long system prompts:
>109103211 >109103223 >109103241 >109103258 >109103266 >109103382 >109103689
--Desire for smaller zai models and Gemma-4-12B performance:
>109106452 >109106505 >109106547 >109106569 >109106628 >109106654 >109106674 >109106723 >109107585
--Comparing Fable 5 to OSS and discussing Anthropic's ID verification:
>109103890 >109103940 >109104006 >109103944 >109104032 >109104082 >109104391 >109106627 >109105684
--Discussing high-quality MoE models and MoE vs dense architectures:
>109104895 >109104901 >109104918 >109104962 >109105038 >109105053 >109105093 >109104992 >109105095 >109105118 >109105388 >109105428 >109106706 >109106766
--Viability of running llama.cpp across mixed Metal and ROCm devices:
>109106205 >109106300 >109106340 >109106329 >109106347
--Discussing a neural network that converts images into playable games:
>109107514 >109107570
--Logs:
>109103511 >109104424 >109104803 >109104809 >109106627
--Miku (free space):
>109103689

►Recent Highlight Posts from the Previous Thread: >>109101988

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
gemmaballz
>>
>no DSv4 yet
>no M3 sparse attention yet
>no one looking at PRs
more like LAMEa.cpp amirite
>>
so what comes after Mythos?
a new super model capable of what exactly?
and after that? if this is an eternal race, what are the future capabilities? programming languages are limited and all the security holes should get filled at some point.
then what does the model do? and how does it improve? to do WHAT? invent new programming languages so it can hack it and then it can create the shields for whatever it invented?
i don't fully get it
>>
>>109108395
>so what comes after Mythos?
Thread
>>
>>109108395
>so what comes after Mythos?
legends
>>
anyone using gemma31 for translations, especially long ones (5000-10000+)
is it good?
>>
anons, when some of you say you're using multiple agents, do you mean :
- sequentially, basically every iteration checking the one before for anything wrong
- at the same time
?
>>
Relative noob here, just perfected my SillyTavern frontend.

What CLI do you guys use for your Gemmy? Gemini is telling me to use Aider.
>>
>>109108422
pi.dev, then whatever plugins you like. only one I've been using is pi-fff with the override for better find and grep
>>
>>109108388
good thing forks exist and you can literally use them right now before waiting months for those fags to implement it
>>
>>109108422
opencode is good
>>
70b dense
>>
>>109108472
>2 t/s is slow
>waits 5 hours for a (you)
>>
File: dsv4lite logs teto.png (118 KB, 1578x474)
118 KB PNG
>>109108388
Using the PR, I'm liking how DS v4 lite writes its in-character thinking, and story completion but I can't stand how I need to wait 10 seconds for each story continuation to begin in mikupad even without changing any tokens in the prompt after the previous generation. Sucks to be poor running GPU+CPU.
>>
>>109108346
>currybook
cringe
>>
>>109108531
So far I'm liking it at high temp for rp/stories and how much more efficient and nicer the thinking is compared to some other models.
>>
loli feet
>>
405b dense
>>
>>109108449
But anon, I use AMD.
>>
>Gemma-4-125B-IT
>Still can't do tool calls sucessfully
>>
Who with a big rig is using a q4+ quant of glm5.2? Is it worthwhile vs k2.7-code for code, planning and logic work?
I'm running low on disk space to be quanting yet another model if it isn't a pretty significant jump.
>>
I am requesting the Ace song guy to train on Tupac, we are long overdue for a Hit Em Up part 2 and general Tupac revival.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.