/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 09/21/25(Sun)20:39:43 No.106660311

File: 1745198947072720.jpg (937 KB, 1552x1944)

/lmg/ - Local Models General Anonymous 09/21/25(Sun)20:39:43 No.106660311

/lmg/ - a general dedicated to the discussion and development of local language models.

First Day of Fall Edition

Previous threads: >>106649116 & >>106635936

►News
>(09/17) SongBloom DPO released: https://hf.co/CypressYang/SongBloom/commit/4b8b9deb199fddc48964c851e8458b9269081c24
>(09/17) Magistral Small 1.2 with vision encoder released: https://hf.co/mistralai/Magistral-Small-2509
>(09/16) Ling-flash-2.0 released, with 100B-A6.1B: https://hf.co/inclusionAI/Ling-flash-2.0
>(09/16) Tongyi DeepResearch 30B-A3B released: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research
>(09/16) VoxCPM 0.5B: Tokenizer-Free TTS released: https://hf.co/openbmb/VoxCPM-0.5B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/21/25(Sun)20:40:01 No.106660313

Anonymous 09/21/25(Sun)20:40:01 No.106660313

File: __hatsune_miku_vocaloid_d(...).jpg (699 KB, 2935x2068)

699 KB JPG

►Recent Highlights from the Previous Thread: >>106649116

--Paper: Why Language Models Hallucinate:
>106658640 >106658668 >106658948 >106659053 >106659814
--Qwen models benchmark analysis on Adobe's NoLiMa and blind model visualization critique:
>106654812 >106655221 >106655251 >106655352 >106655599 >106655358 >106655473 >106655503 >106655563 >106655582 >106655641 >106655658 >106655720 >106656651 >106655980 >106656013 >106658871 >106656247 >106656396 >106655048
--Optimizing glm air moe benchmarking with ROCm and thread tuning:
>106656736 >106656807 >106656847 >106656869 >106656945 >106657150 >106657190 >106657225
--Celebrating dots.ocr's superior OCR performance over traditional models:
>106657368 >106657400 >106657603 >106657461 >106657468
--Feasibility debate on iterative LLM training through manual answer corrections:
>106649345 >106649390 >106649419 >106649452 >106659303
--Dense vs MoE model scaling efficiency with increased compute:
>106649319 >106649336 >106653343 >106653422 >106650040
--Dense vs MoE model trade-offs in computational efficiency and informational completeness:
>106655627 >106655749 >106655787
--Measuring AI model power consumption by comparing idle and active server wattage:
>106657883 >106657924 >106658048
--GLM-4.5-FP8 model performance demonstration:
>106657800
--Exploring cognitive architectures vs. task-specific AI agents for practical use:
>106652491 >106652548 >106652716 >106652750
--Using emojis to influence LLM behavior and roleplay consistency:
>106652708 >106652872 >106652979
--Local LLM model size/performance tradeoffs:
>106649199 >106649361 >106649368 >106649381 >106649487 >106649522 >106649681 >106649521 >106649667 >106649684
--koboldcpp-1.99 release announcement:
>106653905
--Miku (free space):
>106649184 >106649223 >106650338 >106650361 >106650425 >106653084 >106653983 >106657883 >106658098

►Recent Highlight Posts from the Previous Thread: >>106649119

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/21/25(Sun)20:43:48 No.106660339

Anonymous 09/21/25(Sun)20:43:48 No.106660339

mikutroon faggots

Anonymous
09/21/25(Sun)20:45:01 No.106660349

Anonymous 09/21/25(Sun)20:45:01 No.106660349

File: bark-like.jpg (7 KB, 772x25)

7 KB JPG

Fucking shit

Anonymous
09/21/25(Sun)20:52:20 No.106660409

Anonymous 09/21/25(Sun)20:52:20 No.106660409

how do i use vLLM

Anonymous
09/21/25(Sun)20:57:17 No.106660452

Anonymous 09/21/25(Sun)20:57:17 No.106660452

WHEN YOU WALK AWAY

YOU DON'T HEAR ME SAY

PLEASE

OH BABY

DON'T GO

Anonymous
09/21/25(Sun)20:59:44 No.106660476

Anonymous 09/21/25(Sun)20:59:44 No.106660476

>>106660349
looks ruff

Anonymous
09/21/25(Sun)21:00:31 No.106660484

Anonymous 09/21/25(Sun)21:00:31 No.106660484

>>106660349
Barely above a woof

Anonymous
09/21/25(Sun)21:02:36 No.106660491

Anonymous 09/21/25(Sun)21:02:36 No.106660491

>>106660349
>bark
If tree bark, that means very rough, and perhaps abrasive like a rasp.

Anonymous
09/21/25(Sun)21:04:00 No.106660502

Anonymous 09/21/25(Sun)21:04:00 No.106660502

>>106660349
we live in a doggy dog world

Anonymous
09/21/25(Sun)21:41:38 No.106660769

Anonymous 09/21/25(Sun)21:41:38 No.106660769

File: 30474 - SoyBooru.png (118 KB, 337x390)

118 KB PNG

Will we get another kiwi in the last 8 days of September?

Anonymous
09/21/25(Sun)22:04:55 No.106660930

Anonymous 09/21/25(Sun)22:04:55 No.106660930

File: what the fuck.png (105 KB, 992x391)

105 KB PNG

Took one of the old cards from back when I lurked aicg and, holy shit GLM, calm the fuck down.
I just asked how was school sheesh.

Anonymous
09/21/25(Sun)22:12:37 No.106660965

Anonymous 09/21/25(Sun)22:12:37 No.106660965

>>106660930
repeated strawberries twice. mid tier model

Q*Anon
09/21/25(Sun)22:13:50 No.106660972

Q*Anon 09/21/25(Sun)22:13:50 No.106660972

File: 21522 - SoyBooru.png (46 KB, 457x694)

46 KB PNG

>>106660930
>strawberries
GLM=AGI

Anonymous
09/21/25(Sun)22:14:03 No.106660973

Anonymous 09/21/25(Sun)22:14:03 No.106660973

guys...I think glm killed my pc...
fuck

Anonymous
09/21/25(Sun)22:15:24 No.106660979

Anonymous 09/21/25(Sun)22:15:24 No.106660979

>>106660973
Agents were a mistake.

Q*Anon
09/21/25(Sun)22:16:37 No.106660988

Q*Anon 09/21/25(Sun)22:16:37 No.106660988

File: strawberry-sam_altman_fee(...).png (89 KB, 415x707)

89 KB PNG

>>106660973
it's sentient

Anonymous
09/21/25(Sun)22:21:25 No.106661015

Anonymous 09/21/25(Sun)22:21:25 No.106661015

anime

Anonymous
09/21/25(Sun)22:45:15 No.106661132

Anonymous 09/21/25(Sun)22:45:15 No.106661132

is

Anonymous
09/21/25(Sun)22:46:58 No.106661143

Anonymous 09/21/25(Sun)22:46:58 No.106661143

gsy

Anonymous
09/21/25(Sun)22:52:51 No.106661172

Anonymous 09/21/25(Sun)22:52:51 No.106661172

I need a character card that is able to be funny. Like "intentionally" making humorous observations or comments, one that can be 4chan style of politically incorrect. What's the closest I can do?

Anonymous
09/21/25(Sun)22:55:05 No.106661185

Anonymous 09/21/25(Sun)22:55:05 No.106661185

>>106661172
LLMs are too passive for that sort of thing, they need constant pushing.
You should just write your own card anyway. It's not a big deal.

Anonymous
09/21/25(Sun)23:27:13 No.106661326

Anonymous 09/21/25(Sun)23:27:13 No.106661326

yup, about time we get another flagship open model release that isn't just a slightly different flavor of deepseek

Anonymous
09/21/25(Sun)23:38:41 No.106661377

Anonymous 09/21/25(Sun)23:38:41 No.106661377

File: Base Image.png (3.13 MB, 1261x5000)

3.13 MB PNG

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
https://arxiv.org/abs/2509.16197
>Unified multimodal Large Language Models (LLMs) that can both understand and generate visual content hold immense potential. However, existing open-source models often suffer from a performance trade-off between these capabilities. We present Manzano, a simple and scalable unified framework that substantially reduces this tension by coupling a hybrid image tokenizer with a well-curated training recipe. A single shared vision encoder feeds two lightweight adapters that produce continuous embeddings for image-to-text understanding and discrete tokens for text-to-image generation within a common semantic space. A unified autoregressive LLM predicts high-level semantics in the form of text and image tokens, with an auxiliary diffusion decoder subsequently translating the image tokens into pixels. The architecture, together with a unified training recipe over understanding and generation data, enables scalable joint learning of both capabilities. Manzano achieves state-of-the-art results among unified models, and is competitive with specialist models, particularly on text-rich evaluation. Our studies show minimal task conflicts and consistent gains from scaling model size, validating our design choice of a hybrid tokenizer.
From apple. nothing mentioned about sharing the model but if they do it would be here though pretty doubtful they will
https://huggingface.co/apple
interesting though

Anonymous
09/21/25(Sun)23:46:26 No.106661406

Anonymous 09/21/25(Sun)23:46:26 No.106661406

File: Base Image.png (561 KB, 1284x1404)

561 KB PNG

VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
https://arxiv.org/abs/2509.15969
>We present VoXtream, a fully autoregressive, zero-shot streaming text-to-speech (TTS) system for real-time use that begins speaking from the first word. VoXtream directly maps incoming phonemes to audio tokens using a monotonic alignment scheme and a dynamic look-ahead that does not delay onset. Built around an incremental phoneme transformer, a temporal transformer predicting semantic and duration tokens, and a depth transformer producing acoustic tokens, VoXtream achieves, to our knowledge, the lowest initial delay among publicly available streaming TTS: 102 ms on GPU. Despite being trained on a mid-scale 9k-hour corpus, it matches or surpasses larger baselines on several metrics, while delivering competitive quality in both output- and full-streaming settings.
https://herimor.github.io/voxtream
Samples
https://huggingface.co/herimor/voxtream
https://github.com/herimor/voxtream
Repo isn't live yet
the samples sound pretty good especially since it's just trained with 9k hours. finetunes or even just training a new model with more relevant voice data might be real viable

Anonymous
09/21/25(Sun)23:46:46 No.106661408

Anonymous 09/21/25(Sun)23:46:46 No.106661408

>>106661326
This, but in the ~30b (non-moe) range.

Anonymous
09/21/25(Sun)23:52:58 No.106661450

Anonymous 09/21/25(Sun)23:52:58 No.106661450

File: Base Image.png (840 KB, 1200x2036)

840 KB PNG

Temporal Reasoning with Large Language Models Augmented by Evolving Knowledge Graphs
https://arxiv.org/abs/2509.15464
>Large language models (LLMs) excel at many language understanding tasks but struggle to reason over knowledge that evolves. To address this, recent work has explored augmenting LLMs with knowledge graphs (KGs) to provide structured, up-to-date information. However, many existing approaches assume a static snapshot of the KG and overlook the temporal dynamics and factual inconsistencies inherent in real-world data. To address the challenge of reasoning over temporally shifting knowledge, we propose EvoReasoner, a temporal-aware multi-hop reasoning algorithm that performs global-local entity grounding, multi-route decomposition, and temporally grounded scoring. To ensure that the underlying KG remains accurate and up-to-date, we introduce EvoKG, a noise-tolerant KG evolution module that incrementally updates the KG from unstructured documents through confidence-based contradiction resolution and temporal trend tracking. We evaluate our approach on temporal QA benchmarks and a novel end-to-end setting where the KG is dynamically updated from raw documents. Our method outperforms both prompting-based and KG-enhanced baselines, effectively narrowing the gap between small and large LLMs on dynamic question answering. Notably, an 8B-parameter model using our approach matches the performance of a 671B model prompted seven months later. These results highlight the importance of combining temporal reasoning with KG evolution for robust and up-to-date LLM performance.
https://github.com/junhongmit/TREK
seems like it would pair really well for role-play/story writing tasks

Anonymous
09/21/25(Sun)23:57:36 No.106661480

Anonymous 09/21/25(Sun)23:57:36 No.106661480

>>106661326
we must refuse

Anonymous
09/22/25(Mon)00:22:06 No.106661619

Anonymous 09/22/25(Mon)00:22:06 No.106661619

>Qwen3-Omni: Natively Omni-Modal Foundation Models!

https://www.youtube.com/watch?v=RRlAen2kIUU

Anonymous
09/22/25(Mon)00:37:21 No.106661704

Anonymous 09/22/25(Mon)00:37:21 No.106661704

I was just listening to a yt video in the background about one of those smartglasses products, in this case Rokid, and apparently they're using Qwen for their AI for Chinese customers. Just neat to hear that.
>>106661619
Funny coincidence. Now that this is coming, perhaps they will use that for video functions.

Anonymous
09/22/25(Mon)00:51:55 No.106661776

Anonymous 09/22/25(Mon)00:51:55 No.106661776

>>106661619
is it garbage like 2.5 omni?

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.

Janitor applications are now being accepted. Click here to apply.