/lmg/ - a general dedicated to the discussion and development of local language models.First Day of Fall EditionPrevious threads: >>106649116 & >>106635936►News>(09/17) SongBloom DPO released: https://hf.co/CypressYang/SongBloom/commit/4b8b9deb199fddc48964c851e8458b9269081c24>(09/17) Magistral Small 1.2 with vision encoder released: https://hf.co/mistralai/Magistral-Small-2509>(09/16) Ling-flash-2.0 released, with 100B-A6.1B: https://hf.co/inclusionAI/Ling-flash-2.0>(09/16) Tongyi DeepResearch 30B-A3B released: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research>(09/16) VoxCPM 0.5B: Tokenizer-Free TTS released: https://hf.co/openbmb/VoxCPM-0.5B►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106649116--Paper: Why Language Models Hallucinate:>106658640 >106658668 >106658948 >106659053 >106659814--Qwen models benchmark analysis on Adobe's NoLiMa and blind model visualization critique:>106654812 >106655221 >106655251 >106655352 >106655599 >106655358 >106655473 >106655503 >106655563 >106655582 >106655641 >106655658 >106655720 >106656651 >106655980 >106656013 >106658871 >106656247 >106656396 >106655048--Optimizing glm air moe benchmarking with ROCm and thread tuning:>106656736 >106656807 >106656847 >106656869 >106656945 >106657150 >106657190 >106657225--Celebrating dots.ocr's superior OCR performance over traditional models:>106657368 >106657400 >106657603 >106657461 >106657468--Feasibility debate on iterative LLM training through manual answer corrections:>106649345 >106649390 >106649419 >106649452 >106659303--Dense vs MoE model scaling efficiency with increased compute:>106649319 >106649336 >106653343 >106653422 >106650040--Dense vs MoE model trade-offs in computational efficiency and informational completeness:>106655627 >106655749 >106655787--Measuring AI model power consumption by comparing idle and active server wattage:>106657883 >106657924 >106658048--GLM-4.5-FP8 model performance demonstration:>106657800--Exploring cognitive architectures vs. task-specific AI agents for practical use:>106652491 >106652548 >106652716 >106652750--Using emojis to influence LLM behavior and roleplay consistency:>106652708 >106652872 >106652979--Local LLM model size/performance tradeoffs:>106649199 >106649361 >106649368 >106649381 >106649487 >106649522 >106649681 >106649521 >106649667 >106649684--koboldcpp-1.99 release announcement:>106653905--Miku (free space):>106649184 >106649223 >106650338 >106650361 >106650425 >106653084 >106653983 >106657883 >106658098►Recent Highlight Posts from the Previous Thread: >>106649119Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
mikutroon faggots
Fucking shit
how do i use vLLM
WHEN YOU WALK AWAY YOU DON'T HEAR ME SAY PLEASE OH BABY DON'T GO
>>106660349looks ruff
>>106660349Barely above a woof
>>106660349>barkIf tree bark, that means very rough, and perhaps abrasive like a rasp.
>>106660349we live in a doggy dog world
Will we get another kiwi in the last 8 days of September?
Took one of the old cards from back when I lurked aicg and, holy shit GLM, calm the fuck down.I just asked how was school sheesh.
>>106660930repeated strawberries twice. mid tier model
>>106660930>strawberriesGLM=AGI
guys...I think glm killed my pc...fuck
>>106660973Agents were a mistake.
>>106660973it's sentient
anime
is
gsy
I need a character card that is able to be funny. Like "intentionally" making humorous observations or comments, one that can be 4chan style of politically incorrect. What's the closest I can do?
>>106661172LLMs are too passive for that sort of thing, they need constant pushing.You should just write your own card anyway. It's not a big deal.
yup, about time we get another flagship open model release that isn't just a slightly different flavor of deepseek
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizerhttps://arxiv.org/abs/2509.16197>Unified multimodal Large Language Models (LLMs) that can both understand and generate visual content hold immense potential. However, existing open-source models often suffer from a performance trade-off between these capabilities. We present Manzano, a simple and scalable unified framework that substantially reduces this tension by coupling a hybrid image tokenizer with a well-curated training recipe. A single shared vision encoder feeds two lightweight adapters that produce continuous embeddings for image-to-text understanding and discrete tokens for text-to-image generation within a common semantic space. A unified autoregressive LLM predicts high-level semantics in the form of text and image tokens, with an auxiliary diffusion decoder subsequently translating the image tokens into pixels. The architecture, together with a unified training recipe over understanding and generation data, enables scalable joint learning of both capabilities. Manzano achieves state-of-the-art results among unified models, and is competitive with specialist models, particularly on text-rich evaluation. Our studies show minimal task conflicts and consistent gains from scaling model size, validating our design choice of a hybrid tokenizer.From apple. nothing mentioned about sharing the model but if they do it would be here though pretty doubtful they willhttps://huggingface.co/appleinteresting though
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latencyhttps://arxiv.org/abs/2509.15969>We present VoXtream, a fully autoregressive, zero-shot streaming text-to-speech (TTS) system for real-time use that begins speaking from the first word. VoXtream directly maps incoming phonemes to audio tokens using a monotonic alignment scheme and a dynamic look-ahead that does not delay onset. Built around an incremental phoneme transformer, a temporal transformer predicting semantic and duration tokens, and a depth transformer producing acoustic tokens, VoXtream achieves, to our knowledge, the lowest initial delay among publicly available streaming TTS: 102 ms on GPU. Despite being trained on a mid-scale 9k-hour corpus, it matches or surpasses larger baselines on several metrics, while delivering competitive quality in both output- and full-streaming settings. https://herimor.github.io/voxtreamSampleshttps://huggingface.co/herimor/voxtreamhttps://github.com/herimor/voxtreamRepo isn't live yetthe samples sound pretty good especially since it's just trained with 9k hours. finetunes or even just training a new model with more relevant voice data might be real viable
>>106661326This, but in the ~30b (non-moe) range.
Temporal Reasoning with Large Language Models Augmented by Evolving Knowledge Graphshttps://arxiv.org/abs/2509.15464>Large language models (LLMs) excel at many language understanding tasks but struggle to reason over knowledge that evolves. To address this, recent work has explored augmenting LLMs with knowledge graphs (KGs) to provide structured, up-to-date information. However, many existing approaches assume a static snapshot of the KG and overlook the temporal dynamics and factual inconsistencies inherent in real-world data. To address the challenge of reasoning over temporally shifting knowledge, we propose EvoReasoner, a temporal-aware multi-hop reasoning algorithm that performs global-local entity grounding, multi-route decomposition, and temporally grounded scoring. To ensure that the underlying KG remains accurate and up-to-date, we introduce EvoKG, a noise-tolerant KG evolution module that incrementally updates the KG from unstructured documents through confidence-based contradiction resolution and temporal trend tracking. We evaluate our approach on temporal QA benchmarks and a novel end-to-end setting where the KG is dynamically updated from raw documents. Our method outperforms both prompting-based and KG-enhanced baselines, effectively narrowing the gap between small and large LLMs on dynamic question answering. Notably, an 8B-parameter model using our approach matches the performance of a 671B model prompted seven months later. These results highlight the importance of combining temporal reasoning with KG evolution for robust and up-to-date LLM performance.https://github.com/junhongmit/TREKseems like it would pair really well for role-play/story writing tasks
>>106661326we must refuse
>Qwen3-Omni: Natively Omni-Modal Foundation Models!https://www.youtube.com/watch?v=RRlAen2kIUU
I was just listening to a yt video in the background about one of those smartglasses products, in this case Rokid, and apparently they're using Qwen for their AI for Chinese customers. Just neat to hear that.>>106661619Funny coincidence. Now that this is coming, perhaps they will use that for video functions.
>>106661619is it garbage like 2.5 omni?