[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now being accepted. Click here to apply.


[Advertise on 4chan]


File: 1745198947072720.jpg (937 KB, 1552x1944)
937 KB
937 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

First Day of Fall Edition

Previous threads: >>106649116 & >>106635936

►News
>(09/17) SongBloom DPO released: https://hf.co/CypressYang/SongBloom/commit/4b8b9deb199fddc48964c851e8458b9269081c24
>(09/17) Magistral Small 1.2 with vision encoder released: https://hf.co/mistralai/Magistral-Small-2509
>(09/16) Ling-flash-2.0 released, with 100B-A6.1B: https://hf.co/inclusionAI/Ling-flash-2.0
>(09/16) Tongyi DeepResearch 30B-A3B released: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research
>(09/16) VoxCPM 0.5B: Tokenizer-Free TTS released: https://hf.co/openbmb/VoxCPM-0.5B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>106649116

--Paper: Why Language Models Hallucinate:
>106658640 >106658668 >106658948 >106659053 >106659814
--Qwen models benchmark analysis on Adobe's NoLiMa and blind model visualization critique:
>106654812 >106655221 >106655251 >106655352 >106655599 >106655358 >106655473 >106655503 >106655563 >106655582 >106655641 >106655658 >106655720 >106656651 >106655980 >106656013 >106658871 >106656247 >106656396 >106655048
--Optimizing glm air moe benchmarking with ROCm and thread tuning:
>106656736 >106656807 >106656847 >106656869 >106656945 >106657150 >106657190 >106657225
--Celebrating dots.ocr's superior OCR performance over traditional models:
>106657368 >106657400 >106657603 >106657461 >106657468
--Feasibility debate on iterative LLM training through manual answer corrections:
>106649345 >106649390 >106649419 >106649452 >106659303
--Dense vs MoE model scaling efficiency with increased compute:
>106649319 >106649336 >106653343 >106653422 >106650040
--Dense vs MoE model trade-offs in computational efficiency and informational completeness:
>106655627 >106655749 >106655787
--Measuring AI model power consumption by comparing idle and active server wattage:
>106657883 >106657924 >106658048
--GLM-4.5-FP8 model performance demonstration:
>106657800
--Exploring cognitive architectures vs. task-specific AI agents for practical use:
>106652491 >106652548 >106652716 >106652750
--Using emojis to influence LLM behavior and roleplay consistency:
>106652708 >106652872 >106652979
--Local LLM model size/performance tradeoffs:
>106649199 >106649361 >106649368 >106649381 >106649487 >106649522 >106649681 >106649521 >106649667 >106649684
--koboldcpp-1.99 release announcement:
>106653905
--Miku (free space):
>106649184 >106649223 >106650338 >106650361 >106650425 >106653084 >106653983 >106657883 >106658098

►Recent Highlight Posts from the Previous Thread: >>106649119

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
mikutroon faggots
>>
File: bark-like.jpg (7 KB, 772x25)
7 KB
7 KB JPG
Fucking shit
>>
how do i use vLLM
>>
WHEN YOU WALK AWAY

YOU DON'T HEAR ME SAY

PLEASE

OH BABY

DON'T GO
>>
>>106660349
looks ruff
>>
>>106660349
Barely above a woof
>>
>>106660349
>bark
If tree bark, that means very rough, and perhaps abrasive like a rasp.
>>
>>106660349
we live in a doggy dog world
>>
File: 30474 - SoyBooru.png (118 KB, 337x390)
118 KB
118 KB PNG
Will we get another kiwi in the last 8 days of September?
>>
File: what the fuck.png (105 KB, 992x391)
105 KB
105 KB PNG
Took one of the old cards from back when I lurked aicg and, holy shit GLM, calm the fuck down.
I just asked how was school sheesh.
>>
>>106660930
repeated strawberries twice. mid tier model
>>
File: 21522 - SoyBooru.png (46 KB, 457x694)
46 KB
46 KB PNG
>>106660930
>strawberries
GLM=AGI
>>
guys...I think glm killed my pc...
fuck
>>
>>106660973
Agents were a mistake.
>>
>>106660973
it's sentient
>>
anime
>>
is
>>
gsy
>>
I need a character card that is able to be funny. Like "intentionally" making humorous observations or comments, one that can be 4chan style of politically incorrect. What's the closest I can do?
>>
>>106661172
LLMs are too passive for that sort of thing, they need constant pushing.
You should just write your own card anyway. It's not a big deal.
>>
yup, about time we get another flagship open model release that isn't just a slightly different flavor of deepseek
>>
File: Base Image.png (3.13 MB, 1261x5000)
3.13 MB
3.13 MB PNG
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
https://arxiv.org/abs/2509.16197
>Unified multimodal Large Language Models (LLMs) that can both understand and generate visual content hold immense potential. However, existing open-source models often suffer from a performance trade-off between these capabilities. We present Manzano, a simple and scalable unified framework that substantially reduces this tension by coupling a hybrid image tokenizer with a well-curated training recipe. A single shared vision encoder feeds two lightweight adapters that produce continuous embeddings for image-to-text understanding and discrete tokens for text-to-image generation within a common semantic space. A unified autoregressive LLM predicts high-level semantics in the form of text and image tokens, with an auxiliary diffusion decoder subsequently translating the image tokens into pixels. The architecture, together with a unified training recipe over understanding and generation data, enables scalable joint learning of both capabilities. Manzano achieves state-of-the-art results among unified models, and is competitive with specialist models, particularly on text-rich evaluation. Our studies show minimal task conflicts and consistent gains from scaling model size, validating our design choice of a hybrid tokenizer.
From apple. nothing mentioned about sharing the model but if they do it would be here though pretty doubtful they will
https://huggingface.co/apple
interesting though
>>
File: Base Image.png (561 KB, 1284x1404)
561 KB
561 KB PNG
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
https://arxiv.org/abs/2509.15969
>We present VoXtream, a fully autoregressive, zero-shot streaming text-to-speech (TTS) system for real-time use that begins speaking from the first word. VoXtream directly maps incoming phonemes to audio tokens using a monotonic alignment scheme and a dynamic look-ahead that does not delay onset. Built around an incremental phoneme transformer, a temporal transformer predicting semantic and duration tokens, and a depth transformer producing acoustic tokens, VoXtream achieves, to our knowledge, the lowest initial delay among publicly available streaming TTS: 102 ms on GPU. Despite being trained on a mid-scale 9k-hour corpus, it matches or surpasses larger baselines on several metrics, while delivering competitive quality in both output- and full-streaming settings.
https://herimor.github.io/voxtream
Samples
https://huggingface.co/herimor/voxtream
https://github.com/herimor/voxtream
Repo isn't live yet
the samples sound pretty good especially since it's just trained with 9k hours. finetunes or even just training a new model with more relevant voice data might be real viable
>>
>>106661326
This, but in the ~30b (non-moe) range.
>>
File: Base Image.png (840 KB, 1200x2036)
840 KB
840 KB PNG
Temporal Reasoning with Large Language Models Augmented by Evolving Knowledge Graphs
https://arxiv.org/abs/2509.15464
>Large language models (LLMs) excel at many language understanding tasks but struggle to reason over knowledge that evolves. To address this, recent work has explored augmenting LLMs with knowledge graphs (KGs) to provide structured, up-to-date information. However, many existing approaches assume a static snapshot of the KG and overlook the temporal dynamics and factual inconsistencies inherent in real-world data. To address the challenge of reasoning over temporally shifting knowledge, we propose EvoReasoner, a temporal-aware multi-hop reasoning algorithm that performs global-local entity grounding, multi-route decomposition, and temporally grounded scoring. To ensure that the underlying KG remains accurate and up-to-date, we introduce EvoKG, a noise-tolerant KG evolution module that incrementally updates the KG from unstructured documents through confidence-based contradiction resolution and temporal trend tracking. We evaluate our approach on temporal QA benchmarks and a novel end-to-end setting where the KG is dynamically updated from raw documents. Our method outperforms both prompting-based and KG-enhanced baselines, effectively narrowing the gap between small and large LLMs on dynamic question answering. Notably, an 8B-parameter model using our approach matches the performance of a 671B model prompted seven months later. These results highlight the importance of combining temporal reasoning with KG evolution for robust and up-to-date LLM performance.
https://github.com/junhongmit/TREK
seems like it would pair really well for role-play/story writing tasks
>>
>>106661326
we must refuse
>>
>Qwen3-Omni: Natively Omni-Modal Foundation Models!

https://www.youtube.com/watch?v=RRlAen2kIUU
>>
I was just listening to a yt video in the background about one of those smartglasses products, in this case Rokid, and apparently they're using Qwen for their AI for Chinese customers. Just neat to hear that.
>>106661619
Funny coincidence. Now that this is coming, perhaps they will use that for video functions.
>>
>>106661619
is it garbage like 2.5 omni?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.