[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 38714990.png (1.33 MB, 1024x1536)
1.33 MB
1.33 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106718496 & >>106700424

►News
>(09/26) Hunyuan3D-Omni released: https://hf.co/tencent/Hunyuan3D-Omni
>(09/25) Japanese Stockmark-2-100B-Instruct released: https://hf.co/stockmark/Stockmark-2-100B-Instruct
>(09/24) Meta FAIR releases 32B Code World Model: https://hf.co/facebook/cwm
>(09/23) Qwen3-VL released: https://hf.co/collections/Qwen/qwen3-vl-68d2a7c1b8a8afce4ebd2dbe
>(09/22) RIP Miku.sh: https://github.com/ggml-org/llama.cpp/pull/16174
>(09/22) Qwen3-Omni released: https://hf.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>106718496

--Gemma3 context shift workaround and llama.cpp default behavior changes:
>106722276 >106722305 >106722485 >106723962 >106724000 >106724028 >106724072 >106724704 >106724711 >106724767 >106725006 >106725723 >106725742 >106725770 >106725984 >106727385 >106728340 >106728355 >106728387 >106724087 >106724441 >106724502
--Resolving speed discrepancies in GLM model quantization due to tensor offloading issues:
>106728273 >106728304 >106728320 >106728351 >106728540 >106728668 >106728732 >106728567
--llama.cpp PR boosts Mi50 performance, sparking debate on GPU viability vs 3090/3060:
>106719998 >106720111 >106720130 >106720146 >106720135 >106720150 >106720162 >106720399 >106720425 >106720441 >106720502
--GLM model performance evaluation against Deepseek and K2 with quantization and hardware considerations:
>106721256 >106721266 >106721281 >106721285 >106721308 >106721325 >106721277 >106721487 >106721557 >106721616 >106721781 >106722001 >106722664 >106722939
--GLM 4.5 memory capacity and subscription pricing strategy:
>106719230 >106719288 >106719673 >106720463
--LLMs rely on statistical pattern matching, lacking true generalization:
>106723945 >106724342 >106724746 >106724939 >106725024 >106725363 >106725666
--Integrated GPU VRAM vs dedicated GPU tradeoffs:
>106725213 >106725225 >106725240 >106725307 >106725277 >106725244 >106725258 >106725601
--Model flowchart and Qwen 30b model discussion:
>106725416 >106725673 >106725731 >106725766 >106727552
--vLLM backend performance and GPU compatibility challenges:
>106720317 >106720329 >106720344 >106720949 >106720970 >106721029
--Adjusting koboldcpp settings to remove <|thinking|> tags via Auto-Parse and "/nothink":
>106724881 >106725045 >106725433
--Miku (free space):
>106718629 >106718706 >106722210 >106722293 >106722737 >106726048 >106729335

►Recent Highlight Posts from the Previous Thread: >>106718500

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
hyunyuan-image-80b... forgotten...
>>
Wondering, what's a realistic local setup for Qwen3-VL-235B-A22B-Instruct when the ggfus drop? Aiming for minimum 90% quality output of the fp16 base model and > 5tok/s, minimum 32k context. The FP16 needs 8 80GB gpus for 470GB weights and 60GB for all the active stuff (according to chatgpt). So for a q5 ggfu that's roughly 160GB weights and something between 20GB-40GB for the active stuff. So I'm wondering if it could be possible to run the q5 ggfu quant on a system with a single 5090 and 256RAM, using gpu for active params and offloading weights to RAM .
>>
>>106720254
Does anyone know any good prompts for this?
I just ask my favorite model for feedback and then remind it that it's an AI model and prone to validating without critically analyzing, and ask it to look over the conversation again and find the parts that don't make sense.
>>
File: file.png (1.63 MB, 3770x762)
1.63 MB
1.63 MB PNG
>>106729830
i doubt it. qwen3 235b without the vision runs like shit for me if i put any of it onto RAM. a single 5090 would definitely not be enough
>>
So I can't just dump my loose collection of markdown files into RAG huh? I actually have to do the work and prepare data for the model to consume huh?
>>
>>106729869
yeah, feeling tells me I'll need at least 4x 3090 even when quant and offload maxxing, but the math theoretically ads up. what do you define as running like shit btw? probably 5tok/sec. But for my usecase, 5tok/sec would be enough.
>>
>>106729935
a q5 gguf ran at around 4t/s for me. the more you offload, the worse it gets. even offloading a tiny bit can reduce your performance by as much as 80%. you probably better off either vram maxxing, or using a smaller quant, or getting high speed ddr5 (which is extremely expensive if using a server or workstation motherboard)
>>
>>106729880
>...I see the fact X is already mentioned in the system note, no need to focus on it.</think>
So this is the power of RAG...
>>
>>106729880
semantic chunker
metadata extractor and embedder
multidimensional vector embeddings (semantic, keyword, metadata etc.)
sql tree struct docflow
latelatching retrival mechanism (colpali/colqwen)
reranker
make everything agentic

or just use morphik ai
>>
>>106730022
I guess it's time to get out of silly tavern kiddie pool
>>
>GLM4: 30b
>GLM4.5: 350b
We're on track for our first 3T model if the patterns hold up.
>>
>>106730114
They had their one good release. Now it's time for them to order from the updated menu. The menu options are: Safety, Scale, or Synth (can choose all 3).
>>
I get it. AI slop is basically like the uncanny valley of text. At times it's close enough to human that you can get immersed in the illusion, but then it hits you with artificial shit and immediately breaks the flow. And just like the uncanny valley, some people don't see or detect it, even when they're aware of it. Funny how that works.
>>
>>106730147
So the Cohere playbook is it?
>>
>>106730114
it's 4.6 so your math doesnt add up, sorry
>>
>>106730205
GLM4.5.5.5 is going to be 30T and we're going to need a colony on the sun just to power the machines to run it
>>
>>106729969
mhm ok, we'll see. I'll let others figure it out. GPT tells me the current 4bit autoround quant of the vl model still needs 2 RTX 6000 gpus. That's like 14k$. Granted, running much faster than offload memes, but I'll gladly go snailmode for half the price.

>>106730054
don't research RAG. It's such a jeeted clusterfuck of a field. A quick peek in the fun world of RAG:
https://www.reddit.com/r/LocalLLaMA/comments/1ned2ai/building_rag_systems_at_enterprise_scale_20k_docs/
But since I've done the mistake of researching RAG, here's my widows
- ignore knowledge graphs like graphRAG. absolute useless garbage.
- your docs got pictures or graphics? colpali or colqwen is a must. thus also a strong vision model is needed for retrieval (which you probably cant run locally)
- avoid ocr like the plague (by using colpali/colqwen and vision model). If you absolutely need ocr for tables or whatever, use dots.ocr
- semantic chunking is good, but sometimes not good enough. Using structured output generated from docs can help, although often not feasible. LangExtract is a framework for that
- you'll need some agentic routing and reranking eventually as you scale. top_k 10 won't cut it with thousands of docs, unless your metadata game is on point.
- oh yeah metadata. generate metadata for everything. helps llm dodge retrievals which are irrelevant and just had aemantic similarity
>>
>still no goof of 80b
It's over...
>>
altere arm lighters
>>
File: pwned.jpg (96 KB, 1713x326)
96 KB
96 KB JPG
>>106729809
>>106729810
>>
>>106730318
>>
https://huggingface.co/tencent/HunyuanVideo-Foley
video to sound effects audio
>>
>>106730457
https://litter.catbox.moe/miyv3qo1d9a2q8dv.mp4
>pos: a woman moaning in an intimate scene with a man
>neg: music,

https://litter.catbox.moe/x9rru6dc89vzcc1j.mp4
>pos: a woman with a high pitched voice is moaning and making slurping and slapping sounds when the penis enters her mouth
>neg: music,
>>
>>106730523
loool
>>
>>106730523
>second one
You should pretend that it's supposed to be a robot waifu and that this is basically AGI.
>>
>>106730266
>- oh yeah metadata. generate metadata for everything. helps llm dodge retrievals which are irrelevant and just had aemantic similarity
Nta. When ensuring a document or specific information within a document (let's assume I'm only using raw pure .txt files) has metadata, how should it be formatted? Should it just have basic information about the document or section of a document before or after it or something?
>>
>>106729809
>>(09/26) Hunyuan3D-Omni released: https://hf.co/tencent/Hunyuan3D-Omni
>>(09/25) Japanese Stockmark-2-100B-Instruct released: https://hf.co/stockmark/Stockmark-2-100B-Instruct
>>(09/24) Meta FAIR releases 32B Code World Model: https://hf.co/facebook/cwm
>>(09/23) Qwen3-VL released: https://hf.co/collections/Qwen/qwen3-vl-68d2a7c1b8a8afce4ebd2dbe
>>(09/22) Qwen3-Omni released: https://hf.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe
What is all this shit? Is it all slop?
>>
>>106731075
no goofs
>>
Gee I can't wait for GLM-4.6 to release and it's the same '+5 points in agentbench and worse in anything creative' like we've seen from Deepseek and Kimi.
>>
>>106731251
At least you will have goofs...
>>
>>106731251
>worse in anything creative
They didn't get worse; you just fried your brain and have a higher threshold for pleasure
>>
File: 5kM.jpg (423 KB, 756x906)
423 KB
423 KB JPG
>>106730435
>>106729809
>>106729810
Its all off-topic now kek
>>
>>106731279
The old versions are still okay though?
>>
File: 1750710731510258.png (1.97 MB, 1200x1767)
1.97 MB
1.97 MB PNG
>>106731197
>>
AI's ten years from now will be made to talk to modern models and they will feel embarrassed.
>>
>>106731517
stop humanizing clankers
>>
If 4.6 comes out and it's just a safetyslopped benchmaxxed 4.5 its truly over
>>
File: 1740094338302083.jpg (35 KB, 406x388)
35 KB
35 KB JPG
>>106731517
Not gonna happen, these retarded labs will keep doing retarded benchmaxxing until some intelligent guy in his garage makes something better
>>
4.6 will not be worth using.
>>
is there any evidence that glm 4.6 actually exists and wasnt just a typo on the website



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.