[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: IMG_9037.png (2.14 MB, 1024x1536)
2.14 MB
2.14 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107660171 & >>107652767

►News
>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: xmas migu.jpg (828 KB, 1408x752)
828 KB
828 KB JPG
►Recent Highlights from the Previous Thread: >>107660171

--Hardware upgrades for running quantized AI models and multi-GPU configuration:
>107666345 >107666402 >107666408 >107666433 >107666445 >107666462 >107666463
--Crafting effective system prompts for specific LLM behavior:
>107664962 >107665087 >107665580 >107666887 >107666904 >107667023 >107667116 >107667119 >107667145 >107667173 >107667303 >107665026 >107665077 >107665090 >107665101 >107665167
--Proposing MoE-LLM architecture with LRU expert cache for local deployment:
>107666054 >107666073 >107666262 >107666288 >107666325
--Challenges in implementing persistent prompt caching for model routers:
>107661573 >107661968 >107662025 >107662156 >107662236 >107662311 >107662450 >107662575 >107662599 >107662681 >107662725 >107662782 >107663650 >107663642
--Debating GLM 4.7's censorship:
>107664203 >107664223 >107664270 >107664467 >107664687 >107664738 >107664766 >107665148 >107665188 >107665225 >107665325 >107665461 >107665807 >107665943 >107666016 >107666035 >107665706
--Challenges with disabling reasoning in AI models and potential workarounds:
>107660184 >107660198 >107662102 >107662167 >107663018 >107663033 >107663057 >107663102 >107663672 >107663721 >107666949 >107668010 >107667675
--Intel Arc Pro B60 GPU pricing and performance debates:
>107660199 >107660226 >107662460 >107662477 >107662497 >107662506 >107662572 >107662640 >107662650 >107662651 >107665998
--Repurposing NAS with S3 storage as HuggingFace API-compatible model source:
>107667186 >107667290
--Model limitations in generating complex vector graphics:
>107664405 >107664417 >107664438 >107664455 >107664461 >107664472 >107665953
--Translating NSFW ASMR/JAV audio with specialized Whisper models:
>107664657 >107664716 >107664792
--Miku (free space):
>107661427 >107663987 >107664266 >107667371 >107668010 >107668457 >107664599

►Recent Highlight Posts from the Previous Thread: >>107660173

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107668478
v4 when
>>
>>107668489
Assuming DeepSeek operates on Pacific Standard Time, there's still a chance for V4 within the next 6 and a half hours.
>>
There's still nothing better than mythomax for rp, is there?
>>
>>107668489
We don't even have 3.2 support in llama.cpp yet.
>>
>>107668527
new drummer just dropped: https://huggingface.co/QuixiAI/Ina-v11.1
>>
>>107668527
Miqu.
Kimi.
>>
>>107668532
we don't even have deepseek_v32 support in mf'ing _transformers_ yet for god sake
>>
>>107668532
vllm doesn't have this issue :^)
>>
>>107668547
Let me know when someone makes a 2 bit quant for vllm.
>>
Is 3.2 good for sex?
>>
GLM Air or Gemma better for safe and not safe rp?
>>
>>107668555
Be the change you want to see
>>
>>107668575
Yes... Gemma good 100%
>>
>>107668532
v4 would be big news and they would have to make adding support a priority
>>
>>107668527
have you tried Mixtral-8x7B ?
it's probably the best model i had when i had only 16gb vram
>>
>>107668575
air is miles better for both.
>>
Kimi thinking IQ4-XS or Q4_0 or Q4_K_M?
>>
>>107668737
IQ4-XS. imatrix quants are generally better than normal quants at the same bit count. Don't download an unsloth quant.Those are always broken.
>>
>>107668478
I just got A/B tested on Aistudio (while using G3 Pro). Maybe it's Gemma 4? What's weird is that one respond used internet resources, while the other was the raw model (I did not activate web searches). The new model was much better: It was an actual written respond and not a list of lists of bulleted 6-word sentences.
>>
>>107665998
EP no, marlin kind of, they've done a lot of kernel support for shit like fp8 but you won't just load fp8 off huggingface, if you can quant the model you'll likely get anything to run, I got devstral2 123b working recently but I had to quant it to W4A16.

if you have the system resources you can dynamically at load time turn fp16 into fp8 or mxfp4

some things are still cuda identified/cuda only in VLLM like cpu offload gb and some of the dtypes that intel actually supports, I'm not confident they'll get there in the end but it's fun to tinker with.

I got cucked by Qwen 3.2 thinking I believe but didn't try too hard to see if it was workable.

It's in good shape, it's not rocm but it's no cuda and probably never will be.
>>
For the easiest out of the box setup, can I just get a consumer ryzen, 128gb of ddr5 ram, and a dual 9070xt for a total of 32gb of VRAM?
>>
File: kaix.png (89 KB, 789x429)
89 KB
89 KB PNG
We'll get there howevermany more thousand rolls it takes
Perfection is possible
The models shall delight in every interaction and hardware will be irrelevant
Robowaifus will be truly viable for those choosing that path
I believe
>>
>>107668940
btw I don't want to be near a stinky ass irl it's merely an interesting test-case
honest
>>
File: 1762052533318086.gif (140 KB, 379x440)
140 KB
140 KB GIF
>>107668940
>>
File: 1751138313342076.jpg (294 KB, 784x1168)
294 KB
294 KB JPG
>>
File: file.png (2 KB, 78x78)
2 KB
2 KB PNG
>>107668940
>oily thicket
>>
>>107668940
Impressive.
Not excuse me while I puke.
>>
>>107668940
anon that is disgusting worse the cockbortion ngl
>>
File: 1746880839747321.jpg (55 KB, 600x445)
55 KB
55 KB JPG
>>107668940
>>
>>107669006
im sickened but curious
>>
>>107668940
>>
i am convinced nobody actually has any system prompt that makes GLM 4.5 air actually be not cucked
>>
>>107668940
I remind you that our Lord Jesus Christ was born on this day.
>>
>>107668907
probably not the best use of your money, but that would be a decent build. quad 16gb 9060xts or 16gb 5060tis might be better.
>>
>>107669034
>system prompt
The system prompt is the character card.
I just used a thinking prefill.
>>
>>107668940
nvidia made fabs... for this?
>>
>>107668955
It's pretty funny. Nice alternative to... you know what... Just say the word!
>>
>>107668995
stirred something inside you didn't it?
>>
>>107669047
It's the perfect task for an text generative AI. It's not like a human would ever be so despite as to write that for others.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.