[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: hat.jpg (913 KB, 2048x2048)
913 KB
913 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Christmas Edition

Previous threads: >>107652767 & >>107643997

►News
>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>107652767

--Reasoning step control tradeoffs and multi-GPU setup fixes in SillyTavern:
>107654025 >107654033 >107654054 >107654563 >107654882 >107655765 >107655833 >107655903 >107656043 >107656116 >107656253 >107656486 >107656988 >107657096 >107657168 >107657180 >107657297 >107657689 >107657823 >107657906 >107658051 >107658061 >107658104 >107658169 >107657498 >107657307 >107657350 >107657351 >107657477 >107657573 >107657627 >107657639 >107657404 >107657294 >107657176 >107657194
--Performance comparison between ik_llama and exllamav3 in VRAM-bound scenarios:
>107656297 >107656349 >107656555 >107656715 >107656838 >107657115
--Resolving GGUF conversion errors with outdated dependencies:
>107659075 >107659099 >107659110 >107659130 >107659134 >107659157 >107659165 >107659117 >107659129
--Cost and performance considerations for Mac-based AI clusters vs traditional GPU setups:
>107657777 >107657794 >107657813 >107657828 >107657854 >107657870 >107657937 >107657853 >107657876 >107657816
--MoE model parameter vs expert count performance analysis:
>107652819 >107652836 >107652840 >107654372
--ARC-AGI 2 achievement and its implications for future LLM advancements:
>107653556 >107653757 >107653789
--Benchmarking GLM-4.7 models with livebench and GGUF format:
>107656875 >107657040 >107657121 >107658101
--GLM 4.7 model performance and quantization calibration controversies:
>107656256 >107656302 >107656312 >107656327 >107656401 >107656577
--llamafile project update from Mozilla.ai:
>107658257
--Post-training resource demands for advanced AI models:
>107653833
--Critique of dense models and praise for alternatives like qwen3:
>107655084
--Logs: GLM-4.7:
>107658013 >107658080
--Miku (free space):
>107652814 >107652980 >107652999 >107653495 >107654563 >107656486 >107656586 >107657689 >107658850 >107659977

►Recent Highlight Posts from the Previous Thread: >>107652827

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
I still can't turn off thinking for GLM 4.7
>>
File: 1749875201211488.png (572 KB, 1080x1259)
572 KB
572 KB PNG
>>107660184
Works for me
>>
>>107660184
<|assistant|>
</think>
>>
>24gb
>700eur
>460gb/s bandwidth
really intel?
>24gb under 500 dollars
>it will be 500$
>where's my 600$
>HE STOLE MY 700EUROS
https://videocardz.com/newz/sparkle-says-its-arc-pro-b60-gpus-are-now-available
https://videocardz.com/newz/intel-arc-pro-b60-24gb-workstation-gpu-to-launch-in-europe-mid-to-late-november-starting-at-e769
who is this card for?
>770euros
>>
>>107660197
Wtf? I didn't post that pic
>>
>>107660198
still thinks
>>
>>107660199
>https://youtu.be/0qS6HmiRNzE
>llama 70b
>5t/s
>that gpu utilization
its OVER
>>
File: file.png (698 KB, 1920x1075)
698 KB
698 KB PNG
>>107660199
>23t/s with qwen3 30b a3b
>23t/s
>on empty context
>>
>>107660199
Enterprise™
>>
What do you mean local. Is everyone here a billionaire? How are you fuckers affording anything?
>>
>>107660238
don't sound right. the software must be horrible.
>>
>>107660248
Most of us have gainful employment. Shocking, I know.
>>
>>107660254
Considering most people don't earn more than 100k a year, I still don't see it.
>>
>>107660248
No but I have a job
>>
File: 1000006812.png (377 KB, 593x602)
377 KB
377 KB PNG
I downloaded locallm and a gpt oss 20b uncensored model and now I'm drawing a blank on what to try. What can I do with local models besides cooming and coding?
>>
>>107660302
wife agent
>>
>>107660297
most people ITT bought their ram maxxed hardware for deepseek/kimi/glm before the ram price surge
some anons run deepseek on hardware that cost like 1000-1500$
>>
>>107660248
I'm not American, so I can splurge a little.
>>
>>107660302
Vibe code a revolutionary app.
>>
>>107660248
If you don't mind low speeds and sloppy slop you can run quantized models on most PC hardware
>>
File: 1760999571161174.jpg (38 KB, 460x490)
38 KB
38 KB JPG
>>107660302
>coding
>toss 20B
You can remove that part
>>
File: file.png (195 KB, 655x984)
195 KB
195 KB PNG
so this is the power of llama1 7b...
>>
GLM4.7 feels like the K2-0905 to 4.6's K2-0726 or the 4.5 Opus to 4.6's 4.1 Opus. Everything really is going down the shitter.
>>
File: 1750555438514899.jpg (1.29 MB, 1764x875)
1.29 MB
1.29 MB JPG
>>107660248
Trillionaire actually
>>
>>107660307
Yup, my Rome DDR4 build from like 2 years ago runs DS reasonably and was like $1500 at the time (not counting the 3090s I already had).
>>
>>107660307
That's the only benefit of being a /lmg/ resident. Buying hardware before their price surge.
>>
>>107660248
>How are you fuckers affording anything?
No poors allowed
>>
>>107660462
Is middle class at least tolerated?
>>
>>107660468
As long as you don't get too far in debt
>>
>>107660248
We dont. Most are coping with a small model
>>
>>107660248
3.5T/s for 4.7 with just a high end gayming desktop.
>>
>>107660248
it only costs 300k for a full h200 server
>>
>>107660386
That's precisely how I feel about it. Can't they make more "calm" models?
>>
>>107660197
How do they already have a datapoint for 2100?
>>
>>107660491
>only
>>
>>107660491
For that price you could rent that same h200 rack on vast.ai for two years straight lmao
>>
>>107660722
and after 2 years you would have nothing
>>
>>107660729
Your h200 servers would have been obsolete in 2 years anyways
>>
>>107660745
they wouldn't be obsolete even under the most rushed deprecation schedule possible, but /g/tards love pretending things are obsolete
>>
>>107660763
If they won't be obsolete why is Nvidia selling GPUs with buyback clause?
>>
>>107660796
I don't know since I'm not nvidia's sales department
technical obsolesce has a very strenuous relation with sale conditions
>>
>Thump-thump. Thump-thump
>>
>switch to linux
>2x as fast
>processing takes 1/10th the time
wtf is wrong with windows???
>>
>>107661082
jeets
>>
Fuck me, with the power of linux I can actually run a 24B model at Q3 now on my poorfag rig. So far, cydonia is way smarter than anything I've used before but feels really sloppy. Any recs? Magidonia?
>>
>>107660202
well now we know what you were previously planning to post on pol, chuddie
i definitely had this happen when i was using kuroba ex though i think, it remembers if you uploaded a pic previously which will only ever fuck you over
>>107661082
Same but i won't pretend i didn't wish i could run this and have text appear like i'm on a 30b a3b moe. Especially when this little fucker decides to spend 10000 tokens on thought
>>
>>107661127
dont use drummerslop models.
>>
>>>/v/729277223
>NovelAI's whole thing is being unfiltered.
>Now they offer GLM 4.6 with 32k context, which is pretty good considering that you get unlimited use.
>I think it is a good service and very user friendly.
Yeah, I think I'm sticking with NAI. Z.ai ruined 4.7 with their safety training.
>>
>>107661285
4.7 seems as horny as ever. It's just a worse model because it's one of those modern releases that have zero sense for pacing and the ADHD "but wait, self-correction:" style of thinking that's made a horrible return in the past few months.
3.2-Speciale remains the best of the modern bunch because it at least writes well but 4.6 will have to do until that one guy working to implement it is done learning how to vibecode
>>
>FirePaintedCydonia
It's slop time
>>
>>107661127
>Q3
>24B
you can't be serious
>>
>>107661127
try the base model (mistral small)
>>
File: 1763657832885601.jpg (353 KB, 1024x1440)
353 KB
353 KB JPG
>>107660171
>>
>>107660171
is trooncinante still the top-tier 12b model? i got so used to its isms i already know what it's going to generate before it does
>>
>>107661391
IQ3_M is surprisingly usable at 24B. Wouldn't use it on anything less though.
>>
Now that llama.cpp server has model routing support, how to enable prompt caching between model reloads? I want to unload a huge model with slow prompt processing, then reload it again and not have to process the entire prompt.
Cuda dev?
>>
>>107661432
>trooncinante
looool
>>
>>107661432
Neona is better but all 12B models have brain damage (nemo was dumb to begin with). Be like me and run a 24B at a cope quant.
>>
>>107661573
You can configure options for individual models. Do I need to repeat RTFM?
>>
>>107661427
It contains a jar of urine.
>>
>>107660248
>xhe didn't buy hardware when it was cheap
t.neet
>>
>have to put 2 random Chinese runes GLM 4.7 generates sometimes into google translate to get what it was trying to say
Umm, I wasn't having that issue in the main output before. Am I supposed to learn Mandarin to coom efficiently with these new models?
>>
>>107660171
merry christmas you insufferable faggots
>>
>>107661694
Learn to read. My question isn't about individual models. It's about router behavior. Model cache is lost when model is unloaded. I want to preserve cache in system ram until model loaded again.
>>
>>107661968
Save the cache to a file.
>>
File: 1757469874450654.jpg (52 KB, 940x1024)
52 KB
52 KB JPG
>>107660184
>>107660206
>>107660198
Nta. Had this same problem last night but with gpt-oss-20b



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.