[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: media_G7ulKxsaQAAxOKT.jpg (318 KB, 1414x2000)
318 KB
318 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107834480 & >>107826643

►News
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2
>(01/05) OpenPangu-R-72B-2512 (74B-A15B) released: https://hf.co/FreedomIntelligence/openPangu-R-72B-2512
>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>107834480

--Quantization performance discrepancies in Gemma-3-27B GGUF models:
>107837128 >107837136 >107837144 >107837149 >107837148 >107837164 >107837180 >107837192 >107837306 >107837197
--Local AI models for image feedback and multimodal tasks:
>107835848 >107835882 >107835900 >107835915 >107835941 >107835980 >107835996 >107836032 >107836089 >107836175 >107836347
--Running LLMs on low VRAM hardware with quantization and CPU offloading:
>107837436 >107837473 >107837514 >107837539 >107837558 >107837573 >107837603 >107837609 >107837633 >107837639
--Self-taught AI learner's motivation vs. math complexity challenges:
>107835318 >107835331 >107835431 >107835463 >107835488 >107835627 >107835644 >107835375 >107835403 >107835383
--PowerShell cmd confusion with Gemma model response critique:
>107835679 >107835772 >107835785 >107835832
--Ethical and practical concerns about AI-generated PR descriptions on llama.cpp's GitHub:
>107837074 >107837130
--Struggles with model size limitations and RAM requirements for large vision models:
>107836240 >107836244 >107836537 >107836554 >107836556 >107836593 >107836268 >107836272 >107836286
--Extensive banned token list with model-specific customizations:
>107835736 >107835765
--Meta's controversial nuclear energy investment for AI criticized as misguided:
>107835873
--Critique of complex ST webui implementation:
>107834750
--Optimizing documentation vectorization through token-efficient formatting:
>107835121 >107835923
--File numbering organization debates in documentation directories:
>107834742 >107834901 >107835077
--CPU-optimized GPT-SoVITS via ONNX inference engine:
>107836452
--Teto (free space):
>107835833

►Recent Highlight Posts from the Previous Thread: >>107834483

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107838898
The box has clearly just been opened, why is her hair outside the box?
>>
Is the Pangu age finally upon us?
>>
Sup qt's
>>
>>107838951
Yes, 鬼佬 maximum white man benchmark score best model revolutionary intelligence new LLM AI for men, women, children, pets
>>
Do you think they will start up production of additional ram or do you think cloud models will be the only way to go in the future?
>>
>>107839186
No, everyone expects the bubble to pop
>>
File: 1539701490464.jpg (176 KB, 1022x688)
176 KB
176 KB JPG
Why are all templates greyed out in ST when using chat completion? And why are chat and text completion two differrent things?
>>
why do a lot of you favor GLM 4.6 and not 4.7? I was running 4.6 for awhile but I've been liking 4.7, it feels different. just curious
>>
>>107839204
chat completion applies model default templates on the backend
>>
>>107839208
>why do a lot of you favor GLM 4.6 and not 4.7?
Only NAI shills do.
>>
File: 1760515040413703.png (458 KB, 800x800)
458 KB
458 KB PNG
>>107838898
>open box
>I accidentally Miku
>>
>>107839196
How bad will things get if the bubble simply doesn't pop?
>>
>>107839363
When rich people start becoming bothered by it
>>
>>107839186
>>107839363
If you're not poor it's actually extremely funny to watch on the sidelines. I hope it won't pop.
>>
File: 1754908814745187.gif (815 KB, 498x275)
815 KB
815 KB GIF
>>107839196
>he thinks RAM prices are going to return to normal after the bubble pops
>>
OP's pic reminded me of when I had to put my cat that died in a shoebox to take him in for cremation and now i'm sad
>>
>>107839383
retard
>>
>>107839208
I like both for different reasons. 4.7 is smarter and tends to stick to the prompt more closely. Often to an autistic degree which sometimes leads to issues with cards that aren't tightly written. 4.6 feels more flexible and creative out of the box.
But I've had a lot of success with 4.7 here as well after adjusting my older cards to cover for that.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.