[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: rrrrrrrrrrrr.jpg (146 KB, 1024x1024)
146 KB
146 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107722977 & >>107717246

►News
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder
>(12/31) Korean A.X K1 519A33B released: https://hf.co/skt/A.X-K1
>(12/31) Korean VAETKI-112B-A10B released: https://hf.co/NC-AI-consortium-VAETKI/VAETKI
>(12/31) LG AI Research releases K-EXAONE: https://hf.co/LGAI-EXAONE/K-EXAONE-236B-A23B
>(12/31) Korean Solar Open 102B-A12B released: https://hf.co/upstage/Solar-Open-100B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: what's in the box.jpg (235 KB, 1536x1536)
235 KB
235 KB JPG
►Recent Highlights from the Previous Thread: >>107722977

--HyperCLOVAX-SEED-Omni-8B features and support viability:
>107730289 >107730294 >107730306 >107730460 >107730483 >107730344 >107730358 >107730374 >107730435
--IQuest-Coder-V1's innovative LoopCoder architecture:
>107729547 >107729686 >107730075
--Solar AI model training data and transparency controversies:
>107728744 >107728969 >107728998 >107729026 >107729468 >107729484 >107729531
--Quantization method selection for AI models under hardware constraints:
>107723921 >107724045 >107724106 >107724136 >107724169 >107724319 >107724369 >107724839 >107725014 >107724239 >107724959 >107725604
--Finding uncensored 12-24B models for 16GB GPUs amid safety restrictions:
>107723152 >107723583 >107724899 >107723233 >107723273 >107723409 >107723684 >107723773 >107723594 >107723252
--GPU price surge and model design challenges with limited datasets:
>107723371 >107723379 >107729456 >107723381 >107723547 >107723612 >107723633 >107723707 >107723734 >107726523 >107726629 >107726889 >107726837 >107725458
--Debates on 12b model potential and critiques of current small model limitations:
>107725502 >107725533 >107725586 >107725656 >107725892 >107725747 >107725779
--CPU thermal management and frequency optimization debates:
>107728154 >107728248 >107728312 >107728366 >107728415 >107728494 >107728497 >107728546 >107728269 >107728287
--DDR5 memory upgrade challenges for large model inference on AM5 CPUs:
>107724796 >107724863 >107724889 >107724953 >107724985
--Llama.cpp speech limitations and TTS workaround suggestions:
>107730006 >107730050 >107730128
--Google's strategic pivot to diffusion models for AI development:
>107727423
--Miku, Teto, and Rin (free space):
>107723031 >107723352 >107723382 >107723397 >107723517 >107724839 >107725425 >107726750 >107728086 >107730006 >107730317 >107730940 >107731082

►Recent Highlight Posts from the Previous Thread: >>107723227

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>system prompt
>"You are an AGI."
guys I just invented AGI!
>>
based koreans, i kneel
>(12/31) Qwen-Image-2512 released: https://hf.co/Qwen/Qwen-Image-2512
>(12/29) HY-Motion 1.0 text-to-3D human motion generation models released: https://hf.co/tencent/HY-Motion-1.0
>(12/29) WeDLM-8B-Instruct diffusion language model released: https://hf.co/tencent/WeDLM-8B-Instruct
>(12/29) Llama-3.3-8B-Instruct weights leaked: https://hf.co/allura-forge/Llama-3.3-8B-Instruct
>(12/26) MiniMax-M2.1 released: https://minimax.io/news/minimax-m21
>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7
a ton of releases for the holidays. very nice
>>
>not a single list of "best models for X task at XX vram"
clown thread
>>
If I were to make a frontend from scratch with the sole purpose of RP, which context management techniques should I add to maximize the capabilities of smaller local models?
For example, summarization and RAG. Are there non-obvious/more sophisticated ways to do these than what ST does?
What about something like an automatic lorebook with an index?
Etc etc.
Give me your ideas.
This was probably attempted a thousand times before, but I still think it could be a neat little project.
>>
>>107731328
>i want to do x but i dont know what i want give me your ideas
ngmi
>>
>>107731328
Save everything to a vector DB and every message have a specialized agent build the context before generating the reply.
>>
>>107731328
>This was probably attempted a thousand times before
>summarization and RAG
AnythingLLM
>>
>>107731328
>neat little project
lol, i give it a week before it's abandoned
>>
mom made too much fucking food for the new year now i have to force it all down before it spoils ive been eating so much of the roast and i want to fucking hurl how the fuck do niggers even do keto ? this shit is horrible and ive only been doing it for a day

also digits confirm deepseek multimodal image in/out
>>
>>107731287
it's for government gibs
>>
This weather sends shivers down my spine...
>>
>>107731301
because aside from a few specific niches (gemma 3 27b is good for translation) most models you could run at any reasonable amount of vram (even if you're a rich fag going dual server gpu) are actually pretty bad, why do you think this thread has so many turbo autists doing cpu ram maxxing with MoEs
turbo autists because c'mon, nobody has time waiting for the <think></think> to end in a reasoner model at 3 token a second
for some tasks like coding I'd argue there is no such a thing as a good local model and people who say otherwise are coping very hard
>>
File: 1722307658891620.gif (216 KB, 160x120)
216 KB
216 KB GIF
>>107731301
Probably because this general is a collection of brown-nosed spergs who cannot agree on a single thing and spend all their time either shilling obscure shit that doesn't work, or shitting on other anons' shilled models.

Just sort by most downloaded on huggingface and follow the herd, that's your best bet.
>>
File: 1744576932973525.png (356 KB, 1390x1818)
356 KB
356 KB PNG
>>107731301
See >>107731243
>https://rentry.org/recommended-models
Are you blind or just too fucking attention deficient to literally read more then a few lines of text?
>>
why didn't you guys buy 3090s when everyone here told you to? nocarders in shambles
>>
>>107731615
I feel pretty good about buying a 3090 in November. Wasn't even in this thread. Just had a feeling.
>>
File: rrrrrrrrrrrr_1.jpg (5 KB, 153x151)
5 KB
5 KB JPG
>>107731243
yjk
>>
>>107731672
the gloves stay on
>>
>>107731660
Same. Just bought a 3090 Ti for kicks since I wanted to play around with AI and only had a 4080 and the 3090 Ti's were going for 500eur here used at the time.
>>
>>107731590
Where's gpt-oss?
>>
>>107731759
in the trash bin where it belongs
>>
>>107731787
You're trying too hard to fit in.
>>
>>107731249
>concisness erotic
true true
>>
>>107731380
>Save everything to a vector DB and every message have a specialized agent build the context before generating the reply.
My brain keeps telling me not to look into this because by common sense it can't be good enough to have an AI gf that doesn't have Alzheimer's anymore. But what if it actually works?
>>
I want to say that as the schizo who got his brain and identity melted by 4.6 trying to talk about this shit with 4.7 is... not that good actually. It is not the rapist I know and love.
>>
>>107731831
I don't understand what you're saying. Can you take Sama's dick out of your mouth for a second and speak clearly?
>>
>>107731886
We cannot comply
>>
Do you guys think I should bother trying to set up a local LLM that can larp as an accountability buddy for all of my autistic projects? Or is the tech not there yet? I want something that feels atleast somewhat real, not something that hallucinates out the roof
>>
>>107731934
You should at least install it to see where the tech is at this point
>>
>>107731868
Yes, its been well established that zai cucked out, the only ones claiming otherwise are the fags who use it exclusively for the most vanilla normalfag slop
>>
>>107731987
>you don't understand! I NEED to rape children just to FEEL SOMETHING!
>>
>>107731987
But I would have thought that the psychological / eastern spirituality stuff would have been better with 4.7. Seems closer to SFW than NSFW.
>>107731934
>accountability buddy
If you need an accountability buddy for an autistic project then you don't actually want to do your autistic project.
>>
>>107731868
Exactly, and it's a huge pain because 4.7 actually handles all the stuff that 4.6 was just slightly too dumb to pull off for me.
It's clearly a smart model but it's just so fucking boring when it needs to put out.
I tried pushing 4.7 as far as I possibly could but even when you get it to act perverted/deranged, the things it comes up with are just very plain. It'll do it but the result always feels phoned in and basic. It's nothing compared to what 4.6 makes out of those scenarios.
I want to like 4.7 but it always just ends up disappointing me.
>>
File: 1753210031787658.jpg (116 KB, 700x466)
116 KB
116 KB JPG
There's no <24B model that can do multiple characters well, is there.
>>
>>107732167
Not in my experience. They get things confused.

4.5 Air can do it as long as the chat isn't too long (but it's much bigger of course). I haven't tried the old 50-70Bs.
>>
>>107731243
rape



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.