[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107838898 & >>107834480

►News
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2
>(01/05) OpenPangu-R-72B-2512 (74B-A15B) released: https://hf.co/FreedomIntelligence/openPangu-R-72B-2512
>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: rec.jpg (181 KB, 1024x1024)
181 KB
181 KB JPG
►Recent Highlights from the Previous Thread: >>107838898

--Paper: Prompt Repetition Improves Non-Reasoning LLMs:
>107841511 >107841558 >107841788
--Papers:
>107840737
--Test-time training and beam search potential in open models:
>107839993 >107840870 >107840929 >107840944 >107840948 >107841079 >107841107 >107841130 >107841188 >107841193 >107843956 >107844098 >107844297 >107844760
--Adapting Microsoft TinyTroupe for local multiagent simulation with koboldcpp:
>107840877 >107840941 >107841028 >107841046 >107841313 >107842113 >107843658 >107843909 >107844229
--Context caching and efficiency in SillyTavern/LLM interactions:
>107841026 >107841049 >107841057 >107841086 >107841105 >107841142
--AI character interface development with animation control features:
>107841569 >107841591 >107841609 >107841593 >107841614 >107841636 >107841685 >107841771 >107841794 >107844857 >107841645 >107841648 >107841651 >107841655 >107841751 >107841760 >107841789 >107841844 >107841925 >107842016 >107843335 >107843377
--Cost and hardware considerations for multi-3090 AI rig construction:
>107840180 >107840249 >107840309 >107840596 >107840633 >107840640
--RAG explained as document chunking and embedding for context augmentation:
>107841899 >107841939 >107842005 >107842027 >107844296 >107844327 >107844468 >107842015 >107842046 >107842099 >107842082
--AI flaws vs emotional simulation and 3D model tech discussion:
>107842172 >107843286 >107843328 >107843393 >107843528 >107843592 >107845182 >107845226 >107845255 >107846236 >107843907 >107844059 >107844099 >107844179 >107844262
--llama.cpp memory split regression issue after update:
>107840161 >107840177
--ik_llama.cpp PR adds customizable string/regex token banning:
>107843501
--Miku (free space):
>107840633 >107840665 >107842172 >107843286 >107843393 >107843911 >107845663 >107845698 >107846236 >107844824

►Recent Highlight Posts from the Previous Thread: >>107838903

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107847336
>>107847336
>>107847336
>>
>>107847320
I would drill kasane's tetos.
>>
>>107847349
>Am I retarded?
Probably.
>where the fuck do you find the mmproj for mistral small 3.2 2501?
What stops you from making it yourself? Is it not supported?
>>
>>107847379
It's not multimodal.
>>
>>107847396
Yeah. I was just checking. He is retarded, then.
>>
>>107847349
>Am I retarded?
Yes. Go to bartowski's 3.2 page, click the files tab and ctrl+f mmproj.
>>
File: broken-tutu.png (9 KB, 632x73)
9 KB
9 KB PNG
>>107847379
>>107847396
>>107847409
Either Broken-Tutu isn't actually using 2501 as it says or this description is pure LLM hallucination.

Either ways fuck ReadyArt.
>>
>>107847425
Vision was added in 3.1+3.2. '2501' refers to 3.0 which does not have vision.
>>
Reminder that you shouldn't use abliterations. Just don't be lazy and properly prompt with the BASED models.
>>
>>107847320
fateto
>>
>>107847425
>merge
You should have started there, retard. Next time link to the model.
>>
>>107847320
Just got a used 3090 with 24GB VRAM
Any proper in depth guide to get LLM setup with image + sound generation?
I prefer to use deepseek if possible,
And this guide is shit, how does sillytavern communicates with koboltcpp? Is there configuration needed?
>ooba/koboldcpp as your backend
>sillytavern as your frontend
>go to huggingface and download nemo 12b instruct gguf. Start with Q4.
>load into ooba/kobold
>in sillytavern, select Mistral v3 tekken context >template and instruct template
>Temp 0.8
>MinP 0.02
>Rep Pen 1.2
>>
>>107847458
>Just got a used 3090 with 24GB VRAM
Cool.
>Any proper in depth guide to get LLM setup with image + sound generation?
SillyTavern has options for both I think, but I don't use it. Just click on buttons until something happens.
>I prefer to use deepseek if possible,
kek
>And this guide is shit
>Is there configuration needed?
Yes. It needs to know where to connect to. Just use kobold's built-in webui until you know what you're doing to see if you even like these things.
>>
>>107847425
Ok so, Broken-Tutu is actually on 2506 instead of the 2501 listed.
>>
>2025
>Japanese LLMs still suck
>>
>>107847503
Mistakes in the model card always bodes well for the quality of the model
>>
>>107847536
Most japanese consumers are still using core 2 duo-era hardware, there's zero incentive for them to release models.
>>
>>107847552
So far it's actually doing pretty good.
>>
>>107847605
Load up regular 3.2 to cure the placebo effect
>>
Lets compare other UIs you've tried, unless your all boomers stuck in your ways.

https://github.com/kwaroran/RisuAI
Risu is okay. I tried it cos it supported the charx format had multiple expression packs and auto replaced expression.png with the correct image in the pack. Nicer UI but less customization options. It lost my message on refresh tho, silly would never do that.

https://github.com/vegu-ai/talemate
Choose your own adventure style, uses agent style step by step actions, at 15 tk/s felt like ages to get to my turn. It has a mini auto generated memory didn't use it long enough enough to make use of it. Wasn't a big fan of of the style personally.
>>
>>107847698
>at 15 tk/s felt like ages to get to my turn
That's the problem with agents. Anyone serious enough about llms already has a multigpu rig with shit t/s and will absolutely refuse to use small models. Those who aren't serious wouldn't bother with agents anyway
>>
We need significantly better hardware to do agentic tard wrangling with the current models, or better models that don't require tard wrangling. Both options are years away. It's a very depressing hobby
>>
>>107847458
you need 10 3090s in a single machine if you want to run a Q2 of deepseek.
>>
>>
>>107847458
Downlaod ollamma
ollama run deepseek-r1
>>
Still GLMSEX
Still Nemo
>>
>>107847978
sex with russian alcoholic miku



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.