/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103286673 & >>103278810►News>(11/22) LTX-Video: Real-time video generation on a single 4090: https://github.com/Lightricks/LTX-Video>(11/21) Tülu3: Instruct finetunes on top of Llama 3.1 base: https://hf.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5>(11/20) LLaMA-Mesh weights released: https://hf.co/Zhengyi/LLaMA-Mesh>(11/18) Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large>(11/12) Qwen2.5-Coder series released: https://qwenlm.github.io/blog/qwen2.5-coder-family►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/tldrhowtoquant►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/hsiehjackson/RULERJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>103286673--Paper: Simplifying CLIP: Unleashing the Power of Large-Scale Models on Consumer-level Computers:>103297845 >103297860 >103297890--Papers:>103296533 >103297733 >103297788 >103297918--Testing and comparing LLMs, abliteration, and imatrix:>103293087 >103293117 >103293133 >103293276 >103293314 >103293388 >103294589 >103294729 >103294759 >103294852--Anon seeks to replace Claude 2 with a local model:>103297270 >103297321 >103297441 >103297490 >103297520 >103297615--Anon releases unofficial SMT implementation with PEFT version:>103296930 >103297151 >103297268--Merging safetensors files and quantization discussion:>103288602 >103288654 >103291928 >103292005 >103292180--LTX-Video model discussion and testing:>103288336 >103288358 >103293709 >103293808 >103293832 >103293833 >103293979 >103294017 >103294054 >103294101--Best practices for creating character definitions in koboldAI:>103296813 >103297047 >103297081 >103297216--Anons share non-coom uses for the model, including art and programming:>103286774 >103286788 >103288316 >103286822 >103286831 >103286978 >103286998 >103287002 >103287586--Card formatting debate and character writing discussion:>103295022 >103295128 >103295138 >103295179 >103295271 >103295277 >103295250 >103295290 >103295338 >103295472--Kernel update has no effect on CPU inference performance:>103287570--Athene-V2-Chat-72B open model performance and implications:>103293224 >103293513 >103294469 >103294670--Anon shares comic-translate app for automatic comic translations:>103290835--Anon shares AmoralQA-v2 dataset and discusses its usage in models:>103287899 >103288215 >103288314--Miku (free space):>103286754 >103287503 >103289721 >103290110 >103292155 >103292194 >103292482 >103292570 >103294256 >103294336 >103295577 >103296695►Recent Highlight Posts from the Previous Thread: >>103286678Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
>>103298523nigga what
>>103298447Careful, you're gonna set off the MagnumV4 72B schizo. He won't countenance any praise for a larger model.
So with the new behemoth what the fuck am I supposed to do with the instruct format? I didn't quite understand the instructions with the prompt format drummer left.
Zzzz
:-)
ahhhh
:o
>>103298520>>103298523>>103298712>>103298713>>103298717>>103298723Large language models?
>>103298738You're in the wrong hood
Which LLM is good if I want a buddy for programming? Which LLM is good if I needed someone to reference and improve my report writing? I tend to write a lot of things in the passive voice which is annoying.
>>103298742these losers just masturbate.
>>103298770Anon, none of these questions are genuine.
>>103298742The last qwen-32B
Meow! :3
>>103298742Qwen2.5 Coder 32b instruct>24gb VRAM>Q4_K_L>16000 context lengthIts surprisingly really good. Not as good as the SOTA models, but better than literally anything else for local. Starts to derp out a bit at longer contexts like all other models. I just use it in ST for now but plan on hooking it up to aider whenever I get ollama running.
>>103299050>>103299126For both programming and writing feedback? Ideally for me it'll be trained enough and I can isolate the information on my machine.
>>103299126>aider whenever I get ollama running.the cringe... it hurts...
Is llama.cpp still being developed? Anyone know what to use to run models that aren't hopelessly out of date?
>>103299238which fucking repo are you on goddamn
>>103299176Brother I know. I primarily use ooba and have been resisting ollama. Unfortunately aider doesn't work well with ooba.
>>103299167Just coding.
>>103299238No, it's over.
>>103298447No, it's largely the same model. The meta is still Qwen2.5. Especially if you have to run Large at 2bits.
>>103299393Fuck :(
>>103299441After having a great talk 16k token interactive learning session with the new largestral, I couldn't disagree more. To say its just smarter doesn't really reflect how its improved...If it were a person I would say it's much better "put together" and on the ball than the previous one. It also doesn't suffer from nearly the repeating problem qwen2.5 has, which is the main thing that keeps me from using that model full time. I really wish that qwen was better, because I love the extra speed.
>>103299601Can you show me? I have 96GB of VRAM to run it at 4bits and it doesn't even justify occupying disk space in my computer.
>>103299616Lying faggot nigger. And no, posting someone else's nvidia-smi screenshot from the archives won't mean anything either.
>>103298557lol here he is >>103299441
Can I power a Tesla with a spare CPU cable or do I have to use picrel? It fits and the pinout is the same and it says on the internet it's the same connector and redditors say they use it, but I want to hear it from an anon...
>>103299659Maybe the next time try releasing a model that's at least trained on 18 millions tokens like Qwen, Arthur.You didn't even show any evals or anything for the new model, you don't believe in your own creation, and you have to resort to spamming 4chan with shills...Is this the best that your company can do?
>>103299616The logs aren't in English, and I don't really want to post them anyways. I'm running it at q8, but I don't know if that makes a substantial enough difference in quality to make or break it vs qwen.Its just my gut-feel, really, but thought I'd put my experience out there.
>>103298520is there a simple way to make tavern cards from a cai character