/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 11/25/24(Mon)01:14:48 No.103298520

File: 005579.png (1.29 MB, 896x1152)

/lmg/ - Local Models General Anonymous 11/25/24(Mon)01:14:48 No.103298520

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103286673 & >>103278810

►News
>(11/22) LTX-Video: Real-time video generation on a single 4090: https://github.com/Lightricks/LTX-Video
>(11/21) Tülu3: Instruct finetunes on top of Llama 3.1 base: https://hf.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
>(11/20) LLaMA-Mesh weights released: https://hf.co/Zhengyi/LLaMA-Mesh
>(11/18) Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large
>(11/12) Qwen2.5-Coder series released: https://qwenlm.github.io/blog/qwen2.5-coder-family

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/25/24(Mon)01:15:19 No.103298523

Anonymous 11/25/24(Mon)01:15:19 No.103298523

File: ComfyUI_00093_.png (2.35 MB, 1536x1536)

2.35 MB PNG

►Recent Highlights from the Previous Thread: >>103286673

--Paper: Simplifying CLIP: Unleashing the Power of Large-Scale Models on Consumer-level Computers:
>103297845 >103297860 >103297890
--Papers:
>103296533 >103297733 >103297788 >103297918
--Testing and comparing LLMs, abliteration, and imatrix:
>103293087 >103293117 >103293133 >103293276 >103293314 >103293388 >103294589 >103294729 >103294759 >103294852
--Anon seeks to replace Claude 2 with a local model:
>103297270 >103297321 >103297441 >103297490 >103297520 >103297615
--Anon releases unofficial SMT implementation with PEFT version:
>103296930 >103297151 >103297268
--Merging safetensors files and quantization discussion:
>103288602 >103288654 >103291928 >103292005 >103292180
--LTX-Video model discussion and testing:
>103288336 >103288358 >103293709 >103293808 >103293832 >103293833 >103293979 >103294017 >103294054 >103294101
--Best practices for creating character definitions in koboldAI:
>103296813 >103297047 >103297081 >103297216
--Anons share non-coom uses for the model, including art and programming:
>103286774 >103286788 >103288316 >103286822 >103286831 >103286978 >103286998 >103287002 >103287586
--Card formatting debate and character writing discussion:
>103295022 >103295128 >103295138 >103295179 >103295271 >103295277 >103295250 >103295290 >103295338 >103295472
--Kernel update has no effect on CPU inference performance:
>103287570
--Athene-V2-Chat-72B open model performance and implications:
>103293224 >103293513 >103294469 >103294670
--Anon shares comic-translate app for automatic comic translations:
>103290835
--Anon shares AmoralQA-v2 dataset and discusses its usage in models:
>103287899 >103288215 >103288314
--Miku (free space):
>103286754 >103287503 >103289721 >103290110 >103292155 >103292194 >103292482 >103292570 >103294256 >103294336 >103295577 >103296695

►Recent Highlight Posts from the Previous Thread: >>103286678

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
11/25/24(Mon)01:19:07 No.103298545

Anonymous 11/25/24(Mon)01:19:07 No.103298545

>>103298523
nigga what

Anonymous
11/25/24(Mon)01:21:03 No.103298557

Anonymous 11/25/24(Mon)01:21:03 No.103298557

>>103298447
Careful, you're gonna set off the MagnumV4 72B schizo. He won't countenance any praise for a larger model.

Anonymous
11/25/24(Mon)01:35:35 No.103298620

Anonymous 11/25/24(Mon)01:35:35 No.103298620

So with the new behemoth what the fuck am I supposed to do with the instruct format? I didn't quite understand the instructions with the prompt format drummer left.

Anonymous
11/25/24(Mon)01:56:30 No.103298712

Anonymous 11/25/24(Mon)01:56:30 No.103298712

Zzzz

Anonymous
11/25/24(Mon)01:56:49 No.103298713

Anonymous 11/25/24(Mon)01:56:49 No.103298713

File: 005585.png (1.34 MB, 896x1152)

1.34 MB PNG

:-)

Anonymous
11/25/24(Mon)01:57:21 No.103298717

Anonymous 11/25/24(Mon)01:57:21 No.103298717

ahhhh

Anonymous
11/25/24(Mon)01:59:21 No.103298723

Anonymous 11/25/24(Mon)01:59:21 No.103298723

File: 005586.png (1.16 MB, 896x1152)

1.16 MB PNG

:o

Anonymous
11/25/24(Mon)02:02:49 No.103298738

Anonymous 11/25/24(Mon)02:02:49 No.103298738

>>103298520
>>103298523
>>103298712
>>103298713
>>103298717
>>103298723
Large language models?

Anonymous
11/25/24(Mon)02:03:24 No.103298741

Anonymous 11/25/24(Mon)02:03:24 No.103298741

>>103298738
You're in the wrong hood

Anonymous
11/25/24(Mon)02:04:14 No.103298742

Anonymous 11/25/24(Mon)02:04:14 No.103298742

Which LLM is good if I want a buddy for programming? Which LLM is good if I needed someone to reference and improve my report writing? I tend to write a lot of things in the passive voice which is annoying.

Anonymous
11/25/24(Mon)02:10:54 No.103298770

Anonymous 11/25/24(Mon)02:10:54 No.103298770

>>103298742
these losers just masturbate.

Anonymous
11/25/24(Mon)02:20:40 No.103298807

Anonymous 11/25/24(Mon)02:20:40 No.103298807

>>103298770
Anon, none of these questions are genuine.

Anonymous
11/25/24(Mon)03:07:44 No.103299050

Anonymous 11/25/24(Mon)03:07:44 No.103299050

>>103298742
The last qwen-32B

Anonymous
11/25/24(Mon)03:12:02 No.103299079

Anonymous 11/25/24(Mon)03:12:02 No.103299079

Meow! :3

Anonymous
11/25/24(Mon)03:18:19 No.103299126

Anonymous 11/25/24(Mon)03:18:19 No.103299126

>>103298742
Qwen2.5 Coder 32b instruct
>24gb VRAM
>Q4_K_L
>16000 context length

Its surprisingly really good. Not as good as the SOTA models, but better than literally anything else for local. Starts to derp out a bit at longer contexts like all other models. I just use it in ST for now but plan on hooking it up to aider whenever I get ollama running.

Anonymous
11/25/24(Mon)03:24:13 No.103299167

Anonymous 11/25/24(Mon)03:24:13 No.103299167

>>103299050
>>103299126
For both programming and writing feedback? Ideally for me it'll be trained enough and I can isolate the information on my machine.

Anonymous
11/25/24(Mon)03:25:16 No.103299176

Anonymous 11/25/24(Mon)03:25:16 No.103299176

>>103299126
>aider whenever I get ollama running.
the cringe... it hurts...

Anonymous
11/25/24(Mon)03:35:52 No.103299238

Anonymous 11/25/24(Mon)03:35:52 No.103299238

Is llama.cpp still being developed? Anyone know what to use to run models that aren't hopelessly out of date?

Anonymous
11/25/24(Mon)03:41:50 No.103299255

Anonymous 11/25/24(Mon)03:41:50 No.103299255

>>103299238
which fucking repo are you on goddamn

Anonymous
11/25/24(Mon)03:43:01 No.103299259

Anonymous 11/25/24(Mon)03:43:01 No.103299259

>>103299176
Brother I know. I primarily use ooba and have been resisting ollama. Unfortunately aider doesn't work well with ooba.

Anonymous
11/25/24(Mon)03:44:01 No.103299265

Anonymous 11/25/24(Mon)03:44:01 No.103299265

>>103299167
Just coding.

llama.cpp CUDA dev !!OM2Fp6Fn93S
11/25/24(Mon)04:14:23 No.103299393

llama.cpp CUDA dev !!OM2Fp6Fn93S 11/25/24(Mon)04:14:23 No.103299393

>>103299238
No, it's over.

Anonymous
11/25/24(Mon)04:22:51 No.103299441

Anonymous 11/25/24(Mon)04:22:51 No.103299441

>>103298447
No, it's largely the same model. The meta is still Qwen2.5. Especially if you have to run Large at 2bits.

Anonymous
11/25/24(Mon)04:51:13 No.103299580

Anonymous 11/25/24(Mon)04:51:13 No.103299580

>>103299393
Fuck :(

Anonymous
11/25/24(Mon)04:56:29 No.103299601

Anonymous 11/25/24(Mon)04:56:29 No.103299601

>>103299441
After having a great talk 16k token interactive learning session with the new largestral, I couldn't disagree more. To say its just smarter doesn't really reflect how its improved...If it were a person I would say it's much better "put together" and on the ball than the previous one. It also doesn't suffer from nearly the repeating problem qwen2.5 has, which is the main thing that keeps me from using that model full time. I really wish that qwen was better, because I love the extra speed.

Anonymous
11/25/24(Mon)04:59:26 No.103299616

Anonymous 11/25/24(Mon)04:59:26 No.103299616

>>103299601
Can you show me? I have 96GB of VRAM to run it at 4bits and it doesn't even justify occupying disk space in my computer.

Anonymous
11/25/24(Mon)05:04:43 No.103299659

Anonymous 11/25/24(Mon)05:04:43 No.103299659

>>103299616
Lying faggot nigger. And no, posting someone else's nvidia-smi screenshot from the archives won't mean anything either.

Anonymous
11/25/24(Mon)05:06:33 No.103299675

Anonymous 11/25/24(Mon)05:06:33 No.103299675

>>103298557
lol here he is >>103299441

Anonymous
11/25/24(Mon)05:09:55 No.103299698

Anonymous 11/25/24(Mon)05:09:55 No.103299698

File: 1606384321251.jpg (103 KB, 800x800)

103 KB JPG

Can I power a Tesla with a spare CPU cable or do I have to use picrel? It fits and the pinout is the same and it says on the internet it's the same connector and redditors say they use it, but I want to hear it from an anon...

Anonymous
11/25/24(Mon)05:12:00 No.103299712

Anonymous 11/25/24(Mon)05:12:00 No.103299712

>>103299659
Maybe the next time try releasing a model that's at least trained on 18 millions tokens like Qwen, Arthur.
You didn't even show any evals or anything for the new model, you don't believe in your own creation, and you have to resort to spamming 4chan with shills...
Is this the best that your company can do?

Anonymous
11/25/24(Mon)05:12:11 No.103299717

Anonymous 11/25/24(Mon)05:12:11 No.103299717

>>103299616
The logs aren't in English, and I don't really want to post them anyways.
I'm running it at q8, but I don't know if that makes a substantial enough difference in quality to make or break it vs qwen.
Its just my gut-feel, really, but thought I'd put my experience out there.

Anonymous
11/25/24(Mon)05:14:02 No.103299730

Anonymous 11/25/24(Mon)05:14:02 No.103299730

>>103298520
is there a simple way to make tavern cards from a cai character

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.