[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now being accepted. Apply here.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103164575 & >>103153308

►News
>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family/
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
Get busy manufacturing your LLM made bumps miku baker faggot. Time to get /lmg/ going again. Dance monkey dance.
>>
>>103188791
Fake bake.
>>
>>103188894
>Fake bake
Made up term.
>>
kurisusex
>>
File: maxresdefault.jpg (101 KB, 1280x720)
101 KB
101 KB JPG
If context stops being a problem she will be my wife. It is so perfect that in character she is already an AI program.
>>
And she was always a perfect /lmg/ mascot. Much better than that green haired whore without any personality. Amadeus Kurisu was a perfect example why you need a local model. And that is because the nigger that is running a cloud server is looking at everything you are doing and he is doing that to make your life miserable in the long run.

Death to all mikuniggers.
>>
>>103188976
>And she was always a perfect /lmg/ mascot.
Then why did nobody want her but you? And why do you endlessly seethe that nobody wanted your forced mascot, to the point that you spam your BBC collection in the thread?
>>
Is there any model these days that's better at voice transfer than RVC2? Or has that entire area just stagnated for the past year?
>>
I think Skeeter from Doug should be the mascot
>>
>>103188976
>>103188936
Any good amadeus cards?
>>
>>103189280
the slave ship? sounds like a fun idea to make a slave trading sim
>>
>>103189098
>Then why did nobody want her but you?
Because you are a faggot and a retard who didn't play the game obviously. Kill yourself.
>>
>>103189299
no, kurisu (version de la amadeus) from the hit visual novel series steins gate (version dos not the uno version)
>>
>>103189304
rofl i was thinking of the Amistad
>>
>>103188780
https://rentry.org/lmg-spoonfeed-guide
>Edit: 12 Dec 2023 00:10 UTC
Is the guide going to be updated? It's almost been a year.
>>
>>103189327
No we don't update shit. We just make sure miku is in the OP and that is it.
>>
File: miku laugh.png (437 KB, 639x653)
437 KB
437 KB PNG
>>103189328
>>103189328
>>103189328
Next thread
>>
>>103189327
>download kobold, nemo model and st
done
>>
>>103189347
little early there
>>
>>103189355
it is ok. he is a little dumb.
>>
>>103189363
He's a vocaloid fag. He's Indian.
>>
>>103189410
>everyone I don't like is one person
>>
File: 1726211361426201.jpg (127 KB, 890x930)
127 KB
127 KB JPG
>>103189342
false
https://rentry.org/LocalModelsLinks
>lmg links rentry created may 2023, updated 2 weeks ago
>ml roadmap rentry created may 2023, updated 1 week ago
>lmg news rentry updates regularly
>datasets rentry created april 2023, updated october 2024
too lazy to check more but many of the lmg rentries are regularly updated
the spoonfeed guide should at least be updated for 2024
>>
>>103189327
Make a proposal for an update.
If it's good enough, we swap.
>>
Why is the other thread full of retarded drama?
>>
>>103189581
Also >>103189350 has a good point.
For a spoonfed quickstart, I'd just point people to the koboldcpp's wiki.

>>103189590
Just ignore it. The stupid thread splitting is a recurring thing because people can't help themselves.
>>
>>103189590
>other thread
Meanwhile this thread
>Get busy manufacturing your LLM made bumps miku baker faggot. Time to get /lmg/ going again. Dance monkey dance.
>>
i am mildly annoyed that there isn't an arliai rpmax 1.3 12b
>>
>>103189743
Fine-tunes doing anything worthwhile aside you should probably know that v1 v2 and v3 is a total scam. There is zero guarantee that bigger number us better. It is completely random.
>>
>>103188780
>>
>>103189779
True.
The best Rocinante is v1.1 for example.
Doesn't make the model incredibly stupid and steers the prose in a way that's different than the official instruct that I feel is more natural by default and in general.
>>
>>103189743
For me it's 22B.
>>
File: ComfyUI_00052_.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>103189884
agreed
>>
Anon, are you okay?! Noooo! They got him.
>>
Is local AI voice gen something that's feasible with a 12g vram card? I looked up if somebody had made a voiceclone of the narrator in The Dead Flag Blues (https://www.youtube.com/watch?v=XVekJTmtwqM) and I found one on voicedub.ai but it's pay2generate, and I can't even hear a test sample for free to see if it sounds good or not.
>>
Jesus what a nigger that other OP is.
>>
svelk
>>
>>103189806
Buy a fucking ad.
>>
>>103191256
>https://github.com/RVC-Boss/GPT-SoVITS
Should run just fine on your gpu. It uses like 2gb on cpu.
>>
Posting in the real /lmg/ thread. Fuck the splitter retard.
>>
>>103192676
how much vram does petra have?
>>
anyone use Letta (formerly MemGPT)? I'm trying it out with 3.2 Vision 11B
>>
this thread is unsafe
>>
>>103192688
it seems pretty interesting, but it's absurdly slow.
feels like it's not keeping the model in memory or something because my token/s is pretty usable but responses are taking multiple minutes. I guess it's because it's swapping embeddings? I'm such a noob so I've got no idea what that entails
>>
>>103192687
>>94536113
>I only have 2 Gb of VRAM
>I truthfully would love to find a list of which books, websites etc the model's entrainment data actually contains, if anyone has that info.
https://desuarchive.org/g/search/text/entrainment/
>>
>>103192998
And your dick has 0mm cause you chopped it off troon.
>>
What is VRAM?
>>
File: GW3SQxoW0AAZI-E.jpg (1.21 MB, 1491x2048)
1.21 MB
1.21 MB JPG
>>103192870
I figured it out. ollama was using 22GB of memory, and swapping to do so. of course I only noticed after >1TB was written to my SSD.
switched to Mistral 7B and if I use Safari instead of Firefox it doesn't swap. still very slow, doing whatever the embedding stuff is doing.

looking forward to playing with it more
>>
>>103193339
Virtual RAM
>>
Does llama server have bitnet implementation yet?
>>
>>103193589
The biggest bitnet model i've seen is 3.9B. There may be a 7B if i'm not mistaken. Do you really want to run that?
>>
>>103193589
What are you gonna do with it? Current bitnets aren't anything actually worth running.
>>
>>103194788
Find out for myself whether they are worth running or not? It's not much point without server integration.
>>
>>103194798
7B is not worth running. Just get a ministral or something and quant it.
>>
Zuck! I kneel!
>>
who is the king in the 8-20B range?
>>
>>103193339
the ram of your mac mini
>>
>>103194868
Nemo or mistral small.
>>
>>103193435
How do I buy that?
>>
>I have a decent gaming rig from ~2 years ago, trying local llms out
>each answer takes 3 minutes on average for nemo 12B q4
>OP has only software, nothing on hardware
Do you guys run the LLMs on your PCs or do you make for them their own servers? I think I'm gonna do the latter. How expensive would a rig have to be to reach ~5 sec latency for a 12B model?
>>
https://xcancel.com/AlterKyon/status/1857304963330027925
>>
>>103195019
vram speeds everything up, the more vram the faster it goes, if you can't get more vram then ram is the next best substitute
>>
File: cudadev.jpg (1.96 MB, 4000x3000)
1.96 MB
1.96 MB JPG
>>103195019
>3 minutes on average
Useless number. Speak in tokens per second. And post your specs. Even an 8GB gpu should do fine for 12b. If that's what you have, and if you're actually running on gpu, that's as good as it's gonna be.
The bar for "decent" is much higher around here.
>>
>>103195019
wait for the RTX 5090
>>
File: 1686850829560715.jpg (88 KB, 758x748)
88 KB
88 KB JPG
>>103195093
>Useless number. Speak in tokens per second.
21.50T/s
>Even an 8GB gpu should do fine for 12b. If that's what you have,
I have Radeon 6950XT with 16gb VRAM
>and if you're actually running on gpu, that's as good as it's gonna be.
So it's possible I may have fucked something up. Thanks, I'll double check.
>>
>>103195163
That's token generation i assume. In 3 minutes you're getting ~3780 token responses. I think that's as well as you can do on AMD. Just make sure you're offloading all the layers.
>https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
No benchmarks for 12b, or AMD cards, but it'll give you a point of reference. AMD (HIP or Vulkan) doesn't run as fast as CUDA. Maybe there are other benchmarks for AMD.

You can set up streaming if you're using llama.cpp or kobold.cpp (i don't know about other inference programs). It'll show the response as it's generated. It won't be any faster, but it'll give you something to do in the meantime.
>>
>>103195163
i was gonna say what the other anon said
but food for thought about the streaming thing:
average human reading speed is ~4 to 7 tk/s
>>
>>103195019
I use 4x24GB GPUs. You can set that up locally with a separate PC.
>>
A dead general DOESN'T need two threads.
>>
>>103195404
Tell that to the other OP who makes a new thread when there is one already.
>>
File: ComfyUI_00850_.png (1.1 MB, 1024x1024)
1.1 MB
1.1 MB PNG
Stupid thread. Stupid thread-splitting schizo
>>
small 22b q8 or nemo12b fp16
why and what 'tune
>>
>>103196799
lurk more
>>
File: media_GTP7BCgaYAUUZa2.jpg (402 KB, 1826x1817)
402 KB
402 KB JPG
>>103196822
>>103196822
>>103196822
New Thread
>>
File: photo.jpg (221 KB, 2000x1332)
221 KB
221 KB JPG
>>103196799
>fp16
>>
>>103196831
filthy spammer.
>>
>>103195268
and humans only see at 24fps, but most of us skim 90% of the gen not stare intently at every token

maybe for RP stuff it's good enough ig
>>
why ask any questions when you can do it yourself why are you afraid of wasting 5 minutes these threads should stop being made
>>
File: 1723709906333891.jpg (520 KB, 1726x1726)
520 KB
520 KB JPG
>>103188780
Sexrisu
>>
a thread died for this
>>
>>103198769
So true >>103196822 killed a thread.
>>
best <22b model for erp?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.