/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103164575 & >>103153308►News>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family/>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebService►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://livecodebench.github.io/leaderboard.html►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
Get busy manufacturing your LLM made bumps miku baker faggot. Time to get /lmg/ going again. Dance monkey dance.
>>103188791Fake bake.
>>103188894>Fake bakeMade up term.
kurisusex
If context stops being a problem she will be my wife. It is so perfect that in character she is already an AI program.
And she was always a perfect /lmg/ mascot. Much better than that green haired whore without any personality. Amadeus Kurisu was a perfect example why you need a local model. And that is because the nigger that is running a cloud server is looking at everything you are doing and he is doing that to make your life miserable in the long run.Death to all mikuniggers.
>>103188976>And she was always a perfect /lmg/ mascot.Then why did nobody want her but you? And why do you endlessly seethe that nobody wanted your forced mascot, to the point that you spam your BBC collection in the thread?
Is there any model these days that's better at voice transfer than RVC2? Or has that entire area just stagnated for the past year?
I think Skeeter from Doug should be the mascot
>>103188976>>103188936Any good amadeus cards?
>>103189280the slave ship? sounds like a fun idea to make a slave trading sim
>>103189098>Then why did nobody want her but you?Because you are a faggot and a retard who didn't play the game obviously. Kill yourself.
>>103189299no, kurisu (version de la amadeus) from the hit visual novel series steins gate (version dos not the uno version)
>>103189304rofl i was thinking of the Amistad
>>103188780https://rentry.org/lmg-spoonfeed-guide>Edit: 12 Dec 2023 00:10 UTCIs the guide going to be updated? It's almost been a year.
>>103189327No we don't update shit. We just make sure miku is in the OP and that is it.
>>103189328>>103189328>>103189328Next thread
>>103189327>download kobold, nemo model and stdone
>>103189347little early there
>>103189355it is ok. he is a little dumb.
>>103189363He's a vocaloid fag. He's Indian.
>>103189410>everyone I don't like is one person
>>103189342falsehttps://rentry.org/LocalModelsLinks>lmg links rentry created may 2023, updated 2 weeks ago>ml roadmap rentry created may 2023, updated 1 week ago>lmg news rentry updates regularly>datasets rentry created april 2023, updated october 2024too lazy to check more but many of the lmg rentries are regularly updatedthe spoonfeed guide should at least be updated for 2024
>>103189327Make a proposal for an update.If it's good enough, we swap.
Why is the other thread full of retarded drama?
>>103189581Also >>103189350 has a good point.For a spoonfed quickstart, I'd just point people to the koboldcpp's wiki.>>103189590Just ignore it. The stupid thread splitting is a recurring thing because people can't help themselves.
>>103189590>other threadMeanwhile this thread>Get busy manufacturing your LLM made bumps miku baker faggot. Time to get /lmg/ going again. Dance monkey dance.
i am mildly annoyed that there isn't an arliai rpmax 1.3 12b
>>103189743Fine-tunes doing anything worthwhile aside you should probably know that v1 v2 and v3 is a total scam. There is zero guarantee that bigger number us better. It is completely random.
>>103188780
>>103189779True.The best Rocinante is v1.1 for example.Doesn't make the model incredibly stupid and steers the prose in a way that's different than the official instruct that I feel is more natural by default and in general.
>>103189743For me it's 22B.
>>103189884agreed
Anon, are you okay?! Noooo! They got him.
Is local AI voice gen something that's feasible with a 12g vram card? I looked up if somebody had made a voiceclone of the narrator in The Dead Flag Blues (https://www.youtube.com/watch?v=XVekJTmtwqM) and I found one on voicedub.ai but it's pay2generate, and I can't even hear a test sample for free to see if it sounds good or not.
Jesus what a nigger that other OP is.
svelk
>>103189806Buy a fucking ad.
>>103191256>https://github.com/RVC-Boss/GPT-SoVITSShould run just fine on your gpu. It uses like 2gb on cpu.
Posting in the real /lmg/ thread. Fuck the splitter retard.
>>103192676how much vram does petra have?
anyone use Letta (formerly MemGPT)? I'm trying it out with 3.2 Vision 11B
this thread is unsafe
>>103192688it seems pretty interesting, but it's absurdly slow.feels like it's not keeping the model in memory or something because my token/s is pretty usable but responses are taking multiple minutes. I guess it's because it's swapping embeddings? I'm such a noob so I've got no idea what that entails
>>103192687>>94536113 >I only have 2 Gb of VRAM>I truthfully would love to find a list of which books, websites etc the model's entrainment data actually contains, if anyone has that info. https://desuarchive.org/g/search/text/entrainment/
>>103192998And your dick has 0mm cause you chopped it off troon.
What is VRAM?
>>103192870I figured it out. ollama was using 22GB of memory, and swapping to do so. of course I only noticed after >1TB was written to my SSD.switched to Mistral 7B and if I use Safari instead of Firefox it doesn't swap. still very slow, doing whatever the embedding stuff is doing.looking forward to playing with it more
>>103193339Virtual RAM
Does llama server have bitnet implementation yet?
>>103193589The biggest bitnet model i've seen is 3.9B. There may be a 7B if i'm not mistaken. Do you really want to run that?
>>103193589What are you gonna do with it? Current bitnets aren't anything actually worth running.
>>103194788Find out for myself whether they are worth running or not? It's not much point without server integration.
>>1031947987B is not worth running. Just get a ministral or something and quant it.
Zuck! I kneel!
who is the king in the 8-20B range?
>>103193339the ram of your mac mini
>>103194868Nemo or mistral small.
>>103193435How do I buy that?
>I have a decent gaming rig from ~2 years ago, trying local llms out>each answer takes 3 minutes on average for nemo 12B q4>OP has only software, nothing on hardwareDo you guys run the LLMs on your PCs or do you make for them their own servers? I think I'm gonna do the latter. How expensive would a rig have to be to reach ~5 sec latency for a 12B model?
https://xcancel.com/AlterKyon/status/1857304963330027925
>>103195019vram speeds everything up, the more vram the faster it goes, if you can't get more vram then ram is the next best substitute
>>103195019>3 minutes on averageUseless number. Speak in tokens per second. And post your specs. Even an 8GB gpu should do fine for 12b. If that's what you have, and if you're actually running on gpu, that's as good as it's gonna be.The bar for "decent" is much higher around here.
>>103195019wait for the RTX 5090
>>103195093>Useless number. Speak in tokens per second.21.50T/s>Even an 8GB gpu should do fine for 12b. If that's what you have, I have Radeon 6950XT with 16gb VRAM>and if you're actually running on gpu, that's as good as it's gonna be.So it's possible I may have fucked something up. Thanks, I'll double check.
>>103195163That's token generation i assume. In 3 minutes you're getting ~3780 token responses. I think that's as well as you can do on AMD. Just make sure you're offloading all the layers.>https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-InferenceNo benchmarks for 12b, or AMD cards, but it'll give you a point of reference. AMD (HIP or Vulkan) doesn't run as fast as CUDA. Maybe there are other benchmarks for AMD.You can set up streaming if you're using llama.cpp or kobold.cpp (i don't know about other inference programs). It'll show the response as it's generated. It won't be any faster, but it'll give you something to do in the meantime.
>>103195163i was gonna say what the other anon saidbut food for thought about the streaming thing:average human reading speed is ~4 to 7 tk/s
>>103195019I use 4x24GB GPUs. You can set that up locally with a separate PC.
A dead general DOESN'T need two threads.
>>103195404Tell that to the other OP who makes a new thread when there is one already.
Stupid thread. Stupid thread-splitting schizo
small 22b q8 or nemo12b fp16why and what 'tune
>>103196799lurk more
>>103196822>>103196822>>103196822New Thread
>>103196799>fp16
>>103196831filthy spammer.
>>103195268and humans only see at 24fps, but most of us skim 90% of the gen not stare intently at every tokenmaybe for RP stuff it's good enough ig
why ask any questions when you can do it yourself why are you afraid of wasting 5 minutes these threads should stop being made
>>103188780Sexrisu
a thread died for this
>>103198769So true >>103196822 killed a thread.
best <22b model for erp?