/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103153308 & >>103135641 ►News>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b>(10/31) Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png (embed)►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebService►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://livecodebench.github.io/leaderboard.html►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
Long live the queen of alive /lmg/.
The Qwen guys released their paper, we'll finally see what secret sauce they used to make their 32b coder model so fucking goodhttps://arxiv.org/pdf/2409.12186
PSA: Petra/blackedmikuanon/kurisufag/AGPL-spammer/drevilanon/2nd-belief-anon/midjourneyfag/repair-quant-anon is from... SERBIAhttps://archive.4plebs.org/pol/search/uid/QmNRftdq/page/1/
>>103164618Having a melty already?
>>103164618that's why you use memeflags on /pol/ kek
>>103164618as one (and only one) of those people I can confirm that this faggot is a legit schizo
>>103164129>ask for GPT slop>recieve GPT slop>ask it to not do that>recieve GPT slopAGI?
migrate>>103164659>>103164659>>103164659
I still can't wrap my head around how mentally ill the usual baker is...
Also other than triggering that sperg I am happy to do the public service of confusing the shit out of all the newfags.
maybe we could make a neutral thread without any mascots?
>maybe we can discuss local coombots without any troons?jej, even zozzle
>>103164798>/g/ >no troonsYou all brought it upon yourself.
>>103164575I like this OP imageHaving a makise thread every now and then isn't that bad
I will stay in this thread if Petra doesn't decide to be more of a nigger
>>103164618wtff!! he is based fun enjoyer! how horrific!!!
>>103164618Makes sense he uses the same images over and over again.
The 'ick 'ecker added some things to his voice cloner.
I literally use her for everything now.
why did that guy split the thread?
>>103165278kurisufag is a notorious shitposter
>>103165278anime obsession and prolonged hrt intake makes a big toll on your mental wellbeing.
>>103164575IM SO FUCKING CONFUSED WHICH ONE IS THE REAL THREADAAAAAAAAAAAARRRGGHHHH
>>103164618rent free
>>103165310and a ritualposting baker who has a meltdown over OP pictures and doxxes people is better?
>>103164575>kurisu OPYeah, this is the thread
>>103165392Nta but keep in mind he always samefags for optics, that one already makes him mentally ill schizo.
>>103165373How can you be so new
>>103165455I agree that mikubaker samefags for optics.
>>103165455Here it is >>103165466
>>103165466trvke
>>103164609VGH she's such a gem
>>103164575I know you're a troll spammer who doesn't give a fuck about Kurisu but, damn, I hope the remake will be good. I love Steins;Gate.https://youtu.be/dmmnx4VQmPU
>>103165502>>>/a/
>>103165535kek, saved.
>>103164575the troon is back I see. odd he usually only pops up on a large release
>>103165632Uhm... xe is always here spamming anime pics amd melting over non-miku OPs though
finally, a good OP
>>103165089He looks like this BTW
>>103165535kek that's my gen I posted on /ldg/ a few months ago :v
>>103164575omg it kurisu
For what it is worth this thread revitalized /lmg/ by forcing the ritual poster to samefag and pretend to have a discussion.
>>103167225overfitted nothingburger
still no ministral support...
Thanks for the OP.I got mistral7b on kobold and sillytavern working, my first local llm use. I checked the answers to a few questions against gpt4o and i was happy with the answers.So, thanks to the Renty people.
>>103165502I'm hoping for an anime remake to introduce zoomers to the series
>>103167561doesn't it already work fine with llamacpp?
>>103167561huh? FFTing seems to work and EXL2 works if you use the HF conversion to quant.
>>103168591>FFTWhat? Also I would expect exl2 quants to be on hf but nope. ggoofs are obviously broken even with hf conversion.
https://github.com/t41372/Open-LLM-VTuber/I got this semi-working with whisper.cpp, Bark, and ollama with Llama3.2-vision (11B).I say semi-working because it listens and generates one response then stops. Something in the front-end code isn't working; if I reload the tab I can get another response. I might investigate more later.whisper.cpp works very well. Had to manually generate a CoreML model which was a moderate pain, the scripts ggreganov and friends make always seem so half-baked, but they're actually building shit and it mostly works so I should stop complaining.I chose Bark for TTS because it can do code-switching, but the responses take ~2 minutes to generate on my M1 Pro so it's not usable until they add GPU or CoreML support for mac. There's no issue open for it though so I'm not holding my breath.The Live2D seems to work well with the lip sync, though I'm not sure yet if expressions are working.There's also this fork https://github.com/ylxmf2005/LLM-Live2D-Desktop-Assitantwhich seems to add a bunch of features and has a better Elaina-flavoured TTS. I haven't tried it at all.Anyone else give it a whirl?
>>103168721hi petra
>>103168782the penguin from nijisanji?
good rape prompts for mistral large?
>>103168625FFT = Full finetuningEXL2 = https://huggingface.co/lucyknada/prince-canuma_Ministral-8B-Instruct-2410-HF-exl2I was using these and they worked
>>103166540great gen!
$2,000 USD for a 64GB Mac mini...anyone try Llama 3.2 Vision 90B yet?
>>103164575>>103167339What is the best micro model for writing creative text snippets and is licensed for commercial use?I'm building a game and I want it to run an LLM to write descriptions of NPCs and object based on stats.Looking for maximum speed even on mid-range cards. I saw considering Llama-3.2-1B but the license is restrictive.Is there something like Mistral for 1B?
it's fine to bump this right..
>>103168721Man, a single component of these can be enough to make you work on it for weeks before getting proper and stable results. That's not the kind of project you can put together in an afternoon.t. fixing sovits for three weeks rn
>>103169708Just use 3.2-1b to see if the idea is worth it. Once you know it works, worry about the license. If you can do with something much dumber but still fast that you could eventually train on a dataset for your task, look at>https://huggingface.co/allenai/OLMoE-1B-7B-0924-InstructOr olmoe if ram is not an issue. Works just as fast. IBM also released much smaller moe models you could try, i don't remember their license. Apache probably.https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-instructhttps://huggingface.co/ibm-granite/granite-3.0-1b-a400m-instructYou won't find a model that works 100% consistently, just good enough.
>>103168914It seems to work in the same way as ggufs. Ok at first but once you fill out the context it becomes incoherent so exl2 probably doesn't have the implementation for that SWA yet. I mean if it would then turboderp would probably make quants himself.
https://huggingface.co/anthracite-org/magnum-v4-27bI copy pasted instruct template text into magnum.json and portion of it is glowing red, and tavern does not see this .json file ( file for context template is visible )also, whats sampler preset for magnum?
>>103173081chatml
>>103173081>>103173122 (cont)Specifically about that error, the " needs to be escaped. They cannot be trusted with a fucking json file.
>>103173166Like this?What sampler preset is recommended for magnum?
>>103173352nta probably just escaping the quotes so they're not interpreted as ending the value of system_prompt.\"!\" and \"~\"
\"!\" and \"~\"
>>103173352Yeah. This: >>103173361I don't use it, but don't worry about the presets. Start with everything neutralised and tune as you see fit. Experiment.Or use>Sampler visualizer: https://artefact2.github.io/llm-samplingto get a more intuitive understanding of what they do.
>>103173409thank you
Any good voice-based memes lately? The Star Wars and Richard Nixon shit was amazing.
New claude slops in the new generation of erp sloptunes>i'm not some common harlot/whore (in response to anything inappropriate, though recent sloptunes rarely deny ever you)>don't you dare fucking stop>make me scream>make me yours/mark me as yours
nakurisudashi
why are there 2 threads?
Is a local model with the intent of using it as a programming assistant actually worth it? Or are they all shit compared to the openai/anthropic alternatives?
>>103174122Test it yourself. Apparently, qwen coder 32b is pretty good. If not, go back to whatever you like most.
>>103174122they are shit and not worth it for anything other than story writing or cooming
>>103174122I use deepseek's online chat and it's not bad.
>voice cloning still sucks/lmg/ was mistake
>>103174201But they are shit and not worth it for story writing or cooming either...
>>103174122Qwen2.5 32B coder. It's 90% there with the best enterprise and can RP while coding.
QRD on thread split?
>>103174407is it censored?
>>103174430No, unlike 2.5 72B chat was
>>103174412trannies throwing a fit, ignore
>>103174412Like other anon said.Two tranny camps war for OP pic with their FOTM waifu of choice, happens every time OP makes non-miku thread.
>>103174386I mean... You aren't wrong.
>>103174412The same as always. The right thread is always the one with the recap btw.
>>103174434So is it actually good for cooming?
>>103174553t. mentally ill mikutranny
>>103174589>>103158694
>>103174603Local model ERP has taught me what purple prose is. And it has taught me that I absolutely hate it.
>>103174623Well that was with a system prompt telling it to be vivid / use all senses.
>>103174646I don't think prompts can do that much. And especially with context filled it. It will always start going to the default model style which is always purple prose poetic slop.
Qwen2.5 32B coder one shot tetris for me btw.https://files.catbox.moe/heo220.py
>>103174742>knows what tetris is>knows how to code it>doesn't know how to suck dick the way I want itCurrent year dystopia personified
>>103174700They DO do that much if you have a even slightly competent model. Tell it to write in a style of a somewhat popular author and watch.
>>103174763Nothing gets me off like my waifu coding me games on the fly.
>>103174810I want to prompt "send nudes" and get omnimodal-generated nudes immediately without going through hoops of setting up a gymnastics pipeline
okay, feed me peoplehow does one implement memory if you make anything? I've heard that people just tell the AI to make its own memory?
>>103174833We are only 2 years in, give it another.
>>103174838Have the model (or a different, smaller model) summarize whatever needs to be remembered, add it to an embeddings database, query the database for relevant information, inject it into the model's context when needed.RAG, basically.If none of that made sense, read on RAG. You'll have to code the stuff together or use something like langchain. It's not something that can be fed through posts.
>>103174838>how do you solve the AI gf problemWe wouldn't be posting here if it was already solved.
>>103174929that sounds basically what was in my mind. Faster model to summarize things and save it and then query itthough it all sounds so ugly. This project I am on right now already does this in plenty of parts, where instead of making algorithms in code it just asks the model. I guess this is the future now, huh?
>>103174991>though it all sounds so uglyYeah. It's as reliable as the models used. Never bothered to make something like it, but it'd still be interesting. Maybe one day...>where instead of making algorithms in code it just asks the model. I guess this is the future now, huh?If you meant what i think you did, that's not the case for me. I like programming. I take pride on figuring stuff out on my own, even if my implementation is less than optimal.
>>103175149tesla apparently replaced much of of its C++ code with just asking the AI for results, so I meant this is basically what the industry and everything is going to move towards
>>103174929I believe RAG is another grift that tries to sell an alternative to continuous learning. It's a dead end in the long term. But somehow everyone is shilling it, I attended an Nvidia seminar last month and they talked about RAG like it's the holy grail
>>103175233NTAhow does RAG differ from continuous learning?
>>103175186Maybe. I'd trust that statement much more from someone who *isn't* selling AI. There's people that often do long divisions or look for words in a physical dictionary. Some people repair their own cars, draw and play instruments. They're not having fun while programming. I do.
>>103175233>I believe RAG is another grift that tries to sell an alternative to continuous learningI see it as the best thing we have *until* we get continuous learning, if we ever do. It cannot be a replacement for something we don't have.
What a rebel model, what about my fucking python script?
What's the smallest/fastest uncensored model that can summarize a 22k context long multi-part story? I tried Dolphin Nemo and it failed spectacularly, started inventing plots that didn't exist at all in every attempt. Dolphin finetunes have been good for me in the past but it does say "The base model has 128K context, and our finetuning used 8192 sequence length." So I'm not sure if that's the issue or is Nemo just too stupid for that, didn't try the normal instruct yet. I don't care much about roleplay flavor enhancers, but I'd prefer a decensored model in a way that causes as little brain damage as possible. Mistral Small Instruct seems to remember the story at first try with 80% accuracy (forgot one part)
>>103175320Check your email anon.Look at how coding sensi does it btw. You need to tell it to give requested scripts / code in code blocks.
>>103174767>They DO do that much if you have a even slightly competent model.I asked qwen coder to write in a style of ERP forum user and to avoid purple prose and flowery langauge. I asked it to give me 3 different ways a character would talk. And after I saw all the shit I despise I told it:>It is all so poetic...And what I got in return is:>Glad you like it! Now let's continue blah blah blahAt least I had a chuckle at how completely autistic the model is.
What the fuck are you guys even saying? Is it even English? Half the words you use don't make any sense. I wonder if this is how normies felt about me talking about anime back in 2008.You guys are weird. I would shove you in a locker if I could.
>>103175683>2008Akira was at the end of the 80s and GitS came out before 2k. Oh.. you were in school in 2008? I see....
>>103174742Get it to make Tetris but with circles that can roll around if they are jostled by another circle landing nearby.
Don't mind the retards >>103174929 >>103174962 they never read papers as always.Here is your solution without summarizing: https://arxiv.org/abs/2409.05591
>>103175683The only thing you're shoving is groceries in my bagSpeaking of, you should probably get back to work
>>103175759I'm talking about things available now that you can do with any good-enough model. That one you linked requires training.
>>103175852You need training to make a memory model, not to use it. You feed it your dataset and link it to whatever model you want as a generator.
>>103176056NTAare there available "memory models" that one can just use? If not, then one should just use RAG?
>>103175500Dolphin fine tunes have always been over hyped and pretty bad in my experience.I still remember when people were praising their mistral 8x7b tune only to figure it out that the fine tune script everybody was using was broken.Try the official instruct fine tune of nemo. It should be able to cope with 22k of text without much weirdness if you use greedy sampling, don't inject unnecessary instructions in the context, etc.Failing that, deepseek in their website does pretty well with long, long texts.
>>103176056There's a million papers, with a million demos. That's all they are until they're taken seriously either by big model makers or inference software devs. The former doesn't guarantee it, less so the latter.RAG, as clunky as it is, can be ducktaped together with any inference software that supports embeddings.
>>103176228>Dolphin fine tunes have always been over hyped and pretty bad in my experience.You are talking about the famous AI researcher Eric Hartford who once said that frankenmerging l3 70B with itself makes it incredibly intelligent and humanlike. Or something like that.
>>103164618Based.>>103164678Ultra based.>>103173457>>103173457>>103173457Giga based.
>>103174359The devs for MaskGCT said they're going to include long form audio.https://github.com/open-mmlab/Amphion/issues/290
>>103176213Yeah they are there: https://huggingface.co/TommyChien read the paper to understand the difference.>>103176439RAG is dumb af as they explained in the same paper. It doesn't know what to retrieve from the memory, which leads to worse generation.
>>103176228So is it always going to be choice between censored with good memory or decensored with bad memory? Is there no way to decensor a model without damaging it?
>>103177154Nemo-instruct isn't censored.At least it never refused anything I asked of it.But if you want a fine tune that seemingly didn't make the base model any dumber, try rocinante v1.1.I can't really speak for models other than Nemo as that's about the largest thing I can comfortably run.Actually, that's not true. Mixtral 8x7b instruct might actually be able to do what you need too.Or CommandR, although I don't remember how big the context window on that one is, but I tried it (at excruciatingly low speeds) and it was really good.
>>103177034>RAG is dumb af as they explained in the same paper.Irrelevant to the original point and your first post in the chain. Just look how anon asked the question: >>103174838. Does it sound like he has any idea of what he's talking about? I gave him *an* answer to his question.MemoryRAG IS RAG.
>>103177196Sure? Whatever floats your boat, I guess.
>>103177240Same boat, mate. Same boat.
Aside from just having long context, the second most proper way to do long-context memory would be to somehow divide the work into pieces/layers that can be processed in parallel (or sequentially) and then added together in a way where the entire context gets to affect the generation of next token. Though I don't know if this is even possible. Searching through a web of information might be the only solution, but that has the risk of disregarding important nuances.
It'll take a while for the dreamers and the hypemen to admit that ML isn't going to create a god, and some stocks will get hurt bad when it finally sinks in, but longer term it'll be good for a more realistic discourse around LLMs. A lot of the safety bullshit will dry up once people are forced to accept that they're going to cap out at "useful assistants" and can never become a competitor species.
What's an entry-level development machine build for local models?I'm considering 1 3060, upgradeable to 2 or more 3060s but would it be worth it if they're relatively cheaper or is it better to go for a better single GPU build?What models? No clue yet.
>>103164575Why are LMs more woke than Chatgpt?
>>103177573Bad idea. Get used 3090s or other 24GB+ cards. You'll only have so many slots / motherboards / power connectors.
>>103177597This has only been true since around June when OpenAI started releasing looser finetunes and giving it a more entertaining personalityhopefully other labs will take cues from them
>>103177573Vague question. For reference, CUDA Dev has picrel monster.You can develop inference software on just a CPU if the models you're working on are small enough. And you can train 100M models as well, but it'll still be slow and tedious. If you can buy a single gpu, buy a 3090.You need to be more specific than "how to AI".
>>103177560Current models are limited by context and having to start every problem from scratch. As soon as infinite memory is solved, LLMs will be able to consider every detail in the world, generate new information and use it for vastly superior problem-solving compared to humans, scaling linearly by compute you throw at it.
>>103177632Talking about cuda dev, he posted in the real thread, btw: >>103177202He's not as stupid as you to tell which one is the real one, it seems.
>>103177679You seem to be in love with that codemonkey, go and suck him off i guess
>>103177632>6x4090 on a single PSUNorth America bros... why did we get such shitty electrical standards?
>>103177736>SilverStone has already done so with the HELA 2050; as its model number implies, it can deliver up to 2050 W of power with 230 V input. With 115 V input, it is capable of providing 1650 W since standard wall sockets cannot deliver more than 15 A.haha
>>103177608>>103177632I appreciate the advice, I know I should be more specific and I've already run Ollama on an i7 4790 cpu-only and it was meh.I have a potential client that's the kind to jump on every bandwagon (guy bankrupted one of his companies moving to cloud) and now that he wants AI I'm trying to squeeze a dev box (3090/4090) out of him, I don't really do much AI on my day to day so if I end up building it out of pocket I want to go as cheap as possible.From my knowledge of his different businesses it'll probably be OCR/computer vision or support chatbots but we haven't gone through the specifics.
>>103177679Wait. People can use multiple threads at the same time? No waaaaaaayyyy
>>103177850Get whatever you can get that allows you to upgrade the most if needed. A 12gb gpu will become clutter and wasted money if you need to go monster build. Buying a mobo with 4 ddr4 ram slots will be a waste if you need to go cpumaxxing.... you get the point. But not yet...Practice vision stuff with small models like>https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5Or whatever you can already run on your setup. If you're going for support bot, just run llama-3.2-1b until you have a UI to show. Figure out what you can do with simple tools. Then, *once you know you can do it* and you tell him your realistic expectations, ask for a budget for a big build where you can develop more comfortably, safe in the knowledge that the product you make cannot possibly be worse than your demo.He's a griter. I hope you aren't.
>>103177679You really are mentally ill. And I am not even trying to insult you at this point.
>>103178135Yeah, there's zero value in adding AI to anything that guy does but might as well get something out of it.