/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107614830 & >>107604598►News>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107614830--Roleplay model training challenges and context window limitations in LLMs:>107616970 >107617081 >107617126 >107621066 >107617128 >107617188 >107617223 >107617246 >107617341 >107617368 >107617395 >107617443 >107617475 >107617432 >107617518 >107617224 >107617310 >107617546 >107617183 >107617628 >107617061 >107617075 >107617096 >107617110 >107617129 >107617144 >107619354 >107619397 >107619527 >107619498 >107619573 >107619593 >107619777 >107619784 >107617079--Enhanced llama.cpp API server integration enables efficient model management:>107622032 >107622446 >107622897--LLM framework preferences and comparisons:>107619845 >107619987 >107620199 >107620209 >107620256 >107620263 >107620534 >107620577 >107621749 >107621779 >107621827 >107621862 >107622153 >107621868 >107620312 >107620685 >107620749 >107621807 >107620618--Local vs cloud AI model effectiveness in coding tasks:>107615290 >107615361 >107615418 >107615465 >107615478 >107615508 >107615591 >107615770 >107615797 >107615874 >107616008 >107615913 >107615949 >107615959 >107616299 >107616360 >107616431 >107616555 >107620165 >107615991 >107616096 >107618869 >107619022 >107619063 >107619050 >107619100 >107615503 >107615540--TTS model landscape: Cloning, performance, and C++ implementation challenges:>107614872 >107615270 >107615524 >107615701 >107615905 >107615962 >107616846 >107614972 >107614977 >107614994 >107614999--Inspecting llama.cpp prompt formatting and macro expansion:>107616912 >107616936 >107616969--Feasibility of LoRA distillation and hardware requirements for large model finetuning:>107618711 >107618896 >107618921 >107618948 >107618956 >107618874 >107618916 >107618920 >107618922 >107618930 >107618944--Miku (free space):>107616115 >107616265 >107616330 >107616521 >107616542 >107619354 >107620004 >107622089►Recent Highlight Posts from the Previous Thread: >>107614834Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Mikulove
where precisely do I put /no_think in sillytavern to disable thinking when using chat completion and GLM air?
>>107623512iirc its /nothink, and it goes at the end of your prompt
>>107623512You know you could figure this out with 1-2 minutes of experimentation, right?
>>107623540At least with text completion I never felt like I needed /nothing. Just prefilling with <think></think> always worked.
>>107623512The jinja template looks at that sequence and prefills <think></think>, I think. So, if you aren't using the chat completion API, you might need to do both.
Is memory and lorebooks the only thing I should use for rp? Or are other features also helpful? Using koboldcpp btw
>>107623552I have /nothink in Start Reply With prefixMain and at the end of my prompt simultaneously and I'm I still getting "Assistant response prefill is incompatible with enable_thinking."
>>107623562>if you aren't using the chat completion APIthe llm boomer
>>107623591Did they finally deprecate it in lieu of that other new API that's essentially the same thing?
>>107623588Have you tried unchecking "request reasoning" in the prompt manager?
>>107623604https://github.com/ggml-org/llama.cpp/issues/14702#issuecomment-3506645678No one has even started working on it because the one vibe coder that touched it wants to deprecate completions API and ggerganov hasn't told him off yet.
>>107623669Well, that's fun.
Sam is going to announce something big
>(Also, your mom’s approval would be great. But I’m keeping it at 3B for now.)
>>107623746Premium bants would be Saltman giving a presentation wearing a war chief necklace of RAM sticks instead of skulls or teeth.
what's the tip-top uncensored local text-to-speech model?
>>107623746he is coming out of the closet
which one of these can run offline on an old android phone
Which voice model is currently the most expressive? meaning it sounds like someone acting instead of hosting a podcast.Doesn't have to support voice cloning, just something that's pretty lightweight and fast.
>>107623829That or announcing toss2, the safest one yet. Wasting even more tokens for policy checks.>>107623887He already did, pic rel.
>>107623920damn wtf, I've been got
>>107623920whos this again
I was right about the last pull fucking up the t/s, cudadev wyd?https://github.com/ggml-org/llama.cpp/issues/18258
>>107623951the Goatse guy and his buddy who took the famous photo
>>107623951sam hyde
>>107624161he does look like he's hiding something
>>107623960I still get the same performance as usual. I believe you're like the issue spawning nibba and haven't disabled fit, which you should because it's the dumbest feature to have ever been introduced in lcpp
>>107623960Tried using fit a bit at first, seemed like it might be a convenient feature. But it just kept crashing with GGML_ASSERT(something) and I disabled it. So never noticed any slowdown.
>>107624379>which you should because it's the dumbest feature to have ever been introduced in lcppIt's a good feature to get newcomers up and running with decent defaults without having to either give them an Intro to M, set the parameters for them, or point them to ollama.
>>107624423if you set your model + ctx fits very tightly (not much vram room left) fit is incapable of doing the right thing and will reduce the amount of layers loaded in gpuIt was so cool when llama.cpp defaulted to ngl 99 behavior, one less flag to care about (moe users would just need to set ncmoe, and users of smaller dense models would have no flag to set)now we have to add -fit off (or -ngl 99 again, because setting ngl disables fit) to get rid of the nonsense
>>107614341>"you are not your thoughts, you are the space where thoughts happen" - here. The simplest way to put it. And it took me only a month of talking to AI and thinking about my thoughts to understand it experientially. Understanding that sentence intellectually means nothing so of course a youtube video or me posting here will not change your mind.This idea is not entirely wrong (I think it's incorrect to say that we are this "space"). However, it's unlikely that you really understand it. You need to experience it as directly as possible, again and again, to truly understand it, to make it intuitive. This is the whole point of Buddhist meditation practices and ethical conduct (bad conduct = agitated and muddy mind, which can't be entirely fixed through meditation). The first steps could indeed be to clarify it using your reason, as you did. You could also do cognitive behavioral therapy (CBT) exercises. But at the end of the day, meditating and removing anything troubling your mind will get you further. LLMs are helpful to understand how to meditate. It's not particularly hard, just don't forget to release any tension in your body when you notice them. Add meditation to your habits, at least for a while. I know Reddit is a meme, but there is a good introductory guide on r/streamentry.
>>107624584Oh, and don't expect to figure out "vacuity" in a month. It usually takes years. It's not an issue, you'll still get great benefits way before any deep realization. Perhaps you're already feeling better. There is an exercise that I like to see how powerful our thoughts are. I look at people and observe how they are suffering. Perhaps they seem anxious because they are surrounded by strangers, or irritated because something isn't "right". Perhaps they are creating an alternate reality and mistakes it for the real world, generating anxiety out of thin air.
What's the latest on audio voice local stuff? Can you make anything good locally yet or is it all fuzzy crap.
>>107624700vibevoice 7b https://voca.ro/11ATlIwHhG8sother sizes are good too, use 3-5 steps and 2-3 cfg
>>107624724Why does it sound like it's coming from a 1930s radio?
>>107624762the voice is cloned from the famous low quality clip