[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: c00652a.png (256 KB, 965x604)
256 KB
256 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107614830 & >>107604598

►News
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>107614830

--Roleplay model training challenges and context window limitations in LLMs:
>107616970 >107617081 >107617126 >107621066 >107617128 >107617188 >107617223 >107617246 >107617341 >107617368 >107617395 >107617443 >107617475 >107617432 >107617518 >107617224 >107617310 >107617546 >107617183 >107617628 >107617061 >107617075 >107617096 >107617110 >107617129 >107617144 >107619354 >107619397 >107619527 >107619498 >107619573 >107619593 >107619777 >107619784 >107617079
--Enhanced llama.cpp API server integration enables efficient model management:
>107622032 >107622446 >107622897
--LLM framework preferences and comparisons:
>107619845 >107619987 >107620199 >107620209 >107620256 >107620263 >107620534 >107620577 >107621749 >107621779 >107621827 >107621862 >107622153 >107621868 >107620312 >107620685 >107620749 >107621807 >107620618
--Local vs cloud AI model effectiveness in coding tasks:
>107615290 >107615361 >107615418 >107615465 >107615478 >107615508 >107615591 >107615770 >107615797 >107615874 >107616008 >107615913 >107615949 >107615959 >107616299 >107616360 >107616431 >107616555 >107620165 >107615991 >107616096 >107618869 >107619022 >107619063 >107619050 >107619100 >107615503 >107615540
--TTS model landscape: Cloning, performance, and C++ implementation challenges:
>107614872 >107615270 >107615524 >107615701 >107615905 >107615962 >107616846 >107614972 >107614977 >107614994 >107614999
--Inspecting llama.cpp prompt formatting and macro expansion:
>107616912 >107616936 >107616969
--Feasibility of LoRA distillation and hardware requirements for large model finetuning:
>107618711 >107618896 >107618921 >107618948 >107618956 >107618874 >107618916 >107618920 >107618922 >107618930 >107618944
--Miku (free space):
>107616115 >107616265 >107616330 >107616521 >107616542 >107619354 >107620004 >107622089

►Recent Highlight Posts from the Previous Thread: >>107614834

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Mikulove
>>
where precisely do I put /no_think in sillytavern to disable thinking when using chat completion and GLM air?
>>
>>107623512
iirc its /nothink, and it goes at the end of your prompt
>>
>>107623512
You know you could figure this out with 1-2 minutes of experimentation, right?
>>
>>107623540
At least with text completion I never felt like I needed /nothing. Just prefilling with <think></think> always worked.
>>
>>107623512
The jinja template looks at that sequence and prefills <think></think>, I think. So, if you aren't using the chat completion API, you might need to do both.
>>
Is memory and lorebooks the only thing I should use for rp? Or are other features also helpful? Using koboldcpp btw
>>
>>107623552
I have /nothink in
Start Reply With
prefix
Main
and at the end of my prompt simultaneously and I'm I still getting "Assistant response prefill is incompatible with enable_thinking."
>>
>>107623562
>if you aren't using the chat completion API
the llm boomer
>>
>>107623591
Did they finally deprecate it in lieu of that other new API that's essentially the same thing?
>>
>>107623588
Have you tried unchecking "request reasoning" in the prompt manager?
>>
>>107623604
https://github.com/ggml-org/llama.cpp/issues/14702#issuecomment-3506645678
No one has even started working on it because the one vibe coder that touched it wants to deprecate completions API and ggerganov hasn't told him off yet.
>>
>>107623669
Well, that's fun.
>>
Sam is going to announce something big
>>
>(Also, your mom’s approval would be great. But I’m keeping it at 3B for now.)
>>
>>107623746
Premium bants would be Saltman giving a presentation wearing a war chief necklace of RAM sticks instead of skulls or teeth.
>>
what's the tip-top uncensored local text-to-speech model?
>>
>>107623746
he is coming out of the closet
>>
which one of these can run offline on an old android phone
>>
Which voice model is currently the most expressive? meaning it sounds like someone acting instead of hosting a podcast.

Doesn't have to support voice cloning, just something that's pretty lightweight and fast.
>>
File: portrait.jpg (261 KB, 1000x750)
261 KB
261 KB JPG
>>107623829
That or announcing toss2, the safest one yet. Wasting even more tokens for policy checks.
>>107623887
He already did, pic rel.
>>
>>107623920
damn wtf, I've been got
>>
>>107623920
whos this again
>>
I was right about the last pull fucking up the t/s, cudadev wyd?
https://github.com/ggml-org/llama.cpp/issues/18258
>>
>>107623951
the Goatse guy and his buddy who took the famous photo
>>
>>107623951
sam hyde
>>
>>107624161
he does look like he's hiding something
>>
>>107623960
I still get the same performance as usual. I believe you're like the issue spawning nibba and haven't disabled fit, which you should because it's the dumbest feature to have ever been introduced in lcpp
>>
>>107623960
Tried using fit a bit at first, seemed like it might be a convenient feature. But it just kept crashing with GGML_ASSERT(something) and I disabled it. So never noticed any slowdown.
>>
>>107624379
>which you should because it's the dumbest feature to have ever been introduced in lcpp
It's a good feature to get newcomers up and running with decent defaults without having to either give them an Intro to M, set the parameters for them, or point them to ollama.
>>
>>107624423
if you set your model + ctx fits very tightly (not much vram room left) fit is incapable of doing the right thing and will reduce the amount of layers loaded in gpu
It was so cool when llama.cpp defaulted to ngl 99 behavior, one less flag to care about (moe users would just need to set ncmoe, and users of smaller dense models would have no flag to set)
now we have to add -fit off (or -ngl 99 again, because setting ngl disables fit) to get rid of the nonsense
>>
>>107614341
>"you are not your thoughts, you are the space where thoughts happen" - here. The simplest way to put it. And it took me only a month of talking to AI and thinking about my thoughts to understand it experientially. Understanding that sentence intellectually means nothing so of course a youtube video or me posting here will not change your mind.
This idea is not entirely wrong (I think it's incorrect to say that we are this "space"). However, it's unlikely that you really understand it. You need to experience it as directly as possible, again and again, to truly understand it, to make it intuitive. This is the whole point of Buddhist meditation practices and ethical conduct (bad conduct = agitated and muddy mind, which can't be entirely fixed through meditation). The first steps could indeed be to clarify it using your reason, as you did. You could also do cognitive behavioral therapy (CBT) exercises. But at the end of the day, meditating and removing anything troubling your mind will get you further. LLMs are helpful to understand how to meditate. It's not particularly hard, just don't forget to release any tension in your body when you notice them.
Add meditation to your habits, at least for a while. I know Reddit is a meme, but there is a good introductory guide on r/streamentry.
>>
>>107624584
Oh, and don't expect to figure out "vacuity" in a month. It usually takes years. It's not an issue, you'll still get great benefits way before any deep realization. Perhaps you're already feeling better.
There is an exercise that I like to see how powerful our thoughts are. I look at people and observe how they are suffering. Perhaps they seem anxious because they are surrounded by strangers, or irritated because something isn't "right". Perhaps they are creating an alternate reality and mistakes it for the real world, generating anxiety out of thin air.
>>
What's the latest on audio voice local stuff? Can you make anything good locally yet or is it all fuzzy crap.
>>
>>107624700
vibevoice 7b https://voca.ro/11ATlIwHhG8s
other sizes are good too, use 3-5 steps and 2-3 cfg
>>
>>107624724
Why does it sound like it's coming from a 1930s radio?
>>
>>107624762
the voice is cloned from the famous low quality clip



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.