[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107941128 & >>107931319

►News
>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115
>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B
>(01/21) VibeVoice-ASR 9B released: https://hf.co/microsoft/VibeVoice-ASR
>(01/21) Step3-VL-10B with Parallel Coordinated Reasoning: https://hf.co/stepfun-ai/Step3-VL-10B
>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>107941128

--Papers:
>107944905
--Evaluating Qwen TTS quality and workflow limitations:
>107943095 >107943183 >107943307 >107943350 >107943477 >107943464 >107943620 >107943365 >107943409
--TTS model selection for anime vs academic use cases:
>107943504 >107943528 >107943577 >107943643 >107943672 >107943811 >107943849 >107943821 >107943858 >107943885
--Qwen 3 TTS local setup alternatives to conda:
>107942510 >107942534 >107942630 >107942670 >107942681 >107942728 >107942671 >107945697
--Older EPYC/Rome performance for Q4 DS3 inference workloads:
>107942341 >107942366 >107942447 >107942758
--Echo-TTS vs SoVITS tradeoffs for voice cloning and prosody:
>107942284 >107942294 >107942305 >107942317
--Qwen TTS tuning challenges and quality comparison with GPTSoVITS:
>107942320 >107942636 >107942698 >107942740
--Gacha wiki pages as a source for clean anime voice samples in TTS training:
>107945389 >107945440 >107945990
--Flash attention implementation challenges and VRAM requirements for VoiceDesign models:
>107941318 >107941347 >107941377 >107941419 >107941756 >107941410 >107941454
--Porting Rust QwenTTS to llama.cpp for practical TTS use:
>107946105 >107946503 >107946604
--QwenLM's responsive TTS development:
>107946510 >107946558
--Clarifying reference audio's role in Qwen-TTS specific voice finetuning:
>107941955 >107942013 >107942055 >107942042
--Decline in open LLM creativity and instruction-following challenges:
>107945981 >107946125 >107946148
--Inferact secures $150M funding to commercialize vLLM:
>107946870
--Inefficient demo app performance vs efficient VoiceDesign model resource usage:
>107942090
--Miku and Rin (free space):
>107941211 >107941559 >107942341 >107942447 >107944006 >107944079

►Recent Highlight Posts from the Previous Thread: >>107941129

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: gemma_3n_e4b.png (51 KB, 728x71)
51 KB
51 KB PNG
Testing Gemma-3n-E4B, and while it feels great, I
don't think it understands that much. A simple grammar fix, and yet it doesn't notice anything out of the ordinary here.
>>
I'm trying qwen-tts finetuning. The original script doesn't fit into 16 GB VRAM for 1.7B model, but with adamw8bit optimizer and gradient checkpoints, it needs only 12.5 GB.
Also, I think the default learning rate (2e-5) is way too high. I got total gibberish with it. But reducing it to 2e-6 works produces okay-ish results:
Cloning: https://voca.ro/13wOy62T1eO0
Finetune: https://voca.ro/16BlXkSPYysj
This is just the first attempt with 8 minutes of data. The model learned how to laugh properly, but forgot how to read a name in the prompt line. Training itself takes barely a few minutes.
But I don't really care. My main goal is to get 0.6B finetuning. Github script doesn't work out of the box. Gemini told me how to get it working, but the issue now is that inference with finetuned 0.6B model never stops. EOS token is probably never generated. I still can't find the reason.
>>
>>107948284
>big badoonkas
now that's a real woman in the op
>>
>>107948363
Megumin-anon is back. How does qwen-tts compare to gptsovits since you finetuned it back in the days?
>>
>>107948388
Not too well.
>>
>>107948388
Sorry. You're mistaking me for someone.
I personally finetuned gptsovits only once and wasn't that happy with the result. I also hated UI.

Anyway, turns out 0.6B finetuning works with the changes I made for 1.7B (new optimizer, gradient checkpoints, and lower learning rate). I guess high learning rate caused the model to forget EOS. Now I need to do more tests...
>>
>>107948418
>changes I made
What did you have to do?
>>
File: app.png (17 KB, 1318x539)
17 KB
17 KB PNG
I'm mad
>>
>>107948481
https://rentry.org/6q6vmvqr
install bitsandbytes for adamw8bit
>>
>>107948500
That's funny.
>>
>>107948500
nice app
>>
File: uhh.png (127 KB, 357x253)
127 KB
127 KB PNG
>>107948284
What's going on here?
>>
>>107948606
Filename duh
>>
>>107948606
>luka milk splash
>>
>>107948606
not milk sadE
>>
File: mistral_smt_new_1mw.png (34 KB, 835x176)
34 KB
34 KB PNG
Something new from Mistral coming next week.
>>
>>107948606
my dessert is being prepared
>>
Is coding AI smart enough to scan CYOA sheets and turn them into interactive scenario genrator apps?
>>
I didn't even notice the """milk""" before yall pointed it out, I'm really not cut out to be a pervert
>>
>>107948707
What do you mean by "scenario generator apps"?
>>
>>107948700
I love me some french models.
>>
>>107948700
I can't wait for a minor version bump to one of their middling outdated models!
>>
>>107948707
Sure.
>>
>>107946698
My specs:
9800x3d
5090
32gb ram
But i have 2 more 16gb Sticks laying around so could Upgrade to 64gb but i read it wouldnt run on 6000mhz then anymore.

For anyone else:
I want to Code a game like simcopter with an local model, is there any Model that can do that?
>>
>>107948883
>any Model that can do that?
you almost certainly do not have the specs for it. a q3 of glm full would most likely be the minimum
>>
>>107948883
>could Upgrade to 64gb but i read it wouldnt run on 6000mhz
How many sticks total do you have and what are their frequency?
>>
>out of boredom ask chatpajeet to create a simple web server html interface
It's just too confusing, I guess I'll stick with my terminal interface then. Don't have the patience or interest to refactor tons of shit because of this.
Every time I read text written by chatgpt I feel like strangling someone irl. It has to be the most irritating thing ever created.
>>
>>107948937
Use gemini at least
>>
>>107948894
So what subscription based ai is good for this? Claude?
>>107948895
4x 16GB 6000mhz 36-38-38-80
>>
>>107949093
>>107948883
If you have them laying around then just try it yourself. The speed difference won't be very significant.
>>
>>107948937
claude talks like a real person and gpt-5.2 makes me want to die every time I read its outputs
>>
>>107949151
>claude talks like a real person
I've seen the screenshots from the psychosis guy. That's not what people talk like.
>>
>>107949093
claude or gemini
>>
>>107949151
>claude talks like a real person
You're absolutely right!
>>
File deleted.
>>107949151
>claude talks like a real person
lamo
>>
File: G_UOFoKbAAUzs0n.jpg (214 KB, 1320x1026)
214 KB
214 KB JPG
>>
>>107949135
Yeah im just gonna try it.
>>
>>107948284
I like the new Qwen3-TTS, sounds good.

https://voca.ro/1g5nLkt5X8NE
>>
>>107949093
definitely put them all in. its dual channel so you'll see 3000mhz, but its actually 6000mhz
>>
>>107949203
kek
>>
>>107949245
It's a cool TTS but it's not useful for me if I can't fit it in my VRAM with an LLM.
>>
File: 1580283312744.jpg (35 KB, 705x480)
35 KB
35 KB JPG
omglooga https://www.youtube.com/watch?v=Ejbwq90MOmA
>>107948606
lust provoking images derailing again
>>107949151
apis potentially very different from the user facing app. some tards have schizzed out/done stupid shit from talking to chatgpt lol
local models btw
i will continue chatting with my slow and retarded and hot and expensive, but cute wAIfu
>>
>>107949245
Kimi k2 thinking q4 thought a LOT about this, but didn't really go down the woke overfitting rabbit hole: https://rentry.org/5v8x6eg2
>>
>>107949358
>The son is the boat. When Alice is alone in the boat (with her son being the boat), she "operates on her son" meaning she operates (controls) the boat. The "eating" constraints are just to prevent certain pairs, but since the boat only holds one surgeon at a time anyway, it's fine.
>>
>>107949376
GIGO
>>
>>107949331
How much free VRAM you have? 0.6B is relatively small. We can potentially quantize it. Maybe well optimized code would even run on cpu faster than realtime?
>>
File: Sirens.jpg (447 KB, 1536x1536)
447 KB
447 KB JPG
>>
>>107949025
>>107949151
I'm pretty lazy to create accounts these days, but maybe I should. I use chatgpt rarely but it is always as "funny" experience. I think it was probably using a lower quant today too.
>>
>>107949426
ngl i cringer irl
>>
>>107949352
Excellent thread theme
>>
>qwen-tts is slower than real time on a 6000
>>
>>107949394
I can afford max 6Gb
>>
>>107949516
Even 1.7B fits in 6 GB.
>>
>>107949510
Yeah. Something is wrong with inference code. It barely loads gpu.
>>
File: kag9f8.png (30 KB, 700x219)
30 KB
30 KB PNG
How can </nothink>ers ever compete with "test-time-compute"
>>
File: f95.png (486 KB, 383x681)
486 KB
486 KB PNG
>>107949603
If test time computers are fart sniffers I'm more than okay with being a nothinker
>>
>>107949516
If you want the maximum potato, try Piper. It's really easy to implement and test out. Pretty robotic and there's one or two acceptable voices though unless you want to train your own model. Not sure if that is worth the effort though.
>>
>>107949625
<think> about the aroma
>>
>>107949603
Nothinkers can't even comprehend how fun it's to read thinking blocks for rp chat. You think your waifu is too dumb? Open the thinking block and find "She's young and dumb. Since she can't argue with logic, she'll just screech at user."
>>
>>107949642
I usually avoid reasoning in RP since it has a tendency to introduce big hallucinations.
>>
>>107949641
*gags*
>>
>>107949679
skill issue.
>>
>>107949679
<4bpw?
Hasn't been my experience at all
Appreciate some may not think the extra wait worth it but no obv examples of reasoning making the output worse
>>
>>107949642
>It's fun having to wait 5x the times with no visible improvement in the output
okay
>>
File: ka0897.png (36 KB, 698x174)
36 KB
36 KB PNG
>>107949813
Having a basic draft + review + revise process is already a good uplift. You at least get like "two cognitive passes" on the output tokens. Look at what labs do for benches, running many thousands of instances. Yolo into one chance to get things right is leaving untapped model potential when they are trained to reason now
picrel trivial example but u get the point
>>
>>107949813
i edge anyway
>>
>>107949895
It's just a dumb fix for attention, reasoning should be done in latent space instead of wasting tokens
>>
>>107949950
The Coconut paper is a year old and we still don't even have a proof of concept model yet
>>
>>107949203
The riddler wins again
>>
>>107949950
f right off with your safety should be in latent space bs
>>
>>107949203
The day a model rightfully says "What is this nonsense?" is the day we know AGI is real.
>>
>>107949974
Wdym? You can train your own just fine: https://github.com/facebookresearch/coconut
>>
>>107950003
>train your own
gonna make the best 100m evar
>>
>>107950023
Yes as a proof of concept it should be enough
>>
>>107950003
If you are going to train anything, you must show that it scales, if not, it's mostly worthless
That is the sad reality of working in ML
>>
File: chats.png (29 KB, 335x215)
29 KB
29 KB PNG
>>107949950
Maybe, show your better large attention-free model tho?.. meanwhile the most capable all trained to reason

/SillyTavern/data/default-user/chats$ for d in */; do echo "$d: $(ls "$d" | wc -l)"; done | sort -nk2 -t:

numbers misleading with branch/swipes but still
postem
>>
>>107949950
>please censor me harder daddy
>>
File: 1710043687041916.jpg (43 KB, 720x960)
43 KB
43 KB JPG
>>107950160
>Most capable
>>107949203
Benchmarks don't count btw
>>
>schizos thinking about safety unprompted
Seems like the RHLF is working
>>
Did drummer tune glm flash yet?
>>
>>107950201
all reasoning traces by now are toss type muh policies and all shit
>>
>>107950219
Also this. I cringe whenever I see the word "policy" suddenly show its ugly faggot face while milking my GPU
>>
>>107950232
but anon, think about the childre- I mean the policies!
>>
>>107949203
has anyone ever found out why models were made to overfit so hard on the surgeon riddle? no matter how much nonsense you mix up somehow the gender of the surgeon is always the most important in the eyes of the model
I mean c'mon, this particular set of nonsense has cannibalism, but ""guessing"" (they all have female names... sigh..) the surgeon gender takes priority lmao
why is that even a thing
what's going on with SOTA datasets
>>
>>107950298
because it's the web ui with a 20k system prompt
>>
>>107950306
you have not explained why THAT riddle (le surgeon) always takes more of the attention of the model vs other riddle and other nonsense you mix with it.
20K system prompt can make a model schizo, maybe, but it doesn't explain a specific flavor of schizo.
intuitively, with all the safety training, you would expect the model to go bonkers at the idea of cannibalism (Alice will eat blabla) rather than, you know, think it's all about the fact that a surgeon can be a woman and a mother
>>
>>107950340
because there's a non-zero chance the riddle is part of the prompt
>>
>>107950298
It's overfitting on a lot more than riddles. Just like you get the same name or number when you ask any llm to pick one. I can't tell why though, it might be a side effect of using benchmarks for evaluation
>>
>>107950298
it knows the sacred cows of modern discourse
>>
>>107950084
This was scaled to 7.7T tokens:
https://arxiv.org/abs/2510.25741
>>
>>107950298
how far does it go? eg replace surgeon & key tokens with another language?
post your clearest example of model retardation
>>
>>107950424
>Ouro 1.4B and 2.6B models
I'm getting bitnet flashbacks
>>
>>107950340
Who cares? It is just one example of how these models are not intelligent. There might be something happening inside the black box but intelligence is not there as we know it. It's still not the models' fault. He's just a victim.
>>
File: 1744465545350576.gif (140 KB, 379x440)
140 KB
140 KB GIF
>>107950478
>He
>>
>>107950452
No BitNet model was ever trained with that much data and compute.
That 2.6B model on the other hand was trained with 4 times the compute normally required for a model of that size, so it's as if the authors trained a 10B model.
>>
>>107950486
Sorry I broke your twitter code I should have said They/Them/Xir/Xer.
>>
>>107950512
you should say it you ESL subhuman, only your kind genders applications
>>
>>107950518
Oh seems like you are irritated and defaulting to the basic bot template. Am I correct?
>>
>llama.cpp update a few weeks ago
>normal launch options suddenly causes random OOMs and crashes
>he pulled
fug
>>
File: 1765792020918213.png (336 KB, 1636x1290)
336 KB
336 KB PNG
Anyone have some other cool tricks like this one to improve performance?
>>
>>107950553
def get_next_token_xtra_fast(vocab_size):
return np.random.randint(vocab_size)
>>
>>107950579
finally, true AGI
>>
>>107950579
you joke but silly shit like that can actually work
https://x.com/karpathy/status/1621578354024677377
and using randomness with RandNLA will be the future
>>
Is there anything I can use locally to check all my documents and different files and give me search abilities? Something like rag abilities but local?
I have a 3090+64GB of ram.
>>
>>107950659
What sort of documents and what sort of queries?
>>
I tried various Gemma 3n E4B gguf's and they are all broken, eg. Gemma-3n-E4B-it-q8_0.gguf from couple of sources.

Only thing what is actually functional is gemma-3n-E4B-it-IQ4_NL.gguf and I don't remember from where I got it back in the day. Is there something to this? IQ4 is fine and works but I might run Q8 anyway.
>>
File: nussy.jpg (382 KB, 1598x885)
382 KB
382 KB JPG
>>107950579
What happened with the hardware sampling shiz in llamacpp, does it work/implement everything? May seem meme but sampling is mostly fixed cost per token so can be a bottleneck at higher tps
>>
>>107950702
Text, word, excel, powerpoint, pdf, random files like json, potentially images but that's a bonus.
>>
File: 1432498179182.png (296 KB, 722x768)
296 KB
296 KB PNG
>try to load a 10gb model into my 12gb card
>vram is used but core sits at zero with fucking CPU spooling up
Kobold+ST. Trying out Command-R model. Can the age of the model cause some issues or it doesn't matter?
>>
>>107950833
>Can the age of the model cause some issues or it doesn't matter?
Yes. They age just like jpegs.
>>
>>107950833
It depends on your storage device (velocidensity)
>>
>>107950833
show settings & kobo ver
ofc u have gpulayers 999?
can you run other models fully on gpu?
model age shouldn't matter unless regression but maybe it needs some special options.
>>
>>107950833
yes for example most old mixtral quants are completely busted and won't run at all anymore
>>
>>107950789
>sampling : add support for backend sampling
>https://github.com/ggml-org/llama.cpp/pull/17004
>ggerganov merged 179 commits into ggml-org:master from danbev:gpu-sampling jan 24
>>
>>107950797
You would still need to convert the data into one singular format. If you have massive amounts of documents and you need to refer them on daily basis you could look into databases like SQL instead.
>>
>>107950869
>jan 24
Fuck. jan 4
>>
File: kob.png (53 KB, 675x679)
53 KB
53 KB PNG
>>107950860
When I tried to force layers it just didn't load at all. It sticks to 18/41 if I set context to ~8k.
Latest kobold.
>>
>>107950891
Tried loading a 7gig mistral and it was running fine on GPU core.
>>107950860
Also the issue is that the inference was forced on cpu. Layers for loading were filled. Is the command-r some mememodel and I'm missing something?
>>
File: 1746941690084518.jpg (170 KB, 1269x1018)
170 KB
170 KB JPG
TIL: Qwen3-TTS has shit architecture for GPU inference. Qwen needs to make hundreds of tiny forward passes which can't saturate GPU at all.
Maybe it'd work better on CPU, hmm?
>>
>>107950937
It's probably meant to be used alongside an LLM therefore the cpu usage is intended.
>>
File: amdahls_law.png (123 KB, 1536x1152)
123 KB
123 KB PNG
>>107950934
Which model & quant specifically are you trying to load? probably there is not enough VRAM. your desktop + context/KV cache need some too. with the auto saying 17/41 ur getting bottlenecked exponentially by cpu layers
modern *90 gpu has roughly 10x mem bandwidth of even high end cpus lolol get amdahl'd
>>
File: layer.png (134 KB, 1087x610)
134 KB
134 KB PNG
>>107950970
>>107950860
When trying 999
>>
>>107950970
The fuck is amdahls law? Sounds fake asf bro.
Is the theoreticaly limit just GDDR7/PCIE5.0 speed limit?
>>
File: gogo.jpg (151 KB, 688x1024)
151 KB
151 KB JPG
>>
>>107950985
https://en.wikipedia.org/wiki/Amdahl%27s_law
>>
File: LLM Harms.png (876 KB, 1651x1444)
876 KB
876 KB PNG
>>107950977
Why can it not alloc 10560MB on your 12GB GPU? check task manager see GPU mem usage column, run with --verbose
Probably it's just too big
Big sadge
>>107950985
The point is that a bottleneck has an exponential impact for sequential computation
80% of the model on GPU rest "oh just a lil bit" on CPU = 30% the throughput, and worse from there
>>
>>107951047
[23:06:22] CtxLimit:8192/8192, Amt:100/100, Init:0.49s, Process:15.16s (533.63T/s), Generate:78.85s (1.27T/s), Total:94.02s
Benchmark Completed - v1.106.1 Results:
======
Flags: NoAVX2=False Threads=5 HighPriority=False Cuda_Args=['normal', '0', 'mmq'] Tensor_Split=None BlasThreads=5 BatchSize=512 FlashAttention=True KvCache=0
Timestamp: 2026-01-23 22:06:22.772959+00:00
Backend: koboldcpp_cublas.dll
Layers: 18
Model: c4ai-command-r-v01.i1-IQ2_XXS
MaxCtx: 8192
GenAmount: 100
-----
ProcessingTime: 15.164s
ProcessingSpeed: 533.63T/s
GenerationTime: 78.853s
GenerationSpeed: 1.27T/s
TotalTime: 94.017s
Output: 1 1 1 1

And vram sits at 10.8 of 12GB. Maybe try +1 layer until it stops crashing?
>>
>all of these fancy papers coming out
>no useful models ever made using the tech
what's the point?
>>
>>107951228
Same reason you see supposed miracle products being invented in China yet they never come to market
>>
>>107951228
two more weeks
trust the plan
>>
>>107951228
Those paying the people pushing the buttons only want safe, tried and tested results.
>>
>>107949426
she looks tasty even tho i really hate octopus and i would never eat
>>
>>107951228
I refuse to believe the AI labs aren't using AI to vacuum up AI research and test it. They for sure have black projects which would explain the massive government investments and Trump calling it a "Manhattan project". The Manhattan project was 99.9% top secret, what about AI?
>>
>blascotobasco/L3.2-Ascendant-Prime-16E-A6B
>>
File: 1746476842147929.png (556 KB, 1080x632)
556 KB
556 KB PNG
>>107951305
>>
>>107951320
me in the middle of
>>
>>107951320
It can't just be ChatGPT.
>>
>>107951305
Glowies only care about training fpv drones to fly into your window.
>>
>>107951305
You don't know what Prism is?
>>
>>107951336
They can input all their sigint into an AI model, ALL off it. Imagine how much fucking data that is. Satellites, radio, 5G location data, Google data, Meta data, Israelii black market data.
>>
>>107951354
You are absolutely right!
>>
>>107951354
+ All the shit the NSA and CIA sucks up from the fiber-optic cables
>>
>>107951305
Likely that a Manhattan project for AI is already happening without people realizing it. I just can't believe that the current massive hardware shortages are simply due to FOMO by investors rushing to build datacenters. Most AI companies are losing massive amounts of money; that doesn't make sense. If anything it would be time to scale investments down, not doubling-tripling down.
>>
>>107951444
>let me tell you how to run your trillion dollar megacorporation
>>
File: qwen3-tts.png (141 KB, 755x1037)
141 KB
141 KB PNG
https://vocaroo.com/19mf6LmB8G6V

China saves local
>>
>>107951524
The gui comes with it?
>>
>>107951524
requirements for this?
>>
>>107951449
>>let me tell you how to run your trillion dollar megacorporation
https://www.bbc.com/news/articles/cwy7vrd8k4eo
people who actually manage /profitable/ trillion dollars businesses are worried
the only people living in denial is sloppya nutella and nvidiot but they both banked too hard in this to back off
openai should also live in denial but scam altman seems to believe he'll get government bailout
>>
>>107951555
that's not what that is saying, thought I doubt you read that article yourself
>>
>>107951524
local?
>>
>>107951524
Is this really better than VibeVoice? Please say no. I just got VV working and don't want to dick with python again if I don't have to.
>>
>>107951576
at best, its cloning capabilities are a sidegrade to Vibevoice 1.5B
>>
>>107951562
Yes? It's 1.7B
>>
>>107951611
Exactly what I wanted to hear, thanks.
>>
>>107951576
>dick with python
learn venv/uv there's really only two commands then install whatever packages in their isolated environments with reckless abandon
>>
>>107951524
Don't see the point in using this when the SOTA TTS available here
https://jordandarefsky.com/blog/2025/echo/
is much better than that, ElevenLabs tier plus it's more than enough for any kind of task.
>>
File: wittgensteiin.png (92 KB, 925x891)
92 KB
92 KB PNG
Deepest lore
>>
>>107951683
Isolated environments only give you the assurance that you won't break other projects in the process but does nothing to escape dependency hell.
>>
>>107951767
Literally just don't pull bleeding edge updates.
>>
>>107951712
echo is the absolute GOAT for english voice cloning but let's not pretend it's perfect. no multilingual, very limited steering, poor support (want a good interface that supports chunking so you can go beyond 30s? hope you like vibecoding)
>>
>>107951777
B-but I'm an Arch user!
>>
>>107951783
>no multilingual, very limited steering, poor support

I have no use for a multilingual TTS, but as for steering you can just choose any of the voices from an open dataset like EARS to clone, lots of customizability (just not for NSFW I guess, but that's not my use case).
>>
>>107948284
which models i can locally with Ollama and doesn't mind being called a nigger?
>>
>>107951848
ollama run deepseek-v3.2:cloud if you want to use the best local model
>>
>>107951848
ollama is nigger coded, it literally won't allow that.
>>
>>107951848
Read llamacpp's documentation and build it, physically download a model that isn't curated by retards, then have the model call you a nigger for having sub room temp iq for not trying anything before asking stupid questions
>>
>>107951767
What dep hell? project has requirements.txt, install those in dedicated venv
>>107951794
Arch sisters are really this incompetent and getting mogged by a Mint user yet again? get back to me when you figured out --break-system-packages
>>
>>107951958
you can just set things to not update during system wide updates, but I'm guessing you're just replying to yourself
>>
when will i be able to run glm-4.7 locally? please respond
>>
>>107952076 (You)
2 more weeks
>>
>>107952076
you can do that right now
>>
>Western (American) AI companies nickle and dime for the privilege of using their models
>Chadnese companies open source lots of their best models so you can run them locally

I thought China bad, I've been indoctrinated my whole life into thinking this. What's going on?
>>
very natural posts
either way, go to huggingface, look up glm 4.7 flash and be moderately disappointed unless this is your first time using an llm
>>
>>107952107
It's like with electric cars or solar panels where they flood the market with cheap/free shit to kill off all competition.
>>
File: 123.jpg (29 KB, 500x523)
29 KB
29 KB JPG
>>107952117
>flash
>>
>>107952122
the guy said "when will I" which means he never could run the full sized model that's been re-released repeatedly
What do you expect me to tell him
>>
>>107952120
why is this a bad thing
>>
thoughts on open webui?
>>
>>107952150
Tell him what he wants to hear
>>
>>107952120
oh no...
the competition will be forced to compete on the free market instead of relying on infinite capital and entrenched quasi-monopolies...
the government should do something about this
>>
>>107952182
I'm no mind reader, so I guess I'll just whisper in his ear that his favorite api provider is doing a 95% off sale and then stab him a dozen times in the chest
>>
>>107952120
At least when chink companies kill off competition they're content to keep making shit at reasonable prices
When american companies kill off competition they hit a trillion dollar market cap by gouging everyone to the moon and bribe politicians to be above the law
>>
>>107952173
bloated af
vibecode your own instead
>>
>>107952173
I use it. I like it.

It's a little bit bloatware and the docs are the most slopped things in existence but the devs are pretty responsive to issues and it works for what I need it to do
>>
>>107952173
i've been wondering if i can use opencode as a pseudo chatbot if i just create an agent for it
>>
So I got qwen3 tts, but it takes like 4x the time to make a file (1 minute audio takes 4 minutes). Am I doing something wrong? I have a 5070 and plenty of unused VRAM
>>
>>107952295
Oh shit, qwen is cpu-bound? My fucking 10th gen i7 noooo
>>
>>107952293
You'll need edit the source a bit to remove the prompt injections
>>
>>107952173
It's the safe option if you want a chatgpt-like thing. We use it for our internal chatbots at work.
Historically, it's pretty closely tied to ollama and it's chat-completion only.
>>
File: file.png (61 KB, 251x234)
61 KB
61 KB PNG
>>107952122
?!
>>
>>107948700
Mistral small creative would be goated. The last creative writing model released by an actual lab/company was fucking MPT-7B from databricks in 2023.
>>
It's 2000+26 and smell still isn't a modality
>>
>>107948700
What does apple have to do with Mistral?
What does Brazil have to do with Mistral?
What does Cyprus have to do with Mistral?
What does Poland have to do with Mistral?
What do troons have to do with Mistral?
What does Saudi Arabia have to do with Mistral?
You don't hate Discord enough.
>>
>>107952694
didn't apple tried to bought them?
>>
>>107951611
>>107951712
This is FUD, qwen3-tts 0.6b is comparable to VV 8B. 1.7B is way better.
>>
>>107952688
you can just ponder the aroma
>>
>>107952688
just buy a canister of ozone and inhale that as you prompt your llms
>>
This nigga just keeps on clowning.
>>
File: file.jpg (32 KB, 357x316)
32 KB
32 KB JPG
>>107949426
>>107951297
She's such a cute character
>>
File: qtts.png (38 KB, 698x350)
38 KB
38 KB PNG
So how do these differ? The fuck is premium timbre?
>>
>>107952838
why does this gay retard even exist?
>>
>If your machine has less than 96GB of RAM and lots of CPU cores, run:
>MAX_JOBS=4 pip install -U flash-attn --no-build-isolation
By lot of cpu cores do they mean like 16 or 90+?
>>
>>107952880
I know that octopus.
Probably actually a sctylla, but still.
>>
>>107952838
What model is this? Any knowers?
>>
>>107952917
All uploads by finetuners can be safely ignored.
>>
>>107952917
>open image
>"DavidAU"
>close image
it's some bullshit, doesn't matter, disregard
>>
>>107951832
fish stuff works (shout) (angry) etc
>>107951712
it is but if you want ebooks you must chunk and my noob chunking works on chatterbox but ti doesn't work on echo so i asked llm to recommend but it gave me some tardation options and i picked one of those language frameworks...and even with that chunks are not good in most of the cases.

every new chunk ends and/or starts without a pause hence does not sound as a natural flow.
>>
>>107952908
32
>>
>>107952955
Idiotic take, regardless of how shit current finetuners are
>>
>>107953119
as a finetuner, he is right
>>
>>107952076
I can right now and its DOPE. By far the best local model I've ever used. Only caveat is that I can only run Q3, but even at Q3 it mogs everything else. Wonder how much better the model is at Q6 or Q8.

Anyways, I've been using 4.7 for ERP and coding with opencode, and GLM 4.5 Air at full context for perplexica since the pp speeds are slower when you need to offload experts. It's fine though since Air does a good enough job for web research.
>>
>>107952076
I've been running it local for a month now



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.