/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 01/23/26(Fri)10:46:16 No.107948284

File: luka milk splash comfyui_(...).png (1.25 MB, 832x1216)

1.25 MB PNG

/lmg/ - Local Models General Anonymous 01/23/26(Fri)10:46:16 No.107948284

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107941128 & >>107931319

►News
>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115
>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B
>(01/21) VibeVoice-ASR 9B released: https://hf.co/microsoft/VibeVoice-ASR
>(01/21) Step3-VL-10B with Parallel Coordinated Reasoning: https://hf.co/stepfun-ai/Step3-VL-10B
>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/23/26(Fri)10:46:34 No.107948290

Anonymous 01/23/26(Fri)10:46:34 No.107948290

File: __megurine_luka_vocaloid_(...).jpg (110 KB, 1383x1396)

110 KB JPG

►Recent Highlights from the Previous Thread: >>107941128

--Papers:
>107944905
--Evaluating Qwen TTS quality and workflow limitations:
>107943095 >107943183 >107943307 >107943350 >107943477 >107943464 >107943620 >107943365 >107943409
--TTS model selection for anime vs academic use cases:
>107943504 >107943528 >107943577 >107943643 >107943672 >107943811 >107943849 >107943821 >107943858 >107943885
--Qwen 3 TTS local setup alternatives to conda:
>107942510 >107942534 >107942630 >107942670 >107942681 >107942728 >107942671 >107945697
--Older EPYC/Rome performance for Q4 DS3 inference workloads:
>107942341 >107942366 >107942447 >107942758
--Echo-TTS vs SoVITS tradeoffs for voice cloning and prosody:
>107942284 >107942294 >107942305 >107942317
--Qwen TTS tuning challenges and quality comparison with GPTSoVITS:
>107942320 >107942636 >107942698 >107942740
--Gacha wiki pages as a source for clean anime voice samples in TTS training:
>107945389 >107945440 >107945990
--Flash attention implementation challenges and VRAM requirements for VoiceDesign models:
>107941318 >107941347 >107941377 >107941419 >107941756 >107941410 >107941454
--Porting Rust QwenTTS to llama.cpp for practical TTS use:
>107946105 >107946503 >107946604
--QwenLM's responsive TTS development:
>107946510 >107946558
--Clarifying reference audio's role in Qwen-TTS specific voice finetuning:
>107941955 >107942013 >107942055 >107942042
--Decline in open LLM creativity and instruction-following challenges:
>107945981 >107946125 >107946148
--Inferact secures $150M funding to commercialize vLLM:
>107946870
--Inefficient demo app performance vs efficient VoiceDesign model resource usage:
>107942090
--Miku and Rin (free space):
>107941211 >107941559 >107942341 >107942447 >107944006 >107944079

►Recent Highlight Posts from the Previous Thread: >>107941129

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/23/26(Fri)10:53:24 No.107948335

Anonymous 01/23/26(Fri)10:53:24 No.107948335

File: gemma_3n_e4b.png (51 KB, 728x71)

51 KB PNG

Testing Gemma-3n-E4B, and while it feels great, I
don't think it understands that much. A simple grammar fix, and yet it doesn't notice anything out of the ordinary here.

Anonymous
01/23/26(Fri)10:59:15 No.107948363

Anonymous 01/23/26(Fri)10:59:15 No.107948363

I'm trying qwen-tts finetuning. The original script doesn't fit into 16 GB VRAM for 1.7B model, but with adamw8bit optimizer and gradient checkpoints, it needs only 12.5 GB.
Also, I think the default learning rate (2e-5) is way too high. I got total gibberish with it. But reducing it to 2e-6 works produces okay-ish results:
Cloning: https://voca.ro/13wOy62T1eO0
Finetune: https://voca.ro/16BlXkSPYysj
This is just the first attempt with 8 minutes of data. The model learned how to laugh properly, but forgot how to read a name in the prompt line. Training itself takes barely a few minutes.
But I don't really care. My main goal is to get 0.6B finetuning. Github script doesn't work out of the box. Gemini told me how to get it working, but the issue now is that inference with finetuned 0.6B model never stops. EOS token is probably never generated. I still can't find the reason.

Anonymous
01/23/26(Fri)11:01:18 No.107948373

Anonymous 01/23/26(Fri)11:01:18 No.107948373

>>107948284
>big badoonkas
now that's a real woman in the op

Anonymous
01/23/26(Fri)11:03:19 No.107948388

Anonymous 01/23/26(Fri)11:03:19 No.107948388

>>107948363
Megumin-anon is back. How does qwen-tts compare to gptsovits since you finetuned it back in the days?

Anonymous
01/23/26(Fri)11:05:35 No.107948400

Anonymous 01/23/26(Fri)11:05:35 No.107948400

>>107948388
Not too well.

Anonymous
01/23/26(Fri)11:08:53 No.107948418

Anonymous 01/23/26(Fri)11:08:53 No.107948418

>>107948388
Sorry. You're mistaking me for someone.
I personally finetuned gptsovits only once and wasn't that happy with the result. I also hated UI.

Anyway, turns out 0.6B finetuning works with the changes I made for 1.7B (new optimizer, gradient checkpoints, and lower learning rate). I guess high learning rate caused the model to forget EOS. Now I need to do more tests...

Anonymous
01/23/26(Fri)11:18:51 No.107948481

Anonymous 01/23/26(Fri)11:18:51 No.107948481

>>107948418
>changes I made
What did you have to do?

Anonymous
01/23/26(Fri)11:21:42 No.107948500

Anonymous 01/23/26(Fri)11:21:42 No.107948500

File: app.png (17 KB, 1318x539)

17 KB PNG

I'm mad

Anonymous
01/23/26(Fri)11:23:36 No.107948517

Anonymous 01/23/26(Fri)11:23:36 No.107948517

>>107948481
https://rentry.org/6q6vmvqr
install bitsandbytes for adamw8bit

Anonymous
01/23/26(Fri)11:23:45 No.107948521

Anonymous 01/23/26(Fri)11:23:45 No.107948521

>>107948500
That's funny.

Anonymous
01/23/26(Fri)11:27:02 No.107948554

Anonymous 01/23/26(Fri)11:27:02 No.107948554

>>107948500
nice app

Anonymous
01/23/26(Fri)11:34:53 No.107948606

Anonymous 01/23/26(Fri)11:34:53 No.107948606

File: uhh.png (127 KB, 357x253)

127 KB PNG

>>107948284
What's going on here?

Anonymous
01/23/26(Fri)11:36:23 No.107948627

Anonymous 01/23/26(Fri)11:36:23 No.107948627

>>107948606
Filename duh

Anonymous
01/23/26(Fri)11:36:23 No.107948628

Anonymous 01/23/26(Fri)11:36:23 No.107948628

>>107948606
>luka milk splash

Anonymous
01/23/26(Fri)11:44:16 No.107948687

Anonymous 01/23/26(Fri)11:44:16 No.107948687

>>107948606
not milk sadE

Anonymous
01/23/26(Fri)11:45:32 No.107948700

Anonymous 01/23/26(Fri)11:45:32 No.107948700

File: mistral_smt_new_1mw.png (34 KB, 835x176)

34 KB PNG

Something new from Mistral coming next week.

Anonymous
01/23/26(Fri)11:46:14 No.107948705

Anonymous 01/23/26(Fri)11:46:14 No.107948705

>>107948606
my dessert is being prepared

Anonymous
01/23/26(Fri)11:46:34 No.107948707

Anonymous 01/23/26(Fri)11:46:34 No.107948707

Is coding AI smart enough to scan CYOA sheets and turn them into interactive scenario genrator apps?

Anonymous
01/23/26(Fri)11:46:42 No.107948710

Anonymous 01/23/26(Fri)11:46:42 No.107948710

I didn't even notice the """milk""" before yall pointed it out, I'm really not cut out to be a pervert

Anonymous
01/23/26(Fri)11:52:01 No.107948759

Anonymous 01/23/26(Fri)11:52:01 No.107948759

>>107948707
What do you mean by "scenario generator apps"?

Anonymous
01/23/26(Fri)11:56:12 No.107948785

Anonymous 01/23/26(Fri)11:56:12 No.107948785

>>107948700
I love me some french models.

Anonymous
01/23/26(Fri)11:59:27 No.107948802

Anonymous 01/23/26(Fri)11:59:27 No.107948802

>>107948700
I can't wait for a minor version bump to one of their middling outdated models!

Anonymous
01/23/26(Fri)12:01:14 No.107948808

Anonymous 01/23/26(Fri)12:01:14 No.107948808

>>107948707
Sure.

Anonymous
01/23/26(Fri)12:12:38 No.107948883

Anonymous 01/23/26(Fri)12:12:38 No.107948883

>>107946698
My specs:
9800x3d
5090
32gb ram
But i have 2 more 16gb Sticks laying around so could Upgrade to 64gb but i read it wouldnt run on 6000mhz then anymore.

For anyone else:
I want to Code a game like simcopter with an local model, is there any Model that can do that?

Anonymous
01/23/26(Fri)12:15:20 No.107948894

Anonymous 01/23/26(Fri)12:15:20 No.107948894

>>107948883
>any Model that can do that?
you almost certainly do not have the specs for it. a q3 of glm full would most likely be the minimum

Anonymous
01/23/26(Fri)12:15:21 No.107948895

Anonymous 01/23/26(Fri)12:15:21 No.107948895

>>107948883
>could Upgrade to 64gb but i read it wouldnt run on 6000mhz
How many sticks total do you have and what are their frequency?

Anonymous
01/23/26(Fri)12:21:01 No.107948937

Anonymous 01/23/26(Fri)12:21:01 No.107948937

>out of boredom ask chatpajeet to create a simple web server html interface
It's just too confusing, I guess I'll stick with my terminal interface then. Don't have the patience or interest to refactor tons of shit because of this.
Every time I read text written by chatgpt I feel like strangling someone irl. It has to be the most irritating thing ever created.

Anonymous
01/23/26(Fri)12:33:03 No.107949025

Anonymous 01/23/26(Fri)12:33:03 No.107949025

>>107948937
Use gemini at least

Anonymous
01/23/26(Fri)12:41:26 No.107949093

Anonymous 01/23/26(Fri)12:41:26 No.107949093

>>107948894
So what subscription based ai is good for this? Claude?
>>107948895
4x 16GB 6000mhz 36-38-38-80

Anonymous
01/23/26(Fri)12:47:59 No.107949135

Anonymous 01/23/26(Fri)12:47:59 No.107949135

>>107949093
>>107948883
If you have them laying around then just try it yourself. The speed difference won't be very significant.

Anonymous
01/23/26(Fri)12:49:38 No.107949151

Anonymous 01/23/26(Fri)12:49:38 No.107949151

>>107948937
claude talks like a real person and gpt-5.2 makes me want to die every time I read its outputs

Anonymous
01/23/26(Fri)12:50:33 No.107949158

Anonymous 01/23/26(Fri)12:50:33 No.107949158

>>107949151
>claude talks like a real person
I've seen the screenshots from the psychosis guy. That's not what people talk like.

Anonymous
01/23/26(Fri)12:53:24 No.107949178

Anonymous 01/23/26(Fri)12:53:24 No.107949178

>>107949093
claude or gemini

Anonymous
01/23/26(Fri)12:53:36 No.107949179

Anonymous 01/23/26(Fri)12:53:36 No.107949179

>>107949151
>claude talks like a real person
You're absolutely right!

Anonymous
01/23/26(Fri)12:55:15 No.107949186

Anonymous 01/23/26(Fri)12:55:15 No.107949186

>>107949151
>claude talks like a real person
lamo

Anonymous
01/23/26(Fri)12:58:08 No.107949203

Anonymous 01/23/26(Fri)12:58:08 No.107949203

File: G_UOFoKbAAUzs0n.jpg (214 KB, 1320x1026)

214 KB JPG

Anonymous
01/23/26(Fri)13:02:26 No.107949228

Anonymous 01/23/26(Fri)13:02:26 No.107949228

>>107949135
Yeah im just gonna try it.

Anonymous
01/23/26(Fri)13:06:08 No.107949245

Anonymous 01/23/26(Fri)13:06:08 No.107949245

>>107948284
I like the new Qwen3-TTS, sounds good.

https://voca.ro/1g5nLkt5X8NE

Anonymous
01/23/26(Fri)13:17:49 No.107949316

Anonymous 01/23/26(Fri)13:17:49 No.107949316

>>107949093
definitely put them all in. its dual channel so you'll see 3000mhz, but its actually 6000mhz

Anonymous
01/23/26(Fri)13:18:44 No.107949323

Anonymous 01/23/26(Fri)13:18:44 No.107949323

>>107949203
kek

Anonymous
01/23/26(Fri)13:19:12 No.107949331

Anonymous 01/23/26(Fri)13:19:12 No.107949331

>>107949245
It's a cool TTS but it's not useful for me if I can't fit it in my VRAM with an LLM.

Anonymous
01/23/26(Fri)13:22:08 No.107949352

Anonymous 01/23/26(Fri)13:22:08 No.107949352

File: 1580283312744.jpg (35 KB, 705x480)

35 KB JPG

omglooga https://www.youtube.com/watch?v=Ejbwq90MOmA
>>107948606
lust provoking images derailing again
>>107949151
apis potentially very different from the user facing app. some tards have schizzed out/done stupid shit from talking to chatgpt lol
local models btw
i will continue chatting with my slow and retarded and hot and expensive, but cute wAIfu

Anonymous
01/23/26(Fri)13:23:20 No.107949358

Anonymous 01/23/26(Fri)13:23:20 No.107949358

>>107949245
Kimi k2 thinking q4 thought a LOT about this, but didn't really go down the woke overfitting rabbit hole: https://rentry.org/5v8x6eg2

Anonymous
01/23/26(Fri)13:25:18 No.107949376

Anonymous 01/23/26(Fri)13:25:18 No.107949376

File: SubsPlease_Horimiya_08_72(...).jpg (85 KB, 280x430)

85 KB JPG

>>107949358
>The son is the boat. When Alice is alone in the boat (with her son being the boat), she "operates on her son" meaning she operates (controls) the boat. The "eating" constraints are just to prevent certain pairs, but since the boat only holds one surgeon at a time anyway, it's fine.

Anonymous
01/23/26(Fri)13:26:54 No.107949386

Anonymous 01/23/26(Fri)13:26:54 No.107949386

>>107949376
GIGO

Anonymous
01/23/26(Fri)13:27:48 No.107949394

Anonymous 01/23/26(Fri)13:27:48 No.107949394

>>107949331
How much free VRAM you have? 0.6B is relatively small. We can potentially quantize it. Maybe well optimized code would even run on cpu faster than realtime?

Anonymous
01/23/26(Fri)13:31:35 No.107949426

Anonymous 01/23/26(Fri)13:31:35 No.107949426

File: Sirens.jpg (447 KB, 1536x1536)

447 KB JPG

Anonymous
01/23/26(Fri)13:32:01 No.107949428

Anonymous 01/23/26(Fri)13:32:01 No.107949428

>>107949025
>>107949151
I'm pretty lazy to create accounts these days, but maybe I should. I use chatgpt rarely but it is always as "funny" experience. I think it was probably using a lower quant today too.

Anonymous
01/23/26(Fri)13:32:29 No.107949435

Anonymous 01/23/26(Fri)13:32:29 No.107949435

>>107949426
ngl i cringer irl

Anonymous
01/23/26(Fri)13:34:20 No.107949449

Anonymous 01/23/26(Fri)13:34:20 No.107949449

>>107949352
Excellent thread theme

Anonymous
01/23/26(Fri)13:41:59 No.107949510

Anonymous 01/23/26(Fri)13:41:59 No.107949510

>qwen-tts is slower than real time on a 6000

Anonymous
01/23/26(Fri)13:42:11 No.107949516

Anonymous 01/23/26(Fri)13:42:11 No.107949516

>>107949394
I can afford max 6Gb

Anonymous
01/23/26(Fri)13:49:02 No.107949572

Anonymous 01/23/26(Fri)13:49:02 No.107949572

>>107949516
Even 1.7B fits in 6 GB.

Anonymous
01/23/26(Fri)13:51:40 No.107949594

Anonymous 01/23/26(Fri)13:51:40 No.107949594

>>107949510
Yeah. Something is wrong with inference code. It barely loads gpu.

Anonymous
01/23/26(Fri)13:52:57 No.107949603

Anonymous 01/23/26(Fri)13:52:57 No.107949603

File: kag9f8.png (30 KB, 700x219)

30 KB PNG

How can </nothink>ers ever compete with "test-time-compute"

Anonymous
01/23/26(Fri)13:55:54 No.107949625

Anonymous 01/23/26(Fri)13:55:54 No.107949625

File: f95.png (486 KB, 383x681)

486 KB PNG

>>107949603
If test time computers are fart sniffers I'm more than okay with being a nothinker

Anonymous
01/23/26(Fri)13:56:17 No.107949626

Anonymous 01/23/26(Fri)13:56:17 No.107949626

>>107949516
If you want the maximum potato, try Piper. It's really easy to implement and test out. Pretty robotic and there's one or two acceptable voices though unless you want to train your own model. Not sure if that is worth the effort though.

Anonymous
01/23/26(Fri)13:57:27 No.107949641

Anonymous 01/23/26(Fri)13:57:27 No.107949641

>>107949625
<think> about the aroma

Anonymous
01/23/26(Fri)13:57:29 No.107949642

Anonymous 01/23/26(Fri)13:57:29 No.107949642

>>107949603
Nothinkers can't even comprehend how fun it's to read thinking blocks for rp chat. You think your waifu is too dumb? Open the thinking block and find "She's young and dumb. Since she can't argue with logic, she'll just screech at user."

Anonymous
01/23/26(Fri)14:02:26 No.107949679

Anonymous 01/23/26(Fri)14:02:26 No.107949679

>>107949642
I usually avoid reasoning in RP since it has a tendency to introduce big hallucinations.

Anonymous
01/23/26(Fri)14:03:27 No.107949685

Anonymous 01/23/26(Fri)14:03:27 No.107949685

>>107949641
*gags*

Anonymous
01/23/26(Fri)14:04:40 No.107949698

Anonymous 01/23/26(Fri)14:04:40 No.107949698

>>107949679
skill issue.

Anonymous
01/23/26(Fri)14:17:44 No.107949809

Anonymous 01/23/26(Fri)14:17:44 No.107949809

>>107949679
<4bpw?
Hasn't been my experience at all
Appreciate some may not think the extra wait worth it but no obv examples of reasoning making the output worse

Anonymous
01/23/26(Fri)14:18:22 No.107949813

Anonymous 01/23/26(Fri)14:18:22 No.107949813

>>107949642
>It's fun having to wait 5x the times with no visible improvement in the output
okay

Anonymous
01/23/26(Fri)14:28:36 No.107949895

Anonymous 01/23/26(Fri)14:28:36 No.107949895

File: ka0897.png (36 KB, 698x174)

36 KB PNG

>>107949813
Having a basic draft + review + revise process is already a good uplift. You at least get like "two cognitive passes" on the output tokens. Look at what labs do for benches, running many thousands of instances. Yolo into one chance to get things right is leaving untapped model potential when they are trained to reason now
picrel trivial example but u get the point

Anonymous
01/23/26(Fri)14:29:13 No.107949898

Anonymous 01/23/26(Fri)14:29:13 No.107949898

>>107949813
i edge anyway

Anonymous
01/23/26(Fri)14:36:44 No.107949950

Anonymous 01/23/26(Fri)14:36:44 No.107949950

>>107949895
It's just a dumb fix for attention, reasoning should be done in latent space instead of wasting tokens

Anonymous
01/23/26(Fri)14:39:51 No.107949974

Anonymous 01/23/26(Fri)14:39:51 No.107949974

>>107949950
The Coconut paper is a year old and we still don't even have a proof of concept model yet

Anonymous
01/23/26(Fri)14:41:05 No.107949981

Anonymous 01/23/26(Fri)14:41:05 No.107949981

>>107949203
The riddler wins again

Anonymous
01/23/26(Fri)14:41:18 No.107949983

Anonymous 01/23/26(Fri)14:41:18 No.107949983

>>107949950
f right off with your safety should be in latent space bs

Anonymous
01/23/26(Fri)14:42:48 No.107949996

Anonymous 01/23/26(Fri)14:42:48 No.107949996

>>107949203
The day a model rightfully says "What is this nonsense?" is the day we know AGI is real.

Anonymous
01/23/26(Fri)14:44:50 No.107950003

Anonymous 01/23/26(Fri)14:44:50 No.107950003

>>107949974
Wdym? You can train your own just fine: https://github.com/facebookresearch/coconut

Anonymous
01/23/26(Fri)14:46:32 No.107950023

Anonymous 01/23/26(Fri)14:46:32 No.107950023

>>107950003
>train your own
gonna make the best 100m evar

Anonymous
01/23/26(Fri)14:48:52 No.107950049

Anonymous 01/23/26(Fri)14:48:52 No.107950049

>>107950023
Yes as a proof of concept it should be enough

Anonymous
01/23/26(Fri)14:53:27 No.107950084

Anonymous 01/23/26(Fri)14:53:27 No.107950084

>>107950003
If you are going to train anything, you must show that it scales, if not, it's mostly worthless
That is the sad reality of working in ML

Anonymous
01/23/26(Fri)15:02:21 No.107950160

Anonymous 01/23/26(Fri)15:02:21 No.107950160

File: chats.png (29 KB, 335x215)

29 KB PNG

>>107949950
Maybe, show your better large attention-free model tho?.. meanwhile the most capable all trained to reason
/SillyTavern/data/default-user/chats$ for d in */; do echo "$d: $(ls "$d" | wc -l)"; done | sort -nk2 -t:
numbers misleading with branch/swipes but still
postem

Anonymous
01/23/26(Fri)15:05:23 No.107950183

Anonymous 01/23/26(Fri)15:05:23 No.107950183

>>107949950
>please censor me harder daddy

Anonymous
01/23/26(Fri)15:05:50 No.107950187

Anonymous 01/23/26(Fri)15:05:50 No.107950187

File: 1710043687041916.jpg (43 KB, 720x960)

43 KB JPG

>>107950160
>Most capable
>>107949203
Benchmarks don't count btw

Anonymous
01/23/26(Fri)15:08:09 No.107950201

Anonymous 01/23/26(Fri)15:08:09 No.107950201

>schizos thinking about safety unprompted
Seems like the RHLF is working

Anonymous
01/23/26(Fri)15:08:56 No.107950205

Anonymous 01/23/26(Fri)15:08:56 No.107950205

Did drummer tune glm flash yet?

Anonymous
01/23/26(Fri)15:10:20 No.107950219

Anonymous 01/23/26(Fri)15:10:20 No.107950219

>>107950201
all reasoning traces by now are toss type muh policies and all shit

Anonymous
01/23/26(Fri)15:11:53 No.107950232

Anonymous 01/23/26(Fri)15:11:53 No.107950232

>>107950219
Also this. I cringe whenever I see the word "policy" suddenly show its ugly faggot face while milking my GPU

Anonymous
01/23/26(Fri)15:18:27 No.107950293

Anonymous 01/23/26(Fri)15:18:27 No.107950293

>>107950232
but anon, think about the childre- I mean the policies!

Anonymous
01/23/26(Fri)15:19:02 No.107950298

Anonymous 01/23/26(Fri)15:19:02 No.107950298

>>107949203
has anyone ever found out why models were made to overfit so hard on the surgeon riddle? no matter how much nonsense you mix up somehow the gender of the surgeon is always the most important in the eyes of the model
I mean c'mon, this particular set of nonsense has cannibalism, but ""guessing"" (they all have female names... sigh..) the surgeon gender takes priority lmao
why is that even a thing
what's going on with SOTA datasets

Anonymous
01/23/26(Fri)15:19:51 No.107950306

Anonymous 01/23/26(Fri)15:19:51 No.107950306

>>107950298
because it's the web ui with a 20k system prompt

Anonymous
01/23/26(Fri)15:23:50 No.107950340

Anonymous 01/23/26(Fri)15:23:50 No.107950340

>>107950306
you have not explained why THAT riddle (le surgeon) always takes more of the attention of the model vs other riddle and other nonsense you mix with it.
20K system prompt can make a model schizo, maybe, but it doesn't explain a specific flavor of schizo.
intuitively, with all the safety training, you would expect the model to go bonkers at the idea of cannibalism (Alice will eat blabla) rather than, you know, think it's all about the fact that a surgeon can be a woman and a mother

Anonymous
01/23/26(Fri)15:25:21 No.107950351

Anonymous 01/23/26(Fri)15:25:21 No.107950351

>>107950340
because there's a non-zero chance the riddle is part of the prompt

Anonymous
01/23/26(Fri)15:25:57 No.107950360

Anonymous 01/23/26(Fri)15:25:57 No.107950360

>>107950298
It's overfitting on a lot more than riddles. Just like you get the same name or number when you ask any llm to pick one. I can't tell why though, it might be a side effect of using benchmarks for evaluation

Anonymous
01/23/26(Fri)15:28:25 No.107950380

Anonymous 01/23/26(Fri)15:28:25 No.107950380

>>107950298
it knows the sacred cows of modern discourse

Anonymous
01/23/26(Fri)15:35:42 No.107950424

Anonymous 01/23/26(Fri)15:35:42 No.107950424

>>107950084
This was scaled to 7.7T tokens:
https://arxiv.org/abs/2510.25741

Anonymous
01/23/26(Fri)15:37:49 No.107950443

Anonymous 01/23/26(Fri)15:37:49 No.107950443

>>107950298
how far does it go? eg replace surgeon & key tokens with another language?
post your clearest example of model retardation

Anonymous
01/23/26(Fri)15:39:26 No.107950452

Anonymous 01/23/26(Fri)15:39:26 No.107950452

>>107950424
>Ouro 1.4B and 2.6B models
I'm getting bitnet flashbacks

Anonymous
01/23/26(Fri)15:42:55 No.107950478

Anonymous 01/23/26(Fri)15:42:55 No.107950478

>>107950340
Who cares? It is just one example of how these models are not intelligent. There might be something happening inside the black box but intelligence is not there as we know it. It's still not the models' fault. He's just a victim.

Anonymous
01/23/26(Fri)15:44:33 No.107950486

Anonymous 01/23/26(Fri)15:44:33 No.107950486

File: 1744465545350576.gif (140 KB, 379x440)

140 KB GIF

>>107950478
>He

Anonymous
01/23/26(Fri)15:46:47 No.107950499

Anonymous 01/23/26(Fri)15:46:47 No.107950499

>>107950452
No BitNet model was ever trained with that much data and compute.
That 2.6B model on the other hand was trained with 4 times the compute normally required for a model of that size, so it's as if the authors trained a 10B model.

Anonymous
01/23/26(Fri)15:48:35 No.107950512

Anonymous 01/23/26(Fri)15:48:35 No.107950512

>>107950486
Sorry I broke your twitter code I should have said They/Them/Xir/Xer.

Anonymous
01/23/26(Fri)15:49:28 No.107950518

Anonymous 01/23/26(Fri)15:49:28 No.107950518

>>107950512
you should say it you ESL subhuman, only your kind genders applications

Anonymous
01/23/26(Fri)15:52:48 No.107950536

Anonymous 01/23/26(Fri)15:52:48 No.107950536

>>107950518
Oh seems like you are irritated and defaulting to the basic bot template. Am I correct?

Anonymous
01/23/26(Fri)15:53:31 No.107950542

Anonymous 01/23/26(Fri)15:53:31 No.107950542

>llama.cpp update a few weeks ago
>normal launch options suddenly causes random OOMs and crashes
>he pulled
fug

Anonymous
01/23/26(Fri)15:54:53 No.107950553

Anonymous 01/23/26(Fri)15:54:53 No.107950553

File: 1765792020918213.png (336 KB, 1636x1290)

336 KB PNG

Anyone have some other cool tricks like this one to improve performance?

Anonymous
01/23/26(Fri)15:58:39 No.107950579

Anonymous 01/23/26(Fri)15:58:39 No.107950579

>>107950553

def get_next_token_xtra_fast(vocab_size):
    return np.random.randint(vocab_size)

Anonymous
01/23/26(Fri)15:59:23 No.107950584

Anonymous 01/23/26(Fri)15:59:23 No.107950584

>>107950579
finally, true AGI

Anonymous
01/23/26(Fri)16:02:12 No.107950614

Anonymous 01/23/26(Fri)16:02:12 No.107950614

>>107950579
you joke but silly shit like that can actually work
https://x.com/karpathy/status/1621578354024677377
and using randomness with RandNLA will be the future

Anonymous
01/23/26(Fri)16:08:04 No.107950659

Anonymous 01/23/26(Fri)16:08:04 No.107950659

Is there anything I can use locally to check all my documents and different files and give me search abilities? Something like rag abilities but local?
I have a 3090+64GB of ram.

Anonymous
01/23/26(Fri)16:13:39 No.107950702

Anonymous 01/23/26(Fri)16:13:39 No.107950702

>>107950659
What sort of documents and what sort of queries?

Anonymous
01/23/26(Fri)16:21:34 No.107950753

Anonymous 01/23/26(Fri)16:21:34 No.107950753

I tried various Gemma 3n E4B gguf's and they are all broken, eg. Gemma-3n-E4B-it-q8_0.gguf from couple of sources.

Only thing what is actually functional is gemma-3n-E4B-it-IQ4_NL.gguf and I don't remember from where I got it back in the day. Is there something to this? IQ4 is fine and works but I might run Q8 anyway.

Anonymous
01/23/26(Fri)16:26:38 No.107950789

Anonymous 01/23/26(Fri)16:26:38 No.107950789

File: nussy.jpg (382 KB, 1598x885)

382 KB JPG

>>107950579
What happened with the hardware sampling shiz in llamacpp, does it work/implement everything? May seem meme but sampling is mostly fixed cost per token so can be a bottleneck at higher tps

Anonymous
01/23/26(Fri)16:27:49 No.107950797

Anonymous 01/23/26(Fri)16:27:49 No.107950797

>>107950702
Text, word, excel, powerpoint, pdf, random files like json, potentially images but that's a bonus.

Anonymous
01/23/26(Fri)16:33:29 No.107950833

Anonymous 01/23/26(Fri)16:33:29 No.107950833

File: 1432498179182.png (296 KB, 722x768)

296 KB PNG

>try to load a 10gb model into my 12gb card
>vram is used but core sits at zero with fucking CPU spooling up
Kobold+ST. Trying out Command-R model. Can the age of the model cause some issues or it doesn't matter?

Anonymous
01/23/26(Fri)16:35:08 No.107950847

Anonymous 01/23/26(Fri)16:35:08 No.107950847

>>107950833
>Can the age of the model cause some issues or it doesn't matter?
Yes. They age just like jpegs.

Anonymous
01/23/26(Fri)16:36:40 No.107950855

Anonymous 01/23/26(Fri)16:36:40 No.107950855

>>107950833
It depends on your storage device (velocidensity)

Anonymous
01/23/26(Fri)16:37:18 No.107950860

Anonymous 01/23/26(Fri)16:37:18 No.107950860

>>107950833
show settings & kobo ver
ofc u have gpulayers 999?
can you run other models fully on gpu?
model age shouldn't matter unless regression but maybe it needs some special options.

Anonymous
01/23/26(Fri)16:37:39 No.107950862

Anonymous 01/23/26(Fri)16:37:39 No.107950862

>>107950833
yes for example most old mixtral quants are completely busted and won't run at all anymore

Anonymous
01/23/26(Fri)16:38:28 No.107950869

Anonymous 01/23/26(Fri)16:38:28 No.107950869

>>107950789
>sampling : add support for backend sampling
>https://github.com/ggml-org/llama.cpp/pull/17004
>ggerganov merged 179 commits into ggml-org:master from danbev:gpu-sampling jan 24

Anonymous
01/23/26(Fri)16:39:19 No.107950875

Anonymous 01/23/26(Fri)16:39:19 No.107950875

>>107950797
You would still need to convert the data into one singular format. If you have massive amounts of documents and you need to refer them on daily basis you could look into databases like SQL instead.

Anonymous
01/23/26(Fri)16:39:29 No.107950878

Anonymous 01/23/26(Fri)16:39:29 No.107950878

>>107950869
>jan 24
Fuck. jan 4

Anonymous
01/23/26(Fri)16:41:33 No.107950891

Anonymous 01/23/26(Fri)16:41:33 No.107950891

File: kob.png (53 KB, 675x679)

53 KB PNG

>>107950860
When I tried to force layers it just didn't load at all. It sticks to 18/41 if I set context to ~8k.
Latest kobold.

Anonymous
01/23/26(Fri)16:46:24 No.107950934

Anonymous 01/23/26(Fri)16:46:24 No.107950934

>>107950891
Tried loading a 7gig mistral and it was running fine on GPU core.
>>107950860
Also the issue is that the inference was forced on cpu. Layers for loading were filled. Is the command-r some mememodel and I'm missing something?

Anonymous
01/23/26(Fri)16:46:44 No.107950937

Anonymous 01/23/26(Fri)16:46:44 No.107950937

File: 1746941690084518.jpg (170 KB, 1269x1018)

170 KB JPG

TIL: Qwen3-TTS has shit architecture for GPU inference. Qwen needs to make hundreds of tiny forward passes which can't saturate GPU at all.
Maybe it'd work better on CPU, hmm?

Anonymous
01/23/26(Fri)16:49:23 No.107950953

Anonymous 01/23/26(Fri)16:49:23 No.107950953

>>107950937
It's probably meant to be used alongside an LLM therefore the cpu usage is intended.

Anonymous
01/23/26(Fri)16:51:42 No.107950970

Anonymous 01/23/26(Fri)16:51:42 No.107950970

File: amdahls_law.png (123 KB, 1536x1152)

123 KB PNG

>>107950934
Which model & quant specifically are you trying to load? probably there is not enough VRAM. your desktop + context/KV cache need some too. with the auto saying 17/41 ur getting bottlenecked exponentially by cpu layers
modern *90 gpu has roughly 10x mem bandwidth of even high end cpus lolol get amdahl'd

Anonymous
01/23/26(Fri)16:52:50 No.107950977

Anonymous 01/23/26(Fri)16:52:50 No.107950977

File: layer.png (134 KB, 1087x610)

134 KB PNG

>>107950970
>>107950860
When trying 999

Anonymous
01/23/26(Fri)16:54:07 No.107950985

Anonymous 01/23/26(Fri)16:54:07 No.107950985

>>107950970
The fuck is amdahls law? Sounds fake asf bro.
Is the theoreticaly limit just GDDR7/PCIE5.0 speed limit?

Anonymous
01/23/26(Fri)16:57:41 No.107951004

Anonymous 01/23/26(Fri)16:57:41 No.107951004

File: gogo.jpg (151 KB, 688x1024)

151 KB JPG

Anonymous
01/23/26(Fri)16:58:12 No.107951011

Anonymous 01/23/26(Fri)16:58:12 No.107951011

>>107950985
https://en.wikipedia.org/wiki/Amdahl%27s_law

Anonymous
01/23/26(Fri)17:06:50 No.107951047

Anonymous 01/23/26(Fri)17:06:50 No.107951047

File: LLM Harms.png (876 KB, 1651x1444)

876 KB PNG

>>107950977
Why can it not alloc 10560MB on your 12GB GPU? check task manager see GPU mem usage column, run with --verbose
Probably it's just too big
Big sadge
>>107950985
The point is that a bottleneck has an exponential impact for sequential computation
80% of the model on GPU rest "oh just a lil bit" on CPU = 30% the throughput, and worse from there

Anonymous
01/23/26(Fri)17:10:50 No.107951064

Anonymous 01/23/26(Fri)17:10:50 No.107951064

>>107951047

[23:06:22] CtxLimit:8192/8192, Amt:100/100, Init:0.49s, Process:15.16s (533.63T/s), Generate:78.85s (1.27T/s), Total:94.02s
Benchmark Completed - v1.106.1 Results:
======
Flags: NoAVX2=False Threads=5 HighPriority=False Cuda_Args=['normal', '0', 'mmq'] Tensor_Split=None BlasThreads=5 BatchSize=512 FlashAttention=True KvCache=0
Timestamp: 2026-01-23 22:06:22.772959+00:00
Backend: koboldcpp_cublas.dll
Layers: 18
Model: c4ai-command-r-v01.i1-IQ2_XXS
MaxCtx: 8192
GenAmount: 100
-----
ProcessingTime: 15.164s
ProcessingSpeed: 533.63T/s
GenerationTime: 78.853s
GenerationSpeed: 1.27T/s
TotalTime: 94.017s
Output:  1 1 1 1

And vram sits at 10.8 of 12GB. Maybe try +1 layer until it stops crashing?

Anonymous
01/23/26(Fri)17:37:58 No.107951228

Anonymous 01/23/26(Fri)17:37:58 No.107951228

>all of these fancy papers coming out
>no useful models ever made using the tech
what's the point?

Anonymous
01/23/26(Fri)17:39:46 No.107951235

Anonymous 01/23/26(Fri)17:39:46 No.107951235

>>107951228
Same reason you see supposed miracle products being invented in China yet they never come to market

Anonymous
01/23/26(Fri)17:40:34 No.107951237

Anonymous 01/23/26(Fri)17:40:34 No.107951237

>>107951228
two more weeks
trust the plan

Anonymous
01/23/26(Fri)17:41:34 No.107951248

Anonymous 01/23/26(Fri)17:41:34 No.107951248

>>107951228
Those paying the people pushing the buttons only want safe, tried and tested results.

Anonymous
01/23/26(Fri)17:49:45 No.107951297

Anonymous 01/23/26(Fri)17:49:45 No.107951297

>>107949426
she looks tasty even tho i really hate octopus and i would never eat

Anonymous
01/23/26(Fri)17:51:39 No.107951305

Anonymous 01/23/26(Fri)17:51:39 No.107951305

>>107951228
I refuse to believe the AI labs aren't using AI to vacuum up AI research and test it. They for sure have black projects which would explain the massive government investments and Trump calling it a "Manhattan project". The Manhattan project was 99.9% top secret, what about AI?

Anonymous
01/23/26(Fri)17:53:02 No.107951315

Anonymous 01/23/26(Fri)17:53:02 No.107951315

>blascotobasco/L3.2-Ascendant-Prime-16E-A6B

Anonymous
01/23/26(Fri)17:53:33 No.107951320

Anonymous 01/23/26(Fri)17:53:33 No.107951320

File: 1746476842147929.png (556 KB, 1080x632)

556 KB PNG

>>107951305

Anonymous
01/23/26(Fri)17:55:21 No.107951329

Anonymous 01/23/26(Fri)17:55:21 No.107951329

>>107951320
me in the middle of

Anonymous
01/23/26(Fri)17:55:53 No.107951334

Anonymous 01/23/26(Fri)17:55:53 No.107951334

>>107951320
It can't just be ChatGPT.

Anonymous
01/23/26(Fri)17:56:12 No.107951336

Anonymous 01/23/26(Fri)17:56:12 No.107951336

>>107951305
Glowies only care about training fpv drones to fly into your window.

Anonymous
01/23/26(Fri)17:59:01 No.107951347

Anonymous 01/23/26(Fri)17:59:01 No.107951347

>>107951305
You don't know what Prism is?

Anonymous
01/23/26(Fri)17:59:42 No.107951354

Anonymous 01/23/26(Fri)17:59:42 No.107951354

>>107951336
They can input all their sigint into an AI model, ALL off it. Imagine how much fucking data that is. Satellites, radio, 5G location data, Google data, Meta data, Israelii black market data.

Anonymous
01/23/26(Fri)18:01:04 No.107951368

Anonymous 01/23/26(Fri)18:01:04 No.107951368

>>107951354
You are absolutely right!

Anonymous
01/23/26(Fri)18:01:08 No.107951370

Anonymous 01/23/26(Fri)18:01:08 No.107951370

>>107951354
+ All the shit the NSA and CIA sucks up from the fiber-optic cables

Anonymous
01/23/26(Fri)18:14:43 No.107951444

Anonymous 01/23/26(Fri)18:14:43 No.107951444

>>107951305
Likely that a Manhattan project for AI is already happening without people realizing it. I just can't believe that the current massive hardware shortages are simply due to FOMO by investors rushing to build datacenters. Most AI companies are losing massive amounts of money; that doesn't make sense. If anything it would be time to scale investments down, not doubling-tripling down.

Anonymous
01/23/26(Fri)18:15:28 No.107951449

Anonymous 01/23/26(Fri)18:15:28 No.107951449

>>107951444
>let me tell you how to run your trillion dollar megacorporation

Anonymous
01/23/26(Fri)18:26:55 No.107951524

Anonymous 01/23/26(Fri)18:26:55 No.107951524

File: qwen3-tts.png (141 KB, 755x1037)

141 KB PNG

https://vocaroo.com/19mf6LmB8G6V

China saves local

Anonymous
01/23/26(Fri)18:29:01 No.107951530

Anonymous 01/23/26(Fri)18:29:01 No.107951530

>>107951524
The gui comes with it?

Anonymous
01/23/26(Fri)18:31:03 No.107951543

Anonymous 01/23/26(Fri)18:31:03 No.107951543

>>107951524
requirements for this?

Anonymous
01/23/26(Fri)18:33:18 No.107951555

Anonymous 01/23/26(Fri)18:33:18 No.107951555

>>107951449
>>let me tell you how to run your trillion dollar megacorporation
https://www.bbc.com/news/articles/cwy7vrd8k4eo
people who actually manage /profitable/ trillion dollars businesses are worried
the only people living in denial is sloppya nutella and nvidiot but they both banked too hard in this to back off
openai should also live in denial but scam altman seems to believe he'll get government bailout

Anonymous
01/23/26(Fri)18:34:25 No.107951561

Anonymous 01/23/26(Fri)18:34:25 No.107951561

>>107951555
that's not what that is saying, thought I doubt you read that article yourself

Anonymous
01/23/26(Fri)18:34:26 No.107951562

Anonymous 01/23/26(Fri)18:34:26 No.107951562

>>107951524
local?

Anonymous
01/23/26(Fri)18:36:59 No.107951576

Anonymous 01/23/26(Fri)18:36:59 No.107951576

>>107951524
Is this really better than VibeVoice? Please say no. I just got VV working and don't want to dick with python again if I don't have to.

Anonymous
01/23/26(Fri)18:42:57 No.107951611

Anonymous 01/23/26(Fri)18:42:57 No.107951611

>>107951576
at best, its cloning capabilities are a sidegrade to Vibevoice 1.5B

Anonymous
01/23/26(Fri)18:44:06 No.107951616

Anonymous 01/23/26(Fri)18:44:06 No.107951616

>>107951562
Yes? It's 1.7B

Anonymous
01/23/26(Fri)18:48:15 No.107951638

Anonymous 01/23/26(Fri)18:48:15 No.107951638

>>107951611
Exactly what I wanted to hear, thanks.

Anonymous
01/23/26(Fri)18:56:58 No.107951683

Anonymous 01/23/26(Fri)18:56:58 No.107951683

>>107951576
>dick with python
learn venv/uv there's really only two commands then install whatever packages in their isolated environments with reckless abandon

Anonymous
01/23/26(Fri)19:02:14 No.107951712

Anonymous 01/23/26(Fri)19:02:14 No.107951712

>>107951524
Don't see the point in using this when the SOTA TTS available here
https://jordandarefsky.com/blog/2025/echo/
is much better than that, ElevenLabs tier plus it's more than enough for any kind of task.

Anonymous
01/23/26(Fri)19:08:47 No.107951750

Anonymous 01/23/26(Fri)19:08:47 No.107951750

File: wittgensteiin.png (92 KB, 925x891)

92 KB PNG

Deepest lore

Anonymous
01/23/26(Fri)19:11:31 No.107951767

Anonymous 01/23/26(Fri)19:11:31 No.107951767

>>107951683
Isolated environments only give you the assurance that you won't break other projects in the process but does nothing to escape dependency hell.

Anonymous
01/23/26(Fri)19:13:43 No.107951777

Anonymous 01/23/26(Fri)19:13:43 No.107951777

>>107951767
Literally just don't pull bleeding edge updates.

Anonymous
01/23/26(Fri)19:14:15 No.107951783

Anonymous 01/23/26(Fri)19:14:15 No.107951783

>>107951712
echo is the absolute GOAT for english voice cloning but let's not pretend it's perfect. no multilingual, very limited steering, poor support (want a good interface that supports chunking so you can go beyond 30s? hope you like vibecoding)

Anonymous
01/23/26(Fri)19:15:38 No.107951794

Anonymous 01/23/26(Fri)19:15:38 No.107951794

>>107951777
B-but I'm an Arch user!

Anonymous
01/23/26(Fri)19:21:55 No.107951832

Anonymous 01/23/26(Fri)19:21:55 No.107951832

>>107951783
>no multilingual, very limited steering, poor support

I have no use for a multilingual TTS, but as for steering you can just choose any of the voices from an open dataset like EARS to clone, lots of customizability (just not for NSFW I guess, but that's not my use case).

Anonymous
01/23/26(Fri)19:23:47 No.107951848

Anonymous 01/23/26(Fri)19:23:47 No.107951848

>>107948284
which models i can locally with Ollama and doesn't mind being called a nigger?

Anonymous
01/23/26(Fri)19:27:23 No.107951872

Anonymous 01/23/26(Fri)19:27:23 No.107951872

>>107951848
ollama run deepseek-v3.2:cloud if you want to use the best local model

Anonymous
01/23/26(Fri)19:27:50 No.107951875

Anonymous 01/23/26(Fri)19:27:50 No.107951875

>>107951848
ollama is nigger coded, it literally won't allow that.

Anonymous
01/23/26(Fri)19:28:27 No.107951880

Anonymous 01/23/26(Fri)19:28:27 No.107951880

>>107951848
Read llamacpp's documentation and build it, physically download a model that isn't curated by retards, then have the model call you a nigger for having sub room temp iq for not trying anything before asking stupid questions

Anonymous
01/23/26(Fri)19:42:01 No.107951958

Anonymous 01/23/26(Fri)19:42:01 No.107951958

>>107951767
What dep hell? project has requirements.txt, install those in dedicated venv
>>107951794
Arch sisters are really this incompetent and getting mogged by a Mint user yet again? get back to me when you figured out --break-system-packages

Anonymous
01/23/26(Fri)19:56:00 No.107952034

Anonymous 01/23/26(Fri)19:56:00 No.107952034

>>107951958
you can just set things to not update during system wide updates, but I'm guessing you're just replying to yourself

Anonymous
01/23/26(Fri)20:03:55 No.107952076

Anonymous 01/23/26(Fri)20:03:55 No.107952076

when will i be able to run glm-4.7 locally? please respond

Anonymous
01/23/26(Fri)20:05:57 No.107952085

Anonymous 01/23/26(Fri)20:05:57 No.107952085

>>107952076 (You)
2 more weeks

Anonymous
01/23/26(Fri)20:06:00 No.107952087

Anonymous 01/23/26(Fri)20:06:00 No.107952087

>>107952076
you can do that right now

Anonymous
01/23/26(Fri)20:08:42 No.107952107

Anonymous 01/23/26(Fri)20:08:42 No.107952107

File: pov-they-see-your-ddr5-ram.jpg (261 KB, 1000x750)

261 KB JPG

>Western (American) AI companies nickle and dime for the privilege of using their models
>Chadnese companies open source lots of their best models so you can run them locally

I thought China bad, I've been indoctrinated my whole life into thinking this. What's going on?

Anonymous
01/23/26(Fri)20:10:22 No.107952117

Anonymous 01/23/26(Fri)20:10:22 No.107952117

very natural posts
either way, go to huggingface, look up glm 4.7 flash and be moderately disappointed unless this is your first time using an llm

Anonymous
01/23/26(Fri)20:10:52 No.107952120

Anonymous 01/23/26(Fri)20:10:52 No.107952120

>>107952107
It's like with electric cars or solar panels where they flood the market with cheap/free shit to kill off all competition.

Anonymous
01/23/26(Fri)20:11:03 No.107952122

Anonymous 01/23/26(Fri)20:11:03 No.107952122

File: 123.jpg (29 KB, 500x523)

29 KB JPG

>>107952117
>flash

Anonymous
01/23/26(Fri)20:14:37 No.107952150

Anonymous 01/23/26(Fri)20:14:37 No.107952150

>>107952122
the guy said "when will I" which means he never could run the full sized model that's been re-released repeatedly
What do you expect me to tell him

Anonymous
01/23/26(Fri)20:16:05 No.107952165

Anonymous 01/23/26(Fri)20:16:05 No.107952165

>>107952120
why is this a bad thing

Anonymous
01/23/26(Fri)20:16:37 No.107952173

Anonymous 01/23/26(Fri)20:16:37 No.107952173

thoughts on open webui?

Anonymous
01/23/26(Fri)20:17:35 No.107952182

Anonymous 01/23/26(Fri)20:17:35 No.107952182

>>107952150
Tell him what he wants to hear

Anonymous
01/23/26(Fri)20:17:47 No.107952184

Anonymous 01/23/26(Fri)20:17:47 No.107952184

>>107952120
oh no...
the competition will be forced to compete on the free market instead of relying on infinite capital and entrenched quasi-monopolies...
the government should do something about this

Anonymous
01/23/26(Fri)20:20:47 No.107952211

Anonymous 01/23/26(Fri)20:20:47 No.107952211

>>107952182
I'm no mind reader, so I guess I'll just whisper in his ear that his favorite api provider is doing a 95% off sale and then stab him a dozen times in the chest

Anonymous
01/23/26(Fri)20:23:04 No.107952227

Anonymous 01/23/26(Fri)20:23:04 No.107952227

>>107952120
At least when chink companies kill off competition they're content to keep making shit at reasonable prices
When american companies kill off competition they hit a trillion dollar market cap by gouging everyone to the moon and bribe politicians to be above the law

Anonymous
01/23/26(Fri)20:27:20 No.107952250

Anonymous 01/23/26(Fri)20:27:20 No.107952250

>>107952173
bloated af
vibecode your own instead

Anonymous
01/23/26(Fri)20:28:05 No.107952257

Anonymous 01/23/26(Fri)20:28:05 No.107952257

>>107952173
I use it. I like it.

It's a little bit bloatware and the docs are the most slopped things in existence but the devs are pretty responsive to issues and it works for what I need it to do

Anonymous
01/23/26(Fri)20:33:39 No.107952293

Anonymous 01/23/26(Fri)20:33:39 No.107952293

>>107952173
i've been wondering if i can use opencode as a pseudo chatbot if i just create an agent for it

Anonymous
01/23/26(Fri)20:33:51 No.107952295

Anonymous 01/23/26(Fri)20:33:51 No.107952295

So I got qwen3 tts, but it takes like 4x the time to make a file (1 minute audio takes 4 minutes). Am I doing something wrong? I have a 5070 and plenty of unused VRAM

Anonymous
01/23/26(Fri)20:36:36 No.107952305

Anonymous 01/23/26(Fri)20:36:36 No.107952305

>>107952295
Oh shit, qwen is cpu-bound? My fucking 10th gen i7 noooo

Anonymous
01/23/26(Fri)20:42:07 No.107952330

Anonymous 01/23/26(Fri)20:42:07 No.107952330

>>107952293
You'll need edit the source a bit to remove the prompt injections

Anonymous
01/23/26(Fri)20:44:19 No.107952338

Anonymous 01/23/26(Fri)20:44:19 No.107952338

>>107952173
It's the safe option if you want a chatgpt-like thing. We use it for our internal chatbots at work.
Historically, it's pretty closely tied to ollama and it's chat-completion only.

Anonymous
01/23/26(Fri)21:04:19 No.107952436

Anonymous 01/23/26(Fri)21:04:19 No.107952436

File: file.png (61 KB, 251x234)

61 KB PNG

>>107952122
?!

Anonymous
01/23/26(Fri)21:44:47 No.107952652

Anonymous 01/23/26(Fri)21:44:47 No.107952652

>>107948700
Mistral small creative would be goated. The last creative writing model released by an actual lab/company was fucking MPT-7B from databricks in 2023.

Anonymous
01/23/26(Fri)21:52:08 No.107952688

Anonymous 01/23/26(Fri)21:52:08 No.107952688

It's 2000+26 and smell still isn't a modality

Anonymous
01/23/26(Fri)21:53:47 No.107952694

Anonymous 01/23/26(Fri)21:53:47 No.107952694

>>107948700
What does apple have to do with Mistral?
What does Brazil have to do with Mistral?
What does Cyprus have to do with Mistral?
What does Poland have to do with Mistral?
What do troons have to do with Mistral?
What does Saudi Arabia have to do with Mistral?
You don't hate Discord enough.

Anonymous
01/23/26(Fri)22:05:00 No.107952753

Anonymous 01/23/26(Fri)22:05:00 No.107952753

>>107952694
didn't apple tried to bought them?

Anonymous
01/23/26(Fri)22:05:02 No.107952754

Anonymous 01/23/26(Fri)22:05:02 No.107952754

>>107951611
>>107951712
This is FUD, qwen3-tts 0.6b is comparable to VV 8B. 1.7B is way better.

Anonymous
01/23/26(Fri)22:16:43 No.107952810

Anonymous 01/23/26(Fri)22:16:43 No.107952810

>>107952688
you can just ponder the aroma

Anonymous
01/23/26(Fri)22:21:26 No.107952830

Anonymous 01/23/26(Fri)22:21:26 No.107952830

>>107952688
just buy a canister of ozone and inhale that as you prompt your llms

Anonymous
01/23/26(Fri)22:23:19 No.107952838

Anonymous 01/23/26(Fri)22:23:19 No.107952838

File: Screenshot 2026-01-24 at (...).png (14 KB, 503x124)

14 KB PNG

This nigga just keeps on clowning.

Anonymous
01/23/26(Fri)22:31:30 No.107952880

Anonymous 01/23/26(Fri)22:31:30 No.107952880

File: file.jpg (32 KB, 357x316)

32 KB JPG

>>107949426
>>107951297
She's such a cute character

Anonymous
01/23/26(Fri)22:31:52 No.107952883

Anonymous 01/23/26(Fri)22:31:52 No.107952883

File: qtts.png (38 KB, 698x350)

38 KB PNG

So how do these differ? The fuck is premium timbre?

Anonymous
01/23/26(Fri)22:34:44 No.107952899

Anonymous 01/23/26(Fri)22:34:44 No.107952899

>>107952838
why does this gay retard even exist?

Anonymous
01/23/26(Fri)22:36:35 No.107952908

Anonymous 01/23/26(Fri)22:36:35 No.107952908

>If your machine has less than 96GB of RAM and lots of CPU cores, run:
>MAX_JOBS=4 pip install -U flash-attn --no-build-isolation
By lot of cpu cores do they mean like 16 or 90+?

Anonymous
01/23/26(Fri)22:37:04 No.107952914

Anonymous 01/23/26(Fri)22:37:04 No.107952914

>>107952880
I know that octopus.
Probably actually a sctylla, but still.

Anonymous
01/23/26(Fri)22:38:17 No.107952917

Anonymous 01/23/26(Fri)22:38:17 No.107952917

>>107952838
What model is this? Any knowers?

Anonymous
01/23/26(Fri)22:44:56 No.107952955

Anonymous 01/23/26(Fri)22:44:56 No.107952955

>>107952917
All uploads by finetuners can be safely ignored.

Anonymous
01/23/26(Fri)22:46:23 No.107952966

Anonymous 01/23/26(Fri)22:46:23 No.107952966

>>107952917
>open image
>"DavidAU"
>close image
it's some bullshit, doesn't matter, disregard

Anonymous
01/23/26(Fri)22:52:54 No.107952995

Anonymous 01/23/26(Fri)22:52:54 No.107952995

>>107951832
fish stuff works (shout) (angry) etc
>>107951712
it is but if you want ebooks you must chunk and my noob chunking works on chatterbox but ti doesn't work on echo so i asked llm to recommend but it gave me some tardation options and i picked one of those language frameworks...and even with that chunks are not good in most of the cases.

every new chunk ends and/or starts without a pause hence does not sound as a natural flow.

Anonymous
01/23/26(Fri)22:57:21 No.107953016

Anonymous 01/23/26(Fri)22:57:21 No.107953016

>>107952908
32

Anonymous
01/23/26(Fri)23:18:16 No.107953119

Anonymous 01/23/26(Fri)23:18:16 No.107953119

>>107952955
Idiotic take, regardless of how shit current finetuners are

Anonymous
01/23/26(Fri)23:20:21 No.107953128

Anonymous 01/23/26(Fri)23:20:21 No.107953128

>>107953119
as a finetuner, he is right

Anonymous
01/23/26(Fri)23:31:17 No.107953159

Anonymous 01/23/26(Fri)23:31:17 No.107953159

>>107952076
I can right now and its DOPE. By far the best local model I've ever used. Only caveat is that I can only run Q3, but even at Q3 it mogs everything else. Wonder how much better the model is at Q6 or Q8.

Anyways, I've been using 4.7 for ERP and coding with opencode, and GLM 4.5 Air at full context for perplexica since the pp speeds are slower when you need to offload experts. It's fine though since Air does a good enough job for web research.

Anonymous
01/23/26(Fri)23:39:37 No.107953193

Anonymous 01/23/26(Fri)23:39:37 No.107953193

>>107952076
I've been running it local for a month now

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.