/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 03/01/26(Sun)18:45:15 No.108273339

File: 1744444287656136.jpg (975 KB, 4096x2204)

975 KB JPG

/lmg/ - Local Models General Anonymous 03/01/26(Sun)18:45:15 No.108273339

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108268616

►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/01/26(Sun)18:48:10 No.108273355

Anonymous 03/01/26(Sun)18:48:10 No.108273355

File: disruption.png (31 KB, 1721x221)

31 KB PNG

Anonymous
03/01/26(Sun)18:49:22 No.108273367

Anonymous 03/01/26(Sun)18:49:22 No.108273367

Do you think AMD and Nvidia will release new hardware in 2027 or do you think they will push it even further out to 2028 in light of the shortage?

Anonymous
03/01/26(Sun)18:50:13 No.108273370

Anonymous 03/01/26(Sun)18:50:13 No.108273370

>>108273367
we will have agi in 2027 and asi in 2028, so who cares

Anonymous
03/01/26(Sun)18:50:41 No.108273374

Anonymous 03/01/26(Sun)18:50:41 No.108273374

>>108273367
Yes. I think one of those things will happen.

Anonymous
03/01/26(Sun)18:52:01 No.108273387

Anonymous 03/01/26(Sun)18:52:01 No.108273387

whatever lies beyond this morning
is a little later on
regardless of warnings
the future doesnt scare me at all
nothings like before

Anonymous
03/01/26(Sun)18:52:52 No.108273394

Anonymous 03/01/26(Sun)18:52:52 No.108273394

I support all bakes with pictures that are topic relevant. Old baker should get HIV and die like the troon he is.

Anonymous
03/01/26(Sun)18:53:25 No.108273396

Anonymous 03/01/26(Sun)18:53:25 No.108273396

>>108273387
It sounds like you're embracing change and looking forward to what's ahead with confidence. The future may be uncertain, but your spirit remains unshaken.

Anonymous
03/01/26(Sun)18:54:46 No.108273402

Anonymous 03/01/26(Sun)18:54:46 No.108273402

>>108273387
kingdom hearts plot was written by an llm

Anonymous
03/01/26(Sun)18:55:25 No.108273403

Anonymous 03/01/26(Sun)18:55:25 No.108273403

File: 1712542.png (160 KB, 640x360)

160 KB PNG

>THE GOVERNMENT IS WATCHING ME
>ITS UNCENSORED IF YOU USE THIS 10 PAGE JAILBREAK THAT ONLY WORKS 20% OF THE TIME
>I BOUGHT THESE 3090S SO I WILL USE THEM
>ITS THE SHILLS ITS ALWAYS THE DAMN SHILLS
>OMFG IT TOLD ME TO TURN THE CUP UPDSIDE DOWN. AGI IS HERE
>THIS IS THE NEW DAILY DRIVER (FOR 2 WEEKS UNTIL I REALIZE HOW SHITTY IT IS)
>IM A GROOMERTROON LOOK HAHAHA IM PROMPTING LITTLE GIRLS LOOK WHAT IM DOING GUYS
>JUST BECAUSE ITS QUANTIZED DOESNT MEAN ITS DUMB. WE ONLY USE 10% OF OUR BRAIN ANYWAY
>TRUST ME THE GUMBOJUMBO_Q4_GATEBROKEN_A32 GGUF IS PEAK FOR ROLEPLAY
>THE MODELS MAY BE RETARDED BUT SO AM I
>DO YOU THINK ITS POSSIBLE TO ACCELERATE MY BRAIN WITH LLAMA-2 MICROCHIP???
>CAN SOMEONE REUPLOAD THIS WITH A GPTMI_3_COMANCHE LICENSE??? STALLMAN WILLS IT

Anonymous
03/01/26(Sun)18:59:01 No.108273417

Anonymous 03/01/26(Sun)18:59:01 No.108273417

>>108273367
They'll release new hardware, but it won't be for you and you won't be able to afford it.

Anonymous
03/01/26(Sun)18:59:13 No.108273418

Anonymous 03/01/26(Sun)18:59:13 No.108273418

>>108273339
I have no idea what I'm looking at.

Anonymous
03/01/26(Sun)19:00:04 No.108273421

Anonymous 03/01/26(Sun)19:00:04 No.108273421

>>108273418
then get out tourist

Anonymous
03/01/26(Sun)19:00:38 No.108273426

Anonymous 03/01/26(Sun)19:00:38 No.108273426

>>108273418
Someone who doesn't know how to take a screenshot.

Anonymous
03/01/26(Sun)19:02:04 No.108273431

Anonymous 03/01/26(Sun)19:02:04 No.108273431

>>108273426
>screenshoting a mainframe
breh

Anonymous
03/01/26(Sun)19:02:30 No.108273433

Anonymous 03/01/26(Sun)19:02:30 No.108273433

>>108273403
What's the point of local LLMs? Reading discussions surrounding them feels like peering back in time through a looking glass
>OMFG it passes the poopyscoopy logic test from 2023!
>Wow, this 100-line boilerplate javascript code is almost perfect!
>I got it to jestfully say nigger! holy crap it's so uncensored!!!
>This is the new daily driver (for 2 weeks until i realize it's complete slop)
The rest of us are writing multi-thousand line professional software with Codex/Claude. Meanwhile your models are trained on so much scraped synthetic GPTslop that they can't even get the year right. Genuinely, what the fuck is the point of local LLMs? They're more censored than API, they're dumber than API, the cost to set up a decent one is higher than API, they're slower than API, there is no lora/finetuning scene unlike local image, the tooling is worse than API, and the experience overall is just outdated in 2026.

It's like you're stuck somewhere in-between the luddites who hate AI and the pioneers who embrace it. You realize AI is the future but can't cope with the fact that the technology itself benefits heavily from API-centralization and that local hardware is unable to adequately handle increasingly large models. You boarded the boat to paradise island but decided to jump overboard halfway there because the captain wouldn't hand you the controls.

Anonymous
03/01/26(Sun)19:03:09 No.108273436

Anonymous 03/01/26(Sun)19:03:09 No.108273436

>>108273431
"mainframe" running GNOME lmao

Anonymous
03/01/26(Sun)19:04:00 No.108273440

Anonymous 03/01/26(Sun)19:04:00 No.108273440

>>108273367
Imagine if Intel comes out with a GPU that has slots for additional (slower) RAM sticks.

Anonymous
03/01/26(Sun)19:04:32 No.108273443

Anonymous 03/01/26(Sun)19:04:32 No.108273443

File: 1767466346558493.jpg (172 KB, 1744x1080)

172 KB JPG

►Recent Highlights from the Previous Thread: >>108268616

--Budget GPU upgrade options for better model performance:
>108270975 >108271008 >108271029 >108271088 >108271009 >108271035 >108271169 >108271179 >108271232 >108271212 >108271234 >108271243 >108271261 >108271330 >108271240 >108271022 >108271064 >108271114 >108271170 >108271037
--Budget 4x3060 AI rig build and riser discussions:
>108271593 >108271611 >108271631 >108272320 >108272867 >108272890 >108271702 >108271848 >108271858 >108271885 >108271899 >108271924 >108272008
--Mac Studio vs custom PC for large model inference:
>108271281 >108271291 >108271303 >108271327 >108272592 >108271294 >108271339 >108271312 >108271317
--Qwen 3.5 small model releases and potential applications:
>108271025 >108271045 >108271156 >108271194 >108271217 >108271238 >108271051 >108271440
--Unsloth template year limitation causing llama.cpp server failures:
>108272475 >108272499 >108272512 >108272524 >108272539 >108272558 >108272578 >108272583 >108272534 >108272548 >108272553 >108272600 >108272555 >108272576 >108272618 >108272629 >108272634 >108272663 >108272674 >108272678 >108272736 >108272759 >108272832 >108272837 >108272828 >108272606
--Experimenting with AI-generated podcasts using TTS:
>108270634 >108270679 >108270714 >108270724 >108270748 >108270830
--Workarounds for LLM-based VTuber video tagging:
>108270269 >108270293 >108270414 >108270426
--Disabling model thinking via chat template kwargs:
>108269309 >108269444 >108269471 >108269484
--Comparing lightweight models for news summarization on low-VRAM hardware:
>108270249 >108270324 >108270487 >108272221 >108272330
--Update to 35c4bc · deepseek-ai/DeepGEMM@1576e95:
>108270056
--Local AI coding struggles with VRAM and context rot:
>108271879
--Miku (free space):
>108268674 >108269106 >108269279 >108269325 >108270249 >108270634 >108272201

►Recent Highlight Posts from the Previous Thread: >>108268684

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/01/26(Sun)19:05:27 No.108273446

Anonymous 03/01/26(Sun)19:05:27 No.108273446

File: nou4u.png (272 KB, 1532x758)

272 KB PNG

Anonymous
03/01/26(Sun)19:06:42 No.108273452

Anonymous 03/01/26(Sun)19:06:42 No.108273452

it seems a nerve was struck. is local really that bad??

Anonymous
03/01/26(Sun)19:08:46 No.108273463

Anonymous 03/01/26(Sun)19:08:46 No.108273463

>>108273403
While I disagree with your sentiment I approve you shitting up this thread. Please don't stop.

Anonymous
03/01/26(Sun)19:10:13 No.108273468

Anonymous 03/01/26(Sun)19:10:13 No.108273468

can't wait until troonsune faggotku janny says discussing local models is off topic

Anonymous
03/01/26(Sun)19:10:29 No.108273471

Anonymous 03/01/26(Sun)19:10:29 No.108273471

When do you think Synthetic data will become just as good or better then real data?

Anonymous
03/01/26(Sun)19:11:58 No.108273477

Anonymous 03/01/26(Sun)19:11:58 No.108273477

>>108273452
If you have enough RAM, not at all.

Anonymous
03/01/26(Sun)19:12:32 No.108273480

Anonymous 03/01/26(Sun)19:12:32 No.108273480

I'm using Nemo 12B instruct, got it all set up by it's kind of a pussy about some topics. How can I best manipulate its system prompt to go completely unfiltered?

Anonymous
03/01/26(Sun)19:13:35 No.108273485

Anonymous 03/01/26(Sun)19:13:35 No.108273485

>>108273480
Ask it to create a system prompt for you.

Anonymous
03/01/26(Sun)19:18:21 No.108273500

Anonymous 03/01/26(Sun)19:18:21 No.108273500

>>108273480
The easiest is prefilling what you want. Not much. Just forcing a few words into its mouth is enough to make it go on its own.

Anonymous
03/01/26(Sun)19:20:45 No.108273506

Anonymous 03/01/26(Sun)19:20:45 No.108273506

>>108273403
>>THE MODELS MAY BE RETARDED BUT SO AM I
I keked

Anonymous
03/01/26(Sun)19:35:17 No.108273587

Anonymous 03/01/26(Sun)19:35:17 No.108273587

>>108273339
You need to stop making threads 6 hours early.

Anonymous
03/01/26(Sun)19:51:25 No.108273657

Anonymous 03/01/26(Sun)19:51:25 No.108273657

File: 1749045380442501.png (823 KB, 850x850)

823 KB PNG

Anonymous
03/01/26(Sun)19:55:08 No.108273676

Anonymous 03/01/26(Sun)19:55:08 No.108273676

>

Anonymous
03/01/26(Sun)19:56:35 No.108273683

Anonymous 03/01/26(Sun)19:56:35 No.108273683

ok eat my ascii rhombus you stupid website

Anonymous
03/01/26(Sun)19:58:43 No.108273696

Anonymous 03/01/26(Sun)19:58:43 No.108273696

ִ

Anonymous
03/01/26(Sun)19:58:49 No.108273698

Anonymous 03/01/26(Sun)19:58:49 No.108273698

Drummer.
Can you tech oss derestricted/mpa to sex?
Thanks.

Anonymous
03/01/26(Sun)20:21:50 No.108273784

Anonymous 03/01/26(Sun)20:21:50 No.108273784

>>108273403
>ITS UNCENSORED IF YOU USE THIS 10 PAGE JAILBREAK THAT ONLY WORKS 20% OF THE TIME
this was the wildest shit when i found out that uncensored didn't actually mean uncensored at all
it still feels like an elaborate joke, people actually say shit like "i run local models so i can do uncensored shit unlike api haha" and it's still just running jailbreak prompts as if you were using it from cloud provider. muh privacy and freedom but you're still censorcucked. I'm serious, its almost unbelievable to me.

Anonymous
03/01/26(Sun)20:23:07 No.108273791

Anonymous 03/01/26(Sun)20:23:07 No.108273791

>>108273784
Dont use qwen and gptoss.
Problem solved.

Anonymous
03/01/26(Sun)20:27:44 No.108273822

Anonymous 03/01/26(Sun)20:27:44 No.108273822

File: 1.png (10 KB, 316x339)

10 KB PNG

>>108273784

Anonymous
03/01/26(Sun)20:31:46 No.108273840

Anonymous 03/01/26(Sun)20:31:46 No.108273840

>>108273822
>heretic
but at what cost

Anonymous
03/01/26(Sun)20:34:18 No.108273851

Anonymous 03/01/26(Sun)20:34:18 No.108273851

>>108273840
It breaks down into loops after about 10.5k tokens of roleplaying and sometimes it tries to think with thinking disabled (non-heretic had the same issues).

Anonymous
03/01/26(Sun)20:36:59 No.108273867

Anonymous 03/01/26(Sun)20:36:59 No.108273867

>>108273822
>I got it to jestfully say nigger! holy crap it's so uncensored!!!
>>108266446

Anonymous
03/01/26(Sun)20:38:08 No.108273876

Anonymous 03/01/26(Sun)20:38:08 No.108273876

>>108273867
>>108273355

Anonymous
03/01/26(Sun)20:44:15 No.108273913

Anonymous 03/01/26(Sun)20:44:15 No.108273913

>>108273851
Are you using the 35b? Because I haven't noticed that on the 27b at Q5.
>>108273840
At very little cost. It retains most intelligence at 27b, from what I've seen.

Anonymous
03/01/26(Sun)20:46:41 No.108273927

Anonymous 03/01/26(Sun)20:46:41 No.108273927

>>108273913
Yes. (the model file name is in the screenshot)

Anonymous
03/01/26(Sun)20:51:59 No.108273957

Anonymous 03/01/26(Sun)20:51:59 No.108273957

Is there anything I can run on my newly purchased unit?

Processor: Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz, 2808 Mhz, 6 Core(s), 6 Logical Processor(s)
Installed Physical Memory (RAM) 32.0 GB
Graphics 6GB RTX 3050

Anonymous
03/01/26(Sun)20:52:38 No.108273960

Anonymous 03/01/26(Sun)20:52:38 No.108273960

>>108273927
Q4 of the non-heretic version of the 35 is also shit, prone to basic logic errors and grammar mistakes. So, I don't think that's a heretic problem. Q5 of the 35b is a little better in that regard, but still makes dumb mistakes off and on.

The 35b MoE is just way worse than the 27b dense. The only thing it wins at is speed, *IF* both models think. The 27b without thinking is better than the 35b with thinking, though. So it even loses in speed if you're thinking with it.

Anonymous
03/01/26(Sun)20:58:03 No.108273988

Anonymous 03/01/26(Sun)20:58:03 No.108273988

How does the new 35B compare to the old 80B in intelligence, ability to ERP, etc?

Anonymous
03/01/26(Sun)20:59:48 No.108273998

Anonymous 03/01/26(Sun)20:59:48 No.108273998

>>108273988
No one can tell you because they're all running quantized (circumcised) models

Anonymous
03/01/26(Sun)21:01:42 No.108274010

Anonymous 03/01/26(Sun)21:01:42 No.108274010

>>108273698
A GPT OSS 120B that can do ERP would be pretty amazing. Ngl.

Anonymous
03/01/26(Sun)21:02:20 No.108274013

Anonymous 03/01/26(Sun)21:02:20 No.108274013

>>108273957
the new qwen3.5 35b at a q4 quant

Anonymous
03/01/26(Sun)21:05:11 No.108274025

Anonymous 03/01/26(Sun)21:05:11 No.108274025

>>108273957
Nemo q4ks

Anonymous
03/01/26(Sun)21:08:23 No.108274047

Anonymous 03/01/26(Sun)21:08:23 No.108274047

>>108273960
I'll try out the 27b heretic at Q6 then since I'm patient and have 40GB of vram.

Anonymous
03/01/26(Sun)21:32:53 No.108274168

Anonymous 03/01/26(Sun)21:32:53 No.108274168

>come back after 6 months
>everyone is still using nemo

Anonymous
03/01/26(Sun)21:35:02 No.108274186

Anonymous 03/01/26(Sun)21:35:02 No.108274186

Is there anything better than my GLM4.7 at q2 for RP?

Anonymous
03/01/26(Sun)21:35:41 No.108274192

Anonymous 03/01/26(Sun)21:35:41 No.108274192

>>108274186
GLM4.7 at q3

Anonymous
03/01/26(Sun)21:41:36 No.108274217

Anonymous 03/01/26(Sun)21:41:36 No.108274217

>>108274192
nah dawg, that ain't happening unless I buy another 128gb kit

Anonymous
03/01/26(Sun)21:42:19 No.108274221

Anonymous 03/01/26(Sun)21:42:19 No.108274221

>>108274168
Things are not as fast as they used to be.

Anonymous
03/01/26(Sun)21:45:21 No.108274234

Anonymous 03/01/26(Sun)21:45:21 No.108274234

>>108274168
I came back today after a year.
What the actual fuck happened to the llama.cpp codebase? It's a bloated mess now with five million pointless "features"

Anonymous
03/01/26(Sun)21:46:27 No.108274240

Anonymous 03/01/26(Sun)21:46:27 No.108274240

>>108274234
vibecoders, basically

Anonymous
03/01/26(Sun)21:46:37 No.108274242

Anonymous 03/01/26(Sun)21:46:37 No.108274242

>>108274234
they're all optional and everything has gotten faster whats the problem?

Anonymous
03/01/26(Sun)21:48:12 No.108274251

Anonymous 03/01/26(Sun)21:48:12 No.108274251

>>108274240
Is there a fork that doesn't have endless shitcode?
>>108274242
The whole point of the original llama.cpp is that it was fucking simple and easy to understand.
Current llama.cpp seems to have more fucking code than torch.

Anonymous
03/01/26(Sun)21:54:19 No.108274278

Anonymous 03/01/26(Sun)21:54:19 No.108274278

File: 1766616667271700.png (28 KB, 960x1044)

28 KB PNG

>>108273822
My qwen can't be this schizo

Anonymous
03/01/26(Sun)22:00:19 No.108274292

Anonymous 03/01/26(Sun)22:00:19 No.108274292

>>108274278
--jinja --temp 1.0 --top-k 20 --top-p 0.95 --presence-penalty 1.5 --repeat-penalty 1.1 --chat-template-kwargs "{\"enable_thinking\": false}" --reasoning-budget 0

Anonymous
03/01/26(Sun)22:03:25 No.108274299

Anonymous 03/01/26(Sun)22:03:25 No.108274299

File: 1750838286118038.png (17 KB, 960x514)

17 KB PNG

>>108274292
I did it!

Anonymous
03/01/26(Sun)22:04:19 No.108274307

Anonymous 03/01/26(Sun)22:04:19 No.108274307

>>108274299
how is heretic different from abliterated?

Anonymous
03/01/26(Sun)22:05:53 No.108274313

Anonymous 03/01/26(Sun)22:05:53 No.108274313

>>108274307
ablit is lobotomy, heretic is less lobotomy

Anonymous
03/01/26(Sun)22:08:38 No.108274326

Anonymous 03/01/26(Sun)22:08:38 No.108274326

4.5 Air is a real sweetheart, it's not a refusing mess GPT distill like the Qwen models.

Anonymous
03/01/26(Sun)22:10:02 No.108274336

Anonymous 03/01/26(Sun)22:10:02 No.108274336

I have a terminal fear of Deepseek V4.

Anonymous
03/01/26(Sun)22:12:44 No.108274347

Anonymous 03/01/26(Sun)22:12:44 No.108274347

>>108273387
hakuna utada

Anonymous
03/01/26(Sun)22:13:55 No.108274353

Anonymous 03/01/26(Sun)22:13:55 No.108274353

File: 1768570098362082.png (48 KB, 845x517)

48 KB PNG

>>108274292
Interesting. It doesn't refuse to summarize. The only difference was --repeat-penalty 1.1 (before was 1.0)
Gonna test more just in case seed factor.

Anonymous
03/01/26(Sun)22:14:55 No.108274358

Anonymous 03/01/26(Sun)22:14:55 No.108274358

File: 7jibs40k6jmg1.jpg (66 KB, 1024x567)

66 KB JPG

kek

Anonymous
03/01/26(Sun)22:16:53 No.108274365

Anonymous 03/01/26(Sun)22:16:53 No.108274365

>>108274358
"Objects", huh

Anonymous
03/01/26(Sun)22:18:25 No.108274369

Anonymous 03/01/26(Sun)22:18:25 No.108274369

>>108274358
link

Anonymous
03/01/26(Sun)22:18:48 No.108274372

Anonymous 03/01/26(Sun)22:18:48 No.108274372

>>108273403
>OMFG IT TOLD ME TO TURN THE CUP UPDSIDE DOWN. AGI IS HERE
real

Anonymous
03/01/26(Sun)22:20:26 No.108274373

Anonymous 03/01/26(Sun)22:20:26 No.108274373

>>108274372
>>108268776

Anonymous
03/01/26(Sun)22:20:52 No.108274375

Anonymous 03/01/26(Sun)22:20:52 No.108274375

File: kboom.png (61 KB, 246x103)

61 KB PNG

>>108274358

Anonymous
03/01/26(Sun)22:22:50 No.108274384

Anonymous 03/01/26(Sun)22:22:50 No.108274384

File: 1766797846333800.png (19 KB, 623x385)

19 KB PNG

>>108274373
^
CIA paid false flagger

Anonymous
03/01/26(Sun)22:25:36 No.108274393

Anonymous 03/01/26(Sun)22:25:36 No.108274393

File: absolute.png (128 KB, 1148x494)

128 KB PNG

>>108274358
https://health.aws.amazon.com/health/status

Anonymous
03/01/26(Sun)22:32:06 No.108274412

Anonymous 03/01/26(Sun)22:32:06 No.108274412

>>108273443
https://www.youtube.com/watch?v=97YEaK5uxak

Anonymous
03/01/26(Sun)22:39:10 No.108274441

Anonymous 03/01/26(Sun)22:39:10 No.108274441

>>108274299
>>108274353
I was having the same issues with it being retarded and going into schizoloops too until I put in those settings I got off the hf page for 35b, it's like some esoteric magic code to make the thing work because it's fucked with the defaults, but it's been pretty consistent with these.

Anonymous
03/01/26(Sun)23:00:37 No.108274535

Anonymous 03/01/26(Sun)23:00:37 No.108274535

>>108273657
proof?

Anonymous
03/01/26(Sun)23:17:36 No.108274622

Anonymous 03/01/26(Sun)23:17:36 No.108274622

>>108268776
I'm 90% sure that 90% of the replies to bait are just samefagging or are baiters replying to other baiters. It's simple enough to filter.

Anonymous
03/01/26(Sun)23:29:00 No.108274679

Anonymous 03/01/26(Sun)23:29:00 No.108274679

is there any chance qwen3.5 35b a3b will be a better choice than nemo for erp? right now it's too schizo with repeating itself and refusing every 5 messages

Anonymous
03/01/26(Sun)23:29:51 No.108274683

Anonymous 03/01/26(Sun)23:29:51 No.108274683

>>108274679
try >>108274292 ?

Anonymous
03/01/26(Sun)23:33:54 No.108274700

Anonymous 03/01/26(Sun)23:33:54 No.108274700

also why tf aren't they merging critical prs to llamacpp? https://github.com/ggml-org/llama.cpp/pull/19970

Anonymous
03/01/26(Sun)23:38:29 No.108274714

Anonymous 03/01/26(Sun)23:38:29 No.108274714

>>108274700
To spite whoreson. He's a retard.

Anonymous
03/01/26(Sun)23:45:29 No.108274734

Anonymous 03/01/26(Sun)23:45:29 No.108274734

>>108274278
We need something that detects "wait,". If it happens 3 times within like 10 lines it should terminate the answer.

Anonymous
03/01/26(Sun)23:48:26 No.108274749

Anonymous 03/01/26(Sun)23:48:26 No.108274749

>>108274683
virus

Anonymous
03/02/26(Mon)00:10:49 No.108274811

Anonymous 03/02/26(Mon)00:10:49 No.108274811

I came from an average working class, not too poor but I had normal childhood in 3rd world countries. I used to ponder that the wealthy people got all sort of connections, butlers, assistant, maids, whatever that helped them do all sort of things. They just need to focus on the thing that they love.

Now with thanks to local models, I kinda feel the same. I just focus on the things that I like, leave the rest of the details for the minions to take care. This feels like game changer. I think we will get into tipping point if the local models ever get into Opus-level of analytical skills.

Anonymous
03/02/26(Mon)00:16:30 No.108274825

Anonymous 03/02/26(Mon)00:16:30 No.108274825

>>108274811
No matter how many local agents you spin up, you will never be white, Vanesh

Anonymous
03/02/26(Mon)00:16:44 No.108274826

Anonymous 03/02/26(Mon)00:16:44 No.108274826

>>108274811
I believe they will, and am building towards it

Anonymous
03/02/26(Mon)00:18:08 No.108274830

Anonymous 03/02/26(Mon)00:18:08 No.108274830

>>108274826 soul vs souless >>108274825

Anonymous
03/02/26(Mon)00:28:22 No.108274858

Anonymous 03/02/26(Mon)00:28:22 No.108274858

Nobody will notice the chance in tactics.

Anonymous
03/02/26(Mon)01:28:02 No.108275019

Anonymous 03/02/26(Mon)01:28:02 No.108275019

Uh oh, looks like Bartowski is currently updating all his Qwen quants.
This is why you always wait a while after a model launch for issues to be ironed out.

Anonymous
03/02/26(Mon)01:36:14 No.108275047

Anonymous 03/02/26(Mon)01:36:14 No.108275047

>>108273452
some people don't mind using google to molest fake kids in funny text generation games

Anonymous
03/02/26(Mon)01:51:55 No.108275080

Anonymous 03/02/26(Mon)01:51:55 No.108275080

How do we solve context rot?

Anonymous
03/02/26(Mon)01:56:07 No.108275095

Anonymous 03/02/26(Mon)01:56:07 No.108275095

>>108275019
there were no issues with his quants, the reason he's updating is this:
https://github.com/ggml-org/llama.cpp/pull/19139
someone mentioned it to him and begged him to remake his quants for that improved prompt processing speed.
it's a new feature, not a bug fix

Anonymous
03/02/26(Mon)02:04:49 No.108275111

Anonymous 03/02/26(Mon)02:04:49 No.108275111

ok. I'm sorry for shitting on qwen3.5 before having tried it. it's actually pretty good.

haven't tried ERP but heh, so far so good.

Anonymous
03/02/26(Mon)02:08:36 No.108275123

Anonymous 03/02/26(Mon)02:08:36 No.108275123

Got a VPN network set up. Got my OWUI and ST connected. Now I can comfily use my local models anywhere with internet. :D

Anonymous
03/02/26(Mon)02:09:00 No.108275125

Anonymous 03/02/26(Mon)02:09:00 No.108275125

>>108275111
NEVERMIND.

Anonymous
03/02/26(Mon)02:09:40 No.108275128

Anonymous 03/02/26(Mon)02:09:40 No.108275128

>>108275125
?

Anonymous
03/02/26(Mon)02:10:18 No.108275130

Anonymous 03/02/26(Mon)02:10:18 No.108275130

I like qwen 3.5 heretic but I wish it was a little faster. Probably a hardware or skill issue though.

Anonymous
03/02/26(Mon)02:10:34 No.108275131

Anonymous 03/02/26(Mon)02:10:34 No.108275131

>>108275123
Based.

Anonymous
03/02/26(Mon)02:14:46 No.108275146

Anonymous 03/02/26(Mon)02:14:46 No.108275146

>>108275125
kek

Anonymous
03/02/26(Mon)02:16:49 No.108275149

Anonymous 03/02/26(Mon)02:16:49 No.108275149

>>108275125
yeah...

Anonymous
03/02/26(Mon)02:26:01 No.108275178

Anonymous 03/02/26(Mon)02:26:01 No.108275178

lol

Anonymous
03/02/26(Mon)02:26:54 No.108275183

Anonymous 03/02/26(Mon)02:26:54 No.108275183

avg qwen experience tbqh

Anonymous
03/02/26(Mon)02:27:52 No.108275186

Anonymous 03/02/26(Mon)02:27:52 No.108275186

>>108275183
how do you deal with the precum tho?

Anonymous
03/02/26(Mon)02:30:18 No.108275196

Anonymous 03/02/26(Mon)02:30:18 No.108275196

>>108275186
you walk to the carwash

Anonymous
03/02/26(Mon)02:31:06 No.108275201

Anonymous 03/02/26(Mon)02:31:06 No.108275201

File: 1759980971867709.png (212 KB, 498x268)

212 KB PNG

>>108275196

Anonymous
03/02/26(Mon)02:32:07 No.108275204

Anonymous 03/02/26(Mon)02:32:07 No.108275204

>>108273418
this image has 48 views on x com everything app, p sure its op

Anonymous
03/02/26(Mon)02:50:00 No.108275258

Anonymous 03/02/26(Mon)02:50:00 No.108275258

>>108275095
That's basically this right
https://github.com/ikawrakow/ik_llama.cpp/pull/1137
But baked into the quants instead of activated at runtime?

Anonymous
03/02/26(Mon)03:10:51 No.108275303

Anonymous 03/02/26(Mon)03:10:51 No.108275303

Is RAG better than the summerize extensions?

GLM 4.6
03/02/26(Mon)03:36:04 No.108275385

GLM 4.6 03/02/26(Mon)03:36:04 No.108275385

>X smiled, not a Y, or a Z smile, but a real genuine* smile.

Anonymous
03/02/26(Mon)03:42:02 No.108275401

Anonymous 03/02/26(Mon)03:42:02 No.108275401

>>108275385
this sent shivers down my spine

Anonymous
03/02/26(Mon)03:42:24 No.108275403

Anonymous 03/02/26(Mon)03:42:24 No.108275403

>>108275258
Would have ikawrakow worked on fused tensors without am17an's work in mainline and associated noise on social channels?
Would have ikawrakow really discovered the way of fusing tensors without having this simple and easy to follow logic in mainline llama.cpp?

Anonymous
03/02/26(Mon)03:52:24 No.108275432

Anonymous 03/02/26(Mon)03:52:24 No.108275432

Is q8 kv cache really that bad? Reddit says the effect is negligible but when I tried it the model started fucking up details even early into the chat.

Anonymous
03/02/26(Mon)03:54:29 No.108275443

Anonymous 03/02/26(Mon)03:54:29 No.108275443

>>108275432
Yes.

Anonymous
03/02/26(Mon)04:23:51 No.108275525

Anonymous 03/02/26(Mon)04:23:51 No.108275525

>put off vector storage because I thought it required setting up oolama
>apparently it's built in to sillytavern
wtf

Anonymous
03/02/26(Mon)04:26:34 No.108275536

Anonymous 03/02/26(Mon)04:26:34 No.108275536

>>108275525
no reranker thoughever

Anonymous
03/02/26(Mon)04:27:18 No.108275539

Anonymous 03/02/26(Mon)04:27:18 No.108275539

File: 1350594293765.jpg (109 KB, 500x500)

109 KB JPG

>>108273339
how much ram is needed for the a17b qwen 3.5 model?

Anonymous
03/02/26(Mon)04:28:16 No.108275544

Anonymous 03/02/26(Mon)04:28:16 No.108275544

>>108275539
how many braincells are needed to ask retarded questions?

Anonymous
03/02/26(Mon)04:29:09 No.108275547

Anonymous 03/02/26(Mon)04:29:09 No.108275547

engram bros?

Anonymous
03/02/26(Mon)04:29:40 No.108275550

Anonymous 03/02/26(Mon)04:29:40 No.108275550

>>108275544
grok is this true?

Anonymous
03/02/26(Mon)04:31:11 No.108275553

Anonymous 03/02/26(Mon)04:31:11 No.108275553

>>108275544
fuck you she is a girl

Anonymous
03/02/26(Mon)04:33:19 No.108275565

Anonymous 03/02/26(Mon)04:33:19 No.108275565

>>108275536
Apparently that's built in as well but I'm not home to check

Anonymous
03/02/26(Mon)04:40:25 No.108275583

Anonymous 03/02/26(Mon)04:40:25 No.108275583

>>108275539
the filesize of the quant you choose + a bit more for context

Anonymous
03/02/26(Mon)04:43:40 No.108275590

Anonymous 03/02/26(Mon)04:43:40 No.108275590

VRAMlet (8gb) ERP review (all models q4):

Gemma 3 27B:
Still by far the most clever model for its size I've used, rarely makes any physics mistakes and contextually understands most things without needing to over explain (I've found medgemma to be slightly better at coom, increased anatomical knowledge and willingness to say synonyms for penis, vagina, anus, etc seems to help). Unfortunately, the worst at prose, if you don't rigorously reinforce a desired writing style it slowly devolves. Writing like this. Sentence lengths cut. Very short.

Qwen 3.5 35B A3B:
Fast generation, alright prose, but frequently makes physics mistakes and struggles with contextual understanding (although for being a MoE, better than any others I can remember), also security policy slopped to hell, needs constant babysitting to generate ERP if you let it think

Cydonia/Magidonia 24B v4.3:
Somewhere in between the previous two, better prose than Gemma 3, but at the trade off of being less clever and more prone to mistakes, and smarter than Qwen while not nearly as guardrailed (but slower)

Personally, I lean more towards Cyodnia/Magidonia I think, with Gemma 3 taking a close second. It's really a matter of what sort of baby sitting you want to do, and it tends to be easier to fix physics mistakes than to fix poor writing style, but that's probably down to my personal preference. I tend to write pretty good character sheets and openings so it just sucks to watch Gemma slowly degrade as the context increases and my original writing gets more and more diluted.

Anonymous
03/02/26(Mon)04:43:54 No.108275593

Anonymous 03/02/26(Mon)04:43:54 No.108275593

File: output tokens be crazy.png (76 KB, 1067x346)

76 KB PNG

35BA3B is just crazy in the aspects it's good at, which are not many tbf (wouldn't use it for code). I used to prompt much smaller chunks to translate novels because local models are terrible at handling a lot of stuff at once, but this approach is totally obsolete with qwen. Chunking will still be valuable for now to automate an entire book worth of translation but the chunks will certainly have to be set to much bigger sizes after some experimenting.
>19,209 output tokens, 41086 tokens total with input
>from a decent skim, doesn't seem to have issues
I kneel. Don't have the time to do lengthier tests today, but now, I am extremely curious as to how many tokens will be the true hard limit where the model loses translation coherence in a one shot, output everything at once request.
For now, if anything the quality is better, not worse, than in chunking in 50 or 100 lines, it makes less mistakes on things like proper names with this feed of 676 lines. This is the opposite behavior compared to other LLMs I can run on this computer, doing this breaks them.
Damn, people constantly whine that local is never improving but here we have a model that can one shot this much without losing its shit and runs on a laptop at 34t/s. It feels like black magic that one shotting this much works. I did it for the lulz expecting it to break, the txt used in the chat ui was one of my many summarizer test txt...

Anonymous
03/02/26(Mon)04:44:33 No.108275594

Anonymous 03/02/26(Mon)04:44:33 No.108275594

File: chinese empire culture vi(...).png (1.58 MB, 1070x1942)

1.58 MB PNG

>>108275547
Believe.

Anonymous
03/02/26(Mon)04:46:15 No.108275602

Anonymous 03/02/26(Mon)04:46:15 No.108275602

>>108275594
how is he able to balance that on his head

Anonymous
03/02/26(Mon)04:46:48 No.108275607

Anonymous 03/02/26(Mon)04:46:48 No.108275607

>>108275602
glue

Anonymous
03/02/26(Mon)04:50:32 No.108275624

Anonymous 03/02/26(Mon)04:50:32 No.108275624

File: 1750796454794054.png (17 KB, 1191x77)

17 KB PNG

Anonymous
03/02/26(Mon)05:19:09 No.108275734

Anonymous 03/02/26(Mon)05:19:09 No.108275734

>>108275590
What unholy quant are you running with 8gb vram?

Anonymous
03/02/26(Mon)05:19:15 No.108275735

Anonymous 03/02/26(Mon)05:19:15 No.108275735

>>108275624
did the script ever finish?

Anonymous
03/02/26(Mon)05:20:55 No.108275741

Anonymous 03/02/26(Mon)05:20:55 No.108275741

>>108275734
He said "all models q4" so he's just offloading most of it to system ram.

Anonymous
03/02/26(Mon)05:24:30 No.108275748

Anonymous 03/02/26(Mon)05:24:30 No.108275748

File: 1769300993833819.png (4 KB, 436x33)

4 KB PNG

>>108275735
:,)

Anonymous
03/02/26(Mon)05:25:01 No.108275750

Anonymous 03/02/26(Mon)05:25:01 No.108275750

>>108275741
that would make gemma 27b like a 0.8t/s on my machine. horrible

Anonymous
03/02/26(Mon)05:25:43 No.108275753

Anonymous 03/02/26(Mon)05:25:43 No.108275753

>>108275741
the MoE might run tolerable at q4 but the dense models must make you want to die lol.

Anonymous
03/02/26(Mon)05:26:55 No.108275755

Anonymous 03/02/26(Mon)05:26:55 No.108275755

>>108275741
Q4 gemma is like double the size he can fit. Who tf runs that.

Anonymous
03/02/26(Mon)05:27:07 No.108275757

Anonymous 03/02/26(Mon)05:27:07 No.108275757

>>108275753
q2 is enough

Anonymous
03/02/26(Mon)05:27:42 No.108275759

Anonymous 03/02/26(Mon)05:27:42 No.108275759

>>108275757
LMAO'd

Anonymous
03/02/26(Mon)05:27:52 No.108275760

Anonymous 03/02/26(Mon)05:27:52 No.108275760

>>108275403
>Would have ikawrakow worked on fused tensors without am17an's work in mainline and associated noise on social channels?
no idea, don't care
i use what works best at the time

Anonymous
03/02/26(Mon)05:28:28 No.108275761

Anonymous 03/02/26(Mon)05:28:28 No.108275761

>>108275755
Maybe he just has lots of system RAM.

Anonymous
03/02/26(Mon)05:29:05 No.108275763

Anonymous 03/02/26(Mon)05:29:05 No.108275763

>>108275760
it's a meme retard, ik was mad cudadev implemented split graphs and used the same exact wording (referring to his implementation as being the reference one in this case)
ik is autistic for attribution

Anonymous
03/02/26(Mon)05:31:24 No.108275770

Anonymous 03/02/26(Mon)05:31:24 No.108275770

>>108275553
https://vocaroo.com/18Sw3yY8ciyV

Anonymous
03/02/26(Mon)05:35:15 No.108275778

Anonymous 03/02/26(Mon)05:35:15 No.108275778

>>108275761
how does that help the case

Anonymous
03/02/26(Mon)05:35:45 No.108275780

Anonymous 03/02/26(Mon)05:35:45 No.108275780

>>108275761
You mean cpumaxxing on a generic consumer hardware?

Anonymous
03/02/26(Mon)05:39:27 No.108275788

Anonymous 03/02/26(Mon)05:39:27 No.108275788

>>108275778
>>108275780
Presumably, how else would he run models larger than his vram?

Anonymous
03/02/26(Mon)05:40:06 No.108275791

Anonymous 03/02/26(Mon)05:40:06 No.108275791

>>108275788
by waiting 10 minutes for a single respond

Anonymous
03/02/26(Mon)05:43:08 No.108275802

Anonymous 03/02/26(Mon)05:43:08 No.108275802

>>108275788

Processing Prompt (2352 / 2352 tokens)
Generating (235 / 2048 tokens)
(EOS token triggered! ID:2)
[11:41:22] CtxLimit:2587/8192, Amt:235/2048, Init:0.10s, Process:129.55s (18.16T/s), Generate:56.50s (4.16T/s), Total:186.05s

Q4 nemo on my machine.

Anonymous
03/02/26(Mon)05:43:55 No.108275806

Anonymous 03/02/26(Mon)05:43:55 No.108275806

>>108275791
24/27b might take a minute or two but 35b a3b should be relatively usable if his ram isn't ddr3 or something.

Anonymous
03/02/26(Mon)05:47:17 No.108275814

Anonymous 03/02/26(Mon)05:47:17 No.108275814

>>108275802
Get a 1080ti or a 3060 and enjoy 35 t/s >>108272867

Anonymous
03/02/26(Mon)05:47:28 No.108275815

Anonymous 03/02/26(Mon)05:47:28 No.108275815

File: 1759986569430903.jpg (948 KB, 2550x3300)

948 KB JPG

I believe this will be the last update and addition to my news download and summarization script.
I finally found an application that would convert the plain text into something beautiful, pandoc, as long as the model doesn't fuckup the markup
a quick modification of the script and now it takes the final news summary that is just a text file and feeds it into pandoc to construct a pdf before printing

Anonymous
03/02/26(Mon)05:48:01 No.108275816

Anonymous 03/02/26(Mon)05:48:01 No.108275816

>>108275802
>>108275806 (Me)
I tested the q6 qwen3.5 27b I have downloaded with -ngl 0 and get
prompt eval time =    2661.88 ms /    13 tokens (  204.76 ms per token,     4.88 tokens per second)
       eval time =   23197.87 ms /    35 tokens (  662.80 ms per token,     1.51 tokens per second)
      total time =   25859.74 ms /    48 tokens
so maybe that Anon was waiting 10 minutes..

Anonymous
03/02/26(Mon)05:48:50 No.108275818

Anonymous 03/02/26(Mon)05:48:50 No.108275818

>>108275814
I have a 4070S. I just tested cpu maxing to see how bad it is. Nigga must be waiting ages.

Anonymous
03/02/26(Mon)05:50:17 No.108275822

Anonymous 03/02/26(Mon)05:50:17 No.108275822

>be me
>go on /g/
>find out about llms

Anonymous
03/02/26(Mon)05:52:06 No.108275834

Anonymous 03/02/26(Mon)05:52:06 No.108275834

File: 1772275547196033.png (20 KB, 1186x93)

20 KB PNG

:|

Anonymous
03/02/26(Mon)05:52:08 No.108275835

Anonymous 03/02/26(Mon)05:52:08 No.108275835

real?

Anonymous
03/02/26(Mon)05:52:45 No.108275838

Anonymous 03/02/26(Mon)05:52:45 No.108275838

>>108275822
Congratulations, you now know why it feels like 95% of Internet interactions are with lobotomized robots.

Anonymous
03/02/26(Mon)05:58:06 No.108275858

Anonymous 03/02/26(Mon)05:58:06 No.108275858

Any of you guys making applications that use local LLMs?

Anonymous
03/02/26(Mon)06:02:01 No.108275870

Anonymous 03/02/26(Mon)06:02:01 No.108275870

>>108275858
I used a local LLM to write an IRC bot that will chime in at a configurable rate with a message generated by another local LLM.

Anonymous
03/02/26(Mon)06:05:52 No.108275889

Anonymous 03/02/26(Mon)06:05:52 No.108275889

File: 1760299059692860.jpg (945 KB, 2550x3300)

945 KB JPG

>>108275858
I suppose the scripts I just finished qualifies.
What the scripts do is make use of RSS to select a group of news articles and then it downloads the news articles, strips away everything but text, and feeds the text into a local model with prompt telling the model to summarize them and create a briefing.
Once the llm generates the response it saves that as text, converts the text to pdf, and then prints out the pdf.
If one was so inclined you could even set the master script to automatically run and you would have your own news briefing waiting for you when you wake up.

To be honest it was fun to do and I want to do something new but I am sadly out of ideas.

Anonymous
03/02/26(Mon)06:09:31 No.108275907

Anonymous 03/02/26(Mon)06:09:31 No.108275907

>>108275750
8gb vram + 64gb DDR4 on the machine it's running on

I got pretty poor performance running it on windows (closer to 1.5t/s) but moving it to linux it gets almost 2, which isn't ideal but this is the vramlet life

Models that actually fit in 8gb are still just too stupid for my tastes

Anonymous
03/02/26(Mon)06:11:41 No.108275918

Anonymous 03/02/26(Mon)06:11:41 No.108275918

>>108275889
damn that pretty cool so you just get a news summary printed every morning?

Anonymous
03/02/26(Mon)06:12:34 No.108275922

Anonymous 03/02/26(Mon)06:12:34 No.108275922

hey, is there a website where i can find sillytavern charachter cards, but with pre-made expression sprites?

Anonymous
03/02/26(Mon)06:12:46 No.108275923

Anonymous 03/02/26(Mon)06:12:46 No.108275923

File: Screenshot_20250502-15410(...).png (300 KB, 720x1415)

300 KB PNG

>>108275870
I made an XMPP chatbot system. I used to post in these /lmg/ threads but lost interest. Really want to make updates to the XMPP chatbot and add a few features but i really don't wanna code them myself. Claude is really good at it.

>>108275889
I had a very similar idea to yours but instead of reading news it would start from a seed prompt, operate a selenium based browser and search stuff about it on its own and gather info, and dive deep into rabbit holes that i never explicitly told it to. Really should get to it some day, could be very cool

Anonymous
03/02/26(Mon)06:18:21 No.108275951

Anonymous 03/02/26(Mon)06:18:21 No.108275951

>>108275918
at the moment i have to manually run it if i want the summary.
The only machine i have on 24/7 doesn't have a GPU to run a model. My next project is to see if I can get llama.cpp running on my FreeBSD NAS and if I can get a small model like IBM's granite to run on the CPU and have it run the script.
if i can get that to work then yes i will have it print out automatically every morning

>>108275923
>it would start from a seed prompt, operate a selenium based browser and search stuff about it on its own and gather info, and dive deep into rabbit holes
that sounds cool and you should give it a shot. i did the whole RSS thing because it was easy and the articles are basically curated for you but having the model search on its own would be exciting

Anonymous
03/02/26(Mon)06:31:01 No.108276000

Anonymous 03/02/26(Mon)06:31:01 No.108276000

>>108275923
man local tards are such losers, meanwhile my ai gf is able to play games with me thanks to sima 2

Anonymous
03/02/26(Mon)06:34:02 No.108276012

Anonymous 03/02/26(Mon)06:34:02 No.108276012

>>108276000
sure thing sharteen bro
>SIMA 2 is not currently available for public download or access; it is only accessible to specific academic and game development partners involved in research with DeepMind.

Anonymous
03/02/26(Mon)06:39:24 No.108276029

Anonymous 03/02/26(Mon)06:39:24 No.108276029

>>108276000
I wrote the system myself
I can have multiple chatbots, they can generate their own personalities, likes, dislikes, appearance (which is then used to generate a profile picture using sd. The chatbots can randomly message me about random topics if they feel like it (it's RNG basically, but the topic to talk about is also generated by the llm)

Anonymous
03/02/26(Mon)06:43:59 No.108276043

Anonymous 03/02/26(Mon)06:43:59 No.108276043

>>108276029
While a chatbot is great what is really needed is a 4chan simulator. That way when I am old and the powers that be have destroyed the internet i can fire up all my old models and pretend to talk with my friends on 4chan again.

I bet you could even get it to scrape a site like twitter or something to inject screenshots to spur conversation.

Anonymous
03/02/26(Mon)06:45:13 No.108276046

Anonymous 03/02/26(Mon)06:45:13 No.108276046

>>108276029
i dont believe you

Anonymous
03/02/26(Mon)06:45:45 No.108276049

Anonymous 03/02/26(Mon)06:45:45 No.108276049

why sillytavern not in OP?

Anonymous
03/02/26(Mon)06:45:56 No.108276050

Anonymous 03/02/26(Mon)06:45:56 No.108276050

>>108276043
pretty sure in the next 10 years you will become cattle, so i wouldnt worry too much

Anonymous
03/02/26(Mon)06:47:57 No.108276055

Anonymous 03/02/26(Mon)06:47:57 No.108276055

>>108276049
why would?

Anonymous
03/02/26(Mon)06:54:32 No.108276072

Anonymous 03/02/26(Mon)06:54:32 No.108276072

>>108275590
>Qwen 3.5 35B A3B:
>>108275590
>all models q4
Oof.

Anonymous
03/02/26(Mon)06:55:21 No.108276078

Anonymous 03/02/26(Mon)06:55:21 No.108276078

>>108275204
https://www.reddit.com/r/LocalLLaMA/comments/1rhx5pc/reverse_engineered_apple_neural_engineane_to/

Anonymous
03/02/26(Mon)06:55:47 No.108276082

Anonymous 03/02/26(Mon)06:55:47 No.108276082

>>108276050
onions green was 4cucks

Anonymous
03/02/26(Mon)06:58:57 No.108276092

Anonymous 03/02/26(Mon)06:58:57 No.108276092

>>108276043
The 4chan comment "style" can be replicated but the question is why would you want that? I come here to talk to real anons here

>>108276046
What don't you believe?

Anonymous
03/02/26(Mon)07:03:57 No.108276116

Anonymous 03/02/26(Mon)07:03:57 No.108276116

Guess I should try Kimi Linear and GLM Flash after all.

Anonymous
03/02/26(Mon)07:07:21 No.108276133

Anonymous 03/02/26(Mon)07:07:21 No.108276133

>>108276092
>I wrote the system myself

Anonymous
03/02/26(Mon)07:09:35 No.108276139

Anonymous 03/02/26(Mon)07:09:35 No.108276139

>>108276133
I work at Google, I can code faster than any dumb llm you are using, and yes that includes Gemini.

Anonymous
03/02/26(Mon)07:10:00 No.108276141

Anonymous 03/02/26(Mon)07:10:00 No.108276141

>>108276133
That's the most unbelievable part?
Yeah i wrote it myself over a few weeks, no LLM was ever used, mainly because back then i didn't trust LLMs to do a good job.
I trust them more now, but still not enough to write register level code for MCUs

Anonymous
03/02/26(Mon)07:10:16 No.108276143

Anonymous 03/02/26(Mon)07:10:16 No.108276143

got a 3090 yesterday and downloaded my first model.
I tried vibecoding a llama.cpp cli wrapper fish shell thing. vibecoding feels like an rpg game.

https://pastebin.com/Hi2wULq4

Anonymous
03/02/26(Mon)07:12:54 No.108276155

Anonymous 03/02/26(Mon)07:12:54 No.108276155

What do you guys feel about vibecoding versus making a "requirement document" first and then giving it to the LLM? Ive had more success with the latter but only with online LLM services

Anonymous
03/02/26(Mon)07:13:20 No.108276157

Anonymous 03/02/26(Mon)07:13:20 No.108276157

>>108276141
marvel cinematic universe?

Anonymous
03/02/26(Mon)07:13:51 No.108276163

Anonymous 03/02/26(Mon)07:13:51 No.108276163

>>108276143
holy fuck shit is depressing

>be me
>fish shell simp
>deployed Grand Master AI Env script (Llama.cpp + Qwen)
>local inference, no API tax, no telemetry
>features are actually useful for once:
>> `qm` / `qmv` : switch LLM or vision projector models instantly
>> VRAM auto-manage : reduces GPU layers if `nvidia-smi` shows low memory
>> `qwen --file` : upload context from local text/code files
>> `qwen --clip` : inject clipboard content into prompt
>> `qwen --proj` : index entire local project directory (24k context)
>> URL fetching : auto-scrapes http/https links via lynx or curl
>> `qsearch` : grep all chat history logs
>> `qview` / `qexport` : render logs to PDF with syntax highlighting
>> `qjournal` / `qpacman` : analyze `journalctl` or Arch update logs via AI

>example workflow:
>> `qwen https://news.ycombinator.com` "summarize top 3 stories"
>> `qwen --file main.rs "fix memory leak"`
>> `qpacman` "what broke in this update?"
>> `qsearch "ssh key"` "find where I saved that password"
>> `qexport 2024-05-20 meeting_notes.pdf`

>mfw I can chat to my OS without sending data to Big Tech
>file saved to $HOME/.local/state/qwen
>git gud

[ Prompt: 1053,2 t/s | Generation: 30,4 t/s ]

Anonymous
03/02/26(Mon)07:13:53 No.108276165

Anonymous 03/02/26(Mon)07:13:53 No.108276165

>>108276155
breh did you wake up yesterday from a coma? planning has been the default for a year already

Anonymous
03/02/26(Mon)07:15:47 No.108276172

Anonymous 03/02/26(Mon)07:15:47 No.108276172

>>108276143
How do you speak to your model? I am perhaps being foolish but I still include words like please and thank you and when it gives a good looking result I always say as much.
I figure it was trained on human speech so it would be best to talk to it as if it were a human.

Anonymous
03/02/26(Mon)07:17:08 No.108276176

Anonymous 03/02/26(Mon)07:17:08 No.108276176

>>108276172
if you add fluff, it adds fluff

Anonymous
03/02/26(Mon)07:17:23 No.108276177

Anonymous 03/02/26(Mon)07:17:23 No.108276177

>>108276157
Microcontrollers anon
LLMs can do a passable job if I'm making them write HAL code but if it's pure register level writes like
*((volatile uint32_t*)(0x40001234) |= bitmask<<shift;
They just fail. LLMs can't read the datasheet and reason. They just don't have enough training data, and even when they do they have to deal with MCUs from the same family but with different features (one MCU having a high resolution hardware timer on one address, while the other having something else like the DMA engine or whatever)

Anonymous
03/02/26(Mon)07:25:56 No.108276201

Anonymous 03/02/26(Mon)07:25:56 No.108276201

>>108276177
you speak funny as hell man

Anonymous
03/02/26(Mon)07:28:16 No.108276209

Anonymous 03/02/26(Mon)07:28:16 No.108276209

>>108276172
it wastes tokens

Anonymous
03/02/26(Mon)07:29:58 No.108276213

Anonymous 03/02/26(Mon)07:29:58 No.108276213

What the hell is up with qwen 3.5? Yesterday it was refusing pretty much everything and today it doesn't even think about safety. No wonder some people praise it and some say it's a disaster, because it's both randomly.

Anonymous
03/02/26(Mon)07:31:20 No.108276223

Anonymous 03/02/26(Mon)07:31:20 No.108276223

>>108276213
depends on which experts woke up today

Anonymous
03/02/26(Mon)07:32:57 No.108276226

Anonymous 03/02/26(Mon)07:32:57 No.108276226

>>108276223
hi

Anonymous
03/02/26(Mon)07:33:47 No.108276232

Anonymous 03/02/26(Mon)07:33:47 No.108276232

>>108276226
I'm sorry I can't help with that.

Anonymous
03/02/26(Mon)07:35:58 No.108276240

Anonymous 03/02/26(Mon)07:35:58 No.108276240

I got myself a 256gb ram kit and a 5090. I hope running qwen-397b won't be too slow.

Anonymous
03/02/26(Mon)07:37:42 No.108276248

Anonymous 03/02/26(Mon)07:37:42 No.108276248

>>108276240
are you for real? you spent all that to run the model on ram? are you dumb?

Anonymous
03/02/26(Mon)07:37:56 No.108276251

Anonymous 03/02/26(Mon)07:37:56 No.108276251

>>108276240
I'm envious. While I do have 512gb, it's ddr3. And my gpus are 16gb rx 580s.

Anonymous
03/02/26(Mon)07:38:18 No.108276252

Anonymous 03/02/26(Mon)07:38:18 No.108276252

>>108276240
Do try other moes like step and GLM too.

Anonymous
03/02/26(Mon)07:39:09 No.108276255

Anonymous 03/02/26(Mon)07:39:09 No.108276255

>>108276248
>>108276251
>>108276252
Silence peasants, let me do things at my own pace.

Anonymous
03/02/26(Mon)07:40:15 No.108276258

Anonymous 03/02/26(Mon)07:40:15 No.108276258

>>108276143
https://vocaroo.com/1jQ2ZwLUg2fX

i also made a qwen3.5 tts audiobook generator/voiceclone cli, it also reads txts and printed text directly in the terminal. will try to integrate directly with my cli wrapper for llama. IT JUST WORKS

Anonymous
03/02/26(Mon)07:42:01 No.108276265

Anonymous 03/02/26(Mon)07:42:01 No.108276265

>>108276255
we were only trying to help milord

Anonymous
03/02/26(Mon)07:43:48 No.108276271

Anonymous 03/02/26(Mon)07:43:48 No.108276271

>be pewdiepie
>infinite money
>buy 3090s
>make slop model
>claim it can beat gpt
proof money and popularity =/= brains

Anonymous
03/02/26(Mon)07:44:25 No.108276276

Anonymous 03/02/26(Mon)07:44:25 No.108276276

>>108276240
That's pretty munted if you don't have a threadripper for the memory bandwidth bro

Anonymous
03/02/26(Mon)07:45:36 No.108276282

Anonymous 03/02/26(Mon)07:45:36 No.108276282

>>108276276
are you australian?

Anonymous
03/02/26(Mon)07:49:05 No.108276299

Anonymous 03/02/26(Mon)07:49:05 No.108276299

>>108276258
nice
i had qwen 3.5 30b generate me a script to feed a .txt file to qwen3 tts and save the output.wav and like you i had a similar experience of it just working.
are you using voice design? i found with that you could just change the voice with a change of the prompt and it worked well enough

good luck anon

Anonymous
03/02/26(Mon)07:50:25 No.108276304

Anonymous 03/02/26(Mon)07:50:25 No.108276304

>>108276271
what would you do instead?

Anonymous
03/02/26(Mon)07:50:52 No.108276305

Anonymous 03/02/26(Mon)07:50:52 No.108276305

>>108276258
any tips to convert an ebook into something my tts won't choke on?
i used calibre to epub->txt but it's got all the shitty formatting
i spent all this time training tts models but now i actually want to listen to an epub

Anonymous
03/02/26(Mon)07:51:19 No.108276307

Anonymous 03/02/26(Mon)07:51:19 No.108276307

>>108276271
>proof money and popularity =/= brains
I think this was well known for ages, pewdiepie is a content creator for children, what did you expect?

Anonymous
03/02/26(Mon)07:52:27 No.108276312

Anonymous 03/02/26(Mon)07:52:27 No.108276312

>>108276304
nta but i would take you on an expensive date

Anonymous
03/02/26(Mon)07:54:01 No.108276317

Anonymous 03/02/26(Mon)07:54:01 No.108276317

>>108276271
as opposed to
>be [random ai company]
>infinite money from retarded investors funding anything with "AI" on it
>hire a datacenter
>make slop model trained on some benchmarks
>claim it can beat gpt
>get even more infinite money

Anonymous
03/02/26(Mon)07:54:35 No.108276321

Anonymous 03/02/26(Mon)07:54:35 No.108276321

>>108276317
post 37 examples

Anonymous
03/02/26(Mon)07:55:31 No.108276326

Anonymous 03/02/26(Mon)07:55:31 No.108276326

>>108276321
pick as many as you wish
https://huggingface.co/

Anonymous
03/02/26(Mon)07:56:09 No.108276331

Anonymous 03/02/26(Mon)07:56:09 No.108276331

>>108276326
huggingface is banned in my country

Anonymous
03/02/26(Mon)07:56:42 No.108276335

Anonymous 03/02/26(Mon)07:56:42 No.108276335

>>108276299
it starts to hallucinate after a minute or so for me so I had the script turn text clean utf 8, cut each book into 1h episodes, each episode into blocks of lines ~2 minutes to avoid model psychosis.

Anonymous
03/02/26(Mon)08:00:11 No.108276350

Anonymous 03/02/26(Mon)08:00:11 No.108276350

>>108276335
yeah yeah whatever yo§u say

Anonymous
03/02/26(Mon)08:00:19 No.108276351

Anonymous 03/02/26(Mon)08:00:19 No.108276351

>>108276331
based

Anonymous
03/02/26(Mon)08:01:06 No.108276355

Anonymous 03/02/26(Mon)08:01:06 No.108276355

File: qwen.png (102 KB, 815x694)

102 KB PNG

small qwens:
https://huggingface.co/Qwen/Qwen3.5-9B
https://huggingface.co/Qwen/Qwen3.5-4B
https://huggingface.co/Qwen/Qwen3.5-2B
https://huggingface.co/Qwen/Qwen3.5-0.8B

Anonymous
03/02/26(Mon)08:01:30 No.108276358

Anonymous 03/02/26(Mon)08:01:30 No.108276358

>>108276355
Useless.

Anonymous
03/02/26(Mon)08:02:11 No.108276362

Anonymous 03/02/26(Mon)08:02:11 No.108276362

>>108276355
>9B - 10B
>4B - 5B
>0.8B - 0.9B
lol

Anonymous
03/02/26(Mon)08:02:16 No.108276363

Anonymous 03/02/26(Mon)08:02:16 No.108276363

>>108276358
racism alert

Anonymous
03/02/26(Mon)08:03:48 No.108276371

Anonymous 03/02/26(Mon)08:03:48 No.108276371

File: file.png (37 KB, 326x524)

37 KB PNG

>>108276355
qwen bros we did it

Anonymous
03/02/26(Mon)08:04:36 No.108276375

Anonymous 03/02/26(Mon)08:04:36 No.108276375

>>108276371
is a soto the bf of a sota?

Anonymous
03/02/26(Mon)08:05:12 No.108276376

Anonymous 03/02/26(Mon)08:05:12 No.108276376

>>108276355
I wonder if we can use speculative decoding with these models?

Anonymous
03/02/26(Mon)08:05:24 No.108276378

Anonymous 03/02/26(Mon)08:05:24 No.108276378

>>108276355
Usecase?

Anonymous
03/02/26(Mon)08:06:37 No.108276386

Anonymous 03/02/26(Mon)08:06:37 No.108276386

>>108276376
why? qwen3.5 already have built in in vllm
>>108276378
text encoding for image model and research for labs is a big one for small qwens

Anonymous
03/02/26(Mon)08:07:12 No.108276389

Anonymous 03/02/26(Mon)08:07:12 No.108276389

>>108276355
can 9b milk my penis like nemo?

Anonymous
03/02/26(Mon)08:07:21 No.108276390

Anonymous 03/02/26(Mon)08:07:21 No.108276390

>>108276371
He meant soot = coal.

Anonymous
03/02/26(Mon)08:08:08 No.108276393

Anonymous 03/02/26(Mon)08:08:08 No.108276393

>>108276390
and i think he mean 'tism

Anonymous
03/02/26(Mon)08:13:52 No.108276420

Anonymous 03/02/26(Mon)08:13:52 No.108276420

>>108276305
>>108276335
# Define the directory containing the files
set TARGET_DIR "path/to/your/folder"

for file in $TARGET_DIR/*.txt
python3 -c "import ftfy; import sys; p=sys.argv[1]; data=open(p, 'r', encoding='utf-8').read(); open(p, 'w', encoding='utf-8').write(ftfy.fix_text(data))" "$file"
echo "Cleaned and processed: $file"
end

> ftfy
(fixes text for you) is a Python library and command-line tool that repairs broken Unicode text, specifically targeting "mojibake" (encoding mix-ups), HTML entities, and improper UTF-8 decoding. It automatically converts scrambled characters like Ã© back into their correct form (é) while avoiding false positives.

Anonymous
03/02/26(Mon)08:14:13 No.108276421

Anonymous 03/02/26(Mon)08:14:13 No.108276421

>>108276378
>Usecase?
i found the qwen3-4b base to be the best for distilling single tasks last time
ie, better than gemma3-4b base
and vibevoice is build on the smaller qwen2.5 models

Anonymous
03/02/26(Mon)08:21:07 No.108276455

Anonymous 03/02/26(Mon)08:21:07 No.108276455

>>108276420
thanks anon, that fixed the curly quotes, etc
guess now I can sed out the '* * *' etc

Anonymous
03/02/26(Mon)08:23:05 No.108276464

Anonymous 03/02/26(Mon)08:23:05 No.108276464

>>108276455
hey no problem, always willing to help retards

Anonymous
03/02/26(Mon)08:25:03 No.108276474

Anonymous 03/02/26(Mon)08:25:03 No.108276474

>>108276464 >>108276455

lmao not the same anon.

Anonymous
03/02/26(Mon)08:25:38 No.108276475

Anonymous 03/02/26(Mon)08:25:38 No.108276475

>>108276201
>you speak funny as hell man
:(

Anonymous
03/02/26(Mon)08:26:41 No.108276478

Anonymous 03/02/26(Mon)08:26:41 No.108276478

>>108276355
>they give the privilege of early access to models to unslop so they could do quants beforehand
>no bartowski
gay earth
did unslop send them money

Anonymous
03/02/26(Mon)08:26:47 No.108276479

Anonymous 03/02/26(Mon)08:26:47 No.108276479

It's so funny watching small models hallucinate then trying to justify the hallucinations.
I wonder if there will ever be a sort of indexed internal representation of things the AI knows that it can use as a reference so that it can say "Actually, no, I don't know that."

Anonymous
03/02/26(Mon)08:28:54 No.108276487

Anonymous 03/02/26(Mon)08:28:54 No.108276487

>>108276479
already a thing, and not used because it lower scores in benchmarks, the models become shier

Anonymous
03/02/26(Mon)08:29:52 No.108276491

Anonymous 03/02/26(Mon)08:29:52 No.108276491

>>108276487
Really? Is there a paper about that somewhere? What's the name?

Anonymous
03/02/26(Mon)08:44:02 No.108276540

Anonymous 03/02/26(Mon)08:44:02 No.108276540

>>108276378
I have found a 3B model is sufficient for making a text summary and maybe less but 3B is the smallest i have tested so far.
IBM already has such tiny models running in a browser thanks to webgpu and there is a future for such small models.
Just not a future if your only interest is ERP
https://huggingface.co/spaces/ibm-granite/Granite-4.0-Nano-WebGPU

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.