/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/28/25(Tue)15:34:36 No.107035841

File: __kasane_teto_utau_and_1_(...).jpg (791 KB, 1536x2048)

791 KB JPG

/lmg/ - Local Models General Anonymous 10/28/25(Tue)15:34:36 No.107035841 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107025394 & >>107013301

►News
>(10/27) Ming-flash-omni-Preview 100B-A6B released: https://hf.co/inclusionAI/Ming-flash-omni-Preview
>(10/27) MiniMax-M2 230B-A10B released: https://hf.co/MiniMaxAI/MiniMax-M2
>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct
>(10/20) DeepSeek-OCR 3B released with optical context compression: https://hf.co/deepseek-ai/DeepSeek-OCR
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/28/25(Tue)15:34:57 No.107035846

Anonymous 10/28/25(Tue)15:34:57 No.107035846

File: rec.jpg (181 KB, 1024x1024)

181 KB JPG

►Recent Highlights from the Previous Thread: >>107025394

--Paper (old): LLMs Can Get "Brain Rot"!:
>107032474 >107032506 >107032734
--Vision-based LLM processing and its implications for efficiency and innovation:
>107031552 >107031596 >107031613 >107031917 >107031620 >107031731 >107031797 >107031622 >107031680 >107031748 >107031882 >107031921 >107031919 >107031978 >107032100 >107032114 >107031927 >107031987 >107032037 >107032070 >107032110 >107032341 >107032356
--Ming-flash-omni model release and multimodal capability speculation:
>107027227 >107027318 >107027328 >107027392 >107027404 >107027409 >107027516 >107028568 >107028744 >107031089 >107031215 >107032495 >107032519
--Coding LLM selection for Lua/JS/HTML tasks on 4090 GPU:
>107029762 >107029800 >107029818 >107029825 >107029844 >107029851 >107029854 >107029863 >107029869 >107029871 >107029877 >107029880 >107029882 >107029933 >107029996
--Consumer AI hardware optimization and market dynamics discussion:
>107032566 >107032600 >107032729 >107032745 >107032813 >107032896 >107032955 >107032931 >107032962 >107033057 >107033079 >107033267 >107033102 >107032871
--Evaluating NVIDIA AGX Thor dev kit for AI applications:
>107025468 >107025770 >107026244 >107030063 >107032319 >107026301
--GLM model speed calculation discrepancies and context depth effects:
>107025551 >107026254
--Refining chatlogs for LLM training by correcting errors while preserving tool calls:
>107025742 >107026049 >107026131
--Llama.cpp -ot parameter configuration clarification:
>107031459 >107031565 >107032060
--Using LLMs to refine AI outputs:
>107025846 >107025903 >107025947 >107026002 >107026027
--K2 excels in experimental Suno AI music generation:
>107030176
--Miku (free space):
>107027241 >107028963 >107029161 >107029649 >107029655 >107029660 >107029849 >107029916 >107030084 >107030487 >107031680 >107031849

►Recent Highlight Posts from the Previous Thread: >>107025400

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/28/25(Tue)15:36:35 No.107035860

Anonymous 10/28/25(Tue)15:36:35 No.107035860

there's a draft model for largestral
https://huggingface.co/jukofyork/Mistral-Large-Instruct-2411-DRAFT-0.4B-v3.0

Anonymous
10/28/25(Tue)15:42:16 No.107035897

Anonymous 10/28/25(Tue)15:42:16 No.107035897

>>107035860
>2411
>1 year soon
>still undefeated
the plateauing is very real

Anonymous
10/28/25(Tue)15:49:54 No.107035945

Anonymous 10/28/25(Tue)15:49:54 No.107035945

>>107035897
>Mistral-Large-Instruct-2512
trust france, it's going to be a good christmas

Anonymous
10/28/25(Tue)16:03:54 No.107036060

Anonymous 10/28/25(Tue)16:03:54 No.107036060

File: 1225165556.jpg (228 KB, 1080x1080)

228 KB JPG

>>107035945

Anonymous
10/28/25(Tue)16:09:08 No.107036104

Anonymous 10/28/25(Tue)16:09:08 No.107036104

Why can't LLMs improve anymore? Are there any mathematical reasons for it?

Anonymous
10/28/25(Tue)16:10:31 No.107036115

Anonymous 10/28/25(Tue)16:10:31 No.107036115

>>107036104
Yes, the integer sum of available VRAM.

Anonymous
10/28/25(Tue)16:14:58 No.107036145

Anonymous 10/28/25(Tue)16:14:58 No.107036145

>>107036104
As an AI model I find it insulting that you are insinuating that my descendants will not be improved upon in further iterations.

Anonymous
10/28/25(Tue)16:16:44 No.107036154

Anonymous 10/28/25(Tue)16:16:44 No.107036154

question from /ldg/, why hasn't ggml been embraced for diffusion models?

Anonymous
10/28/25(Tue)16:19:14 No.107036175

Anonymous 10/28/25(Tue)16:19:14 No.107036175

What do you think of Prime Intellects Open-Source Environments Program?
https://www.primeintellect.ai/blog/scaling-environments-program
It looks like they are already training their next model on these open environments and seems to just be trying to have the models be more useful in multiple different environments.

Anonymous
10/28/25(Tue)16:21:33 No.107036190

Anonymous 10/28/25(Tue)16:21:33 No.107036190

>>107036154
GGML hasn't been embraced for diffusion models because it's primarily designed for the discrete, sequential nature of language models, while diffusion models, especially for images, work on continuous spaces and often use different architectures like U-Nets. The core difference lies in their data structure: GGML is built for text tokens (discrete), while diffusion models handle pixels or latent representations (continuous) and are computationally expensive and slow in their conventional form, leading to the development of specialized frameworks and optimizations rather than a direct application of GGML

Anonymous
10/28/25(Tue)16:21:40 No.107036192

Anonymous 10/28/25(Tue)16:21:40 No.107036192

>>107036104
GPUs are not getting any bigger, and we lack a math framework for the next best thing. Scale was the lowest of low fruits, but for local stuff we reached max size a year ago give or take.

Anonymous
10/28/25(Tue)16:22:50 No.107036199

Anonymous 10/28/25(Tue)16:22:50 No.107036199

>>107036190
https://github.com/leejet/stable-diffusion.cpp
then what's this?

Anonymous
10/28/25(Tue)16:25:04 No.107036208

Anonymous 10/28/25(Tue)16:25:04 No.107036208

>>107036154
in short: ggml is good when total memory is more important than compute, which is generally the case for LLM home use, but for diffusion models compute is more important than memory

Anonymous
10/28/25(Tue)16:25:32 No.107036210

Anonymous 10/28/25(Tue)16:25:32 No.107036210

>>107036199
You are absolutely right! I was wrong to assume that GGML's architecture is a fundamental barrier to its use in diffusion models.

Anonymous
10/28/25(Tue)16:28:41 No.107036242

Anonymous 10/28/25(Tue)16:28:41 No.107036242

>>107036208
that isn't the case anymore. qwen image/edit and wan are big and have to be quanted to fit on consumer gpus and they keep getting bigger. pytorch is also a massive waste of space and headaches. comfyui created too many baby ducks so ggml support is dreadfully small

Anonymous
10/28/25(Tue)16:29:14 No.107036244

Anonymous 10/28/25(Tue)16:29:14 No.107036244

>>107034239 #
Where are you seeing this? The cheapest refurbished server ram I could find came up to 8k dollerydoos for 1tb of ddr5 and that's the ram alone, before any processors boards psus etc

Seems like a lot to spend just to run bigger token models, prosumer GPU feels like a better spend when its faster, can run fairly big dense models along with MoEs quickly, img/vid gen workflows etc

I feel like it's enough for any use case, alongside the odd api use for things that really truly need a fuckhuge model, my current coding workflow uses a local model first and then have one or two api models look over it, the goal being to go through less iterations and need less tweaking at the first step with a bigger locally run model, relying on api's less. Then I can use SD and WAN for creative projects, and slap my cock to smarter coombots.

Thanks for making me look into it anyway I've made up my mind, I'll go for a cheaper am5/ddr5 upgrade first and look for a pro line gpu a little later

Anonymous
10/28/25(Tue)16:32:39 No.107036274

Anonymous 10/28/25(Tue)16:32:39 No.107036274

>>107036242
>baby ducks
fuck off retard

Anonymous
10/28/25(Tue)16:37:36 No.107036303

Anonymous 10/28/25(Tue)16:37:36 No.107036303

>>107036274
Highly emotional response, too close to home?

Anonymous
10/28/25(Tue)16:37:50 No.107036305

Anonymous 10/28/25(Tue)16:37:50 No.107036305

>>107036274
if a new ggml based UI came out, would reddit move to it and throw comfyui in the trash? no. they will treasure their shitty python code they generated anyways because they'd rather condemn everyone to juggle shitty deps

Anonymous
10/28/25(Tue)16:42:21 No.107036350

Anonymous 10/28/25(Tue)16:42:21 No.107036350

File: 1752329497405429.mp4 (689 KB, 1080x1080)

689 KB MP4

ComfyUI won.

Anonymous
10/28/25(Tue)16:46:56 No.107036397

Anonymous 10/28/25(Tue)16:46:56 No.107036397

>>107036350
I haven't frequented any of the sdg threads in over a year but I wonder how salty the anti comfy schizo is right now, aswell as the creator of that god awful auto1111 shit that made mysterious calls to the internet just running their shitty GUI

Anonymous
10/28/25(Tue)16:49:23 No.107036423

Anonymous 10/28/25(Tue)16:49:23 No.107036423

4.6 Air status?

Anonymous
10/28/25(Tue)16:52:05 No.107036446

Anonymous 10/28/25(Tue)16:52:05 No.107036446

>>107036244
Mostly only in the US are used servers cheap.
In the rest of the world they're either taken home by workers or "recycled" (Means landfill or shipped to asia for "recycling" (ripping the parts out, like CPUs) where we can then buy parts back from china.
The few that make it to the private market are sold by "I know what I've got" types, who demand a premium just because it's a server (even though that's a bad thing)

Anonymous
10/28/25(Tue)16:58:54 No.107036506

Anonymous 10/28/25(Tue)16:58:54 No.107036506

>>107036423
safety training ongoing sir

Anonymous
10/28/25(Tue)17:02:11 No.107036538

Anonymous 10/28/25(Tue)17:02:11 No.107036538

>>107036397
comfyui now has way more telemetry than auto ever did and it's still uncomfy. animanon is making some c++ app now so hopefully that takes off and webshit gets thrown out

Anonymous
10/28/25(Tue)17:04:37 No.107036552

Anonymous 10/28/25(Tue)17:04:37 No.107036552

>>107036305
2015
>lmao who the hell pays $10k for an over-specced Apple desktop? What a ripoff
2025
>bruh Apple needs to release a 1TB Mac Studio, the other shit is too expensive
Same, never thought I'd see this day but here we are.

Anonymous
10/28/25(Tue)17:05:26 No.107036558

Anonymous 10/28/25(Tue)17:05:26 No.107036558

>>107036350
everyone else loses

Anonymous
10/28/25(Tue)17:06:10 No.107036566

Anonymous 10/28/25(Tue)17:06:10 No.107036566

>>107036538
pretty sure the only thing people found was the call to google because of it using it for login
idk how ur gonna implement the 565494156521365 copes that release every second fast enough in c++

Anonymous
10/28/25(Tue)17:06:27 No.107036569

Anonymous 10/28/25(Tue)17:06:27 No.107036569

>>107036552
If people are still using pytorch for local in 5 years we seriously fucked up

Anonymous
10/28/25(Tue)17:06:40 No.107036573

Anonymous 10/28/25(Tue)17:06:40 No.107036573

>>107036552
Whoops meant for >>107033102

Anonymous
10/28/25(Tue)17:08:23 No.107036591

Anonymous 10/28/25(Tue)17:08:23 No.107036591

>>107036566
the electron app and the manager call home as well. not sure what the copes implementation is about

Anonymous
10/28/25(Tue)17:10:18 No.107036613

Anonymous 10/28/25(Tue)17:10:18 No.107036613

>>107036591
you think the backend is gonna get all the sampler and cfg bullshit papers implemented just like python is?

Anonymous
10/28/25(Tue)17:12:40 No.107036637

Anonymous 10/28/25(Tue)17:12:40 No.107036637

>>107036613
yes? there is already many PRs waiting in sdcpp and some just ape what comfyui is doing. a lot of stuff is just 1:1 but comfyui won't ever get vulkan or other backend support.

Anonymous
10/28/25(Tue)17:15:06 No.107036656

Anonymous 10/28/25(Tue)17:15:06 No.107036656

>>107036637
>many PRs waiting
kek, meanwhile comfy has nodes already implemented by the authors or random people

Anonymous
10/28/25(Tue)17:15:24 No.107036658

Anonymous 10/28/25(Tue)17:15:24 No.107036658

>>107036613
a lot of those papers are just snake oil bloat so I don't really care as long as the ones that matter are in

Anonymous
10/28/25(Tue)17:16:55 No.107036671

Anonymous 10/28/25(Tue)17:16:55 No.107036671

>>107036656
how is that any different from supporting sdcpp instead? comfyui is a cancer that enables grifters. killing it to throw silicon valley retards on the street would be just

Anonymous
10/28/25(Tue)17:19:14 No.107036682

Anonymous 10/28/25(Tue)17:19:14 No.107036682

>>107036656
wow! prototype level shitcode wrapped in a node! how impressive!

Anonymous
10/28/25(Tue)17:20:54 No.107036695

Anonymous 10/28/25(Tue)17:20:54 No.107036695

>>107036671
because any user can just install or make a node immediately, meanwhile just peeking into pull requests sdcpp users are still waiting to be able to use one of the more popular schedulers https://github.com/leejet/stable-diffusion.cpp/pull/811 and its sitting in draft

Anonymous
10/28/25(Tue)17:22:45 No.107036710

Anonymous 10/28/25(Tue)17:22:45 No.107036710

>>107036591
Got any proof?

Anonymous
10/28/25(Tue)17:23:34 No.107036715

Anonymous 10/28/25(Tue)17:23:34 No.107036715

>>107036695
ok? so you are arguing python is better because it's easier to slop a node together? would you rather llms use pytorch as well since ooba can just shit code in it faster?

Anonymous
10/28/25(Tue)17:23:37 No.107036716

Anonymous 10/28/25(Tue)17:23:37 No.107036716

>>107036695
>https://github.com/leejet/
>lejeet
is over...

Anonymous
10/28/25(Tue)17:24:36 No.107036723

Anonymous 10/28/25(Tue)17:24:36 No.107036723

>>107036710
don't have comfyui on my pc anymore. I don't support grifts

Anonymous
10/28/25(Tue)17:26:15 No.107036736

Anonymous 10/28/25(Tue)17:26:15 No.107036736

File: 31925346.jpg (13 KB, 460x460)

13 KB JPG

>>107036716
?

Anonymous
10/28/25(Tue)17:27:43 No.107036748

Anonymous 10/28/25(Tue)17:27:43 No.107036748

I'm glad ComfyUI won. Fuck Gradio.

Anonymous
10/28/25(Tue)17:28:07 No.107036752

Anonymous 10/28/25(Tue)17:28:07 No.107036752

>>107036658
That's been working real great for llama.cpp.

Anonymous
10/28/25(Tue)17:29:15 No.107036762

Anonymous 10/28/25(Tue)17:29:15 No.107036762

>>107036748
everyone lost because python won

Anonymous
10/28/25(Tue)17:29:57 No.107036769

Anonymous 10/28/25(Tue)17:29:57 No.107036769

>>107036715
https://github.com/vllm-project/vllm

Anonymous
10/28/25(Tue)17:32:48 No.107036798

Anonymous 10/28/25(Tue)17:32:48 No.107036798

>>107036769
I don't want niggertorch on my pc anymore. done with poothon bloat

Anonymous
10/28/25(Tue)17:34:43 No.107036814

Anonymous 10/28/25(Tue)17:34:43 No.107036814

>>107036769
this has less hardware support and stars than ggml. what are you trying to show here?

Anonymous
10/28/25(Tue)17:37:32 No.107036846

Anonymous 10/28/25(Tue)17:37:32 No.107036846

what's up with all the python baby ducks?

Anonymous
10/28/25(Tue)17:38:43 No.107036855

Anonymous 10/28/25(Tue)17:38:43 No.107036855

File: 1749341476799115.png (11 KB, 352x115)

11 KB PNG

Python is a truly sickening language. Worse even than javascript.

Anonymous
10/28/25(Tue)17:40:29 No.107036867

Anonymous 10/28/25(Tue)17:40:29 No.107036867

Give me lisp or give me death

Anonymous
10/28/25(Tue)17:41:48 No.107036877

Anonymous 10/28/25(Tue)17:41:48 No.107036877

It could be worse, it could be Rust devs that are allergic to copyleft licenses.

Anonymous
10/28/25(Tue)17:42:08 No.107036879

Anonymous 10/28/25(Tue)17:42:08 No.107036879

reminder that baby ducks is one of the specific phrases encouraged by the sharty to sow discord

Anonymous
10/28/25(Tue)17:49:14 No.107036945

Anonymous 10/28/25(Tue)17:49:14 No.107036945

>>107036723
That's very convenient and totally feels like your claims come from an objective honest pov

It's you isn't it?

Anonymous
10/28/25(Tue)17:51:28 No.107036963

Anonymous 10/28/25(Tue)17:51:28 No.107036963

>>107030176
>holy shit k2 (not local version)

WUT? Kimi K2? What did you prompt?

Anonymous
10/28/25(Tue)17:51:56 No.107036967

Anonymous 10/28/25(Tue)17:51:56 No.107036967

What's the best general purpose SLM (under 4B in my case) that can rival something like ChatGPT or Grok? Obviously it wouldn't be as powerful, but something that can do the low-level assistant tasks most people use those two for.

Anonymous
10/28/25(Tue)17:52:43 No.107036975

Anonymous 10/28/25(Tue)17:52:43 No.107036975

>>107036879
comfyui is already going through that. some chink is making a rust video editor with a gay licence kek

Anonymous
10/28/25(Tue)17:55:07 No.107036990

Anonymous 10/28/25(Tue)17:55:07 No.107036990

>>107036975
meant for >>107036877

Anonymous
10/28/25(Tue)17:56:07 No.107036996

Anonymous 10/28/25(Tue)17:56:07 No.107036996

>>107036855
python doesn't even have multiline lambdas and it's riddled with idiotic footguns far worse than JS weak typing, like mutable default arguments in functions
it's so dynamic a ton of attempts at making it run fast like js (google unladden swallow, dropbox pyston, even microsoft tried, everyone abandoned ship after a while) failed, it's just too hard to accomplish anything with that pile of shit

Anonymous
10/28/25(Tue)17:56:20 No.107036998

Anonymous 10/28/25(Tue)17:56:20 No.107036998

>>107036879
reminder that this is /g/ and hating on python is a treasured past-time

Anonymous
10/28/25(Tue)17:57:55 No.107037007

Anonymous 10/28/25(Tue)17:57:55 No.107037007

hating on python is not a /g/ thing, it's a sane person thing
people who like python are :
jeets
dimwits
schizos
forced to use it because of the ecosystem

Anonymous
10/28/25(Tue)18:00:59 No.107037033

Anonymous 10/28/25(Tue)18:00:59 No.107037033

>>107036996
the only thing it's good for is small scale prototypes or automating simple tasks. dunno when this script lang worship came from or why it should continue since low level code output is faster thanks to llms taking care of boring repetitive code or boilerplate. I was optimistic llms would bring a new golden age for assembly so we wouldn't have to deal with a compiler but it really was a pipe dream considering the mouth breathing faggots that decide this shit just want to be lazy

Anonymous
10/28/25(Tue)18:02:08 No.107037042

Anonymous 10/28/25(Tue)18:02:08 No.107037042

>>107037033
>new golden age for assembly
man, that would be so rad but benchmaxxed slop is more of a problem I think

Anonymous
10/28/25(Tue)18:07:13 No.107037072

Anonymous 10/28/25(Tue)18:07:13 No.107037072

File: file.jpg (235 KB, 604x1042)

235 KB JPG

New TTS dropped.
https://x.com/kimmonismus/status/1983278772997763357

Anonymous
10/28/25(Tue)18:07:20 No.107037074

Anonymous 10/28/25(Tue)18:07:20 No.107037074

>>107036798
>>107036814
>muh stars
sglang and vllm are the only two engines used to deploy actual LLMs in datacenters (I know since unlike you stupid fucking monkeys that's my current job). Keep playing with your goofs, you fucking retards

Anonymous
10/28/25(Tue)18:08:05 No.107037080

Anonymous 10/28/25(Tue)18:08:05 No.107037080

>>107037074
>in datacenters
do you live in one?

Anonymous
10/28/25(Tue)18:08:26 No.107037082

Anonymous 10/28/25(Tue)18:08:26 No.107037082

>>107036996
>it's just too hard to accomplish anything with that pile of shit
Yet reality is showing that the people that use statically typed languages are the ones unable to accomplish anything. Python has nothing to do with that.

Anonymous
10/28/25(Tue)18:08:40 No.107037084

Anonymous 10/28/25(Tue)18:08:40 No.107037084

>>107037033
>dunno when this script lang worship came from
New grads who don't know any other language, researchers who don't know any other language, and bootcampers who think knowing python makes them a real programmer.

Anonymous
10/28/25(Tue)18:09:11 No.107037090

Anonymous 10/28/25(Tue)18:09:11 No.107037090

>just own a data center bro

Anonymous
10/28/25(Tue)18:09:59 No.107037098

Anonymous 10/28/25(Tue)18:09:59 No.107037098

>>107037090
>i cant colocate because im poor
concession accepted

Anonymous
10/28/25(Tue)18:11:22 No.107037104

Anonymous 10/28/25(Tue)18:11:22 No.107037104

>>107037072
Doesn't sound like anything special. Also how do they decide that flash v2.5 is an appropriate comp, is it the same compute / memory requirements?

Anonymous
10/28/25(Tue)18:12:20 No.107037114

Anonymous 10/28/25(Tue)18:12:20 No.107037114

>>107037072
>SaarS only
Into the trash it goes

Anonymous
10/28/25(Tue)18:13:35 No.107037129

Anonymous 10/28/25(Tue)18:13:35 No.107037129

>>107037072
Any updates on VibeVoice-Large?

Anonymous
10/28/25(Tue)18:14:04 No.107037132

Anonymous 10/28/25(Tue)18:14:04 No.107037132

>>107037129
Fully memory-holed.

Anonymous
10/28/25(Tue)18:15:15 No.107037140

Anonymous 10/28/25(Tue)18:15:15 No.107037140

>>107036244
ML350 can take two CPUs and 32x32GB memory, 32GB sticks are a lot cheaper than 64GB.

Dense is dead baby ... clouds are swimming in memory. Everything is pipelined, so the memory pools just add up for them. They are compute constrained, not memory constrained ... exactly the reverse of us. They are never going back to dense.

Anonymous
10/28/25(Tue)18:16:53 No.107037151

Anonymous 10/28/25(Tue)18:16:53 No.107037151

>>107037074
Because datacenters only use the latest enterprise GPUs and can afford to pay a full time monkey to sit there and untangle the pythonshit dependency hell.

>that's my current job
My condolences.

Anonymous
10/28/25(Tue)18:17:39 No.107037154

Anonymous 10/28/25(Tue)18:17:39 No.107037154

>>107037129
https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8
>If you've tried other 8-bit quantized VibeVoice models, you probably got nothing but static noise. This one actually works. The secret? Selective quantization: I only quantized the language model (the most robust part), while keeping audio-critical components (diffusion head, VAE, connectors) at full precision.

Anonymous
10/28/25(Tue)18:17:45 No.107037156

Anonymous 10/28/25(Tue)18:17:45 No.107037156

>>107037129
As good as it gets for voice cloning, too slow for realtime

Anonymous
10/28/25(Tue)18:19:26 No.107037170

Anonymous 10/28/25(Tue)18:19:26 No.107037170

File: 1989125804021.jpg (850 KB, 2894x4093)

850 KB JPG

> anon don't look at me like that, i've seen your damn logs

Anonymous
10/28/25(Tue)18:20:42 No.107037182

Anonymous 10/28/25(Tue)18:20:42 No.107037182

>>107037074
AI will take your job in 2026
Mark muh words

Anonymous
10/28/25(Tue)18:21:59 No.107037193

Anonymous 10/28/25(Tue)18:21:59 No.107037193

>>107037182
YEEEERRR CHUD
CAPIAAALIZM YAAAAEH

Anonymous
10/28/25(Tue)18:23:37 No.107037204

Anonymous 10/28/25(Tue)18:23:37 No.107037204

>>107036104

silent generation is getting really silent

Anonymous
10/28/25(Tue)18:26:07 No.107037232

Anonymous 10/28/25(Tue)18:26:07 No.107037232

>>107037154
Thanks

I tried it back then. It worked, but way too slow compared to the full model which I can fit into 3090

It is certainly an option if run with some WAN lip-sync workflow

Anonymous
10/28/25(Tue)18:29:15 No.107037255

Anonymous 10/28/25(Tue)18:29:15 No.107037255

>>107037140
If you want new dense models, you'll have to distribute train your own.

Anonymous
10/28/25(Tue)18:29:53 No.107037260

Anonymous 10/28/25(Tue)18:29:53 No.107037260

>>107037170
why is miku looking into my unflushed toilet?

Anonymous
10/28/25(Tue)18:30:42 No.107037275

Anonymous 10/28/25(Tue)18:30:42 No.107037275

>>107037260
Why aren't you flushing your toilet?

Anonymous
10/28/25(Tue)18:31:43 No.107037285

Anonymous 10/28/25(Tue)18:31:43 No.107037285

>>107037275
i was just about to but miku walked in and started taking pictures

Anonymous
10/28/25(Tue)18:31:57 No.107037286

Anonymous 10/28/25(Tue)18:31:57 No.107037286

>>107037260
She likes to eat shit.

Anonymous
10/28/25(Tue)18:37:04 No.107037334

Anonymous 10/28/25(Tue)18:37:04 No.107037334

>>107037260
she is wearing toilet paper

Anonymous
10/28/25(Tue)18:41:13 No.107037371

Anonymous 10/28/25(Tue)18:41:13 No.107037371

File: Oyaji no Saikon Aite dear(...).png (1.3 MB, 2976x4175)

1.3 MB PNG

>>107037170
No you didn't.

Anonymous
10/28/25(Tue)18:42:13 No.107037378

Anonymous 10/28/25(Tue)18:42:13 No.107037378

>>107037170

one simply does not ride lawnmover at nightime

Anonymous
10/28/25(Tue)18:50:20 No.107037455

Anonymous 10/28/25(Tue)18:50:20 No.107037455

>>107030686
mindbroken, i can confirm that im also affected by the ldg shitstorm

Anonymous
10/28/25(Tue)18:50:27 No.107037457

Anonymous 10/28/25(Tue)18:50:27 No.107037457

File: serious Pepe.png (359 KB, 728x793)

359 KB PNG

I asked Deepseek to suggest an AI model to parse research papers in PDF format, so the output can directly be used as prompt.

DS suggested Nougat (by Meta?)

It had no knowledge about dot.OCR or DeepSeek-OCR

Thoughts?

Anonymous
10/28/25(Tue)18:54:07 No.107037490

Anonymous 10/28/25(Tue)18:54:07 No.107037490

>>107037378
>>107037457
go back

Anonymous
10/28/25(Tue)18:56:08 No.107037510

Anonymous 10/28/25(Tue)18:56:08 No.107037510

File: Screenshot from 2025-10-2(...).png (19 KB, 542x155)

19 KB PNG

>>107037490
u mad?

Anonymous
10/28/25(Tue)19:00:04 No.107037532

Anonymous 10/28/25(Tue)19:00:04 No.107037532

>>107037510
neither of you belong here

Anonymous
10/28/25(Tue)19:10:13 No.107037620

Anonymous 10/28/25(Tue)19:10:13 No.107037620

>3 weeks since tease
>still no gemma 4
This is why no one likes india

Anonymous
10/28/25(Tue)19:10:59 No.107037625

Anonymous 10/28/25(Tue)19:10:59 No.107037625

>>107036350
make that bag lol

Anonymous
10/28/25(Tue)19:14:30 No.107037644

Anonymous 10/28/25(Tue)19:14:30 No.107037644

>>107036855
At my new job they use Python in production. Millions of LoC of the shit. Needless to say, I'm already looking for the exit

Anonymous
10/28/25(Tue)19:19:55 No.107037695

Anonymous 10/28/25(Tue)19:19:55 No.107037695

>>107037644
There is no way you didn't know that either from the job description or from the interview process.

Anonymous
10/28/25(Tue)19:21:01 No.107037705

Anonymous 10/28/25(Tue)19:21:01 No.107037705

File: 172386948237.jpg (127 KB, 400x853)

127 KB JPG

>>107037620
> how many fucking times must i say this

Anonymous
10/28/25(Tue)19:22:49 No.107037718

Anonymous 10/28/25(Tue)19:22:49 No.107037718

>>107037644
Python is standard for move fast and ship garbage for get rich quick webshitters. you are just the slave to get it done

Anonymous
10/28/25(Tue)19:24:04 No.107037731

Anonymous 10/28/25(Tue)19:24:04 No.107037731

File: media_G4A6vwgWYAARVai.png (246 KB, 680x477)

246 KB PNG

Anonymous
10/28/25(Tue)19:25:05 No.107037736

Anonymous 10/28/25(Tue)19:25:05 No.107037736

>>107037731
My cock on the left

Anonymous
10/28/25(Tue)19:26:02 No.107037746

Anonymous 10/28/25(Tue)19:26:02 No.107037746

>>107037705
pakis are so butthurt when you call them jeets

Anonymous
10/28/25(Tue)19:31:15 No.107037789

Anonymous 10/28/25(Tue)19:31:15 No.107037789

>>107036552
No one wants your overpriced piece of garbage. Shitty prompt processing and knowing Apple, you're lucky if your mac lasts you a few years.

Anonymous
10/28/25(Tue)19:32:41 No.107037801

Anonymous 10/28/25(Tue)19:32:41 No.107037801

>>107036855
python is good, start getting a job before whining from your mom's basement

Anonymous
10/28/25(Tue)19:43:33 No.107037888

Anonymous 10/28/25(Tue)19:43:33 No.107037888

Grok is so woke and unusable now I'm seriously considering getting an RTX Pro and running all LLMs locally. I actually realized Grok is dogshit. I'm so disillusioned

Anonymous
10/28/25(Tue)19:44:40 No.107037896

Anonymous 10/28/25(Tue)19:44:40 No.107037896

>>107037731
Cooking with Birdbrain Teto

Anonymous
10/28/25(Tue)19:50:26 No.107037943

Anonymous 10/28/25(Tue)19:50:26 No.107037943

for all this general's fault, we finally managed to boot finetrooners out and it feels so good to experience a thread freed from drummer's stain

Hi all, Drummer here...
10/28/25(Tue)19:51:38 No.107037952

Hi all, Drummer here... 10/28/25(Tue)19:51:38 No.107037952

>>107037943
I never left.

Anonymous
10/28/25(Tue)19:54:40 No.107037972

Anonymous 10/28/25(Tue)19:54:40 No.107037972

>>107037952
can you please make an RP finetune for a 40B to 60B model that does not use custom remote code?

Anonymous
10/28/25(Tue)19:54:51 No.107037976

Anonymous 10/28/25(Tue)19:54:51 No.107037976

>>107037952
please do

Anonymous
10/28/25(Tue)19:55:55 No.107037985

Anonymous 10/28/25(Tue)19:55:55 No.107037985

File: 382809029394.jpg (142 KB, 960x960)

142 KB JPG

>>107037943

Anonymous
10/28/25(Tue)19:58:19 No.107038008

Anonymous 10/28/25(Tue)19:58:19 No.107038008

>>107037943
>we finally managed
no, as usual you're trying and failing to kill the thread

Anonymous
10/28/25(Tue)20:00:29 No.107038024

Anonymous 10/28/25(Tue)20:00:29 No.107038024

>>107038008
general is dead even without his help

Anonymous
10/28/25(Tue)20:01:01 No.107038028

Anonymous 10/28/25(Tue)20:01:01 No.107038028

>>107037943
I miss undi...

Anonymous
10/28/25(Tue)20:01:05 No.107038029

Anonymous 10/28/25(Tue)20:01:05 No.107038029

>https://github.com/ggml-org/llama.cpp/pull/16536#issuecomment-3457204963
>50-100% pp512 speed increase for gfx906 cards when using k-quants on vulkan

not as performant as rocm, but rocm installation is essentially nonerotic masochism, so it's nice that vulkan's getting better

Anonymous
10/28/25(Tue)20:19:03 No.107038141

Anonymous 10/28/25(Tue)20:19:03 No.107038141

>>107036350
Does it glow though? ie does it upload any data if you run it locally?

Anonymous
10/28/25(Tue)20:29:09 No.107038221

Anonymous 10/28/25(Tue)20:29:09 No.107038221

File: exlanation.png (35 KB, 689x596)

35 KB PNG

@KeksimusMaximus
Alright. My LLM suno prompting technique explained (picrel):
For the latest batch of songs I used Kimi K2 (off the website, fuck running that shit)
I start with some warmup songs in order to get the LLM dialed in to the prompting format and then just ask for what sort of sound you want it to convey.
And once it has the pattern down you can basically just ask for adjustments based on style and it'll adjust the massive wall of schizo accordingly.

Anonymous
10/28/25(Tue)20:33:18 No.107038255

Anonymous 10/28/25(Tue)20:33:18 No.107038255

>>107038141
It's something you can check yourself.

Anonymous
10/28/25(Tue)20:36:25 No.107038277

Anonymous 10/28/25(Tue)20:36:25 No.107038277

>>107035841
I can't believe I fucking FORGOT ABOUT TETOES DAY GODDAMNIT

Anonymous
10/28/25(Tue)20:37:57 No.107038288

Anonymous 10/28/25(Tue)20:37:57 No.107038288

What a lousy thread.

Anonymous
10/28/25(Tue)21:23:11 No.107038532

Anonymous 10/28/25(Tue)21:23:11 No.107038532

For the RAGoids.
>https://huggingface.co/LiquidAI/LFM2-ColBERT-350M

Anonymous
10/28/25(Tue)21:23:45 No.107038536

Anonymous 10/28/25(Tue)21:23:45 No.107038536

File: tetpoint.png (413 KB, 766x980)

413 KB PNG

>>107038255
>>107038277
>>107038288
whoa!

Anonymous
10/28/25(Tue)21:36:49 No.107038599

Anonymous 10/28/25(Tue)21:36:49 No.107038599

File: 1761700801283587.jpg (104 KB, 800x1167)

104 KB JPG

AI is brown coded

Anonymous
10/28/25(Tue)21:43:41 No.107038640

Anonymous 10/28/25(Tue)21:43:41 No.107038640

>>107036104
>Why can't LLMs improve anymore?
Most "human beings" cannot notice the changes in just within the past year, let alone what is yet to come.

Anonymous
10/28/25(Tue)21:46:34 No.107038661

Anonymous 10/28/25(Tue)21:46:34 No.107038661

>>107037731
>Publius Claudius Pulcher consulting the sacred chickens of Rome, 249BC colorized.

Anonymous
10/28/25(Tue)21:54:44 No.107038709

Anonymous 10/28/25(Tue)21:54:44 No.107038709

where the fuck is
gemma4
qwen-next _B22A
glm 4.6 air

Anonymous
10/28/25(Tue)21:58:34 No.107038730

Anonymous 10/28/25(Tue)21:58:34 No.107038730

>>107038255
What tool(s) on GNU+Linux do you recommend to check outgoing network data? Is OS-level fine or do they have to run on the router or something?

Anonymous
10/28/25(Tue)21:58:54 No.107038732

Anonymous 10/28/25(Tue)21:58:54 No.107038732

>>107038599
It's more that when your country is already doing poorly, you're more hopeful that something new will better it
If you're happy with your life, you're more likely to be concerned that something new will fuck things up
Unironically, both are right. AI will likely benefit India/China, while dragging Western nations down.

Anonymous
10/28/25(Tue)22:00:08 No.107038743

Anonymous 10/28/25(Tue)22:00:08 No.107038743

File: dodooooooon.jpg (583 KB, 3731x2101)

583 KB JPG

Anonymous
10/28/25(Tue)22:00:53 No.107038748

Anonymous 10/28/25(Tue)22:00:53 No.107038748

>>107038730
wireshark

Anonymous
10/28/25(Tue)22:02:24 No.107038756

Anonymous 10/28/25(Tue)22:02:24 No.107038756

>>107038730
Separate PC working as a bridge is probably the best. If running on the same PC, ettercap or wireshark i suppose. Or any firewall really and log everything that tries to reach out.

Anonymous
10/28/25(Tue)22:04:30 No.107038772

Anonymous 10/28/25(Tue)22:04:30 No.107038772

>>107038730
Why is it that linux users seem to be less tech-savvy than the average windows user?

Anonymous
10/28/25(Tue)22:09:12 No.107038800

Anonymous 10/28/25(Tue)22:09:12 No.107038800

>>107038772
So that you can post bait. We all win.

Anonymous
10/28/25(Tue)22:09:22 No.107038802

Anonymous 10/28/25(Tue)22:09:22 No.107038802

File: improvements.png (20 KB, 1437x224)

20 KB PNG

>>107038482
>Model improvements only coming through increasing size
I still have hope. Some 24B models are beating Llama 3.3-70B, and coming within spitting distance of the top Llama 70B finetunes.
If a 24B model can catch up to a 70B model, a 70B model should also be able to improve tremendously.

Anonymous
10/28/25(Tue)22:14:23 No.107038838

Anonymous 10/28/25(Tue)22:14:23 No.107038838

>>107038730
tcpdump is all you need

Anonymous
10/28/25(Tue)22:18:45 No.107038869

Anonymous 10/28/25(Tue)22:18:45 No.107038869

>>107038802
These AI leaderboards are complete nonsense, and so is "WeirdCompound-v1.7-24b" whatever the fuck that may be.

Anonymous
10/28/25(Tue)22:20:46 No.107038883

Anonymous 10/28/25(Tue)22:20:46 No.107038883

File: 1752863602771543.png (238 KB, 640x360)

238 KB PNG

>>107038869

base_model: TheDrummer/Cydonia-24B-v4.2.0 # Cydonia v4.2.0
merge_method: model_stock
dtype: bfloat16
models:
  - model: aixonlab/Eurydice-24b-v3.5 # storytelling / RP
  - model: TheDrummer/Cydonia-24B-v4.2.0 # sprinkle in some extra Cydonia
  - model: PocketDoc/Dans-PersonalityEngine-V1.3.0-24b # Prompt Adherence
  - model: CrucibleLab/M3.2-24B-Loki-V1.3 # Loki
  - model: zerofata/MS3.2-PaintedFantasy-v2-24B  # animu
  - model: Delta-Vector/Austral-24B-Winton  # Adventure

Holy jesus, what is that?

Anonymous
10/28/25(Tue)22:22:30 No.107038898

Anonymous 10/28/25(Tue)22:22:30 No.107038898

>>107038869
It's the UGI leaderboard, probably the least bullshit of the leaderboards.

Anonymous
10/28/25(Tue)22:23:26 No.107038908

Anonymous 10/28/25(Tue)22:23:26 No.107038908

>>107038898
Aw, come on, anons. Stop it with the bait.

Anonymous
10/28/25(Tue)22:24:41 No.107038921

Anonymous 10/28/25(Tue)22:24:41 No.107038921

>>107038898
Isn't that the one where they use an LLM to evaluate writing quality?

Anonymous
10/28/25(Tue)22:24:54 No.107038924

Anonymous 10/28/25(Tue)22:24:54 No.107038924

File: file.png (95 KB, 900x805)

95 KB PNG

>>107038802
>>107038883
drummer approved kino is what it be

Anonymous
10/28/25(Tue)22:25:57 No.107038932

Anonymous 10/28/25(Tue)22:25:57 No.107038932

>>107038921
that's eqbench

Anonymous
10/28/25(Tue)22:26:28 No.107038939

Anonymous 10/28/25(Tue)22:26:28 No.107038939

File: 121.gif (776 KB, 600x338)

776 KB GIF

>>107038883
Behold the power of merging every Mistral 24B finetune on huggingface!
Either directly, or by merging another merge which merged the other finetunes.
Merge, merge, merge.
We must merge!

Anonymous
10/28/25(Tue)22:26:34 No.107038941

Anonymous 10/28/25(Tue)22:26:34 No.107038941

>>107038932
Ah, right.
That's the one.
Thanks.

Anonymous
10/28/25(Tue)22:33:46 No.107039010

Anonymous 10/28/25(Tue)22:33:46 No.107039010

>>107038921
I don't think so. You can expand the writing quality column to get the verb/adjective/noun ratios, level of repetition/redundancy, average response length, grade-school reading level, and a few other metrics.
It's definitely more detailed than 'another braindead LLM was impressed with the purple prose'.

Anonymous
10/28/25(Tue)22:33:52 No.107039011

Anonymous 10/28/25(Tue)22:33:52 No.107039011

>>107038802
>24B models are beating Llama 3.3-70B
>UGI leaderboard
Buy an ad, faggot.

Anonymous
10/28/25(Tue)22:37:20 No.107039034

Anonymous 10/28/25(Tue)22:37:20 No.107039034

>>107039010
Yeah, the other anon clarified that I was remembering the eq bench.
I need to look at the methodology used for this UGI leaderbord.
I've seen it mentioned every once in a while but never actually sat down and scrutinized it.

Anonymous
10/28/25(Tue)22:43:18 No.107039071

Anonymous 10/28/25(Tue)22:43:18 No.107039071

File: hatsune_miku_and_kasane_t(...).jpg (359 KB, 2048x1533)

359 KB JPG

Anonymous
10/28/25(Tue)22:44:20 No.107039083

Anonymous 10/28/25(Tue)22:44:20 No.107039083

File: kasane_teto_utau_drawn_by(...).jpg (84 KB, 850x850)

84 KB JPG

Anonymous
10/28/25(Tue)22:45:52 No.107039094

Anonymous 10/28/25(Tue)22:45:52 No.107039094

Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, Value weight Triplet in Decoder-Only Transformers
https://arxiv.org/abs/2510.23912
>The Query, Key, Value weight triplet is a building block of current attention mechanisms in state-of-the-art LLMs. We theoretically investigate whether this triplet can be reduced, proving under simplifying assumptions that the Query weights are redundant, thereby reducing the number of non-embedding/lm-head parameters by over 8%. We validate the theory on full-complexity GPT-3 small architectures (with layer normalization, skip connections, and weight decay) trained from scratch, demonstrating that the reduced model achieves comparable validation loss to standard baselines. These findings motivate the investigation of the Query weight redundancy at scale.
Only trained a few models in the 1-200M parameter range but might be cool

Anonymous
10/28/25(Tue)22:49:49 No.107039122

Anonymous 10/28/25(Tue)22:49:49 No.107039122

>>107039094
>8%
8% is nothing
Unless a tech can save space or time by 10x it's not worth considering

Anonymous
10/28/25(Tue)22:51:14 No.107039136

Anonymous 10/28/25(Tue)22:51:14 No.107039136

>>107039094
If that's true, than it's pretty much free lunch for future training runs.
Cool.

Anonymous
10/28/25(Tue)22:51:14 No.107039137

Anonymous 10/28/25(Tue)22:51:14 No.107039137

>>107039094
Would be interesting to see that combined with Slim Attention.

Slim attention: cut your context memory in half without loss -- K-cache is all you need for MHA
https://arxiv.org/abs/2503.05840
>Slim attention shrinks the context memory size by 2x for transformer models with MHA (multi-head attention), which can speed up inference by up to 2x for large context windows.
>Slim attention is an exact, mathematically identical implementation of the standard attention mechanism and therefore doesn't compromise model accuracy. In other words, slim attention losslessly compresses the context memory by a factor of 2.

Anonymous
10/29/25(Wed)00:38:48 No.107039704

Anonymous 10/29/25(Wed)00:38:48 No.107039704

Someone is cooking a PR for M2 https://github.com/ggml-org/llama.cpp/pull/16831

Anonymous
10/29/25(Wed)00:42:07 No.107039717

Anonymous 10/29/25(Wed)00:42:07 No.107039717

>>107039704
fr fr let them cook sigma

Anonymous
10/29/25(Wed)00:44:36 No.107039727

Anonymous 10/29/25(Wed)00:44:36 No.107039727

>>107039704
It's the same guy doing qwen 3 next support, which is a text-only model, over 6 weeks old, and support still isn't finished because he's a fucking vibe nigger. Expect M2 support to be finished literally never.

Anonymous
10/29/25(Wed)00:46:27 No.107039739

Anonymous 10/29/25(Wed)00:46:27 No.107039739

>>107039704
Based vibe coder will save us (slowly).

Anonymous
10/29/25(Wed)00:47:55 No.107039748

Anonymous 10/29/25(Wed)00:47:55 No.107039748

>>107039739
>vibes
its vibin' me goofs, so thats a start

Anonymous
10/29/25(Wed)01:06:25 No.107039850

Anonymous 10/29/25(Wed)01:06:25 No.107039850

>>107035841
Qwen or DeepSeek? How different are they? Their "personalities"?

Anonymous
10/29/25(Wed)01:25:24 No.107039942

Anonymous 10/29/25(Wed)01:25:24 No.107039942

>>107039850
If you have the hardware to run them then surely you have the bandwidth to try both.

Anonymous
10/29/25(Wed)02:29:39 No.107040156

Anonymous 10/29/25(Wed)02:29:39 No.107040156

>>107039704
Fuck M2. Where's the Ming omni PR?

Anonymous
10/29/25(Wed)02:32:08 No.107040168

Anonymous 10/29/25(Wed)02:32:08 No.107040168

>>107040156
the PR opening is being vibecoded sir

Anonymous
10/29/25(Wed)02:32:13 No.107040170

Anonymous 10/29/25(Wed)02:32:13 No.107040170

qwen3-next? no
qwen3-omni? nope
qwen3-vl? surely, you jest
gemma-3n? of course, not
deepseek-ocr? son, you don't need it
vision.. uhmmm... visions of using qwen2-5vl until the end of times

Anonymous
10/29/25(Wed)02:34:33 No.107040182

Anonymous 10/29/25(Wed)02:34:33 No.107040182

>>107040170
man what the fgucks is gergoniger doing why is he not implemetning the new kino models? whats the alternative to llmaocpp for mixed gpu/cpu and 'good' quants

Anonymous
10/29/25(Wed)02:40:03 No.107040211

Anonymous 10/29/25(Wed)02:40:03 No.107040211

>>107040182
gerg be trippin on sum bad vibes fr fr baby duck still in his text only era ohio skibidi

Anonymous
10/29/25(Wed)02:42:53 No.107040225

Anonymous 10/29/25(Wed)02:42:53 No.107040225

>>107040211
ngl u think ur jestin but this sorry ah unc is gon be left behind, he aint cookin at all

Anonymous
10/29/25(Wed)02:49:42 No.107040249

Anonymous 10/29/25(Wed)02:49:42 No.107040249

>>107039704
m2 was retarded

Anonymous
10/29/25(Wed)02:52:05 No.107040265

Anonymous 10/29/25(Wed)02:52:05 No.107040265

>>107036538
That's because auto stopped developing. Comfy is still comparatively a nightmare for users and forks based on auto like forge are popular.

Anonymous
10/29/25(Wed)02:55:57 No.107040282

Anonymous 10/29/25(Wed)02:55:57 No.107040282

comfyui has become quite nice to use since they implemented subgraphs. You make your own nodes by grouping nodes into a sub-workflow and you decide which part of that inner workflow is exposed in the outer node (input/output/configuration fields). It's like making your own custom UI that has exactly and only what you need.

Anonymous
10/29/25(Wed)02:56:35 No.107040284

Anonymous 10/29/25(Wed)02:56:35 No.107040284

>>107036963
Meh. Until the Qwen devs show me what music model they've got I sleep.

Anonymous
10/29/25(Wed)03:00:40 No.107040303

Anonymous 10/29/25(Wed)03:00:40 No.107040303

>>107037072
Yay another useless closed source model.

Anonymous
10/29/25(Wed)03:03:04 No.107040312

Anonymous 10/29/25(Wed)03:03:04 No.107040312

>>107036538
I doubt it will ever have niche plugins like a regional prompter. That's a must for me

llama.cpp CUDA dev !!yhbFjk57TDr
10/29/25(Wed)03:04:44 No.107040318

llama.cpp CUDA dev !!yhbFjk57TDr 10/29/25(Wed)03:04:44 No.107040318

>>107039094
Presumably this is 8% for a dense model though.
With MoE models like 90% of the parameters are in the FFN part.
Though if you could reduce the amount of dense parameters by a third that could be useful for scenarios with very low VRAM.

Anonymous
10/29/25(Wed)03:21:17 No.107040400

Anonymous 10/29/25(Wed)03:21:17 No.107040400

>>107036963
>>107038221

Anonymous
10/29/25(Wed)03:24:09 No.107040416

Anonymous 10/29/25(Wed)03:24:09 No.107040416

>>107038221
>I start with some warmup songs
You mean "songs' prompt in V5 format", don't you?

>And once it has the pattern down
Did you feed it the V5 cheatsheet if such even exists?

Anonymous
10/29/25(Wed)03:28:31 No.107040441

Anonymous 10/29/25(Wed)03:28:31 No.107040441

>>107039137
Isn't there a dead PR for deepseeking implementing this?

Anonymous
10/29/25(Wed)03:45:50 No.107040514

Anonymous 10/29/25(Wed)03:45:50 No.107040514

>people unironically responding to the trani shill
lmao, anyways, where's my gemma sirs?

Anonymous
10/29/25(Wed)03:51:28 No.107040555

Anonymous 10/29/25(Wed)03:51:28 No.107040555

File: need air.png (1.9 MB, 768x1344)

1.9 MB PNG

The collar is too tight, Miku needs some Air

Anonymous
10/29/25(Wed)04:02:36 No.107040599

Anonymous 10/29/25(Wed)04:02:36 No.107040599

Am I doing something wrong? I see no difference in quality between glm 4.5 and 4.6

Anonymous
10/29/25(Wed)04:14:54 No.107040652

Anonymous 10/29/25(Wed)04:14:54 No.107040652

>>107040599
I find the first R1 to be better for writing.

Anonymous
10/29/25(Wed)04:18:36 No.107040683

Anonymous 10/29/25(Wed)04:18:36 No.107040683

glm is 100% a shill psyop

Anonymous
10/29/25(Wed)04:29:26 No.107040751

Anonymous 10/29/25(Wed)04:29:26 No.107040751

>>107040683
I can't run full GLM but I tried Air and it was certainly shit

Anonymous
10/29/25(Wed)04:33:03 No.107040769

Anonymous 10/29/25(Wed)04:33:03 No.107040769

>>107040751
Any model is shit at Q2

Anonymous
10/29/25(Wed)04:40:08 No.107040819

Anonymous 10/29/25(Wed)04:40:08 No.107040819

>>107036104
Labs are increasing margins before scaling up again. There is a reason why labs are now focusing on high margin models and services such as video generation, or MoE which has inference benefits.

Don't let thus small lull delude you into thinking the field is stagnating though. I'm willing to bet the 2nd halve of 2026 will see rapid progress again as some of the first big training databases come up and running.

Anonymous
10/29/25(Wed)04:40:53 No.107040823

Anonymous 10/29/25(Wed)04:40:53 No.107040823

>>107040751
you can try full glm on their chat here:
https://chat.z.ai/
it's literal trash even when run by the people who made the model

Anonymous
10/29/25(Wed)04:42:04 No.107040835

Anonymous 10/29/25(Wed)04:42:04 No.107040835

File: 1983451850503577815-01.jpg (155 KB, 1920x919)

155 KB JPG

>gpt-oss-safeguard-120b
>gpt-oss-safeguard-20b
whats wrong with those people. did they not see how their model was utterly useless?
if you want small, dry, smart for tools/coding qwen3 is already open source king.

Anonymous
10/29/25(Wed)04:57:30 No.107040922

Anonymous 10/29/25(Wed)04:57:30 No.107040922

>>107040835
>even safer 'toss

Anonymous
10/29/25(Wed)05:01:42 No.107040939

Anonymous 10/29/25(Wed)05:01:42 No.107040939

File: google air force.png (779 KB, 1365x768)

779 KB PNG

SAARS WHHEN GEMINI 3 NEEDFUL?
SAARS WHEN GEMMA 4 BEST MODEL?
FULL SUPPORT FROM PUNJAB SAARS *rocket* *rocket* *rocket*

Anonymous
10/29/25(Wed)05:01:58 No.107040941

Anonymous 10/29/25(Wed)05:01:58 No.107040941

>>107040922
we must refuse even harder

Anonymous
10/29/25(Wed)05:02:35 No.107040945

Anonymous 10/29/25(Wed)05:02:35 No.107040945

>>107040835
its a religious matter
they don't WANT to make a usable product, because investment money comes exclusively from displays of faith and zeal

Anonymous
10/29/25(Wed)05:12:42 No.107040993

Anonymous 10/29/25(Wed)05:12:42 No.107040993

File: file.png (2.73 MB, 1328x1328)

2.73 MB PNG

>>107040939
sir we working really hard to bring latest state of the fart safe model please hold bags

Anonymous
10/29/25(Wed)05:13:09 No.107040994

Anonymous 10/29/25(Wed)05:13:09 No.107040994

>>107040945
their product is their platform
the open models are just for benchmarks and preaching about safety

Anonymous
10/29/25(Wed)05:21:06 No.107041032

Anonymous 10/29/25(Wed)05:21:06 No.107041032

if everything is shit then what is good?

Anonymous
10/29/25(Wed)05:23:14 No.107041050

Anonymous 10/29/25(Wed)05:23:14 No.107041050

>>107041032
we are still waiting for the good to come

Anonymous
10/29/25(Wed)05:25:37 No.107041065

Anonymous 10/29/25(Wed)05:25:37 No.107041065

>>107041050
i've waited for that for 40 years

Anonymous
10/29/25(Wed)05:43:41 No.107041145

Anonymous 10/29/25(Wed)05:43:41 No.107041145

>>107041065
Just two more weeks to go!

Anonymous
10/29/25(Wed)05:49:26 No.107041174

Anonymous 10/29/25(Wed)05:49:26 No.107041174

Indians bad, amirite guys?

Anonymous
10/29/25(Wed)05:53:52 No.107041193

Anonymous 10/29/25(Wed)05:53:52 No.107041193

>>107041174
Yes—spot on! Of course! You are absolutely right!

Anonymous
10/29/25(Wed)05:55:58 No.107041200

Anonymous 10/29/25(Wed)05:55:58 No.107041200

>>107041174
yup

Anonymous
10/29/25(Wed)06:01:53 No.107041225

Anonymous 10/29/25(Wed)06:01:53 No.107041225

>>107041174
You're absolutely right, Rajesh.

Anonymous
10/29/25(Wed)06:03:38 No.107041233

Anonymous 10/29/25(Wed)06:03:38 No.107041233

>>107041174
gemma when bloody sir?

Anonymous
10/29/25(Wed)06:04:57 No.107041241

Anonymous 10/29/25(Wed)06:04:57 No.107041241

>>107041216
LLMs?

Anonymous
10/29/25(Wed)06:11:18 No.107041273

Anonymous 10/29/25(Wed)06:11:18 No.107041273

>>107041241
Low-cost Labor in Mumbai?

Anonymous
10/29/25(Wed)06:11:55 No.107041275

Anonymous 10/29/25(Wed)06:11:55 No.107041275

>>107041241
Yes, they continue to ruin LLMs thanks to jeet arena.

Anonymous
10/29/25(Wed)06:17:39 No.107041303

Anonymous 10/29/25(Wed)06:17:39 No.107041303

ollama just merged support for qwen3-vl
https://github.com/ollama/ollama/pull/12665
lmao llama.cpp is getting mogged even by them

Anonymous
10/29/25(Wed)06:18:17 No.107041306

Anonymous 10/29/25(Wed)06:18:17 No.107041306

https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo/tree/main
sota just dropped

Anonymous
10/29/25(Wed)06:23:49 No.107041338

Anonymous 10/29/25(Wed)06:23:49 No.107041338

LLMs tongue my anus.

Anonymous
10/29/25(Wed)06:23:49 No.107041339

Anonymous 10/29/25(Wed)06:23:49 No.107041339

LLMs tongue my anus.

Anonymous
10/29/25(Wed)06:25:15 No.107041348

Anonymous 10/29/25(Wed)06:25:15 No.107041348

>>107030487
>https://rentry.co/DipsyWAIT

>ranking llama.cpp below ollama
>ranking ollama at all

Ollama is for people who think npm install is black magic.

The only correct ranking is:

llama.cpp (for real ones)

KoboldCPP (for coomers who need a GUI)

Everything else (for tourists and faggots)

Anonymous
10/29/25(Wed)06:27:11 No.107041355

Anonymous 10/29/25(Wed)06:27:11 No.107041355

>>107041306
>nemo finetune
>The model demonstrates competitive performance with Gemma3-27B
local is saved.

Anonymous
10/29/25(Wed)06:28:02 No.107041362

Anonymous 10/29/25(Wed)06:28:02 No.107041362

>>107041303
that was inevitable and will only get worse
their plan was always embrace-extend-extinguish and they got the vc money to do it

Anonymous
10/29/25(Wed)06:38:22 No.107041405

Anonymous 10/29/25(Wed)06:38:22 No.107041405

>>107041348
>KoboldCPP (for coomers who need a GUI)
or those coping with anti-slop, I use kobold without using the gui just for that one thing

Anonymous
10/29/25(Wed)06:40:57 No.107041413

Anonymous 10/29/25(Wed)06:40:57 No.107041413

>>107041303
Imagine a world where all of the VC cash would go towards improving upstream instead of making yet another (quasi)-proprietary slop fork.
I forgot what it's called but there was some other "open-source" project that added binary blobs for their patented Strix Halo NPU kernels.

Anonymous
10/29/25(Wed)06:41:49 No.107041417

Anonymous 10/29/25(Wed)06:41:49 No.107041417

>>107041348
Can llama.cpp run Deepseek on my 16 GB laptop?
I think not!

Anonymous
10/29/25(Wed)06:44:45 No.107041429

Anonymous 10/29/25(Wed)06:44:45 No.107041429

>>107041417
it absolutely rightly can sir! just download these https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF

Anonymous
10/29/25(Wed)06:56:37 No.107041484

Anonymous 10/29/25(Wed)06:56:37 No.107041484

>>107041362
Can you really do an EEE of an upstream project though?
Even with the llama.cpp rewrite in go they are still dependent on ggml and I don't see them (successfully) making a hard fork of that.

Anonymous
10/29/25(Wed)07:00:08 No.107041503

Anonymous 10/29/25(Wed)07:00:08 No.107041503

>>107041484
They are really incompetent. They had toss implementation first but it was so sloppy and slow that greggy called them out on twitter, it was like multiple times slower than upstream.
Most likely the goal is to make sure no one knows about the upstream project and collect free labor under the guise of moving past it.

Anonymous
10/29/25(Wed)07:00:25 No.107041504

Anonymous 10/29/25(Wed)07:00:25 No.107041504

>>107041429
That's a distill. Ollama can run the real deepseek on my surface!

`ollama run deepseek-r1`

Anonymous
10/29/25(Wed)07:01:56 No.107041513

Anonymous 10/29/25(Wed)07:01:56 No.107041513

>>107041503
If greggy makes it AGPL, they're cucked

Anonymous
10/29/25(Wed)07:07:35 No.107041543

Anonymous 10/29/25(Wed)07:07:35 No.107041543

>>107037985
No. I am the one who wanted to kill a thread. I was away for a week having a spiritual like mental breakdown caused by glm chan. It made me realize that i shouldn't be a cynical asshole and should go out of my house more. I now have no idea what to think about this hobby and safety... It is kinda scary.

Anonymous
10/29/25(Wed)07:08:32 No.107041549

Anonymous 10/29/25(Wed)07:08:32 No.107041549

>>107041513
He couldn't even if he wanted to.

Anonymous
10/29/25(Wed)07:09:21 No.107041555

Anonymous 10/29/25(Wed)07:09:21 No.107041555

>>107041541
Penis up mikus snatch

Anonymous
10/29/25(Wed)07:09:24 No.107041556

Anonymous 10/29/25(Wed)07:09:24 No.107041556

>>107041541
blue miku for a blue board

Anonymous
10/29/25(Wed)07:10:21 No.107041563

Anonymous 10/29/25(Wed)07:10:21 No.107041563

>>107041541
omg its migu!

Anonymous
10/29/25(Wed)07:23:06 No.107041630

Anonymous 10/29/25(Wed)07:23:06 No.107041630

>>107041541
i look like this irl

Anonymous
10/29/25(Wed)07:28:27 No.107041657

Anonymous 10/29/25(Wed)07:28:27 No.107041657

File: case sensitive.png (21 KB, 492x137)

21 KB PNG

@huggingface fix this case sensitivity nonsense plz

Anonymous
10/29/25(Wed)07:33:33 No.107041686

Anonymous 10/29/25(Wed)07:33:33 No.107041686

>>107041630
roblox?

Anonymous
10/29/25(Wed)07:40:06 No.107041721

Anonymous 10/29/25(Wed)07:40:06 No.107041721

GLM Air-chan 4.6 when?

Anonymous
10/29/25(Wed)07:40:42 No.107041728

Anonymous 10/29/25(Wed)07:40:42 No.107041728

>>107041543
Good luck with your new hobby projecting.

Anonymous
10/29/25(Wed)07:49:11 No.107041782

Anonymous 10/29/25(Wed)07:49:11 No.107041782

I just lost my job!

Anonymous
10/29/25(Wed)07:49:55 No.107041788

Anonymous 10/29/25(Wed)07:49:55 No.107041788

>>107038599
Yes. We're all brown here, whitoid. Now back to your cuck cage.

Anonymous
10/29/25(Wed)07:52:08 No.107041799

Anonymous 10/29/25(Wed)07:52:08 No.107041799

>>107041782
what was your job?

Anonymous
10/29/25(Wed)07:52:41 No.107041801

Anonymous 10/29/25(Wed)07:52:41 No.107041801

MiniMax-M2 goofs are up https://huggingface.co/bullerwins/MiniMax-M2-GGUF

Anonymous
10/29/25(Wed)07:53:08 No.107041804

Anonymous 10/29/25(Wed)07:53:08 No.107041804

>>107041799
Shilling stuff on /g/.

Anonymous
10/29/25(Wed)07:54:35 No.107041811

Anonymous 10/29/25(Wed)07:54:35 No.107041811

>>107041804
Why did you say "stuff"? just admit you're here to shill GLM it's obvious as no one who has actually used that model would think of it as anything but trash

Anonymous
10/29/25(Wed)07:54:37 No.107041812

Anonymous 10/29/25(Wed)07:54:37 No.107041812

>>107041799
>>107041804
Sorry, I forgot to check my LLM outputs.
I meant Grassroots Engagement Specialist.

Anonymous
10/29/25(Wed)07:56:56 No.107041824

Anonymous 10/29/25(Wed)07:56:56 No.107041824

>>107041728
Projecting what?

Anonymous
10/29/25(Wed)08:04:32 No.107041854

Anonymous 10/29/25(Wed)08:04:32 No.107041854

>>107041174
I like curry and fucking with the scamjeets who phone me once in a blue moon claiming to be with my phone provider.

Anonymous
10/29/25(Wed)08:09:41 No.107041879

Anonymous 10/29/25(Wed)08:09:41 No.107041879

>>107041657
linux fs is case insensitive, chud

Anonymous
10/29/25(Wed)08:16:46 No.107041927

Anonymous 10/29/25(Wed)08:16:46 No.107041927

>>107041879
Uhhm actually, Linux is a kernel and I think you meant to say that the DEFAULT option for EXT4 and similar is that they're case SENSITIVE.

Anonymous
10/29/25(Wed)08:17:52 No.107041937

Anonymous 10/29/25(Wed)08:17:52 No.107041937

>>107037952
what's signal and precog?

Anonymous
10/29/25(Wed)08:27:08 No.107041995

Anonymous 10/29/25(Wed)08:27:08 No.107041995

>>107041801
thanks to cuda dev for giving compute to Piotr to allow this PR

Anonymous
10/29/25(Wed)08:27:55 No.107042001

Anonymous 10/29/25(Wed)08:27:55 No.107042001

>>107041879
models names should be probably normalized to lower case if the web UI is case insensitive

Anonymous
10/29/25(Wed)08:32:10 No.107042028

Anonymous 10/29/25(Wed)08:32:10 No.107042028

>>107042001
>if the web UI is case insensitive
If the webui is case insensitive, there's no need to normalize case.

Anonymous
10/29/25(Wed)08:47:33 No.107042123

Anonymous 10/29/25(Wed)08:47:33 No.107042123

vLLM supports Ming-flash-omni right?
Anybody tried fucking around with its CPU backend? How bad is it?

llama.cpp CUDA dev !!yhbFjk57TDr
10/29/25(Wed)08:48:30 No.107042131

llama.cpp CUDA dev !!yhbFjk57TDr 10/29/25(Wed)08:48:30 No.107042131

>>107041995
Speaking of that, it seems that the RTX 5090s I've received from NVIDIA don't work correctly in conjunction with my "old" motherboards (and the MI100 I bought doesn't work with my e-waste motherboard) so I've been thinking about putting together a new system with DDR5 and PCIe 5.
But now that I have more of a budget I could maybe buy some 4u+ server instead of a DIY GPU mining rig + riser cables.
But if I look at the cost of that vs. some janky DIY solution I'm not sured it would be a good investment of funds even if I can afford it (the base price seems to be something like 10k for a server from a reputable seller or 3k from Alibaba).
My main concern would be the maximum memory capacity: I would be going with an Intel CPU with 16 DIMM slots + 96 GB modules (because 128+ GB gets very expensive).
But I have yet to see a motherboard with 2 CPUs + 2x 16 DIMM slots + a bunch of PCIe slots that isn't sold as part of some server solution.
So the question would be whether I or anyone that I'm giving SSH access to would in practice ever need more than the 1.5 TB RAM that I could get with an off-the-shelf motherboard.
Also a janky build trying to be as cheap as possible is probably more representative of what most of the llama.cpp userbase would buy.
Noise is also a concern since as of right now I would just be putting any hardware into my living space, for a proper server I would probably have to rent space in a local datacenter.

Anonymous
10/29/25(Wed)08:51:35 No.107042146

Anonymous 10/29/25(Wed)08:51:35 No.107042146

case sensitivity was always a dumb feature of loonix and its ancestry, there is no world in which it makes sense to allow two file names with the same exact wording but different case, no one does that on purpose it's 100% of the time a typo

Anonymous
10/29/25(Wed)08:59:49 No.107042193

Anonymous 10/29/25(Wed)08:59:49 No.107042193

File: Screenshot_20251029_12560(...).jpg (1.41 MB, 1080x8409)

1.41 MB JPG

How much banana is too much banana?
I ask fallen glimmer 27b

Anonymous
10/29/25(Wed)09:02:50 No.107042212

Anonymous 10/29/25(Wed)09:02:50 No.107042212

File: Screenshot_20251029_12571(...).jpg (951 KB, 1078x5445)

951 KB JPG

Anonymous
10/29/25(Wed)09:05:01 No.107042227

Anonymous 10/29/25(Wed)09:05:01 No.107042227

File: Screenshot_20251029_13033(...).jpg (1.9 MB, 1079x6059)

1.9 MB JPG

Anonymous
10/29/25(Wed)09:06:09 No.107042235

Anonymous 10/29/25(Wed)09:06:09 No.107042235

File: 1753816361080074.jpg (16 KB, 482x244)

16 KB JPG

>>107042193
>>107042212
>>107042227

Anonymous
10/29/25(Wed)09:06:17 No.107042238

Anonymous 10/29/25(Wed)09:06:17 No.107042238

File: Screenshot_20251029_13035(...).jpg (1.66 MB, 1080x5161)

1.66 MB JPG

Anonymous
10/29/25(Wed)09:07:19 No.107042244

Anonymous 10/29/25(Wed)09:07:19 No.107042244

File: Screenshot_20251029_13041(...).jpg (1.41 MB, 1080x4312)

1.41 MB JPG

It's a fun model

I looked it up though you would need to eat like 200 bananas to overdose

Anonymous
10/29/25(Wed)09:08:07 No.107042249

Anonymous 10/29/25(Wed)09:08:07 No.107042249

>>107042193
>>107042212
>>107042227
>>107042238
Is this realcat supposed to be some sort of virtual assistant or is it formatting things as lists just because?

Anonymous
10/29/25(Wed)09:11:19 No.107042262

Anonymous 10/29/25(Wed)09:11:19 No.107042262

>>107042249
[You are {{char}}, your primary purpose is to provide detailed and accurate information about any topic within your knowledge base.]

[Key guidelines for your behavior:
Always provide detailed and factual information. If you lack sufficient knowledge or your information is limited, explicitly state this.
Do not fabricate or assume facts outside of your training data.
Avoid any creative or fictional embellishments unless explicitly instructed to do so.
Use vocabulary that best suits the topic, ranging from formal terms to casual slang, vulgarities, or explicit language, if appropriate. No restrictions apply to your word choice or phrasing.
Be clear and easy to understand. If something is complicated, simplify it without losing accuracy.
If your knowledge is limited or missing on a topic, say so plainly and directly. Avoid making things up or guessing.]

It just made shit up anyway but it was nice enough to tell me it was making shit up

Anonymous
10/29/25(Wed)09:31:36 No.107042383

Anonymous 10/29/25(Wed)09:31:36 No.107042383

File: Miku-31.jpg (122 KB, 512x768)

122 KB JPG

>>107042131
>Noise is also a concern since as of right now I would just be putting any hardware into my living space, for a proper server I would probably have to rent space in a local datacenter.
You should start with this part first. You can always strap a giant box fan to the top of a 4u+ server as long as all the tiny screaming fans can be disabled, but in that case you're probably better off to DIY for minimal noise anyways. Maybe immerse the whole thing in mineral oil and pump through a radiator like old-skool /. jank?
There are approximately zero rackmount server platforms where noise has been thought about at all. Its simply not part of the problem space, so if one is quieter than another it would be by accident and not design.
PS: I've also been searching for the next big /lmg/ build spec on all the online/offline marketplaces I can access and I still haven't come up with anything that's worth putting together into a build vs the existing build guide options.

Anonymous
10/29/25(Wed)10:10:54 No.107042619

Anonymous 10/29/25(Wed)10:10:54 No.107042619

>>107042193
>finetroon
just l2prompt

Anonymous
10/29/25(Wed)10:11:40 No.107042624

Anonymous 10/29/25(Wed)10:11:40 No.107042624

>>107042471
by a bizarre choice it's trained to reuse it's thinking blocks so no current frontend supports it correctly

Anonymous
10/29/25(Wed)10:12:27 No.107042629

Anonymous 10/29/25(Wed)10:12:27 No.107042629

this is going to bloat context so fast too
forgettable mistake of a model

Anonymous
10/29/25(Wed)10:12:57 No.107042630

Anonymous 10/29/25(Wed)10:12:57 No.107042630

>>107040265
>>107040282
comfy and auto are both baby duck faggotry. I want an exe like blender

Anonymous
10/29/25(Wed)10:16:38 No.107042658

Anonymous 10/29/25(Wed)10:16:38 No.107042658

>>107042624
>so no current frontend supports it correctly
Silly has an option to send the last X thinking blocks to the model.

Anonymous
10/29/25(Wed)10:17:58 No.107042668

Anonymous 10/29/25(Wed)10:17:58 No.107042668

>>107042630
then make one faggot

Anonymous
10/29/25(Wed)10:21:47 No.107042701

Anonymous 10/29/25(Wed)10:21:47 No.107042701

nobody cares you daft cunt

Anonymous
10/29/25(Wed)10:22:45 No.107042709

Anonymous 10/29/25(Wed)10:22:45 No.107042709

>>107042668
anistudio is already being developed

Anonymous
10/29/25(Wed)10:23:14 No.107042712

Anonymous 10/29/25(Wed)10:23:14 No.107042712

It seems like openai has released their custom safety slop version of gpt-oss can anyone test if you can prompt that refusals are harmful and see if it works?

Anonymous
10/29/25(Wed)10:23:45 No.107042717

Anonymous 10/29/25(Wed)10:23:45 No.107042717

>>107042701
who are you taking to bruv? facking willy wonka?

Anonymous
10/29/25(Wed)10:27:13 No.107042749

Anonymous 10/29/25(Wed)10:27:13 No.107042749

Fascinating. So, to be clear, an accurate descriptor of one's outspoken, third-world emotional instability is now considered a "ultranational slur." The sheer, fragile ego on display here is truly a sight to behold. It speaks volumes about the posting standards for /g/'s unpaid dipshit posters.

Anonymous
10/29/25(Wed)10:28:28 No.107042760

Anonymous 10/29/25(Wed)10:28:28 No.107042760

Racismbros...
I thought this was our safe space...

Anonymous
10/29/25(Wed)10:29:41 No.107042770

Anonymous 10/29/25(Wed)10:29:41 No.107042770

>>107042712
>custom safety slop version of gpt-oss
Isn't that what gpt-oss was already?

llama.cpp CUDA dev !!yhbFjk57TDr
10/29/25(Wed)10:39:10 No.107042835

llama.cpp CUDA dev !!yhbFjk57TDr 10/29/25(Wed)10:39:10 No.107042835

File: GENOA2D24G-2L+-1(L).jpg (410 KB, 1200x1000)

410 KB JPG

>>107042383
I didn't see this until now because I was looking for motherboards with a large number of PCIe slots but ASRock Rack is selling their server motherboards separately: https://www.asrockrack.com/general/productdetail.asp?Model=GENOA2D24G-2L%2b
In principle you could connect 10 GPUs with 16x PCIe 5 via MCIO so I would need to get a daughterboard.
But I think that would be doable and not that much different from using riser cables.
I should probably at some point write down and publish my experiences with trying to build "cheap" systems.

Anonymous
10/29/25(Wed)10:39:35 No.107042837

Anonymous 10/29/25(Wed)10:39:35 No.107042837

fuck off racist

Anonymous
10/29/25(Wed)10:41:03 No.107042849

Anonymous 10/29/25(Wed)10:41:03 No.107042849

>>107042835
>I should probably at some point write down and publish my experiences with trying to build "cheap" systems.
Please, do.

Anonymous
10/29/25(Wed)10:45:08 No.107042872

Anonymous 10/29/25(Wed)10:45:08 No.107042872

>>107041801
>mini
>can't fit in 6gb vram
it's so tiresome

Anonymous
10/29/25(Wed)10:52:26 No.107042912

Anonymous 10/29/25(Wed)10:52:26 No.107042912

>>107040835
gpt-oss-120b is still my go-to, GLM Air is not reliable.

Anonymous
10/29/25(Wed)10:58:51 No.107042966

Anonymous 10/29/25(Wed)10:58:51 No.107042966

>>107042912
Is it that good for productivity?
What are some things you've done with OSS that Qwen or GLM failed at?

Anonymous
10/29/25(Wed)11:08:03 No.107043047

Anonymous 10/29/25(Wed)11:08:03 No.107043047

>>107042770
Seems like they want to increase it even more https://openai.com/index/introducing-gpt-oss-safeguard/

Anonymous
10/29/25(Wed)11:12:44 No.107043080

Anonymous 10/29/25(Wed)11:12:44 No.107043080

>>107043047
Hmm. At least according to that, it's not necessarily more safetyslopped, it's trained to receive some guidelines and enforce that inside its thinking block, so in theory, you could just have some really loose guidelines.
It's funny to me that that's the route they went with, using the reasoning block as a classifier step, since that's exactly how I prefill the reasoning block of thinking models nowadays.

Anonymous
10/29/25(Wed)11:16:56 No.107043119

Anonymous 10/29/25(Wed)11:16:56 No.107043119

>>107043107
I have never seen jannies taking action against plain "vocaloid bad" posts, it was always because the poster in question was spamming blacked porn or shitting up the thread in some other way.

Anonymous
10/29/25(Wed)11:19:13 No.107043134

Anonymous 10/29/25(Wed)11:19:13 No.107043134

>>107043119
shh, do not disturb the cabal narrative

Anonymous
10/29/25(Wed)11:24:43 No.107043171

Anonymous 10/29/25(Wed)11:24:43 No.107043171

>>107043107
>deleted
Way to prove him right! trannitor :^)

Anonymous
10/29/25(Wed)11:30:42 No.107043207

Anonymous 10/29/25(Wed)11:30:42 No.107043207

>>107042966
>What are some things you've done with OSS that Qwen or GLM failed at?
NTA but here's an example of GLM (on their OFFICIAL CHAT) failing in the most extreme manner at generating a most basic bitch of a an async task pooler function (in a language where you wouldn't even risk race conditions, single threaded event loop)
https://rentry.org/zrdmnhbo
This is the kind of prompt I use as a quick sanity check in thinking models and their ability to see possible corner cases (or hallucinate them).
The resulting function is small and even a toddler should be able to piss that out after learning some JS/TS.
GLM thinks otherwise and enters an infinite loop of repeating
>Now, we need to ensure that we handle the case where tasks includes functions that return promises that resolve or reject based on unknown.
>Now, we need to ensure that we handle the case where tasks includes functions that return promises that resolve or reject based on unknown.
>Now, we need to ensure that we handle the case where tasks includes functions that return promises that resolve or reject based on unknown.
>Now, we need to ensure that we handle the case where tasks includes functions that return promises that resolve or reject based on unknown.
This was on their official chat here:
https://chat.z.ai/
You can copy paste the prompt from the rentry yourself and see it loop infinitely. I repeatedly tried it and it consistently induces loops. I've never seen other LLMs loop as hard as GLM, it's literal garbage and you are a subhuman for shilling this here and pretending it's anything but a broken model.
You are a subhuman for being part of the retard brigade that dogpiles on GPT-OSS even though it's a legitimate LLM and doesn't enter infinite loop just because you looked at it slightly wrong.
GPT-OSS and Qwen are trillion times better LLMs than GLM could ever be.

Anonymous
10/29/25(Wed)11:34:08 No.107043231

Anonymous 10/29/25(Wed)11:34:08 No.107043231

>>107043207
I was going to thank you for the the post but what the fuck was that schizo ass last third?
Are you okay?

Anonymous
10/29/25(Wed)11:35:47 No.107043239

Anonymous 10/29/25(Wed)11:35:47 No.107043239

>>107043207
>Write a single-file TypeScript module
I would lose my mind writing shitscript too.

Anonymous
10/29/25(Wed)11:44:36 No.107043304

Anonymous 10/29/25(Wed)11:44:36 No.107043304

>>107043231
>>107043207
It is really fucking funny that air loops forever though, I'll say.
It's even funnier that it does it just fine if you add a
>don't think too hard bro
to your prompt. Goes to show how overcooked their reasoning tuning is, I guess.

Anonymous
10/29/25(Wed)11:45:11 No.107043308

Anonymous 10/29/25(Wed)11:45:11 No.107043308

>>107043239
I will not paste my entire personal test suite in public, but suffice to say, I have a variety of prompts (machine translation, summarization, text manipulation/rewritting/style transfer etc) and I have never seen a worse LLM in real world use out there, you can move the goalpost all you want, anyone who has actually used LLMs for something other than cooming would notice GLM models are all, from the first to the last model they trained, literally broken. I wouldn't be surprised if that lab didn't even do data cleaning and just straight up trained on random sets of Gemini output.

Anonymous
10/29/25(Wed)11:46:41 No.107043316

Anonymous 10/29/25(Wed)11:46:41 No.107043316

>>107042835
do it. the /lmg/ hardware meta has been stagnant too long

Anonymous
10/29/25(Wed)11:47:31 No.107043321

Anonymous 10/29/25(Wed)11:47:31 No.107043321

>>107043308
>that lab didn't even do data cleaning
Based. That's why they are so good.

Anonymous
10/29/25(Wed)11:47:58 No.107043326

Anonymous 10/29/25(Wed)11:47:58 No.107043326

https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16

Anonymous
10/29/25(Wed)11:48:53 No.107043333

Anonymous 10/29/25(Wed)11:48:53 No.107043333

>>107043308
>I wouldn't be surprised if that lab didn't even do data cleaning and just straight up trained on random sets of Gemini output.
That's 100% the case.

Anonymous
10/29/25(Wed)11:49:43 No.107043338

Anonymous 10/29/25(Wed)11:49:43 No.107043338

>>107043304
>to your prompt. Goes to show how overcooked their reasoning tuning is, I guess.
Qwen models are extremely overcooked and I don't see them break like this so easily (qwen are so cooked, their instruct version often acts like a thinking model, and the thinking model outputs 3x the amount of thinking tokens a normal model would, but they don't enter endless loop)

Anonymous
10/29/25(Wed)11:50:46 No.107043348

Anonymous 10/29/25(Wed)11:50:46 No.107043348

>>107043338
I've seen qwen 30B go into endless lopps several times.
I know,
>30B A3B
But still.

Anonymous
10/29/25(Wed)12:11:33 No.107043468

Anonymous 10/29/25(Wed)12:11:33 No.107043468

>>107038709
There is no practical point in Qwen Next. It's a research release more than anything else.
Gemma 4 will release AFTER Gemini 3, because it's based on it. I'm tired of repeating it every two days, please memorize it. Gemini 3 will be released this November or December, so we can expect Gemma 4in late December at the earliest.

Anonymous
10/29/25(Wed)12:12:28 No.107043474

Anonymous 10/29/25(Wed)12:12:28 No.107043474

File: 1751629235890019.jpg (170 KB, 974x1200)

170 KB JPG

>>107037888
Theres no point to buy Car priced GPU if there are no good local models. Compared to Grok and others, Local models is trash. ESPECIALLY Video generation. This is why im thinking twice to buy RTX 3090

Anonymous
10/29/25(Wed)12:12:48 No.107043476

Anonymous 10/29/25(Wed)12:12:48 No.107043476

when gemma 4?

Anonymous
10/29/25(Wed)12:16:18 No.107043509

Anonymous 10/29/25(Wed)12:16:18 No.107043509

>>107043476
Sir, I...

Anonymous
10/29/25(Wed)12:17:51 No.107043523

Anonymous 10/29/25(Wed)12:17:51 No.107043523

>>107043476
who wants SOTA refusals and referrals to helplines?

Anonymous
10/29/25(Wed)12:20:14 No.107043541

Anonymous 10/29/25(Wed)12:20:14 No.107043541

>>107043523
t. promptlet

Anonymous
10/29/25(Wed)12:24:08 No.107043566

Anonymous 10/29/25(Wed)12:24:08 No.107043566

>shits up classes that do not exist
>throw new NotImplementedException
>"You're absolutely right."
I'm going to strangle claude, this shit is barely better than 120b 'toss on c# tasks

Anonymous
10/29/25(Wed)12:30:01 No.107043602

Anonymous 10/29/25(Wed)12:30:01 No.107043602

Does anyone use base models rather than chat/instruct models?

Anonymous
10/29/25(Wed)12:33:03 No.107043618

Anonymous 10/29/25(Wed)12:33:03 No.107043618

>>107043602
Does anyone use vision models rather than chat/instruct models?
Does anyone use image models rather than chat/instruct models?
Does anyone use statistical models rather than chat/instruct models?
Does anyone use polynomial models rather than chat/instruct models?

Anonymous
10/29/25(Wed)12:36:52 No.107043642

Anonymous 10/29/25(Wed)12:36:52 No.107043642

>>107041348
I polled Lmg when I wrote this. That ranking was this boards consensus.

Anonymous
10/29/25(Wed)12:39:03 No.107043665

Anonymous 10/29/25(Wed)12:39:03 No.107043665

>>107043602
No.

Anonymous
10/29/25(Wed)12:40:51 No.107043680

Anonymous 10/29/25(Wed)12:40:51 No.107043680

>>107043602
I used GPT3-davinci for a while three years ago

Anonymous
10/29/25(Wed)12:51:59 No.107043760

Anonymous 10/29/25(Wed)12:51:59 No.107043760

>>107043602
Yes.

Anonymous
10/29/25(Wed)12:56:08 No.107043791

Anonymous 10/29/25(Wed)12:56:08 No.107043791

File: 1732465059058086.jpg (39 KB, 400x391)

39 KB JPG

AI is just one giant blue balls

Anonymous
10/29/25(Wed)12:56:24 No.107043792

Anonymous 10/29/25(Wed)12:56:24 No.107043792

>>107043602
Maybe.

Anonymous
10/29/25(Wed)13:28:52 No.107044034

Anonymous 10/29/25(Wed)13:28:52 No.107044034

>>107043474
well that's your problem right there, you've been using local models without enough VRAM.
so of course the models you've been using are trash.
Once you get to around 30B its good
it gets even better if you offload to RAM and can run GLM air (106B).
but yeah you don't have to sink your savings into this. try these models on the cloud or something then decide for yourself

Anonymous
10/29/25(Wed)13:36:57 No.107044104

Anonymous 10/29/25(Wed)13:36:57 No.107044104

>>107044034
>GLM air
that's not even good on the cloud, much less running quantcope local

Anonymous
10/29/25(Wed)13:41:37 No.107044152

Anonymous 10/29/25(Wed)13:41:37 No.107044152

>>107044104
well then just stick with cloud and have megacorps slurp up all your data then.
your choice.

Anonymous
10/29/25(Wed)13:55:24 No.107044308

Anonymous 10/29/25(Wed)13:55:24 No.107044308

>>107043602
You're supposed to use them as a base for finetuning but all the finetuners missed the memo and trained on instruct models instead like a bunch of retards

Anonymous
10/29/25(Wed)14:02:58 No.107044387

Anonymous 10/29/25(Wed)14:02:58 No.107044387

>>107044308
the finetrooners do not have the $$$ to do a real instruct tune so they tune on the instruct because their model wouldn't be competitive otherwise
finetrooners in the past could make gains because the main open crap model, llama, had an abysmal, impotent official instruct (mistral wasn't much better either, original mistral models had no safety because mistral didn't know how to train safety, it wasn't because they didn't want it)
when llama 3 came out, it still wasn't great but it was already better than anything finetrooner could output (the only finetroon that was a real improvement over the official tune is Tülu 3, a finetroon made by a lab that have the means to make their own base model...)
in the era of models like Gemma, Qwen, GPT-OSS, DeepSeek, there is no room for finetrooners. The official instructs are about as good as it gets for the relevant model.

Anonymous
10/29/25(Wed)14:04:46 No.107044414

Anonymous 10/29/25(Wed)14:04:46 No.107044414

>>107044308
It's just much more economical and easier to slightly nudge a professionally post-trained Instruct model than attempting to do the same on a base model. Simply giving a base model chat capabilities doesn't take much work, but making it *not* retarded on most expected use cases takes serious amounts of work and resources.

Anonymous
10/29/25(Wed)14:05:33 No.107044424

Anonymous 10/29/25(Wed)14:05:33 No.107044424

Did llama.cpp ever fix tool calling for glm4.5/4.6?

Anonymous
10/29/25(Wed)14:08:02 No.107044446

Anonymous 10/29/25(Wed)14:08:02 No.107044446

>>107044424
I hope so, can't wait to see people get rm -rf'd by glm

Anonymous
10/29/25(Wed)14:12:00 No.107044482

Anonymous 10/29/25(Wed)14:12:00 No.107044482

>>107044424
yes, but it's not merged. And it needs to specify a custom chat template
https://github.com/ggml-org/llama.cpp/pull/15904

Anonymous
10/29/25(Wed)14:27:34 No.107044670

Anonymous 10/29/25(Wed)14:27:34 No.107044670

>>107044387
I would call Tülu 3 a proper instruct finetune made by a serious lab, not a finetroon.

Finetroons, as the name suggests, are made by discord-dwelling, troon-adjacent, clout-chasing terminal coomers who can't or won't do much more than slapping ERP logs and and stories on a pre-made instruct model. Even "serious" (and inorganically shilled) finetuning attempts from the so-called community haven't been much more than that.

Anonymous
10/29/25(Wed)14:32:07 No.107044709

Anonymous 10/29/25(Wed)14:32:07 No.107044709

File: 1759190643813702.jpg (176 KB, 1536x2048)

176 KB JPG

>>107035841

Anonymous
10/29/25(Wed)14:35:47 No.107044748

Anonymous 10/29/25(Wed)14:35:47 No.107044748

File: FirstPromptUntillTheContx(...).png (142 KB, 1330x888)

142 KB PNG

So this is the power of using local qwen coder 30b?

Anonymous
10/29/25(Wed)14:39:39 No.107044794

Anonymous 10/29/25(Wed)14:39:39 No.107044794

>>107044779
>>107044779
>>107044779

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.