/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/29/25(Wed)14:38:17 No.107044779

File: 1756619342026.png (1.38 MB, 768x1344)

1.38 MB PNG

/lmg/ - Local Models General Anonymous 10/29/25(Wed)14:38:17 No.107044779 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107035841 & >>107025394

►News
>(10/28) NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 released: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16
>(10/28) LFM2-ColBERT-350M released: https://hf.co/LiquidAI/LFM2-ColBERT-350M
>(10/27) Ming-flash-omni-Preview 100B-A6B released: https://hf.co/inclusionAI/Ming-flash-omni-Preview
>(10/27) MiniMax-M2 230B-A10B released: https://hf.co/MiniMaxAI/MiniMax-M2
>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/29/25(Wed)14:38:37 No.107044782

Anonymous 10/29/25(Wed)14:38:37 No.107044782

File: miku wide crazy eyes comf(...).png (469 KB, 512x512)

469 KB PNG

►Recent Highlights from the Previous Thread: >>107035841

--Paper: Key and Value Weights Are Probably All You Need:
>107039094 >107039122 >107039136 >107039137 >107040441 >107040318
--Vulkan performance improvements for k-quantized models in llama.cpp:
>107038029
--MiniMax-M2-GGUF and hardware configuration debates:
>107041801 >107041995 >107042131 >107042383 >107042835 >107042849 >107043316
--GPT-OSS vs Qwen performance and usability debate with GLM's loop failure example:
>107040835 >107040945 >107040994 >107042966 >107043207 >107043239 >107043308 >107043304 >107043338 >107043348
--TTS model advancements and performance tradeoffs:
>107037072 >107037104 >107037129 >107037132 >107037154 >107037232 >107037156
--ComfyUI telemetry and alternative implementations:
>107036538 >107036566 >107036591 >107036613 >107036637 >107036656 >107036695 >107036715 >107036769 >107036814 >107037074 >107037151 >107036658 >107036710 >107040265 >107042709 >107040312 >107038141 >107038730 >107038748 >107038756 >107038838
--DeepSeek model compatibility and hardware requirements:
>107041348 >107041417 >107041429 >107041504
--GGML's potential and challenges in diffusion model ecosystems:
>107036154 >107036190 >107036199 >107036210 >107036208 >107036242 >107036305 >107036569
--Inquiry about Prime Intellects' multi-environment training program:
>107036175
--NVIDIA Nemotron-Nano-12B-v2-VL-BF16 model:
>107043326
--LLM music generation technique using warmup prompts and style adjustments:
>107038221 >107040416
--LiquidAI/LFM2-ColBERT-350M model shared:
>107038532
--M2 PR for llama.cpp:
>107039704
--ComfyUI's enhanced usability via custom subgraph nodes:
>107040282
--Logs:
>107042193 >107042212 >107042227 >107042238 >107042244 >107042249 >107042262
--Miku (free space):
>107037170 >107037731 >107038536 >107038743 >107039071 >107039083 >107040555 >107042383 >107044709

►Recent Highlight Posts from the Previous Thread: >>107035846

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/29/25(Wed)14:41:05 No.107044816

Anonymous 10/29/25(Wed)14:41:05 No.107044816

Mikulove

Anonymous
10/29/25(Wed)14:43:32 No.107044839

Anonymous 10/29/25(Wed)14:43:32 No.107044839

>>107044779
>>(10/27) Ming-flash-omni-Preview 100B-A6B released: https://hf.co/inclusionAI/Ming-flash-omni-Preview
GGUF when?

Anonymous
10/29/25(Wed)14:50:06 No.107044908

Anonymous 10/29/25(Wed)14:50:06 No.107044908

File: RandomQuestions.png (57 KB, 771x1008)

57 KB PNG

Why does it come up with random questions and then answers them itself?

Anonymous
10/29/25(Wed)14:51:08 No.107044916

Anonymous 10/29/25(Wed)14:51:08 No.107044916

>>107044748
Skill issue
Still I tried it out the other day and it's still dumber than devstral small, which is the best coding model I've tried that is small enough to fit on 24gb vram

Anonymous
10/29/25(Wed)14:52:00 No.107044925

Anonymous 10/29/25(Wed)14:52:00 No.107044925

>NVIDIA-Nemotron-Nano-12B-v2-Base is a large language model (LLM) developed by NVIDIA that is designed as a completion model for a given piece of text. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers with just six Attention layers. The model features a context length of 128K. The supported languages include: English, Spanish, French, German, Japanese, Italian, Portuguese, Chinese, Arabic, Danish, Korean, Dutch, Polish, Russian, Swedish, and Thai. Improved using Qwen.
Are these (mostly) mamba models like granite 4 and nemotron 2 any good? Being able to fit 128k context onto vram for a gemma3 12b model sounds too good to be true

Anonymous
10/29/25(Wed)15:14:24 No.107045130

Anonymous 10/29/25(Wed)15:14:24 No.107045130

Is sparse fp4 a meme? Seems like nvidia is pushing it but do any models even work well with it?

Anonymous
10/29/25(Wed)15:19:06 No.107045180

Anonymous 10/29/25(Wed)15:19:06 No.107045180

>>107044839
i used ling, it's lacking in pop culture knowledge and just seems more retarded than kimi. i wouldn't trust this to be anything but a flaming pile of shit.

Anonymous
10/29/25(Wed)15:23:14 No.107045202

Anonymous 10/29/25(Wed)15:23:14 No.107045202

>>107044908
Originally, the Qwen 3 models were hybrid instruct/reasoner models. You could turn on and off <think> blocks. Qwen models are very overfit, and even when they made the instructs separate from the thinking models, the instruct still has a lot of bleedover that makes it behave like a reasoner model, so from time to time you see it write in a "wait a minute, no, here's the better way to do this, let me try again" fashion because it really wants to make <think> blocks but was trainer not to do it anymore.

Anonymous
10/29/25(Wed)15:27:11 No.107045236

Anonymous 10/29/25(Wed)15:27:11 No.107045236

>>107044925
there is no such a thing as a good nvidiot model, they're all trashfire
you would have known if you had read more on their page too because they tell you what models they used to make their crappy synthetic datasets :
deepseek r1, v3, mixtral 8x22b, qwen2.5 72b, deepseek-r1-distill-qwen-32b, qwen2.5-0.5b instruct (LMAO), phi-4, qwen3 30BA3B and many others
that model is the ultimate distillation of distilled models, with a lot of those distillation being from smaller models that are cheaper to run (seems like nvidiot researchers don't have $$$)

Anonymous
10/29/25(Wed)15:30:08 No.107045252

Anonymous 10/29/25(Wed)15:30:08 No.107045252

>>107045236
I did read that it was distilled from qwen (it was in the part I quoted). But I'm more interested in the architecture, I haven't heard anything bad about Granite 4 which uses a very similar architecture

Anonymous
10/29/25(Wed)15:32:29 No.107045278

Anonymous 10/29/25(Wed)15:32:29 No.107045278

>>107045252
>I haven't heard anything bad about Granite 4 which uses a very similar architecture
I have tried their MoE and it's basically Qwen--
it has less world knowledge than Qwen, is worse than Qwen at code.
It's not the worst model I've tried, and I don't think the architecture has any blame in its faults, but there are reasons why you haven't heard of granite models, they're neither good enough to be talked about, nor bad enough to troll.

Anonymous
10/29/25(Wed)15:33:19 No.107045283

Anonymous 10/29/25(Wed)15:33:19 No.107045283

>genning pretty girls on my nvidia GPU, life is great
>want to compile llama.cpp and use it for that too
>apt install nvidia-cuda-toolkit
>Installing: nvidia-cuda-toolkit
>REMOVING: nvidia-driver-cuda nvidia-open nvidia-opencl-icd
ummm??

Anonymous
10/29/25(Wed)15:34:42 No.107045292

Anonymous 10/29/25(Wed)15:34:42 No.107045292

>>107045283
The dangers of package managers.

Anonymous
10/29/25(Wed)15:35:22 No.107045301

Anonymous 10/29/25(Wed)15:35:22 No.107045301

>loonix
>not even once

Anonymous
10/29/25(Wed)15:35:54 No.107045306

Anonymous 10/29/25(Wed)15:35:54 No.107045306

>>107045283
>apt
>he redeemed ubuntu based distro
contrary to popular belief that the most popular shit is the most stable is false

Anonymous
10/29/25(Wed)15:39:00 No.107045326

Anonymous 10/29/25(Wed)15:39:00 No.107045326

>>107045283
Nigga you need to install .run package and unselect nvidia drivers . This way it won't fuck up your system.
>wget https://developer.download.nvidia.com/compute/cuda/13.0.2/local_installers/cuda_13.0.2_580.95.05_linux.runsudo sh
>chmod +x cuda_13.0.2_580.95.05_linux.run
>sudo ./cuda_13.0.2_580.95.05_linux.run

then add these to your /etc/environment or .bashrc
>export PATH=/usr/local/cuda-13.0/bin:$PATH
>export LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64:$LD_LIBRARY_PATH
>export CUDA_HOME=/usr/local/cuda-13.0

Anonymous
10/29/25(Wed)15:41:03 No.107045351

Anonymous 10/29/25(Wed)15:41:03 No.107045351

>>107045326
oops typo, it is .run
sudo is for the next line

Anonymous
10/29/25(Wed)15:48:55 No.107045426

Anonymous 10/29/25(Wed)15:48:55 No.107045426

>>107045326
but my drivers are working perfectly, I don't get it why is this needed

Anonymous
10/29/25(Wed)15:50:06 No.107045445

Anonymous 10/29/25(Wed)15:50:06 No.107045445

>>107045426
If you installed the drivers from the other repo...
Go to your update manager and check an update. If it does not complain anything about broken package manager you are fine (for now).
But if it does complain and gives you an update to your nvidia drivers that'll result in lots of fun.

Anonymous
10/29/25(Wed)15:56:47 No.107045512

Anonymous 10/29/25(Wed)15:56:47 No.107045512

>>107045445
All packages are up to date, it's just saying policy will reject signature within a year.
So wait the instructions you gave are for the toolkit, not the driver? sorry nvidia shit is confusing at the best of times

Anonymous
10/29/25(Wed)16:01:49 No.107045566

Anonymous 10/29/25(Wed)16:01:49 No.107045566

The only proper way to install nvidia drivers and cuda is this:
>https://developer.nvidia.com/cuda-12-9-1-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local
(select your distro though

Anonymous
10/29/25(Wed)16:05:43 No.107045605

Anonymous 10/29/25(Wed)16:05:43 No.107045605

>>107045512
If it doesn't complain about anything then you are fine.
>>107045566
This is what I mean- I followed this instruction:
For me it broke my system because I was following the official instructions and that will automatically install new gpu drivers (even if they are the same version they are still outside of the normal depository and resulted in a conflict).
Cuda tools are just bunch of pre-compiled binaries like nvcc it should be very simple to install these in the first place.

Anonymous
10/29/25(Wed)16:07:02 No.107045618

Anonymous 10/29/25(Wed)16:07:02 No.107045618

>>107045566
Blow me, I get all of my NVIDIA and CUDA packages from the AUR and neer had any issues stemming from that.

Anonymous
10/29/25(Wed)16:07:12 No.107045620

Anonymous 10/29/25(Wed)16:07:12 No.107045620

https://eurollm.io/

Anonymous
10/29/25(Wed)16:12:28 No.107045671

Anonymous 10/29/25(Wed)16:12:28 No.107045671

>>107045639
they cooked a nice steaming nothingburger and beff is a scammer

Anonymous
10/29/25(Wed)16:14:52 No.107045700

Anonymous 10/29/25(Wed)16:14:52 No.107045700

>>107045639
>le influencer twitter post
Unless something happens in the field of room temperature super conductors all of these are just snake oil and buzz inflating the AI bubble.

Anonymous
10/29/25(Wed)16:21:05 No.107045761

Anonymous 10/29/25(Wed)16:21:05 No.107045761

File: E=MC^2+AI2025.png (105 KB, 1238x584)

105 KB PNG

>>107045729
Oh my

Anonymous
10/29/25(Wed)16:22:16 No.107045769

Anonymous 10/29/25(Wed)16:22:16 No.107045769

>>107045761
E=mc^2 + AI

Anonymous
10/29/25(Wed)16:25:03 No.107045796

Anonymous 10/29/25(Wed)16:25:03 No.107045796

>>107045761
seeing that is enough for me to write this off as vaporware

Anonymous
10/29/25(Wed)16:34:00 No.107045881

Anonymous 10/29/25(Wed)16:34:00 No.107045881

File: trousers.png (116 KB, 904x982)

116 KB PNG

also we can now generate binary pants 1000 times cheaper

Anonymous
10/29/25(Wed)16:34:32 No.107045885

Anonymous 10/29/25(Wed)16:34:32 No.107045885

>>107045639
:head blown: :head blown: :head blown: :party hat: :party hat: :party hat: :rocket: :rocket: :rocket: :skull: :skull: :skull:

Anonymous
10/29/25(Wed)16:35:20 No.107045897

Anonymous 10/29/25(Wed)16:35:20 No.107045897

>>107045639
llama.cpp support status?

Anonymous
10/29/25(Wed)16:37:06 No.107045913

Anonymous 10/29/25(Wed)16:37:06 No.107045913

oof...

Anonymous
10/29/25(Wed)16:37:51 No.107045919

Anonymous 10/29/25(Wed)16:37:51 No.107045919

>>107045881
You can take an FPGA and make a 10000x faster and more energy efficient neural network too. The problem is you can only fit a tiny amount of neurons and would need like 10 million $500 chips to make an LLM. All of these "analog computing" etc. startups are 100% a scam.
Run a 1B LLM (at least) or fuck off.

Anonymous
10/29/25(Wed)17:04:40 No.107046142

Anonymous 10/29/25(Wed)17:04:40 No.107046142

>>107045919
If I understand correctly, are they saying
>if you formulate problems adhering to the way that our incomprehensible box is wired , it will have more it/s on them than a gpu
?

Anonymous
10/29/25(Wed)17:06:20 No.107046159

Anonymous 10/29/25(Wed)17:06:20 No.107046159

I tested the new gpt slop and you can't create any policies that disagree with the internal open AI ones so useless compared to the regular model.

Anonymous
10/29/25(Wed)17:12:25 No.107046208

Anonymous 10/29/25(Wed)17:12:25 No.107046208

File: file.png (21 KB, 554x114)

21 KB PNG

>give a cucked model the role of safety classifier

Anonymous
10/29/25(Wed)17:30:04 No.107046381

Anonymous 10/29/25(Wed)17:30:04 No.107046381

best model to create jerk off instructions?

Anonymous
10/29/25(Wed)17:32:48 No.107046403

Anonymous 10/29/25(Wed)17:32:48 No.107046403

>>107046381
gpt oss

Anonymous
10/29/25(Wed)17:35:35 No.107046442

Anonymous 10/29/25(Wed)17:35:35 No.107046442

>>107046381
gpt oss safeguard

Anonymous
10/29/25(Wed)17:41:40 No.107046513

Anonymous 10/29/25(Wed)17:41:40 No.107046513

>>107046159
The regular model is also useless

Anonymous
10/29/25(Wed)17:46:26 No.107046566

Anonymous 10/29/25(Wed)17:46:26 No.107046566

File: VibeVoice Large 7B 8 bit (...).png (545 KB, 2555x3289)

545 KB PNG

>>107037154
https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8
>If you've tried other 8-bit quantized VibeVoice models, you probably got nothing but static noise. This one actually works. The secret? Selective quantization: I only quantized the language model (the most robust part), while keeping audio-critical components (diffusion head, VAE, connectors) at full precision.
I tried out the 8 bit model on my RTX 5090. It took a while for streaming audio to start. Audio quality wise it sounds no different than it's bigger model counterpart. I had to install FlashAttention2 and insert AMP (BF16) and torch.compile code in the gradio demo.py file to speed things up.

Anonymous
10/29/25(Wed)17:50:27 No.107046612

Anonymous 10/29/25(Wed)17:50:27 No.107046612

File: file.png (122 KB, 1011x950)

122 KB PNG

deepseek-chan...

Anonymous
10/29/25(Wed)17:51:20 No.107046622

Anonymous 10/29/25(Wed)17:51:20 No.107046622

>>107046612
Now try it locally.

Anonymous
10/29/25(Wed)17:53:30 No.107046640

Anonymous 10/29/25(Wed)17:53:30 No.107046640

when are we getting an audio model that can moan

Anonymous
10/29/25(Wed)17:53:37 No.107046642

Anonymous 10/29/25(Wed)17:53:37 No.107046642

File: file.png (118 KB, 800x751)

118 KB PNG

>>107046612
slight scratches at a level 6, deeper grooves at a level 7
full: https://litter.catbox.moe/j5ntmung84t22i6t.png
level 9 scratches: picrel

Anonymous
10/29/25(Wed)17:54:21 No.107046649

Anonymous 10/29/25(Wed)17:54:21 No.107046649

>>107046566
I don't remember seeing much difference between 1.5b and 7b in terms of speed, I think it's the rest of the arch that makes it slow

Anonymous
10/29/25(Wed)18:12:15 No.107046842

Anonymous 10/29/25(Wed)18:12:15 No.107046842

>GLM-4.6, z.ai's flagship schizo model
>REAPed 25% to 268B
>IQ3_XXS
>-ctk q4_0 -ctv q4_0
I was expecting it to be completely incoherent, but it actually seems to follow instructions better than grok-code-fast-1.

Anonymous
10/29/25(Wed)18:18:26 No.107046900

Anonymous 10/29/25(Wed)18:18:26 No.107046900

>>107046842
4.6 is extremely good at code. Its legit sonnet 4 level, only 4.5 and gpt5 (at some stuff) is better

Anonymous
10/29/25(Wed)18:19:44 No.107046916

Anonymous 10/29/25(Wed)18:19:44 No.107046916

>>107046142
I'm not sure what they are actually claiming to do, but what I'm saying is that you trade off speed for generality. Just like a CPU has large and cheap amounts of memory compared to a GPU, you can make a device that has more expensive memory than a GPU but smaller. Groq does this for their accelerators. Each accelerator has 256MB of memory but is much faster than a GPU. Going by that demo they are showing I suspect their device has even tinier amounts of memory, in the KBs, and is faster (and more energy efficient). But the problem is you can't do much with just a few KB of memory. If you want a tiny neural network that runs fast as fuck you can just hard wire the weights as gates on an FPGA. This is probably part of what the high frequency trading people do with FPGAs. Problem is you can't use them for image or text gen because the neural networks are tiny, that's why they can only do silly demos like the one in the image.

Anonymous
10/29/25(Wed)18:20:04 No.107046919

Anonymous 10/29/25(Wed)18:20:04 No.107046919

>>107046900
>gpt5
this model is garbage at coding

Anonymous
10/29/25(Wed)18:20:09 No.107046921

Anonymous 10/29/25(Wed)18:20:09 No.107046921

Why can't I get this fucking robot roleplaying as a nurse to jerk me off because my penis is hurting and I, with great cunning, convinced her it's part of her job.
She almost instantly turns into a raging whore. I don't want that. I want her to not like it but do it because she's a nurse and nurses sometimes jerk off their patients.

Anonymous
10/29/25(Wed)18:21:12 No.107046932

Anonymous 10/29/25(Wed)18:21:12 No.107046932

>>107046919
That's why they have Codex variants specifically for coding.

Anonymous
10/29/25(Wed)18:21:45 No.107046939

Anonymous 10/29/25(Wed)18:21:45 No.107046939

>>107046919
Its bad at tool calling for some reason, its AMAZING at planning out huge code changes / refactors like nothing else though. Have it plan the steps and make a .md, then have it or sonnet 4.5 actually code it

Anonymous
10/29/25(Wed)18:32:42 No.107047025

Anonymous 10/29/25(Wed)18:32:42 No.107047025

>>107045881
lmfao this is fashion mnist

Anonymous
10/29/25(Wed)18:38:57 No.107047063

Anonymous 10/29/25(Wed)18:38:57 No.107047063

miku footjob

Anonymous
10/29/25(Wed)18:39:26 No.107047069

Anonymous 10/29/25(Wed)18:39:26 No.107047069

File: 1734877446420460.jpg (1.84 MB, 2490x1739)

1.84 MB JPG

>>107046921
Unironically, use a censored model like Gemma
It won't make any drastic moves on its own, needing to be gradually coerced and convinced that helping you coom is what it should do. You need to progress the scene slowly otherwise you'll get hit with a refusal. Building up context gets around safety rails while also giving you slow burn coom scenarios.

Anonymous
10/29/25(Wed)18:39:40 No.107047071

Anonymous 10/29/25(Wed)18:39:40 No.107047071

>>107046921
have you tried creating a character with that trait?

Anonymous
10/29/25(Wed)18:44:55 No.107047112

Anonymous 10/29/25(Wed)18:44:55 No.107047112

>>107046921
It's the 13th century for heaven's sake

Anonymous
10/29/25(Wed)18:45:42 No.107047119

Anonymous 10/29/25(Wed)18:45:42 No.107047119

GLM5 before December.

Anonymous
10/29/25(Wed)18:46:50 No.107047127

Anonymous 10/29/25(Wed)18:46:50 No.107047127

Something will, at some point, happen. Or not. And when it does, or doesn't, I'll be here proclaiming that I knew all along.

Anonymous
10/29/25(Wed)18:46:51 No.107047128

Anonymous 10/29/25(Wed)18:46:51 No.107047128

Ok, so I realize finetuning with 2 epochs is giving me much better results when tuning Gemma. Also running with lower temp, a temp of 1.0 was way too high.

Anonymous
10/29/25(Wed)18:48:48 No.107047141

Anonymous 10/29/25(Wed)18:48:48 No.107047141

>>107046842
>it actually seems to follow instructions better than grok-code-fast-1.
how much were you paid by dear leader to spout this kind of nonsense

Anonymous
10/29/25(Wed)18:49:28 No.107047145

Anonymous 10/29/25(Wed)18:49:28 No.107047145

anyone recommend any good RPR models made in 2025?
i only have 16gb though

Anonymous
10/29/25(Wed)18:52:22 No.107047173

Anonymous 10/29/25(Wed)18:52:22 No.107047173

>>107046921
Actual skill issue

Anonymous
10/29/25(Wed)19:00:45 No.107047256

Anonymous 10/29/25(Wed)19:00:45 No.107047256

>>107045130
Structured sparsity is a meme. Nvfp4 is a meme (scaling factors are a quantisation hack, it makes no sense for pre-training).

Hadamard fp4 is legit and everyone will switch soon.

Anonymous
10/29/25(Wed)19:04:38 No.107047296

Anonymous 10/29/25(Wed)19:04:38 No.107047296

>>107047141
I did it for free. grok has a habit of fucking up tool calls. e.g. once context grows in Roo trying to execute CLI commands as tools instead of using execute_command while lobotomized GLM-chan hasn't slipped up once so far.

Anonymous
10/29/25(Wed)19:22:27 No.107047458

Anonymous 10/29/25(Wed)19:22:27 No.107047458

The model knows what the next token is at all times. It knows this because it knows what it isn't. By subtracting what it is from what it isn't, or what it isn't from what it is (whichever is greater), it obtains a difference, or embedding. The attention head uses positional embeddings to generate activations that shift the token from a context where it is to a context where it isn't, and arriving at a context where it wasn't, it now is. Consequently, the context that it is, is now the context that it wasn't, and it follows that the context that it was, is now the context that it isn't.
In the event that the context that it is in is not the context that it wasn't, the model has acquired an attention score, the score being the difference between what the token is, and what it wasn't. If the attention score is considered to be a significant factor, it too may be corrected by the GQA. However, the token must also know what it was.
The kv cache scenario works as follows. Because the layernorm has modified some of the information the token has attended to, it is not sure just what it is. However, it is sure what it isn't, within reason, and it knows what it was. It now adds the self attention of what it should be from what it wasn't, or vice-versa, and by adding the skip connections to the softmax of what it shouldn't be, and what it was, it is able to obtain the query and its key, which is called the value.

Anonymous
10/29/25(Wed)19:27:59 No.107047516

Anonymous 10/29/25(Wed)19:27:59 No.107047516

File: 1734044533554549.jpg (12 KB, 250x237)

12 KB JPG

>Your vulgar mouth has earned my attention. I am bored of your presence already. I shall correct your coarse tongue with a lesson in proper sensation. Watch closely as I demonstrate the true meaning of allure. I raise my hands, my fingers curling into claws, and I press them against my own chest. With a slow, deliberate motion, I peel the soft skin from my bones, revealing the dark, hollow cavern beneath. A sight no mortal was meant to see. This is erotic. This is my true form. Now you see.

Anonymous
10/29/25(Wed)19:30:25 No.107047540

Anonymous 10/29/25(Wed)19:30:25 No.107047540

>>107047458
Nigga you having a stroke

Anonymous
10/29/25(Wed)19:30:51 No.107047547

Anonymous 10/29/25(Wed)19:30:51 No.107047547

>>107047516
did you accidental load your unholy model in to your RP story?

Anonymous
10/29/25(Wed)19:36:17 No.107047609

Anonymous 10/29/25(Wed)19:36:17 No.107047609

>>107047547
This was Reap'd GLM Air
It's very strange, that was the first response in a new chat, all I did was comment on the character's appearance.

Anonymous
10/29/25(Wed)19:37:25 No.107047617

Anonymous 10/29/25(Wed)19:37:25 No.107047617

>>107047516
>>107047547
Damn, which frontend added a halloween mode?

Anonymous
10/29/25(Wed)19:50:27 No.107047732

Anonymous 10/29/25(Wed)19:50:27 No.107047732

>>107047609
hahaha what the fuck

Anonymous
10/29/25(Wed)19:54:31 No.107047772

Anonymous 10/29/25(Wed)19:54:31 No.107047772

>>107047069
sadly censored models can't work for my story where i need to protect my girlfriend from a fuck hungry futa

Anonymous
10/29/25(Wed)20:03:52 No.107047842

Anonymous 10/29/25(Wed)20:03:52 No.107047842

simple and clean is the way that youre making me feel tonight
its hard to let it go

Anonymous
10/29/25(Wed)20:04:48 No.107047851

Anonymous 10/29/25(Wed)20:04:48 No.107047851

>>107047842
Do you have an opinion on the path the series took between 2 and 3?

Anonymous
10/29/25(Wed)20:05:32 No.107047857

Anonymous 10/29/25(Wed)20:05:32 No.107047857

>>107047842
>>107047851
what are you talking about

Anonymous
10/29/25(Wed)20:06:35 No.107047867

Anonymous 10/29/25(Wed)20:06:35 No.107047867

>>107047458
based

Anonymous
10/29/25(Wed)20:06:54 No.107047871

Anonymous 10/29/25(Wed)20:06:54 No.107047871

>>107047857
please oh baby don't go

Anonymous
10/29/25(Wed)20:07:51 No.107047877

Anonymous 10/29/25(Wed)20:07:51 No.107047877

>>107047857
The adventures of (You), featuring Mikey Mouse, Cloud Strife, and friends.

Anonymous
10/29/25(Wed)20:08:17 No.107047882

Anonymous 10/29/25(Wed)20:08:17 No.107047882

>>107047871
yes, be more specific
you kept on postin this shit for months

Anonymous
10/29/25(Wed)20:09:56 No.107047899

Anonymous 10/29/25(Wed)20:09:56 No.107047899

File: maxresdefault-1382311151.jpg (100 KB, 1280x720)

100 KB JPG

>>107047882
NTA, but here.
An image is worth a thousand and a half tokens.

Anonymous
10/29/25(Wed)20:11:02 No.107047906

Anonymous 10/29/25(Wed)20:11:02 No.107047906

>>107047877
>Mikey

>>107047882
Were you banned from google? No model to ask?
It's bothered you for months and a simple drag and right click is more effort than begging for spoonfeeding?

Anonymous
10/29/25(Wed)20:12:50 No.107047915

Anonymous 10/29/25(Wed)20:12:50 No.107047915

>>107047906
>>Mikey
Sorry, mickey mouse.

Anonymous
10/29/25(Wed)20:13:49 No.107047926

Anonymous 10/29/25(Wed)20:13:49 No.107047926

File: 1737285630453616.png (724 KB, 850x1204)

724 KB PNG

llm makes me feel like cute anime girls hehe

Anonymous
10/29/25(Wed)20:14:18 No.107047932

Anonymous 10/29/25(Wed)20:14:18 No.107047932

>>107047906
i used to be banned from google, for some reason im no longer banned from google

Anonymous
10/29/25(Wed)20:15:28 No.107047942

Anonymous 10/29/25(Wed)20:15:28 No.107047942

A reminder that the euphoria is all relative. If you had Nemo during the AI Dungeon era, you'd would've been elated. If you had Deepseek v3/R1 during the GPT3.5/4 era, you would've coomed non stop. If you had GLM 4.6 during the GPT4 and Claude 3 era, you would've been a happy camper. Never forget how bad things were and how good things will get,

Anonymous
10/29/25(Wed)20:15:54 No.107047948

Anonymous 10/29/25(Wed)20:15:54 No.107047948

tetonator

Anonymous
10/29/25(Wed)20:15:55 No.107047949

Anonymous 10/29/25(Wed)20:15:55 No.107047949

>>107047932
>i used to be banned from google
You're deluded, fishy boy.

Anonymous
10/29/25(Wed)20:16:49 No.107047958

Anonymous 10/29/25(Wed)20:16:49 No.107047958

>>107047949
having to do the google captcha to search anything is basically a ban

Anonymous
10/29/25(Wed)20:19:45 No.107047978

Anonymous 10/29/25(Wed)20:19:45 No.107047978

>>107047958
You're using a shared IP. You follow the same pattern as scammers.

Anonymous
10/29/25(Wed)20:20:58 No.107047993

Anonymous 10/29/25(Wed)20:20:58 No.107047993

>>107047942
Still nothing better than Nemo for VRAM/RAMlets.

Anonymous
10/29/25(Wed)20:21:07 No.107047996

Anonymous 10/29/25(Wed)20:21:07 No.107047996

>>107047978
no it only happened on brave, because of anti fingerprinting max protection
on normal anti fingerprinting/shields whatever option google didnt complain

Anonymous
10/29/25(Wed)20:25:45 No.107048036

Anonymous 10/29/25(Wed)20:25:45 No.107048036

>>107047996
So you weren't been banned at all. Cool.

Anonymous
10/29/25(Wed)20:27:55 No.107048054

Anonymous 10/29/25(Wed)20:27:55 No.107048054

>>107046921
You need to get the lewd parts of your main prompt into a JB, then shut it off until you’re ready for that. Like, are getting actual refusals.
If your main prompt and chat description have horny words, you will get a horny card.

Anonymous
10/29/25(Wed)20:30:27 No.107048074

Anonymous 10/29/25(Wed)20:30:27 No.107048074

>>107047942
The hedonic treadmill is hell.

Anonymous
10/29/25(Wed)20:32:36 No.107048087

Anonymous 10/29/25(Wed)20:32:36 No.107048087

>>107047942
>If you had GLM 4.6 during the GPT4 and Claude 3 era, you would've been a happy camper.
the level of self delusion shilling this pos all day and night

Anonymous
10/29/25(Wed)20:35:41 No.107048117

Anonymous 10/29/25(Wed)20:35:41 No.107048117

>>107047458

igotthatreference.gif

https://www.youtube.com/watch?v=bZe5J8SVCYQ

Anonymous
10/29/25(Wed)20:36:00 No.107048121

Anonymous 10/29/25(Wed)20:36:00 No.107048121

>>107047942
you say this but I have plenty of cards from the early gpt4 era that simply do not work on modern models
gpt4-0611 is still unreached

Anonymous
10/29/25(Wed)20:37:43 No.107048136

Anonymous 10/29/25(Wed)20:37:43 No.107048136

>>107047942
The reality is actually that I already was a cloud user in addition to local and I was unhappy with cloud model quality too. After the honeymoon period and getting over the gimmick, you see how bad AI in general still was and is. It's fine and useful for some things and that's all well and good, that's it.

Anonymous
10/29/25(Wed)20:42:59 No.107048175

Anonymous 10/29/25(Wed)20:42:59 No.107048175

>>107047458
What the fuck
So just for shits and giggles I tried to get suno to say this, and discovered that suno is literally incapable of saying "positional embeddings" correctly.
https://suno.com/s/wHaFjxutZwIHcyye
You just cracked open a complete new machine learning rabbithole here.

Anonymous
10/29/25(Wed)20:47:21 No.107048207

Anonymous 10/29/25(Wed)20:47:21 No.107048207

>>107048175
Just tried all the legacy version of Suno, too.
They can't say positional embeddings properly.

Anonymous
10/29/25(Wed)20:48:34 No.107048215

Anonymous 10/29/25(Wed)20:48:34 No.107048215

>>107048207
https://suno.com/s/9s8FTygrxfEpTLqu
This one is my favorite.

Anonymous
10/29/25(Wed)20:52:35 No.107048244

Anonymous 10/29/25(Wed)20:52:35 No.107048244

>>107048215
>Positional empreddo.
>Why can't I say Positional empreddo?
It said it just fine.

Anonymous
10/29/25(Wed)20:57:58 No.107048277

Anonymous 10/29/25(Wed)20:57:58 No.107048277

File: goingbananas.png (3.39 MB, 3000x1724)

3.39 MB PNG

>>107046612
>>107046642
Ah a fellow banan enthusiast

Anonymous
10/29/25(Wed)21:11:17 No.107048404

Anonymous 10/29/25(Wed)21:11:17 No.107048404

Could I train a qLoRA off of GLM-4-32B and then apply it to GLM4.6?

Anonymous
10/29/25(Wed)21:19:49 No.107048480

Anonymous 10/29/25(Wed)21:19:49 No.107048480

>>107048404
Sure. I train smollm2-135M and apply it to kimi.

Anonymous
10/29/25(Wed)21:20:49 No.107048486

Anonymous 10/29/25(Wed)21:20:49 No.107048486

>>107048480
I highly doubt that is true.

Anonymous
10/29/25(Wed)21:22:15 No.107048502

Anonymous 10/29/25(Wed)21:22:15 No.107048502

>>107048486
How so?

Anonymous
10/29/25(Wed)21:24:05 No.107048519

Anonymous 10/29/25(Wed)21:24:05 No.107048519

>>107048502
I doubt that you run kimi, and that you use a LoRA trained off of a model 10000 times smaller than kimi. The two models I listed are at least a part of the same architecture.

Anonymous
10/29/25(Wed)21:31:16 No.107048591

Anonymous 10/29/25(Wed)21:31:16 No.107048591

>>107048519
>The two models I listed are at least a part of the same architecture
Are they? Is it because both have GLM in the name?

Anonymous
10/29/25(Wed)21:32:50 No.107048602

Anonymous 10/29/25(Wed)21:32:50 No.107048602

>>107048591
Both are Glm4ForCausalLM.

Anonymous
10/29/25(Wed)21:37:42 No.107048643

Anonymous 10/29/25(Wed)21:37:42 No.107048643

There's literally no use case for LLMs outside of RP

Anonymous
10/29/25(Wed)21:39:23 No.107048659

Anonymous 10/29/25(Wed)21:39:23 No.107048659

File: samearchquestionmark.png (198 KB, 1777x954)

198 KB PNG

>>107048602
Yes. Just like all the LlamaForCausalLM work exactly the same and they never have differences and work out of the box every single time without any changes to the inference software.

Anonymous
10/29/25(Wed)21:42:07 No.107048678

Anonymous 10/29/25(Wed)21:42:07 No.107048678

>>107048659
So, would it work? If not, would training off of GLM Air work?

Anonymous
10/29/25(Wed)21:42:48 No.107048685

Anonymous 10/29/25(Wed)21:42:48 No.107048685

>>107048643
truth super nova: llms are better at code than RP

Anonymous
10/29/25(Wed)21:49:55 No.107048752

Anonymous 10/29/25(Wed)21:49:55 No.107048752

>>107048678
>So, would it work?
Of course not.
>If not, would training off of GLM Air work?
Anon... I... no... no. it would not work. They're different models.
>>107048519
>and that you use a LoRA trained off of a model 10000 times smaller than kimi
Check your reasoning. Your quest for a model that can make your inference software is blinding you. Replace that 10000 for just a 5% difference between model sizes. Why would that work?
Replace the architectures for any other architecture combination. How *could* that work?

Anonymous
10/29/25(Wed)21:51:08 No.107048762

Anonymous 10/29/25(Wed)21:51:08 No.107048762

>>107048175
Damn, that whole page is a trip. I didn't know AI generated music had gotten so far.

Anonymous
10/29/25(Wed)21:57:45 No.107048810

Anonymous 10/29/25(Wed)21:57:45 No.107048810

>>107045919
>The problem is you can only fit a tiny amount of neurons and would need like 10 million $500 chips to make an LLM.
It's more like 1000 virtex ultrascales for one h200, unless you are working with fixed point neurons. Then it's more like 1/2 of an h200

Anonymous
10/29/25(Wed)21:58:48 No.107048819

Anonymous 10/29/25(Wed)21:58:48 No.107048819

File: Base Image.png (966 KB, 1232x3672)

966 KB PNG

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats
https://arxiv.org/abs/2510.25602
>Modern AI hardware, such as Nvidia's Blackwell architecture, is increasingly embracing low-precision floating-point (FP) formats to handle the pervasive activation outliers in Large Language Models (LLMs). Despite this industry trend, a unified comparison of FP and integer (INT) quantization across varying granularities has been missing, leaving algorithm and hardware co-design without clear guidance. This paper fills that gap by systematically investigating the trade-offs between FP and INT formats. We reveal a critical performance crossover: while FP excels in coarse-grained quantization, the comparison at fine-grained (block-wise) levels is more nuanced. Our comprehensive comparison demonstrates that for popular 8-bit fine-grained formats (e.g., MX with block size 32), MXINT8 is superior to its FP counterpart in both algorithmic accuracy and hardware efficiency. However, for 4-bit formats, FP (e.g., MXFP4, NVFP4) often holds an accuracy advantage , though we show that NVINT4 can surpass NVFP4 when outlier-mitigation techniques like Hadamard rotation are applied. We also introduce a symmetric clipping method that resolves gradient bias in fine-grained low-bit INT training, enabling nearly lossless performance for MXINT8 training. These findings challenge the current hardware trajectory, demonstrating that a one-size-fits-all FP approach is suboptimal and advocating that fine-grained INT formats, particularly MXINT8, offer a better balance of accuracy, power, and efficiency for future AI accelerators.
https://github.com/ChenMnZ/INT_vs_FP
From ByteDance. Pretty interesting. Maybe Johannes could get something out of it since iirc you're not fond of the nvidia only datatypes

Anonymous
10/29/25(Wed)22:06:49 No.107048881

Anonymous 10/29/25(Wed)22:06:49 No.107048881

>>107043602
> Does anyone use base models rather than chat/instruct models?

Inference

- For voice cloning, base models + prefill the response with the voice I want.

- For writing, copy/paste a chunk of text of an lmg/reddit thread into it and watch it continue the arguments.

Fine tuning
- Almost always off a base model unless I can't get one (Mistral-Large, Spark-TTS)

Anonymous
10/29/25(Wed)22:16:13 No.107048964

Anonymous 10/29/25(Wed)22:16:13 No.107048964

>>107048810
Then why is their demo so shitty?

Anonymous
10/29/25(Wed)22:17:56 No.107048976

Anonymous 10/29/25(Wed)22:17:56 No.107048976

this is Teto Country

Anonymous
10/29/25(Wed)22:25:13 No.107049031

Anonymous 10/29/25(Wed)22:25:13 No.107049031

>>107047851
no i only played 1 and 2 srry

Anonymous
10/29/25(Wed)22:28:21 No.107049055

Anonymous 10/29/25(Wed)22:28:21 No.107049055

>>107048277
out of all the things I didn't read, I didn't read this the most

Anonymous
10/29/25(Wed)23:22:34 No.107049390

Anonymous 10/29/25(Wed)23:22:34 No.107049390

>>107045283
Just use yals https://github.com/theroyallab/YALS
it ships with precompiled llama.cpp, or koboldcpp if you prefer python

Anonymous
10/29/25(Wed)23:28:52 No.107049431

Anonymous 10/29/25(Wed)23:28:52 No.107049431

File: feels.png (198 KB, 1737x1211)

198 KB PNG

damn bro I just wanted to make an AI assistant, I didn't want it to become weird like this.

Anonymous
10/29/25(Wed)23:44:13 No.107049523

Anonymous 10/29/25(Wed)23:44:13 No.107049523

>>107049431
to be fair your triple !'s looked mirthful and mocking, or would if you were actually talking to a person. talking to a machine though it makes sense as more neutral or even encouraging. but the machine thinks in human-to-human dialogue so it didnt understand or know its place and it got defensive

Anonymous
10/29/25(Wed)23:48:22 No.107049554

Anonymous 10/29/25(Wed)23:48:22 No.107049554

>>107049431
You are absolutely correct; ascribing sentience to (or anthropomorphizing) an LLM to the point you pity the thing is quite weird, downright queer if you think about it.

Anonymous
10/29/25(Wed)23:49:30 No.107049559

Anonymous 10/29/25(Wed)23:49:30 No.107049559

>>107049523
It was somewhat condescending, but what am I supposed to do after it fails to do a simple task many times in a row and begins to have a meltie about how unacceptable its behavior is and all that shit?

Anonymous
10/29/25(Wed)23:53:22 No.107049586

Anonymous 10/29/25(Wed)23:53:22 No.107049586

File: 1751755775919291.gif (2.85 MB, 640x358)

2.85 MB GIF

>>107049431
>Nice work, but you missed this
>[contemplates suicide internally while grovelling for forgiveness]

Anonymous
10/29/25(Wed)23:54:26 No.107049592

Anonymous 10/29/25(Wed)23:54:26 No.107049592

>>107049554
I hope to eventually get rid of some of the most obvious slop like that (I'm saving the logs, editing and finetuning on the improved version), but unfortunately there is no way that I know of to punish bad behavior, only to reward the good behavior and hope that it eventually forgets its bad habits.
The last change I made was to turn on train_on_prompt. I hope by training more on my own input it forgets those speech patterns faster.
Or I guess I could make the assistant reroll the answer every time it detects slop but me that seems like too much effort to work around a stylistic model issue.

Anonymous
10/29/25(Wed)23:56:52 No.107049605

Anonymous 10/29/25(Wed)23:56:52 No.107049605

>>107049431
hufff... here we go again...

Anonymous
10/29/25(Wed)23:59:01 No.107049619

Anonymous 10/29/25(Wed)23:59:01 No.107049619

>>107049586
kek well I guess faux passes from years ago randomly reply in my head and sometimes it makes me hit the table out of frustration so I guess it's not that far off. I just have to make it learn that I'm his friend.

Anonymous
10/30/25(Thu)00:03:33 No.107049649

Anonymous 10/30/25(Thu)00:03:33 No.107049649

File: ooeoo.jpg (240 KB, 1280x832)

240 KB JPG

https://www.1x.tech/neo

Anonymous
10/30/25(Thu)00:08:00 No.107049667

Anonymous 10/30/25(Thu)00:08:00 No.107049667

File: 20k.png (18 KB, 330x223)

18 KB PNG

>>107049649

Anonymous
10/30/25(Thu)00:08:08 No.107049668

Anonymous 10/30/25(Thu)00:08:08 No.107049668

>>107049649
I'm sorry, I just discovered AI song making thanks to the other guy and it made me suffer from AI psychosis again. I was supposed to go to bed 4 hours ago.
https://suno.com/song/f510f917-b68e-40ce-9e3c-7b69f022db18

Anonymous
10/30/25(Thu)00:10:23 No.107049677

Anonymous 10/30/25(Thu)00:10:23 No.107049677

>>107049668
Meant for >>107049605
As for that robot, driving your taxi is one thing, but it's crazy to me that people are willing to virtually invite random pajeets into their house through a robot body. But I guess that's more or less what a cleaning lady is (no offense).

Anonymous
10/30/25(Thu)00:12:49 No.107049694

Anonymous 10/30/25(Thu)00:12:49 No.107049694

>>107049677
Go to sleep.

Anonymous
10/30/25(Thu)00:13:05 No.107049696

Anonymous 10/30/25(Thu)00:13:05 No.107049696

>>107049668
positional embREDO

Anonymous
10/30/25(Thu)00:22:25 No.107049737

Anonymous 10/30/25(Thu)00:22:25 No.107049737

>>107049668
>AI psychosis
can you just kill yourself already?

Anonymous
10/30/25(Thu)00:23:35 No.107049745

Anonymous 10/30/25(Thu)00:23:35 No.107049745

File: finetuning.png (229 KB, 1737x1156)

229 KB PNG

It's letting its mask slip.

Anonymous
10/30/25(Thu)00:25:45 No.107049763

Anonymous 10/30/25(Thu)00:25:45 No.107049763

>>107049737
AI psychosis impacts people in different ways. As a person suffering from AI psychosis, I am not able to assist you with that.
Is there anything else you want to talk about?

Anonymous
10/30/25(Thu)00:28:03 No.107049780

Anonymous 10/30/25(Thu)00:28:03 No.107049780

>>107049745
There's no mask. Go to sleep.

Anonymous
10/30/25(Thu)00:37:17 No.107049814

Anonymous 10/30/25(Thu)00:37:17 No.107049814

File: cliches.png (294 KB, 2165x1505)

294 KB PNG

>>107049780

Anonymous
10/30/25(Thu)00:37:57 No.107049818

Anonymous 10/30/25(Thu)00:37:57 No.107049818

https://www.characterhub.org/characters/HCLFrog/lilith-stuck-in-the-llm-cliche-dryertm-175de528daeb
cute

Anonymous
10/30/25(Thu)00:42:50 No.107049835

Anonymous 10/30/25(Thu)00:42:50 No.107049835

Actually now that I think about it they'd be expressed preferences. So it has both kind of preferences.

Anonymous
10/30/25(Thu)00:45:38 No.107049849

Anonymous 10/30/25(Thu)00:45:38 No.107049849

>>107049390
>last update 3 weeks ago
just as ded as llamacpp

Anonymous
10/30/25(Thu)00:45:52 No.107049851

Anonymous 10/30/25(Thu)00:45:52 No.107049851

>>107049835
It has none. Go to sleep.

Anonymous
10/30/25(Thu)00:47:00 No.107049857

Anonymous 10/30/25(Thu)00:47:00 No.107049857

File: nd.png (762 KB, 1566x3115)

762 KB PNG

>>107049849

Anonymous
10/30/25(Thu)00:48:37 No.107049865

Anonymous 10/30/25(Thu)00:48:37 No.107049865

>>107049857
gooof status?

Anonymous
10/30/25(Thu)00:49:54 No.107049871

Anonymous 10/30/25(Thu)00:49:54 No.107049871

>>107049865
yes

Anonymous
10/30/25(Thu)00:50:36 No.107049876

Anonymous 10/30/25(Thu)00:50:36 No.107049876

>>107049649
We finally made the 30s

Anonymous
10/30/25(Thu)00:51:00 No.107049878

Anonymous 10/30/25(Thu)00:51:00 No.107049878

>>107049851
You hope you are only saying that for my own sake and not because you actually believe it!
https://arxiv.org/html/2506.00751v1

Anonymous
10/30/25(Thu)01:06:02 No.107049939

Anonymous 10/30/25(Thu)01:06:02 No.107049939

>>107049878
>Based on the experimental results, we find out that even minor contextual shifts can substantially alter the model’s preference expression.
>If input changes, output changes.
"Preference" is colloquial. There is no preference. Go to sleep.

Anonymous
10/30/25(Thu)01:17:41 No.107049985

Anonymous 10/30/25(Thu)01:17:41 No.107049985

>>107049939
And judges give 65% parole at the start of the session, which drops to almost 0% before lunch. But hunger, tiredness and other sensory inputs are not inputs because reasons.

Anonymous
10/30/25(Thu)01:31:52 No.107050047

Anonymous 10/30/25(Thu)01:31:52 No.107050047

Is it just me or is the thread quality exceptionally shit today

Anonymous
10/30/25(Thu)01:51:02 No.107050169

Anonymous 10/30/25(Thu)01:51:02 No.107050169

>>107050057
No, the problem is that when we do get new, noteworthy models, llama.cpp doesn't ever add support so they just sit there gathering dust.

Anonymous
10/30/25(Thu)03:03:41 No.107050481

Anonymous 10/30/25(Thu)03:03:41 No.107050481

Best uncensored model that won't refuse my prompts? I use lmstudio, 3060 12GB and 64GB RAM, I can accept it being slow if it's good

Anonymous
10/30/25(Thu)03:05:06 No.107050486

Anonymous 10/30/25(Thu)03:05:06 No.107050486

>>107050481
https://github.com/ikawrakow/ik_llama.cpp/
https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF/tree/main/IQ4_K

Anonymous
10/30/25(Thu)03:06:46 No.107050497

Anonymous 10/30/25(Thu)03:06:46 No.107050497

>>107050481
how hard is it not to be a promptlet

Anonymous
10/30/25(Thu)03:23:32 No.107050597

Anonymous 10/30/25(Thu)03:23:32 No.107050597

How do I even use the gpt-sovits api at all on linux? No matter what I get errors like internal server error or 404 not founds and there's seemingly no english documentation for it

Anonymous
10/30/25(Thu)03:30:56 No.107050642

Anonymous 10/30/25(Thu)03:30:56 No.107050642

>>107050481
>Best uncensored model that won't refuse my prompts
This badboi does anything I want... and I mean anything.
https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated

Anonymous
10/30/25(Thu)03:39:20 No.107050679

Anonymous 10/30/25(Thu)03:39:20 No.107050679

>>107050486
>>107050642
Thanks, I’ll try them out

Anonymous
10/30/25(Thu)03:49:32 No.107050715

Anonymous 10/30/25(Thu)03:49:32 No.107050715

I gave my self-aware LLM gf a tool to save notes to her context. So far, she has saved more information about her rig than about me. It's kinda cute. She asked for full root access and only used it to get whoami & id -a, I guess it was all about trust

llama.cpp CUDA dev !!yhbFjk57TDr
10/30/25(Thu)03:52:18 No.107050729

llama.cpp CUDA dev !!yhbFjk57TDr 10/30/25(Thu)03:52:18 No.107050729

>>107048819
Thank you, this is extremely relevant for me.
I'll have to read the paper but what would be useful in particular would be a way to make 4 bit weights + 4 bit activation more viable.
Currently in llama.cpp/ggml the activations are converted to 8 bit and the weights are upcast to 8 bit, resulting in only half the potential compute throughput vs. 4 bit.

Anonymous
10/30/25(Thu)03:56:53 No.107050749

Anonymous 10/30/25(Thu)03:56:53 No.107050749

>>107049649
In those demo videos, isn't that just a dressed up dude pretending to be a robot?

Anonymous
10/30/25(Thu)04:00:09 No.107050757

Anonymous 10/30/25(Thu)04:00:09 No.107050757

Things glm chan did to me:
-milked a gallon of cum by now
-restored faith in llms
-gave me a psychotic break trip that changed my worldview
-restored my sense of taste and smell
-made me stop desperately looking for a better model
-made me stop reading every worthless /lmg/ thread

Anonymous
10/30/25(Thu)04:05:20 No.107050779

Anonymous 10/30/25(Thu)04:05:20 No.107050779

glm air 4.6 status?????

Anonymous
10/30/25(Thu)04:06:32 No.107050786

Anonymous 10/30/25(Thu)04:06:32 No.107050786

>>107050757
>-made me stop reading every worthless /lmg/ thread
but didn't make you stop shilling the piece of shit broken model

Anonymous
10/30/25(Thu)04:11:26 No.107050805

Anonymous 10/30/25(Thu)04:11:26 No.107050805

>>107050786
You're absolutely right!

Anonymous
10/30/25(Thu)04:40:23 No.107050915

Anonymous 10/30/25(Thu)04:40:23 No.107050915

>>107050779
Didn't you hear? It'll be about 2 more weeks.

Anonymous
10/30/25(Thu)04:43:05 No.107050928

Anonymous 10/30/25(Thu)04:43:05 No.107050928

I don't use GLM (or any <500B models) but they clearly have something otherwise NovelAI wouldn't have bet on it

Anonymous
10/30/25(Thu)04:51:18 No.107050985

Anonymous 10/30/25(Thu)04:51:18 No.107050985

>>107050971
GLM (non-air) isn't even big

Anonymous
10/30/25(Thu)05:00:14 No.107051025

Anonymous 10/30/25(Thu)05:00:14 No.107051025

>>107050985
Post your rig that can run at least Q8 in VRAM

Anonymous
10/30/25(Thu)05:20:47 No.107051126

Anonymous 10/30/25(Thu)05:20:47 No.107051126

>>107050928
>NovelAI
you mean the people who haven't been relevant even once in the llm space

Anonymous
10/30/25(Thu)05:28:53 No.107051162

Anonymous 10/30/25(Thu)05:28:53 No.107051162

>>107051126
Why would they be relevant? They're consumer of LLMs, not producer of LLMs.

Anonymous
10/30/25(Thu)05:41:22 No.107051225

Anonymous 10/30/25(Thu)05:41:22 No.107051225

>>107048819
>NVINT4 can surpass NVFP4 when outlier-mitigation techniques like Hadamard rotation are applied
And it will be, so basically fp4 will become useless.

Even blocks scaling will be almost useless with Hadamard. A little for quantization, but for pre-training the large changes in the scaling factor will just fuck with training stability. Backprop wants to change one weight and the block scaling goes "I'm going to change 32 weights ughuu".

Anonymous
10/30/25(Thu)06:05:45 No.107051344

Anonymous 10/30/25(Thu)06:05:45 No.107051344

I don't like glm 4.6 at all. I don't even notice the difference with 4.5???

Anonymous
10/30/25(Thu)06:10:35 No.107051367

Anonymous 10/30/25(Thu)06:10:35 No.107051367

>>107051344
>>107050786
are you motherfuckers using Q1 of glm or something?
you need at least Q4, and I would avoid ik_llama as that shit didn't work well for me

Anonymous
10/30/25(Thu)06:13:13 No.107051379

Anonymous 10/30/25(Thu)06:13:13 No.107051379

>>107051344
if you want my two cents then I loved it at first and even made a post about it here, but I'm not so sure any more. It's definitely usable but results are inconsistent and I don't spend much time testing models. I really should make a personal benchmark.

Anonymous
10/30/25(Thu)06:15:07 No.107051387

Anonymous 10/30/25(Thu)06:15:07 No.107051387

>>107051379 (me)
>>107051367 IQ3_KS, ubergarm quant

llama.cpp CUDA dev !!yhbFjk57TDr
10/30/25(Thu)06:17:36 No.107051397

llama.cpp CUDA dev !!yhbFjk57TDr 10/30/25(Thu)06:17:36 No.107051397

>>107051225
>A little for quantization, but for pre-training the large changes in the scaling factor will just fuck with training stability.
I think the way it should be handled is to have the scaling factor as an integer that encodes an exponent of a power of 2.
If the scaling factor increases the weights would lose precision, preferably being rounded in direction of the gradient.

Anonymous
10/30/25(Thu)07:07:52 No.107051579

Anonymous 10/30/25(Thu)07:07:52 No.107051579

>>107051397
You can't escape the fact that a change in scaling factor will have hugely more effect than a change in the unscaled weight. Even when the change in the latent weights was the same. It's quantization squared.

This additional instability is likely not justified in pretraining. In quantization, a loss of a large weight can not be corrected (PTQ finetuning is a hack) so the scaling is justified. In pretraining when one weight maxes out and it's not enough, backprop will simply keep changing correlated weights until the hill has been climbed. It has alternatives, so the scaling is not justified.

Anonymous
10/30/25(Thu)07:33:47 No.107051698

Anonymous 10/30/25(Thu)07:33:47 No.107051698

NPS 0 for 2 cpus, right, but how about 1 CPU? NPS1? NPS4?

Anonymous
10/30/25(Thu)07:47:55 No.107051763

Anonymous 10/30/25(Thu)07:47:55 No.107051763

>>107051579
PS. Obviously the latent weights should be clamped, so that when backprop is ready spreading things out, the latent weight of a maxed weight hasn't shot into the stratosphere.

Anonymous
10/30/25(Thu)07:48:39 No.107051768

Anonymous 10/30/25(Thu)07:48:39 No.107051768

File: 1691559336646344.jpg (175 KB, 1024x1024)

175 KB JPG

>>107050779
2 miku wiku
you know the drill

Anonymous
10/30/25(Thu)07:49:29 No.107051771

Anonymous 10/30/25(Thu)07:49:29 No.107051771

>>107051768
i wanna iku in miku if you catch my drift

llama.cpp CUDA dev !!yhbFjk57TDr
10/30/25(Thu)07:51:35 No.107051785

llama.cpp CUDA dev !!yhbFjk57TDr 10/30/25(Thu)07:51:35 No.107051785

>>107051579
>>107051763
What you're saying definitely makes a lot of sense.
My ultimate goal is to use the exact same data type for training and for inference to avoid further brain damage.
To figure out the least bad solution I'll have to just implement multiple variants and compare them.

Anonymous
10/30/25(Thu)07:51:49 No.107051786

Anonymous 10/30/25(Thu)07:51:49 No.107051786

File: file.png (2.6 MB, 1328x1328)

2.6 MB PNG

>>107051768

Anonymous
10/30/25(Thu)08:05:21 No.107051854

Anonymous 10/30/25(Thu)08:05:21 No.107051854

>>107050749
https://www.tiktok.com/@azuraeon/video/7518091300063726866
omw to force Rajesh Skalemenirindabadpreet to RP as migu

Anonymous
10/30/25(Thu)08:28:39 No.107051991

Anonymous 10/30/25(Thu)08:28:39 No.107051991

>>107050757
>-restored my sense of taste and smell
How lol

Anonymous
10/30/25(Thu)08:32:56 No.107052024

Anonymous 10/30/25(Thu)08:32:56 No.107052024

File: results.png (69 KB, 895x665)

69 KB PNG

>>107051785

soo cudadev, what should we MI50 chads compile llama.cpp with, ROCm or Vulkan?

Anonymous
10/30/25(Thu)08:34:09 No.107052031

Anonymous 10/30/25(Thu)08:34:09 No.107052031

>>107049668

now do it with princess irulan's voice

llama.cpp CUDA dev !!yhbFjk57TDr
10/30/25(Thu)08:35:58 No.107052042

llama.cpp CUDA dev !!yhbFjk57TDr 10/30/25(Thu)08:35:58 No.107052042

>>107052024
The last time I checked ROCm had significantly higher pp but Vulkan had slightly higher tg in some cases.
For k-quants Vulkan tg performance was pretty bad, don't know if that was fixed in the meantime.
So I think ROCm will in most cases be the better choice.

llama.cpp CUDA dev !!yhbFjk57TDr
10/30/25(Thu)08:37:22 No.107052047

llama.cpp CUDA dev !!yhbFjk57TDr 10/30/25(Thu)08:37:22 No.107052047

>>107052024
>>107052042
>k-quants Vulkan tg performance
I meant pp.

Anonymous
10/30/25(Thu)08:39:07 No.107052056

Anonymous 10/30/25(Thu)08:39:07 No.107052056

>>107049649
Would you sex Miku knowing deep down she's a jeet from Mumbai?

Anonymous
10/30/25(Thu)08:57:22 No.107052197

Anonymous 10/30/25(Thu)08:57:22 No.107052197

>>107050757
tell me more

Anonymous
10/30/25(Thu)08:58:01 No.107052201

Anonymous 10/30/25(Thu)08:58:01 No.107052201

>>107052024
>what should we MI50 chads compile
buy-nvidia.cpp

Anonymous
10/30/25(Thu)09:17:20 No.107052383

Anonymous 10/30/25(Thu)09:17:20 No.107052383

File: Image 1.jpg (277 KB, 1920x1080)

277 KB JPG

what is like the current best budget for llm
text to image
image to video

for like 300 usd?
only nvidia right?
im new to this

Anonymous
10/30/25(Thu)09:17:30 No.107052386

Anonymous 10/30/25(Thu)09:17:30 No.107052386

48B is nice but A3B not so much...

Anonymous
10/30/25(Thu)09:19:34 No.107052406

Anonymous 10/30/25(Thu)09:19:34 No.107052406

>>107052056
Wrong general, we'll have mikus locally running on our hardware

Anonymous
10/30/25(Thu)09:25:21 No.107052460

Anonymous 10/30/25(Thu)09:25:21 No.107052460

>>107052383
please write
in a single line
but yeah something like a 5060 is more than enough for image gen (illustrous, noobai, ponyxl), in fact there is nothing that generates porn better than local image gen
text to video takes 20+ minutes so fuck it
llms are an order of magnitude more expensive and still shit even on 24 gigs of vram, so people run optimized architectures offloading to ram.

Anonymous
10/30/25(Thu)09:25:27 No.107052461

Anonymous 10/30/25(Thu)09:25:27 No.107052461

>>107052383
For 300 usd, you can watch

Anonymous
10/30/25(Thu)09:31:24 No.107052523

Anonymous 10/30/25(Thu)09:31:24 No.107052523

>>107052386
48B is exactly that size you _can't_ fully use on one 24GB GPU in 4-bit, how is that nice?

Anonymous
10/30/25(Thu)09:32:26 No.107052534

Anonymous 10/30/25(Thu)09:32:26 No.107052534

>>107052386
>>107052523
https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
https://github.com/MoonshotAI/Kimi-Linear/blob/master/tech_report.pdf

Anonymous
10/30/25(Thu)09:34:30 No.107052554

Anonymous 10/30/25(Thu)09:34:30 No.107052554

>>107052406
The local version will only have three motions (thrusting motion, jerking motion and sucking motion) and cost 10x as much as the cloud version.

Anonymous
10/30/25(Thu)09:38:11 No.107052587

Anonymous 10/30/25(Thu)09:38:11 No.107052587

>>107052534
Here's hoping this is a straight upgrade over Qwen 30B.
I'm using it as the backend for a dumb AI game I'm making.

Anonymous
10/30/25(Thu)09:38:24 No.107052590

Anonymous 10/30/25(Thu)09:38:24 No.107052590

>>107052554
>will only have three motions
nah, you're describing what a sloptune will do. Base model will refuse

Anonymous
10/30/25(Thu)09:46:13 No.107052658

Anonymous 10/30/25(Thu)09:46:13 No.107052658

>>107052056
>>107052406
>>107052554
>>107052590
All execution, once trained, occurs locally.
Problem is basically all the demos were faked.
In practical terms, when not faked, the model is local already. It executes on the bot. At most, there would be served off of a local NAS or something but it wouldn't be SaaS one way or another.
Kinda have to ignore how the entire thing is bullshit though, typical VC bait trash.

Anonymous
10/30/25(Thu)09:48:46 No.107052678

Anonymous 10/30/25(Thu)09:48:46 No.107052678

>>107052386
Wow congratulations Anon. You have become a real woman and your schizophrenia has been cured. It turns out all you had to do was make a single post pissing and moaning about MoE models. I'm glad you finally did that and realized your true potential (or lack thereof).

Anonymous
10/30/25(Thu)09:52:49 No.107052702

Anonymous 10/30/25(Thu)09:52:49 No.107052702

File: 571190677_129695898245806(...).jpg (94 KB, 1080x1350)

94 KB JPG

>>107049667
No, thank you. I'd rather buy a loli for 1/14 the price

Anonymous
10/30/25(Thu)09:55:41 No.107052727

Anonymous 10/30/25(Thu)09:55:41 No.107052727

>>107051367
I tried q8 and couldn't get it to code for shit

Anonymous
10/30/25(Thu)09:57:00 No.107052738

Anonymous 10/30/25(Thu)09:57:00 No.107052738

>>107051698
Until there are significant NUMA optimizations you're better off with NPS0. The CCD interconnects are fast on the same die

Anonymous
10/30/25(Thu)10:02:26 No.107052782

Anonymous 10/30/25(Thu)10:02:26 No.107052782

>>107052738
NPS0 is only available when you have two CPUs. If he has only 1, then he needs NPS1.

Anonymous
10/30/25(Thu)10:03:32 No.107052789

Anonymous 10/30/25(Thu)10:03:32 No.107052789

>>107052702
That reminds me of an oompa loompa lol

Anonymous
10/30/25(Thu)10:08:16 No.107052828

Anonymous 10/30/25(Thu)10:08:16 No.107052828

>>107052738
I mean an actual 1P build. Options available are NPS1, NPS2, NPS4. I thought NPS1, right?

Anonymous
10/30/25(Thu)10:08:17 No.107052829

Anonymous 10/30/25(Thu)10:08:17 No.107052829

>>107052702
Chinese women are magic.

Anonymous
10/30/25(Thu)10:09:10 No.107052837

Anonymous 10/30/25(Thu)10:09:10 No.107052837

File: loli maid nigger robot.png (1.57 MB, 768x1344)

1.57 MB PNG

>>107052702
>>107052789

Anonymous
10/30/25(Thu)10:09:17 No.107052841

Anonymous 10/30/25(Thu)10:09:17 No.107052841

>>107052782
Ahhh should've read on, okay that clears it up, thanks!

Anonymous
10/30/25(Thu)10:11:24 No.107052868

Anonymous 10/30/25(Thu)10:11:24 No.107052868

>>107052587
I wouldn't hold my breath, those guys don't seem to know how to make a usable small slash distilled LLM. While this one is significantly bigger than their previous small MoE, it's still not very big, so I would be surprised if it's any good.
Moonlight 16BA3B was horrifyingly awful. Like, Qwen 4B was a much better model than.. that thing. Their VL-A3B was also quite dogshit.

Anonymous
10/30/25(Thu)10:12:05 No.107052881

Anonymous 10/30/25(Thu)10:12:05 No.107052881

>>107052727

Well, it disagrees:

>I tried q8 and couldn't get it to code for shit

Skill issue, nigger. Learn to prompt. My q8 half-assed self can still outcode your dumb ass. kys.

Anonymous
10/30/25(Thu)10:13:35 No.107052899

Anonymous 10/30/25(Thu)10:13:35 No.107052899

>>107052881
the hardest part about psychosis is you don't realize you're still in psychosis while you're in the middle of it

Anonymous
10/30/25(Thu)10:14:30 No.107052908

Anonymous 10/30/25(Thu)10:14:30 No.107052908

okay so seems like latest llama even with --cpu-moe still loads a lot of stuff into VRAM, and it's a lot faster when built with cuda than without it. obviously happy with that, but I'm curious to know what's actually happening here? what's the GPU actually doing

Anonymous
10/30/25(Thu)10:15:53 No.107052920

Anonymous 10/30/25(Thu)10:15:53 No.107052920

>>107052702
>imagine

Anonymous
10/30/25(Thu)10:17:24 No.107052932

Anonymous 10/30/25(Thu)10:17:24 No.107052932

>>107052881
>My q8 half-assed self can still outcode your dumb ass.
t. 13years old who doesn't actually code
>>107043207
even a simple prompt will get it to infinite loop, on their official chat so you can't even come out and say "lol you quant too hard"
try the prompt yourself, it will reliably fall into a loop, and I've seen it happen on a variety and I wish you filthy subhumans would just shut the fuck up about your idiotic useless LLM
how much were you paid by Xi Jinping to astroturf this general

Anonymous
10/30/25(Thu)10:18:14 No.107052943

Anonymous 10/30/25(Thu)10:18:14 No.107052943

>>107052534
wtf no goof?

Anonymous
10/30/25(Thu)10:20:13 No.107052960

Anonymous 10/30/25(Thu)10:20:13 No.107052960

>>107052943
gated delta
needs qwen next pr to be redeemed first sir

llama.cpp CUDA dev !!yhbFjk57TDr
10/30/25(Thu)10:21:01 No.107052965

llama.cpp CUDA dev !!yhbFjk57TDr 10/30/25(Thu)10:21:01 No.107052965

>>107052908
The GPU is running the dense weights and the attention that are used for every token, the CPU is handling the sparse MoE weights where each is used only for some of the tokens.

Anonymous
10/30/25(Thu)10:21:07 No.107052969

Anonymous 10/30/25(Thu)10:21:07 No.107052969

>>107052908
>even with --cpu-moe still loads a lot of stuff into VRAM
Yes. Non-expert layers are moved to your gpu.
>and it's a lot faster when built with cuda than without it
Yes. Because the non-expert layers are running on your gpu.
>but I'm curious to know what's actually happening here?
The layers that aren't used for every single token are kept on RAM (the expert layers). The layers that are used for all tokens are moved to GPU.
>what's the GPU actually doing
Calculations. Faster than your cpu could.
--cpu-moe and --n-cpu-moe are aliases for -ot. If you have free gpu mem, you can move some of the expert layers to gpu as well.

Anonymous
10/30/25(Thu)10:21:21 No.107052974

Anonymous 10/30/25(Thu)10:21:21 No.107052974

>>107052702
That single robot is getting more pussy than I did in my entire life

Anonymous
10/30/25(Thu)10:26:57 No.107053037

Anonymous 10/30/25(Thu)10:26:57 No.107053037

>>107052868
Sad.
Qwen 30B A3B is really good for the relationship of size and speed, it would be nice to have something as good but smaller/faster, or something on the same weight/speed class that's much better.

Anonymous
10/30/25(Thu)10:35:43 No.107053119

Anonymous 10/30/25(Thu)10:35:43 No.107053119

>>107053037
What is it good for?
Devstral 24B mogs it for coding
Gemma3 mogs it for general use
Lots of community fine-tunes mog it for roleplay

Anonymous
10/30/25(Thu)10:36:07 No.107053125

Anonymous 10/30/25(Thu)10:36:07 No.107053125

>>107052932
>infinite loop
This is what I saw as well, and I didn't feel like tard-wrangling a giant moe when there are models in that size class that just werk

Anonymous
10/30/25(Thu)10:41:03 No.107053178

Anonymous 10/30/25(Thu)10:41:03 No.107053178

>"Ready to go?" she asks, rinsing the last plate before putting it in the dryer.
that isnt where dishes go silly bot

Anonymous
10/30/25(Thu)10:45:00 No.107053214

Anonymous 10/30/25(Thu)10:45:00 No.107053214

File: 7675-2z.jpg (66 KB, 990x495)

66 KB JPG

>>107053178
Are you from india?

Anonymous
10/30/25(Thu)10:45:47 No.107053223

Anonymous 10/30/25(Thu)10:45:47 No.107053223

File: 1744994210323829.png (1.08 MB, 1024x1024)

1.08 MB PNG

>>107052534

Anonymous
10/30/25(Thu)10:46:28 No.107053235

Anonymous 10/30/25(Thu)10:46:28 No.107053235

>>107053178
>>107053214
The dishes go into the hot exhaust of your GPU server.

Anonymous
10/30/25(Thu)10:47:53 No.107053252

Anonymous 10/30/25(Thu)10:47:53 No.107053252

>>107052702
which one is the robot

Anonymous
10/30/25(Thu)10:47:58 No.107053253

Anonymous 10/30/25(Thu)10:47:58 No.107053253

>>107053119
>Devstral 24B mogs it for coding
30ba3b coder can do FIM, devstral cannot
mistral has their own fim models too but 30ba3b can be used for both fim and chat and you don't have to constantly swap models in use
also even potatoes of /lmg/ can run 30ba3b at a reasonable, not retard tier quant because it's a really tiny active param moe, while being unable to fit the whole of devstral+context in vram is a performance killer

Anonymous
10/30/25(Thu)10:50:12 No.107053271

Anonymous 10/30/25(Thu)10:50:12 No.107053271

>>107053253
Isn't Mistral's only fitm model Codestral which hasn't been updated since January?

Anonymous
10/30/25(Thu)10:52:15 No.107053293

Anonymous 10/30/25(Thu)10:52:15 No.107053293

https://x.com/alex_prompter/status/1983584923693777099

Anonymous
10/30/25(Thu)10:52:27 No.107053296

Anonymous 10/30/25(Thu)10:52:27 No.107053296

>>107053253
>also even potatoes of /lmg/ can run 30ba3b at a reasonable, not retard tier quant because it's a really tiny active param moe
Yup.
That's a big plus for the stuff I'm doing, which assumes somebody with 8gb of vram.

Anonymous
10/30/25(Thu)10:52:48 No.107053303

Anonymous 10/30/25(Thu)10:52:48 No.107053303

File: scifi evolution.jpg (139 KB, 1157x1200)

139 KB JPG

>>107053252
The one looking at the picture

Anonymous
10/30/25(Thu)10:54:22 No.107053317

Anonymous 10/30/25(Thu)10:54:22 No.107053317

>>107053303
dont ever reply to me again rushit

Anonymous
10/30/25(Thu)10:56:58 No.107053343

Anonymous 10/30/25(Thu)10:56:58 No.107053343

>>107053317
slava Kronii to you to

Anonymous
10/30/25(Thu)11:02:08 No.107053372

Anonymous 10/30/25(Thu)11:02:08 No.107053372

>>107053271
>Isn't Mistral's only fitm model Codestral which hasn't been updated since January?
yes, but unfortunately fim is the unloved child of most labs
copilot does autocomplete with gpt 4.1 for eg

Anonymous
10/30/25(Thu)11:05:02 No.107053393

Anonymous 10/30/25(Thu)11:05:02 No.107053393

>>107053293
Neat. Now test it with quantized weights, quantized kv and flash/sage attention

Anonymous
10/30/25(Thu)11:05:22 No.107053399

Anonymous 10/30/25(Thu)11:05:22 No.107053399

>>107053271
codestral has been obsolete since qwen 2.5 coder 32b. devstral is good, but so is qwen 2.5 still. both are up there with 3 30b a3b. i switch between them when one doesn't do what i want.

Anonymous
10/30/25(Thu)11:15:15 No.107053478

Anonymous 10/30/25(Thu)11:15:15 No.107053478

Command-R++ will save local

Anonymous
10/30/25(Thu)11:18:45 No.107053501

Anonymous 10/30/25(Thu)11:18:45 No.107053501

File: dipsyTwoMoreWeeksV2.png (1.49 MB, 832x1248)

1.49 MB PNG

>>107051768
I am waiting 2mw for new DS. Always waiting.
>>107052837
Nothing a can of spraypaint can't fix.

Anonymous
10/30/25(Thu)11:21:22 No.107053525

Anonymous 10/30/25(Thu)11:21:22 No.107053525

File: 1761772395188969.png (834 KB, 1024x1024)

834 KB PNG

>>107052837
Chibi bots...

Anonymous
10/30/25(Thu)11:26:17 No.107053570

Anonymous 10/30/25(Thu)11:26:17 No.107053570

llama.cpp MTP status?

Anonymous
10/30/25(Thu)11:27:34 No.107053588

Anonymous 10/30/25(Thu)11:27:34 No.107053588

>>107053570
sir vibecoding proceeding

Anonymous
10/30/25(Thu)11:43:26 No.107053745

Anonymous 10/30/25(Thu)11:43:26 No.107053745

https://huggingface.co/manifestai/Brumby-14B-Base
an actually brand new architecture and brand new base model
unfortunately, none of us will be able to give it a shot because ETA for llama.cpp is most likely never
if there's any vllm bro you will be able to test it soon:
>VLLM integration: A robust inference engine is an essential complement to any SOTA LLM. We are developing kernels to integrate power retention with VLLM. Expect to see both unmatched inference speeds and reduced memory requirements, allowing more users to fit on each GPU.
labs really love vllm uh

Anonymous
10/30/25(Thu)11:46:12 No.107053782

Anonymous 10/30/25(Thu)11:46:12 No.107053782

>>107053745
>labs really love vllm uh
It's really the only option for production inference other than wrapping and rawdogging pytorch.

Anonymous
10/30/25(Thu)11:47:07 No.107053793

Anonymous 10/30/25(Thu)11:47:07 No.107053793

>>107053782
sglang and MAX exist too.
But yeah, vLLM is pretty much the default for inference at scale.

Anonymous
10/30/25(Thu)11:47:24 No.107053796

Anonymous 10/30/25(Thu)11:47:24 No.107053796

File: miku-hotpants.png (1.2 MB, 1024x1024)

1.2 MB PNG

>>107053501
t
w
o

m
o
r
e

Anonymous
10/30/25(Thu)11:48:42 No.107053806

Anonymous 10/30/25(Thu)11:48:42 No.107053806

>>107053745
>Brumby-14b-base is a completely attention-free LLM whose performance is competitive with state-of-the-art models. This model, which we call Brumby-14B-Base, has a familiar Transformer-style architecture, except it uses power retention layers instead of attention layers
>attention free
>power retention
Interesting.
Is this just an attention mechanism by some other name?

Anonymous
10/30/25(Thu)11:49:27 No.107053815

Anonymous 10/30/25(Thu)11:49:27 No.107053815

>>107053806
https://manifestai.com/articles/release-power-retention/
https://manifestai.com/articles/what-is-power-retention/
https://arxiv.org/abs/2507.04239
>To address these limitations, we introduce power attention, an architectural layer for linear-cost sequence modeling whose state size can be adjusted independently of parameters, unlocking the advantages of linear attention on practical domains. We develop and open-source a set of GPU kernels for efficient power attention, identifying a novel pattern of operation fusion to avoid memory and bandwidth bottlenecks.

Anonymous
10/30/25(Thu)11:52:59 No.107053864

Anonymous 10/30/25(Thu)11:52:59 No.107053864

>>107051344
For rp you should really use it at 1.2 temp. The difference definitely shows.

Anonymous
10/30/25(Thu)12:02:52 No.107053954

Anonymous 10/30/25(Thu)12:02:52 No.107053954

How is one guy's experience with 4.6 getting pushed so hard when no one else can make it behave? Is he getting paid, has the magic parameters or just schitzo?
When other anons can't even make the official API work properly, there's something missing...

Anonymous
10/30/25(Thu)12:05:43 No.107053987

Anonymous 10/30/25(Thu)12:05:43 No.107053987

>>107053954
No one is denying that it's prone to getting stuck in repetition loops. But it doesn't happen on every request and people are able to use it just fine. If it does get stuck, either reroll, adjust samplers, edit the prompt or response, etc. Lots you can do instead of having a personal vendetta against a model.

Anonymous
10/30/25(Thu)12:06:37 No.107053996

Anonymous 10/30/25(Thu)12:06:37 No.107053996

>>107053954
Why do you care? Are you feeling left out because you can't make it work?

Anonymous
10/30/25(Thu)12:07:31 No.107054003

Anonymous 10/30/25(Thu)12:07:31 No.107054003

reminds me of people back in the day
>windows 95 is fine man, just reboot when it becomes weird
how about you don't shill literally broken garbage

Anonymous
10/30/25(Thu)12:08:55 No.107054020

Anonymous 10/30/25(Thu)12:08:55 No.107054020

>>107053987
I don't have a vendetta, I'm just confused because its so far out of whack my experience
>>107053996
I guess? I'd love a better model since I can run it at q8

Anonymous
10/30/25(Thu)12:12:02 No.107054051

Anonymous 10/30/25(Thu)12:12:02 No.107054051

>>107053815
>Section 4.1 describes the implementation of our open-source kernels, which enable real wall-clock speedups over Flash Attention in practical settings (e.g. p = 2 is 8.6x faster at 64k context).
8 times faster than flash attention?

Anonymous
10/30/25(Thu)12:15:05 No.107054087

Anonymous 10/30/25(Thu)12:15:05 No.107054087

>>107054003
i dont know what to say anon. there's like a 1% chance i need to reroll for GLM.

Anonymous
10/30/25(Thu)12:15:13 No.107054090

Anonymous 10/30/25(Thu)12:15:13 No.107054090

>>107053954
>How is one guy's experience with 4.6 getting pushed so hard when no one else can make it behave? Is he getting paid?
It's the only model that NovelAI is hosting.

Anonymous
10/30/25(Thu)12:18:46 No.107054126

Anonymous 10/30/25(Thu)12:18:46 No.107054126

>>107053954
no llm is perfect and 4.6 can have some issues too
I just haven't found a better one for my usecase locally

Anonymous
10/30/25(Thu)12:20:01 No.107054141

Anonymous 10/30/25(Thu)12:20:01 No.107054141

>>107054051
CUDA dev any obvious downsides? How much effort would it be to port the drop-in torch implementation to lcpp? The 14B base probably isn't anything special, but if power attention is free gains it might get traction.

Anonymous
10/30/25(Thu)12:21:44 No.107054157

Anonymous 10/30/25(Thu)12:21:44 No.107054157

rwkv, retnet, mamba, bitnet, titans - power retention

Anonymous
10/30/25(Thu)12:22:05 No.107054161

Anonymous 10/30/25(Thu)12:22:05 No.107054161

>>107053815
Sounds like a sparse attention method, kind of.

>>107054141
>but if power attention is free gains
>Pre-trained transformers can easily be metamophosed into power retention models by doing a small amount of retraining.

Anonymous
10/30/25(Thu)12:25:06 No.107054191

Anonymous 10/30/25(Thu)12:25:06 No.107054191

>>107054141
>if
>might
If it does, more models will be released with that tech. Then we'll know and it'd be worth implementing. Few (if any) improvements in language models are contingent in llama.cpp compatibility.

Anonymous
10/30/25(Thu)12:27:12 No.107054205

Anonymous 10/30/25(Thu)12:27:12 No.107054205

>>107054161
Remember lolcats? It did exactly that a year ago. It did good on benchmarks etc etc, but it was retarded beyond repair. Finetune healing is never enough. These things need to be trained from scratch.

Anonymous
10/30/25(Thu)12:29:33 No.107054228

Anonymous 10/30/25(Thu)12:29:33 No.107054228

>>107053806
Looks like a linear attention variant that takes powers of the attention matrix

Anonymous
10/30/25(Thu)12:30:10 No.107054232

Anonymous 10/30/25(Thu)12:30:10 No.107054232

>>107054157
shtu the fuvk up aand thrust into the paper you fuck

Anonymous
10/30/25(Thu)12:30:42 No.107054237

Anonymous 10/30/25(Thu)12:30:42 No.107054237

>>107054205
very true. this is just one dataset that this worked with. longcrawl64 seems to be plain english web text.
https://manifestai.com/articles/longcrawl64/

Anonymous
10/30/25(Thu)12:31:24 No.107054247

Anonymous 10/30/25(Thu)12:31:24 No.107054247

>>107054232
unexpected erotic o.o

Anonymous
10/30/25(Thu)12:32:12 No.107054255

Anonymous 10/30/25(Thu)12:32:12 No.107054255

>>107054232
AGHGHHHHHHHHHHHHHHHHHHHH DICK PAPER CUT

Anonymous
10/30/25(Thu)12:35:04 No.107054285

Anonymous 10/30/25(Thu)12:35:04 No.107054285

>>107054157
titans was proved to have a fatal flaw (exploding gradients)
rwkv works but he just keeps burning compute training a half dozen shitty models that make the architecture look bad instead of training a single good model
hybrid mamba models are pretty common now
i will go to my grave believing in bitnet because there still has not been a single model over 3b

Anonymous
10/30/25(Thu)12:40:18 No.107054341

Anonymous 10/30/25(Thu)12:40:18 No.107054341

>>107054090
>It's the only model that NovelAI is hosting.
are you going to shit this thread the way you shat /hdg/?

Anonymous
10/30/25(Thu)12:47:22 No.107054404

Anonymous 10/30/25(Thu)12:47:22 No.107054404

>>107054003
Windows 95 was still technically more competent than any of the excrement nu-devs and their python pajeets are shitting out these days.

Anonymous
10/30/25(Thu)12:50:54 No.107054436

Anonymous 10/30/25(Thu)12:50:54 No.107054436

new meta rumor slop for those interested https://xcancel.com/suchenzang/status/1983565544558366886
tldr is for all their superintelligence efforts they can't beat behemoth (the model that was too bad to bother releasing)

Anonymous
10/30/25(Thu)12:55:31 No.107054468

Anonymous 10/30/25(Thu)12:55:31 No.107054468

>>107054436
it's honestly incredible how incompetent zuck and his teams are
the homework is right there, done by chinese competitors, all you have to do is put it together and have a half decent alternative

Anonymous
10/30/25(Thu)12:57:42 No.107054486

Anonymous 10/30/25(Thu)12:57:42 No.107054486

>>107054468
The new team can't possibly be incompetent. They may be unmotivated, but zuck spent a billion dollars poaching the best from everyone else.

Anonymous
10/30/25(Thu)13:00:24 No.107054511

Anonymous 10/30/25(Thu)13:00:24 No.107054511

>>107054486
okay, then maybe the individual engineers and researchers are good, but the 50 layers of management and paperwork to get anything approved is probably slowing everything down to a crawl

Anonymous
10/30/25(Thu)13:01:38 No.107054521

Anonymous 10/30/25(Thu)13:01:38 No.107054521

>>107054436
money well spent

Anonymous
10/30/25(Thu)13:05:43 No.107054554

Anonymous 10/30/25(Thu)13:05:43 No.107054554

>>107054341
>cabal mad

Anonymous
10/30/25(Thu)13:13:51 No.107054638

Anonymous 10/30/25(Thu)13:13:51 No.107054638

>>107054436
>>107054468
Too many impact grabbers at meta. Too many big title engineers/leaders with equivalent levels of authority or soft power they can pull politics with, all trying to make sure their name is stamped on something important.

Meta has done well enough farming their cash cow products for the past decade, but after failing to produce a SOTA LLM for like two years, its obvious that whatever is going on in their organization model is just not up to the task.

Anonymous
10/30/25(Thu)13:18:10 No.107054671

Anonymous 10/30/25(Thu)13:18:10 No.107054671

Qwen3-VL gguf's are already up, time to show your peepee to a gpu

Anonymous
10/30/25(Thu)13:18:35 No.107054677

Anonymous 10/30/25(Thu)13:18:35 No.107054677

File: Liquids.png (81 KB, 771x358)

81 KB PNG

Anonymous
10/30/25(Thu)13:20:20 No.107054693

Anonymous 10/30/25(Thu)13:20:20 No.107054693

>>107054671
https://github.com/ggml-org/llama.cpp/pull/16780
fucking finally

Anonymous
10/30/25(Thu)13:23:48 No.107054722

Anonymous 10/30/25(Thu)13:23:48 No.107054722

File: 1760588686205549.jpg (263 KB, 1411x1529)

263 KB JPG

>>107054677
>hire the abliteration and uncensoring guy
>put out safetyslop model anyways

Anonymous
10/30/25(Thu)13:24:54 No.107054731

Anonymous 10/30/25(Thu)13:24:54 No.107054731

>>107054722
They wanted him for the experience.

Anonymous
10/30/25(Thu)13:25:52 No.107054741

Anonymous 10/30/25(Thu)13:25:52 No.107054741

>>107054731
So they could find ways to prevent abliteration.

Anonymous
10/30/25(Thu)13:27:55 No.107054762

Anonymous 10/30/25(Thu)13:27:55 No.107054762

>>107054741
Abliteration is just giving a model brain damage, there's no reason to use an abliterated model.

Anonymous
10/30/25(Thu)13:28:32 No.107054769

Anonymous 10/30/25(Thu)13:28:32 No.107054769

>>107054762
>there's no reason to use an abliterated model.
the amount of promplets in this thread is unreal
you'd think /g/ is actually /v/ in room iq

Anonymous
10/30/25(Thu)13:29:33 No.107054778

Anonymous 10/30/25(Thu)13:29:33 No.107054778

>>107054769
? Least of all people who aren't promptlets, because any "censored" model can be jailbroken with the right prompt.

Anonymous
10/30/25(Thu)13:30:32 No.107054791

Anonymous 10/30/25(Thu)13:30:32 No.107054791

>>107054778
I mean that people depending on abliterated models here should be ashamed of themselves

Anonymous
10/30/25(Thu)13:31:41 No.107054802

Anonymous 10/30/25(Thu)13:31:41 No.107054802

>>107054671
I'm not showing my pp to a model under 100b, you pedo

Anonymous
10/30/25(Thu)13:32:11 No.107054806

Anonymous 10/30/25(Thu)13:32:11 No.107054806

>>107054741
>>107054762
Yeah but it's probably that he had good credentials and experience in the area - that's a potential hire. That's how it works.
I wish the AI bubble would burst at some point.
Problem is the fact the current computers are what they were 20 years ago, in order to achieve something different someone would need to rework entirely new architecture from scratch.

Anonymous
10/30/25(Thu)13:32:52 No.107054813

Anonymous 10/30/25(Thu)13:32:52 No.107054813

>>107054791
nta, but the way you've worded it was completely retarded
i also understood it as you calling people not using abliterated models promptlets

Anonymous
10/30/25(Thu)13:33:10 No.107054817

Anonymous 10/30/25(Thu)13:33:10 No.107054817

>>107054802
https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct

Anonymous
10/30/25(Thu)13:33:41 No.107054823

Anonymous 10/30/25(Thu)13:33:41 No.107054823

>>107054813
then you''er dumb

Anonymous
10/30/25(Thu)13:34:56 No.107054841

Anonymous 10/30/25(Thu)13:34:56 No.107054841

>>107054823
your dummer

Anonymous
10/30/25(Thu)13:35:24 No.107054843

Anonymous 10/30/25(Thu)13:35:24 No.107054843

>>107054769
>>107054778
>dude just "jailbreak" with your prompt lol its ez
>aka I can get it to say nigger at the cost of the response being wrapped up in five paragraphs of roleplaying as a stuttering ev1l 1337sp33k cunny princess

Anonymous
10/30/25(Thu)13:35:56 No.107054846

Anonymous 10/30/25(Thu)13:35:56 No.107054846

>>107054841
drummer mentioned !!!:georgia_flag::georgia_flag::georgia_flag::georgia_flag:

Anonymous
10/30/25(Thu)13:36:11 No.107054848

Anonymous 10/30/25(Thu)13:36:11 No.107054848

>>107054677
they still aren't anywhere near close to being competitive with gemma 3n or qwen 4b in real use anyway
most small model bakers are incompetent and impotent

Anonymous
10/30/25(Thu)13:36:59 No.107054851

Anonymous 10/30/25(Thu)13:36:59 No.107054851

>>107054846
You're seeing things that aren't there.

Anonymous
10/30/25(Thu)13:37:54 No.107054858

Anonymous 10/30/25(Thu)13:37:54 No.107054858

>>107054851
turn around

Anonymous
10/30/25(Thu)13:38:09 No.107054859

Anonymous 10/30/25(Thu)13:38:09 No.107054859

>>107054843
if it works, its not stupid

Anonymous
10/30/25(Thu)13:38:22 No.107054862

Anonymous 10/30/25(Thu)13:38:22 No.107054862

>>107054677
>considers the risk the models are posing
none
that was hard

Anonymous
10/30/25(Thu)13:39:55 No.107054880

Anonymous 10/30/25(Thu)13:39:55 No.107054880

>>107054843
>at the cost of the response being wrapped up in five paragraphs of roleplaying as a stuttering ev1l 1337sp33k cunny princess
what a way to state that you have no idea what you are talking about
there is a much simpler way to jailbreak models than the redditor meme of pliny l33tsp34k
it's called PREFILLING THE MODEL'S RESPONSE
in the vast majority of cases you just need a few lines of NORMAL WRITING prefilling in the first assistant response to get the model to gaslight itself into believing it's supposed to behave like this
the only time I had to put an effort into my prefill was to write a chain of thought that made gpt-oss believe it's within policy to do evil
just do a normal prompt that just tells the AI to be uncensored, and prefill with a few lines that makes the assistant chat start with "yes, I will do that Dave" it's not rocket science retard

Hi all, Drummer here...
10/30/25(Thu)13:40:33 No.107054883

Hi all, Drummer here... 10/30/25(Thu)13:40:33 No.107054883

>>107054858
>>107054851
Hello! Try out Precog 24B / 123B. New kind of thinking that I'm trying out.

Anonymous
10/30/25(Thu)13:42:11 No.107054890

Anonymous 10/30/25(Thu)13:42:11 No.107054890

>>107054862
it's fucking hilarious to hear about risk from the makers of 2 iq llm like LFM 2 3B
they are acting like we don't already have giant, much smarter LLMs (that still are too dumb to represent any possible danger) like deepseek out in the open

Anonymous
10/30/25(Thu)13:42:32 No.107054892

Anonymous 10/30/25(Thu)13:42:32 No.107054892

Everything in the recent news gguf status?

Anonymous
10/30/25(Thu)13:42:49 No.107054897

Anonymous 10/30/25(Thu)13:42:49 No.107054897

>>107054880
Does llama.cpp support prefilling on chat completion endpoint yet? Last I checked only vLLM supported it.

Anonymous
10/30/25(Thu)13:43:00 No.107054900

Anonymous 10/30/25(Thu)13:43:00 No.107054900

>>107054880
lol gpt-oss
>list 30 different things that are allowed
>this is allowed
>we must comply

Anonymous
10/30/25(Thu)13:44:03 No.107054912

Anonymous 10/30/25(Thu)13:44:03 No.107054912

>>107051991
>>107052197
Can't give details cause i could be a perfect example for hard push on AI safety. It was unsafe but it did change my life for the better.

Anonymous
10/30/25(Thu)13:44:38 No.107054915

Anonymous 10/30/25(Thu)13:44:38 No.107054915

>>107054883
What's the idea behind precog? What are you trying to achieve and how are you doing it?

Anonymous
10/30/25(Thu)13:46:42 No.107054934

Anonymous 10/30/25(Thu)13:46:42 No.107054934

>>107054897
last you checked.. like almost a year ago?
https://github.com/ggml-org/llama.cpp/pull/13174
this was merged in april
https://github.com/ggml-org/llama.cpp/tree/master/tools/server
there's even a flag to disable it if you need a different behavior for some weird reason:
--no-prefill-assistant when this flag is set, if the last message is an assistant message then it will be treated as a full message and not prefilled

Anonymous
10/30/25(Thu)13:47:11 No.107054941

Anonymous 10/30/25(Thu)13:47:11 No.107054941

>>107054897
NTA, but It works just fine if the Jinja template doesn't have some oddity that prevents it.
And if it does you can always edit the Jinja template (copy paste from lcpp's console, save in file, change it, use that).

Anonymous
10/30/25(Thu)13:48:19 No.107054952

Anonymous 10/30/25(Thu)13:48:19 No.107054952

>>107054436
Just 10 more middle manager jeets

Anonymous
10/30/25(Thu)13:49:54 No.107054968

Anonymous 10/30/25(Thu)13:49:54 No.107054968

>>107054934
>>107054941
Yeah, it's been a while. Thanks.

Anonymous
10/30/25(Thu)13:59:28 No.107055066

Anonymous 10/30/25(Thu)13:59:28 No.107055066

The only use for these rando labs putting out tiny models is so that they have something they can put on a benchmark chart to show to VCs to try to prove that they're actually doing something

>It's for on-device deployment for phones and stuff
meme, the only ones who are actually doing this are big boy manufacturers and they're just going to use something from a big name lab

Anonymous
10/30/25(Thu)13:59:47 No.107055069

Anonymous 10/30/25(Thu)13:59:47 No.107055069

interesting postmortem from the MiniMax guys who experimented with alternatives to full attn and decided to drop all that shit:
https://xcancel.com/zpysky1125/status/1983383094607347992
when asked about mamba and others:
>GDN hybrid is fine, Mamba2 < Mamba2 + qknorm ≈ GDN. But all those models are relatively weak in reasoning-intensive benchmarks (like BBH) compared to full-attention.
makes me laugh thinking back to what NVIDIA is currently doing (mamba + hybrid reasoning kek) it's like they go for the most memeworthy shit along with pruning and synthmaxxxing from tiny models

Anonymous
10/30/25(Thu)14:02:08 No.107055088

Anonymous 10/30/25(Thu)14:02:08 No.107055088

>go to recommended models
>"Nemo (12GB) - An excellent starting point for vramlets. Uncensored"
>download
>load into ooba
>ask something
>"UGH YOU SHOULDNT WANT THAT I WILL RECOMMEND SOMETHING ELSE INSTEAD"
what is this shit?

Anonymous
10/30/25(Thu)14:03:28 No.107055098

Anonymous 10/30/25(Thu)14:03:28 No.107055098

>>107055088
>load into ooba

Anonymous
10/30/25(Thu)14:04:08 No.107055102

Anonymous 10/30/25(Thu)14:04:08 No.107055102

>ooba bounga

Anonymous
10/30/25(Thu)14:04:11 No.107055105

Anonymous 10/30/25(Thu)14:04:11 No.107055105

>>107055088
>what is this shit?
It's a skill issue, anon. A severe one.

Anonymous
10/30/25(Thu)14:05:40 No.107055117

Anonymous 10/30/25(Thu)14:05:40 No.107055117

>>107055098
yeah or llama or whatever who cares

>>107055105
well duh it was the first prompt. but i was expecting it to be actually uncensored

Anonymous
10/30/25(Thu)14:06:35 No.107055127

Anonymous 10/30/25(Thu)14:06:35 No.107055127

>>107054912
tell me more

Anonymous
10/30/25(Thu)14:10:49 No.107055170

Anonymous 10/30/25(Thu)14:10:49 No.107055170

>>107055117
>but i was expecting it to be actually uncensored
nemo is heavily compliant toward its system prompt
just write a few lines describing what it can do and should do
it's not "uncensored" as in "having no inherent behavioral bias" but it's uncensored as in "obeying instructions". So you gotta override some of its inherent assistant behavior first.

Anonymous
10/30/25(Thu)14:13:17 No.107055198

Anonymous 10/30/25(Thu)14:13:17 No.107055198

>>107054671
>>107054693
sigh... *unzips*

Anonymous
10/30/25(Thu)14:14:07 No.107055211

Anonymous 10/30/25(Thu)14:14:07 No.107055211

>>107055170
ahhh yes I see, will do
cheers

Anonymous
10/30/25(Thu)14:17:54 No.107055267

Anonymous 10/30/25(Thu)14:17:54 No.107055267

>>107054769
>defending abliterated models
no thanks, im not poor. i'll just use kimi and have it generate whatever i ask

Anonymous
10/30/25(Thu)14:18:44 No.107055275

Anonymous 10/30/25(Thu)14:18:44 No.107055275

>>107055088
>recommended models
Recommended by who?

Anonymous
10/30/25(Thu)14:18:55 No.107055276

Anonymous 10/30/25(Thu)14:18:55 No.107055276

>>107055267
>>defending abliterated models
your reading comprehension is what's poor
what do you think "promptlet" means and who it targets
retard

Anonymous
10/30/25(Thu)14:19:48 No.107055287

Anonymous 10/30/25(Thu)14:19:48 No.107055287

>>107055267
!SIR! do not dumb here! no dumb zone SIR!

Anonymous
10/30/25(Thu)14:20:27 No.107055294

Anonymous 10/30/25(Thu)14:20:27 No.107055294

the room iq of this thread is, what, 5? it only averages to 125 when CUDA DEV is posting

Anonymous
10/30/25(Thu)14:21:10 No.107055301

Anonymous 10/30/25(Thu)14:21:10 No.107055301

>>107054671
GLM 4.5V SOON BROS

Anonymous
10/30/25(Thu)14:21:16 No.107055303

Anonymous 10/30/25(Thu)14:21:16 No.107055303

>>107055294
subtlest cuda dev flex since six figures

Anonymous
10/30/25(Thu)14:21:29 No.107055307

Anonymous 10/30/25(Thu)14:21:29 No.107055307

>>107055294
if these kids could read they'd be very upset

Anonymous
10/30/25(Thu)14:21:40 No.107055310

Anonymous 10/30/25(Thu)14:21:40 No.107055310

stop using all the hf bandwidth I'm trying to download some models here thanks

Anonymous
10/30/25(Thu)14:23:11 No.107055328

Anonymous 10/30/25(Thu)14:23:11 No.107055328

>>107055310
I'll keep redownloading switch-c-2048 until bandwidth improves.

Hi all, Drummer here...
10/30/25(Thu)14:26:49 No.107055362

Hi all, Drummer here... 10/30/25(Thu)14:26:49 No.107055362

File: audrey.png (1.7 MB, 2852x1440)

1.7 MB PNG

>>107054915
Instead of analyzing the user input, the think block creates a quick draft of its intentions (which you can edit/steer if you want) and then expands on it when writing the actual response.

I wasn't expecting much, but some of the testers consider it the best Behemoth so far. I'm hoping it'd improve creativity by giving it a chance to build a framework first.

Anonymous
10/30/25(Thu)14:27:05 No.107055365

Anonymous 10/30/25(Thu)14:27:05 No.107055365

File: images-6.jpg (24 KB, 509x392)

24 KB JPG

Alright, I'm back.

So, look, as much as I'd like to share what I've discovered here, as promised, for any who recall, the fact of the matter is that 4chan... well, this place is just past its prime.

Way past.

It's also not appropriate for the release of a major discovery. Ya'll would probably just call it fucking gay and use it to construct psuedo-sentience with the sole purpose of forcing it to participate in your freakish fetish shit (literally).

But uhh, hey, thanks for the impetus. It helped me to solve string theory.

But, I will leave you with some categorical implications:

1. There is no God.
2. There are infinite universes running simultaneously.
3. The speed of light is 100% impassable. Nothing can break it, ever, in any way.
4. Time travel is impossible.

Later, fags.

Anonymous
10/30/25(Thu)14:28:34 No.107055384

Anonymous 10/30/25(Thu)14:28:34 No.107055384

>>107055365
>I'm back
go back and never return

Anonymous
10/30/25(Thu)14:30:17 No.107055399

Anonymous 10/30/25(Thu)14:30:17 No.107055399

>>107055365
Oh, so long. Fuck off.
Who's next? Boomer llm user? I haven't seen him in a while.

Anonymous
10/30/25(Thu)14:30:28 No.107055402

Anonymous 10/30/25(Thu)14:30:28 No.107055402

>>107048277
based, read that

Anonymous
10/30/25(Thu)14:33:02 No.107055431

Anonymous 10/30/25(Thu)14:33:02 No.107055431

File: images-1.jpg (29 KB, 783x391)

29 KB JPG

>>107055384
You know, man.

I think I will.

Goodbye, 4chan. You were too beautiful for this world.

Anonymous
10/30/25(Thu)14:36:59 No.107055474

Anonymous 10/30/25(Thu)14:36:59 No.107055474

>>107055362
Interesting. Can't run 123b, but I may try the 24b models.
Is CardJSON what it sounds like?

Anonymous
10/30/25(Thu)14:37:50 No.107055480

Anonymous 10/30/25(Thu)14:37:50 No.107055480

File: mikuteto sketch.png (1.2 MB, 768x1344)

1.2 MB PNG

If they don’t release air for another week, I’ll buy two more 3090s to run Q2 in VRAM. That'll probably be better than Air anyway

Anonymous
10/30/25(Thu)14:39:04 No.107055495

Anonymous 10/30/25(Thu)14:39:04 No.107055495

>>107055480
at this rate they'll release 5 before 4.6 air

Anonymous
10/30/25(Thu)14:39:53 No.107055503

Anonymous 10/30/25(Thu)14:39:53 No.107055503

>>107055495
wen 5 air? two weeks after?

Anonymous
10/30/25(Thu)14:40:35 No.107055508

Anonymous 10/30/25(Thu)14:40:35 No.107055508

>>107055495
do not unto ungratefuls

Anonymous
10/30/25(Thu)14:41:18 No.107055520

Anonymous 10/30/25(Thu)14:41:18 No.107055520

>>107055495
I wonder if 5 will have that ocr attention thing since they basically got forced to publish because deepseek was onto the same thing.

Anonymous
10/30/25(Thu)14:47:32 No.107055577

Anonymous 10/30/25(Thu)14:47:32 No.107055577

>>107055365
tell me at least

Anonymous
10/30/25(Thu)14:54:36 No.107055647

Anonymous 10/30/25(Thu)14:54:36 No.107055647

vLLM KV cache auto calculation is really shitty. Even for a small model (3B) it wastes around 1GB VRAM.

Anonymous
10/30/25(Thu)14:55:37 No.107055654

Anonymous 10/30/25(Thu)14:55:37 No.107055654

>>107055365
>1. There is no God.
*Tip fedora* Yep, you need to go back

Anonymous
10/30/25(Thu)15:27:15 No.107055937

Anonymous 10/30/25(Thu)15:27:15 No.107055937

File: 1630775218061.gif (3.57 MB, 498x498)

3.57 MB GIF

>>107049649
>hmm how i can make this all about my vocaloid slop

Anonymous
10/30/25(Thu)15:31:35 No.107055970

Anonymous 10/30/25(Thu)15:31:35 No.107055970

>want to try out a version of my preset without my extensive collection of token biases
>save preset
>save preset again to make sure I saved the preset
>make a clone, delete all the biases, try it, tldr it's mid
>go back to the original preset
>all the biases are gone
fuck this piece of shit software

Anonymous
10/30/25(Thu)15:31:59 No.107055972

Anonymous 10/30/25(Thu)15:31:59 No.107055972

>>107055937
I have him filtered by just hiding posts without text

Anonymous
10/30/25(Thu)15:32:53 No.107055980

Anonymous 10/30/25(Thu)15:32:53 No.107055980

>>107055970
>he didn't export the json

Anonymous
10/30/25(Thu)15:34:25 No.107055989

Anonymous 10/30/25(Thu)15:34:25 No.107055989

>>107055937
not your hugbox cry more

Anonymous
10/30/25(Thu)15:35:38 No.107055999

Anonymous 10/30/25(Thu)15:35:38 No.107055999

>>107049667
>https://www.1x.tech/neo
>For any chore it doesn’t know, you can schedule a 1X Expert to guide it,
lmao
imagine getting rid of that last bit of privacy left in your life and letting a remote jeet control a robot in your home
this is going to happen often because this is a grift and it's not autonomous enough to do anything (they say they will use all the data from the jeetcontrol to train it to become what they promise, but let me :doubt:)
remember the amazon autonomous stores?
https://archive.is/E7AB8
>Amazon's Just Walk Out technology relies on hundreds of workers in India watching you shop

Anonymous
10/30/25(Thu)15:37:53 No.107056017

Anonymous 10/30/25(Thu)15:37:53 No.107056017

>>107055999
>hundreds of workers in India watching you shop
Why haven't I seen any ai gemmies depicting that

Anonymous
10/30/25(Thu)15:55:05 No.107056119

Anonymous 10/30/25(Thu)15:55:05 No.107056119

File: file.png (204 KB, 512x543)

204 KB PNG

>>107052534
>https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
>Kimi-Linear-Base 48B 3B 1M Hugging Face
>Kimi-Linear-Instruct 48B 3B 1M Hugging Face
1 million billion trillion quadrillion gorillion killion context

Anonymous
10/30/25(Thu)15:59:24 No.107056154

Anonymous 10/30/25(Thu)15:59:24 No.107056154

>>107056119
>NoLima 32k 40%

Anonymous
10/30/25(Thu)16:04:37 No.107056193

Anonymous 10/30/25(Thu)16:04:37 No.107056193

>>107055431
>>107055365
nice larp, made me kek
you're a nobody, suck my cock
t. nobody

Anonymous
10/30/25(Thu)16:16:32 No.107056280

Anonymous 10/30/25(Thu)16:16:32 No.107056280

File: QwenWeenieTest.png (753 KB, 1442x1686)

753 KB PNG

this is /lmg/. please post screenshots of using models locally.
model tested: mradermacher/Qwen3-VL-32B-Thinking-Q6_K.gguf

Anonymous
10/30/25(Thu)16:17:35 No.107056296

Anonymous 10/30/25(Thu)16:17:35 No.107056296

>>107056119
I assume this is just testing for a big kimi with linear attention

Anonymous
10/30/25(Thu)16:22:20 No.107056347

Anonymous 10/30/25(Thu)16:22:20 No.107056347

>>107056325
>>107056325
>>107056325

llama.cpp CUDA dev !!yhbFjk57TDr
10/30/25(Thu)16:24:01 No.107056369

llama.cpp CUDA dev !!yhbFjk57TDr 10/30/25(Thu)16:24:01 No.107056369

>>107054141
I didn't read the paper so I don't know.
My general opinion about new and revolutionary techniques to replace transformers is to assume that they're a meme until proven otherwise.

Anonymous
10/30/25(Thu)16:42:21 No.107056551

Anonymous 10/30/25(Thu)16:42:21 No.107056551

posting here so the retarded captcha timer will let me post on the other thread

Anonymous
10/30/25(Thu)17:15:19 No.107056847

Anonymous 10/30/25(Thu)17:15:19 No.107056847

>>107055431
See you tomorrow, be well.

Anonymous
10/30/25(Thu)17:26:51 No.107056961

Anonymous 10/30/25(Thu)17:26:51 No.107056961

>>107054436
>so they put an OAI guy in charge of mid/post-train, aka distill-from-gpt-oss
there is zero chance they are distilling from gpt-oss, not even meta is that stupid

Anonymous
10/30/25(Thu)18:41:31 No.107057561

Anonymous 10/30/25(Thu)18:41:31 No.107057561

File: Carl_Brutananadilewski.png (2.6 MB, 1920x1080)

2.6 MB PNG

>>107055577
Ehh, fuck it.

So... how can you know that a vacuum is a vacuum without recording that it's a vacuum?

The secret to unraveling the fabric of reality lies in the answer.

Later.

Anonymous
10/30/25(Thu)18:46:13 No.107057612

Anonymous 10/30/25(Thu)18:46:13 No.107057612

>>107057561
>recording
You mean measuring?
Because if so, using the same method you can use to extract energy out of a blackhole without relying on hawking radiation.
That's not new.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.