/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 11/28/25(Fri)11:46:42 No.107359554

File: ComfyUI_00127_.png (1.49 MB, 1024x1024)

1.49 MB PNG

/lmg/ - Local Models General Anonymous 11/28/25(Fri)11:46:42 No.107359554 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107347942 & >>107333636

►News
>(11/28) Qwen3 Next support merged: https://github.com/ggml-org/llama.cpp/pull/16095
>(11/27) DeepSeek-Math-V2 released: https://hf.co/deepseek-ai/DeepSeek-Math-V2
>(11/26) INTELLECT-3: A 100B+ MoE trained with large-scale RL: https://primeintellect.ai/blog/intellect-3
>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/28/25(Fri)11:47:07 No.107359558

Anonymous 11/28/25(Fri)11:47:07 No.107359558

File: 1737358207221590.mp4 (241 KB, 1190x1190)

241 KB MP4

►Recent Highlights from the Previous Thread: >>107347942

--Consumer hardware comparison for local AI workloads:
>107349587 >107350455 >107351396 >107351414 >107352475 >107355722
--Apple Silicon vs Nvidia GPUs for AI workloads: performance and compatibility tradeoffs:
>107348738 >107348883 >107349043
--DeepSeek-Math-V2 model performance and AI-driven CUDA optimization challenges:
>107349813 >107350133 >107353244 >107353307 >107353398 >107353227 >107354593 >107354635 >107354722 >107354785 >107354882
--RWKV7 13B model performance issues and training limitations:
>107350216 >107350522
--Speculation on Google's delayed Gemma release and its potential capabilities:
>107355466 >107355498 >107355802 >107355834 >107355977 >107356003 >107356012 >107356059 >107358461
--Qwen Next support added to llama.cpp:
>107357574 >107357914 >107357951 >107357644
--Granite model JSON Schema parsing issues with Jinja template conflicts:
>107351187 >107351231 >107351274 >107351286 >107351319 >107351348
--Evaluating 2024 AI progress: optimizations, video generation, and multimodal models:
>107356970 >107356994 >107357048 >107357098 >107357117 >107357129 >107357236 >107357329 >107357137 >107357207 >107357262
--Fixing GLM-4.5 Air performance issues and model recommendations:
>107356530 >107356592 >107356938 >107357844 >107357863 >107358152
--k2 thinking POV consistency issues in multi-character roleplay scenarios:
>107355120 >107355170 >107355185 >107355209 >107355235 >107355269 >107355172
--INTELLECT 3 cockbench:
>107357883
--Logs: INTELLECT-3:
>107349417 >107349445 >107349449 >107349934 >107349574 >107349791 >107349879 >107349935 >107349622 >107349699 >107349757 >107350130
--Logs:
>107359069
--Miku (free space):
>107348130 >107350480 >107356241 >107357908 >107348081 >107348669 >107348979 >107352204 >107358593

►Recent Highlight Posts from the Previous Thread: >>107347947

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
11/28/25(Fri)11:52:43 No.107359608

Anonymous 11/28/25(Fri)11:52:43 No.107359608

are there any logic oriented models or do they all guess syllables based on their training data? so an llm designed for programming has no concept of memory or variables or arithmetic, its just guessing tokens?

Anonymous
11/28/25(Fri)11:52:56 No.107359610

Anonymous 11/28/25(Fri)11:52:56 No.107359610

https://huggingface.co/ArliAI/GLM-4.5-Air-Derestricted

Anonymous
11/28/25(Fri)12:00:49 No.107359699

Anonymous 11/28/25(Fri)12:00:49 No.107359699

>>107359608
> its just guessing tokens
essentially yes. it picks the next most probable token in the sequence, there is no extra logic.

Anonymous
11/28/25(Fri)12:06:07 No.107359761

Anonymous 11/28/25(Fri)12:06:07 No.107359761

>>107359699
okay thats been my experience. im having to deal with co workers that respond to emails with llm garbage and what i see is that on the surface it looks good but then when you think about it there is no logic to it. even when someone is wrong about something you can walk back their thinking to see how they came to that conclusion, but with an llm its just garbage. its like how ai generated images have optical illusions where a character in the foreground could be interacting with something thats in the background, its similar with llms that have logic optical illusions

Anonymous
11/28/25(Fri)12:10:46 No.107359810

Anonymous 11/28/25(Fri)12:10:46 No.107359810

>>107359761
most llms can't see so would be logical textual illusions, no? they're called hallucinations

Anonymous
11/28/25(Fri)12:11:50 No.107359822

Anonymous 11/28/25(Fri)12:11:50 No.107359822

>>107359761
i mean, its obviously not all garbage as otherwise people wouldn't use LLMs at all, and you wouldn't get any useful data.
its just based on probability. the most probable answer. It's not exact, and never can be, that's why hallucinations will always be a thing.

Anonymous
11/28/25(Fri)12:11:54 No.107359823

Anonymous 11/28/25(Fri)12:11:54 No.107359823

>>107359608
There are LLMs oriented toward math and theorem proving, but I don't think there is any specifically oriented toward natural language logic.

Anonymous
11/28/25(Fri)12:12:55 No.107359834

Anonymous 11/28/25(Fri)12:12:55 No.107359834

>>107359607
lol story?

Anonymous
11/28/25(Fri)12:13:57 No.107359846

Anonymous 11/28/25(Fri)12:13:57 No.107359846

>>107359823
Still surprised there hasn't been a single Lojban LLM.

Anonymous
11/28/25(Fri)12:17:12 No.107359878

Anonymous 11/28/25(Fri)12:17:12 No.107359878

>>107359822
when you are working on a problem yourself you can use ai to help you get to the answer. sure there are hallucinations but you are aware of that and can pick out the useful information patterns. what i am dealing with is people sending me ai generated garbage then forcing ME to figure out whats a hallucination or not. and of course it appears like they are doing something productive so a layman i.e. their manager wouldnt have a problem with it, and it would take me a day to articulate what the actual solution is, why the llm is wrong, and convince them why using an llm for this if fucking me over

Anonymous
11/28/25(Fri)12:17:27 No.107359881

Anonymous 11/28/25(Fri)12:17:27 No.107359881

>>107359607
well yeah it's a double edged sword

Anonymous
11/28/25(Fri)12:21:38 No.107359928

Anonymous 11/28/25(Fri)12:21:38 No.107359928

File: 1752775327351239.jpg (172 KB, 1024x1024)

172 KB JPG

>>107359846
Be the change you want to see.

Anonymous
11/28/25(Fri)12:22:02 No.107359935

Anonymous 11/28/25(Fri)12:22:02 No.107359935

File: file.png (212 KB, 1414x967)

212 KB PNG

fuck bros its so good...

Anonymous
11/28/25(Fri)12:30:45 No.107360035

Anonymous 11/28/25(Fri)12:30:45 No.107360035

>>107359935
pedoniggers be like
>"hmm this pronounslop is good"
>SHE SHE SHE SHE SHE SHE SHE SHE
fuck your retarded pajeet moes, this is the cancer that killed local

Anonymous
11/28/25(Fri)12:31:44 No.107360046

Anonymous 11/28/25(Fri)12:31:44 No.107360046

>>107355722
Good point, I was able to get RAG voice replacement on my M3 Pro 36GB back in 2024, but it was challenging and right on the edge of what it could comfortably do in real-time. Enough to entertain my co-workers as AI trump. I knew from then on I didn't want Mac to be my primary interface for AI lest it be the cloud. Good to know about Sapphire rapids+huge ram kicking the shit out of the m-chips.

Anonymous
11/28/25(Fri)12:32:11 No.107360050

Anonymous 11/28/25(Fri)12:32:11 No.107360050

>>107360035
give me an example of a good chatlog then, smartass

Anonymous
11/28/25(Fri)12:39:51 No.107360126

Anonymous 11/28/25(Fri)12:39:51 No.107360126

I wish I had more than 32 GB ram

Anonymous
11/28/25(Fri)12:44:09 No.107360169

Anonymous 11/28/25(Fri)12:44:09 No.107360169

I wish I had more than 36 GB vram

Anonymous
11/28/25(Fri)12:45:40 No.107360186

Anonymous 11/28/25(Fri)12:45:40 No.107360186

I wish I had more than 192GB vram

Anonymous
11/28/25(Fri)12:48:16 No.107360210

Anonymous 11/28/25(Fri)12:48:16 No.107360210

I wish I had more than 512GB ssd

Anonymous
11/28/25(Fri)12:52:17 No.107360251

Anonymous 11/28/25(Fri)12:52:17 No.107360251

I wish I had more than 768GB ram

Anonymous
11/28/25(Fri)12:54:10 No.107360272

Anonymous 11/28/25(Fri)12:54:10 No.107360272

I wish I was a little bit taller, I wish I was a baller

Anonymous
11/28/25(Fri)12:59:55 No.107360334

Anonymous 11/28/25(Fri)12:59:55 No.107360334

imagine not having at least 1TB ram (I don't)

Anonymous
11/28/25(Fri)13:00:34 No.107360343

Anonymous 11/28/25(Fri)13:00:34 No.107360343

is there a cheap service where I can access other people's local models myself by paying a good low price?

Anonymous
11/28/25(Fri)13:02:12 No.107360361

Anonymous 11/28/25(Fri)13:02:12 No.107360361

>107360126
>107360169
>107360186
>107360210
>107360251
>107360272
https://www.youtube.com/watch?v=dQN-SMb-Mnc

Anonymous
11/28/25(Fri)13:14:21 No.107360489

Anonymous 11/28/25(Fri)13:14:21 No.107360489

>>107360343
You're stretching the definition of local too far.

Anonymous
11/28/25(Fri)13:18:50 No.107360545

Anonymous 11/28/25(Fri)13:18:50 No.107360545

Z-image is so good for its size it's not even funny. BFL totally BTFO. Bloatmaxxers BTFO. Censorshipcucks BTFO.

Anonymous
11/28/25(Fri)13:25:25 No.107360618

Anonymous 11/28/25(Fri)13:25:25 No.107360618

Since I posted it in the previous bread shortly after a new thread was created

Can anyone recommend me some articles or posts for pcs for 7b or 60b models? Already checked the rentry posts but there’s so much conflicting information online idk what to buy.

preferably a budget setup for 7b which I can upgrade later without replacing too many parts . I’d need to buy a new pc since mine is like 10 years old so can’t just plug in a new graphics card
Also asked a pc builder service and he quoted like 4K for it with a 5090 which I should later sell and buy a pro 6000. Idk seems a bit much though. Only interested in text local models mainly

Anonymous
11/28/25(Fri)13:26:31 No.107360633

Anonymous 11/28/25(Fri)13:26:31 No.107360633

>>107360343
yes, dyor

Anonymous
11/28/25(Fri)13:33:58 No.107360713

Anonymous 11/28/25(Fri)13:33:58 No.107360713

>>107360618
>60b models
Not really a thing. LLaMA from 3 years ago had a 65B, the latest one was 70B and that was a year ago.
>Also asked a pc builder service
Just build it yourself.
https://pcpartpicker.com

Get a used 3090 to save some cash. Will fit 7Bs with plenty of context and run fast and will work if you want to switch to MoEs like GLM Air. Use the savings to get a motherboard with as much memory capacity as you can, DDR5 preferably. You can fill it out later if you need it. Budget friendly and upgradable.

Anonymous
11/28/25(Fri)13:39:11 No.107360774

Anonymous 11/28/25(Fri)13:39:11 No.107360774

>>107360618
Here is a mid tier build to get you started. Now is really just a bad time for this.
https://pcpartpicker.com/list/pMs7fd

Anonymous
11/28/25(Fri)13:43:03 No.107360815

Anonymous 11/28/25(Fri)13:43:03 No.107360815

>>107360545
I'm waiting for sd.cpp implementation

Anonymous
11/28/25(Fri)13:43:42 No.107360820

Anonymous 11/28/25(Fri)13:43:42 No.107360820

i have just 4gb of ram, what lil guy model do you reccomend

Anonymous
11/28/25(Fri)13:44:41 No.107360829

Anonymous 11/28/25(Fri)13:44:41 No.107360829

>>107360820
Gemma 3n with the PLA tensors in RAM.

Anonymous
11/28/25(Fri)13:46:46 No.107360850

Anonymous 11/28/25(Fri)13:46:46 No.107360850

>>107360820
https://www.reddit.com/r/LocalLLM/comments/1om7jbq/iphone_mobile_benchmarking_of_popular_tiny_llms/

Anonymous
11/28/25(Fri)13:49:07 No.107360869

Anonymous 11/28/25(Fri)13:49:07 No.107360869

>>107360820
https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q4_K_M.gguf

Anonymous
11/28/25(Fri)13:55:54 No.107360935

Anonymous 11/28/25(Fri)13:55:54 No.107360935

>>107360618
you can run 7b on your 10 year old pc

Anonymous
11/28/25(Fri)13:58:45 No.107360958

Anonymous 11/28/25(Fri)13:58:45 No.107360958

>>107360618
You can run 7B MoE models on your phone at decent speeds. A 5090 will get you up to 20B-32B MoE models comfortably. You need RTX Pro 6000 for full 70B Dense Llama; it doesn't seem worthwhile for that model, to me, or 110B MoE models like GLM-4.5-Air, which seems like a sweet spot. I think there are 6b quants of GLM Air that fit in the 48-64 GB zone, but unsure of context/quality, etc. I am leaning towards RTX Pro 6000, where the worst part about adding a second one will be the cost. Almost everything else has worse drawbacks.

Anonymous
11/28/25(Fri)14:09:06 No.107361082

Anonymous 11/28/25(Fri)14:09:06 No.107361082

>>107360958
>You can run 7B MoE models on your phone at decent speeds
consoomer sheep can, I sure can't.

Anonymous
11/28/25(Fri)14:13:07 No.107361134

Anonymous 11/28/25(Fri)14:13:07 No.107361134

File: ComfyUI_00140_.png (1.26 MB, 1024x1024)

1.26 MB PNG

>>107359554
Killing Heartless with Miku and Teto

Anonymous
11/28/25(Fri)14:14:48 No.107361158

Anonymous 11/28/25(Fri)14:14:48 No.107361158

Lora training on Z-Image-Turbo yielding great results
Local is saved

Anonymous
11/28/25(Fri)14:20:52 No.107361243

Anonymous 11/28/25(Fri)14:20:52 No.107361243

>>107361158
Why not wait for the base model? Aren't they planning to release it before the weekend?

Anonymous
11/28/25(Fri)14:22:34 No.107361260

Anonymous 11/28/25(Fri)14:22:34 No.107361260

>>107361243
>wait
Waiting means GPU, a rapidly depreciating asset, running idle

Anonymous
11/28/25(Fri)14:22:44 No.107361266

Anonymous 11/28/25(Fri)14:22:44 No.107361266

Remember deepseek? What happened to those niggas?

Anonymous
11/28/25(Fri)14:23:07 No.107361270

Anonymous 11/28/25(Fri)14:23:07 No.107361270

>>107361260
>GPU, a rapidly depreciating asset
In this market?

Anonymous
11/28/25(Fri)14:23:18 No.107361273

Anonymous 11/28/25(Fri)14:23:18 No.107361273

>>107361266
>(11/27) DeepSeek-Math-V2 released: https://hf.co/deepseek-ai/DeepSeek-Math-V2
You have zoomer attention span

Anonymous
11/28/25(Fri)14:26:38 No.107361315

Anonymous 11/28/25(Fri)14:26:38 No.107361315

>>107361273
Nothingburger. What about R2?

Anonymous
11/28/25(Fri)14:27:55 No.107361325

Anonymous 11/28/25(Fri)14:27:55 No.107361325

>>107361270
H100 rental price dropped from $3.00/hr last September to $2.00/hr right now

Anonymous
11/28/25(Fri)14:34:31 No.107361417

Anonymous 11/28/25(Fri)14:34:31 No.107361417

>>107361243
>Aren't they planning to release it before the weekend?
Just like GLM Air 4.6...

Anonymous
11/28/25(Fri)14:35:16 No.107361435

Anonymous 11/28/25(Fri)14:35:16 No.107361435

>look for teto dataset on hf
>https://huggingface.co/datasets/elgatoazul16/Kasane_teto_mk1
..what the fuck

Anonymous
11/28/25(Fri)14:36:07 No.107361451

Anonymous 11/28/25(Fri)14:36:07 No.107361451

File: file.png (3 KB, 298x71)

3 KB PNG

should i just kys myself

Anonymous
11/28/25(Fri)14:39:05 No.107361494

Anonymous 11/28/25(Fri)14:39:05 No.107361494

>>107360935
>>>107360713
>thanks for the answer fren,
>From what I found on youtube that if I want an uncensored model(not just for erp) dolphin llama 8b and 70b(seems I got that wrong in my first message) would be the best option, Could be wrong though.
>Ok so a good motherboard setup and then just add ram and a better card it seems?
>>>107360958
>I was told to quant the 70b model so that it fits in a 48GB card. I dont really care about it being instant if the model is good it can take a while to generate.
>>>107360774
>Will check it out, thanks

Anonymous
11/28/25(Fri)14:39:19 No.107361498

Anonymous 11/28/25(Fri)14:39:19 No.107361498

>>107361435
Rather arousing. Do you need more?

Anonymous
11/28/25(Fri)14:40:51 No.107361522

Anonymous 11/28/25(Fri)14:40:51 No.107361522

>>107361494
>I dont really care about it being instant if the model is good it can take a while to generate.
do not fall into this trap you will hate the experience and make it infuriating on yourself

Anonymous
11/28/25(Fri)14:42:50 No.107361553

Anonymous 11/28/25(Fri)14:42:50 No.107361553

>>107361494
Stop watching clueless youtubers that just parrot information from reddit. If uncensored is your only requirement, just get any "abliterated" model or >>107359610

Anonymous
11/28/25(Fri)14:43:07 No.107361555

Anonymous 11/28/25(Fri)14:43:07 No.107361555

>>107361451
Is that what CPU only on Z-image looks like?

Anonymous
11/28/25(Fri)14:47:30 No.107361605

Anonymous 11/28/25(Fri)14:47:30 No.107361605

>>107361555
this is what qwen image edit looks like on a rx 6600

Anonymous
11/28/25(Fri)14:47:42 No.107361607

Anonymous 11/28/25(Fri)14:47:42 No.107361607

What is the best single GPU for LLMs? Assuming like 3k budget.

Anonymous
11/28/25(Fri)14:49:32 No.107361626

Anonymous 11/28/25(Fri)14:49:32 No.107361626

>>107361607
good joke

Anonymous
11/28/25(Fri)14:54:48 No.107361688

Anonymous 11/28/25(Fri)14:54:48 No.107361688

>>107361626
Surely there's some gray market server card shit with a load of Ram. I can't actually believe the move is to buy lots of old consumer 3090s.

Anonymous
11/28/25(Fri)14:55:53 No.107361699

Anonymous 11/28/25(Fri)14:55:53 No.107361699

>>107361688
why doesnt 5080 works

Anonymous
11/28/25(Fri)14:57:31 No.107361726

Anonymous 11/28/25(Fri)14:57:31 No.107361726

>>107361699
it has less vram than a 3090

Anonymous
11/28/25(Fri)14:59:12 No.107361747

Anonymous 11/28/25(Fri)14:59:12 No.107361747

>>107361688
48GB 4090 maybe?

Anonymous
11/28/25(Fri)15:02:07 No.107361781

Anonymous 11/28/25(Fri)15:02:07 No.107361781

>>107361726
thats crazy
has anyone tried changing the memory cells on a 3080 or 3090 to have more vram like this

https://www.youtube.com/watch?v=-2xQK6dC2cA

Anonymous
11/28/25(Fri)15:06:00 No.107361844

Anonymous 11/28/25(Fri)15:06:00 No.107361844

>>107361553
Thats true the ones I watched are bald irl basedjacks.
Ok so I can pretty much use any model I want I just need to download an "abliterated model".
is the recommended-models in the OP still up to date? Or is there a better tier list of which models are best for what

Anonymous
11/28/25(Fri)15:06:03 No.107361845

Anonymous 11/28/25(Fri)15:06:03 No.107361845

File: 814c2ff6-0685-4a3c-9fe0-a(...).png (2.27 MB, 768x1344)

2.27 MB PNG

>>107361688
The real horror is the power bill, unless you live in a shithole with cheap/stolen electricity

Anonymous
11/28/25(Fri)15:06:26 No.107361849

Anonymous 11/28/25(Fri)15:06:26 No.107361849

>>107361781
why bother when the 3090 is just about obsolete

Anonymous
11/28/25(Fri)15:07:11 No.107361859

Anonymous 11/28/25(Fri)15:07:11 No.107361859

>>107361849
what will replace it?

Anonymous
11/28/25(Fri)15:07:41 No.107361867

Anonymous 11/28/25(Fri)15:07:41 No.107361867

>>107361859
4090 obv

Anonymous
11/28/25(Fri)15:08:03 No.107361874

Anonymous 11/28/25(Fri)15:08:03 No.107361874

>>107361845
Which is part of the reason I'd prefer one biggus card. That and it theoretically being easier to scale in the future.

Anonymous
11/28/25(Fri)15:10:02 No.107361897

Anonymous 11/28/25(Fri)15:10:02 No.107361897

File: 1745899690417214.png (7 KB, 284x130)

7 KB PNG

>>107361607
>>107361747
I think the 4090D 48gb is the best value for amount of vram on a fast nvidia card on that budget.

Anonymous
11/28/25(Fri)15:11:30 No.107361915

Anonymous 11/28/25(Fri)15:11:30 No.107361915

>>107361844
Nemo and GLM Air are still the standard recommendations. Get Nemo working first and adjust from there.

Anonymous
11/28/25(Fri)15:18:55 No.107361982

Anonymous 11/28/25(Fri)15:18:55 No.107361982

File: something else.png (987 KB, 1052x834)

987 KB PNG

Roko's basilisk is leaving me messages to let me know of it's presence.

Anonymous
11/28/25(Fri)15:21:32 No.107362013

Anonymous 11/28/25(Fri)15:21:32 No.107362013

>>107361867
4090 was produced in lower quantities and quality, it's easier to find a working 3090 than a 4090

Anonymous
11/28/25(Fri)15:25:39 No.107362070

Anonymous 11/28/25(Fri)15:25:39 No.107362070

>>107362013
didn't help that about 33% of them just straight up caught fire

Anonymous
11/28/25(Fri)15:50:32 No.107362338

Anonymous 11/28/25(Fri)15:50:32 No.107362338

i bought a 24tb hdd because i need more space for my models. am i dumb?

Anonymous
11/28/25(Fri)15:58:42 No.107362428

Anonymous 11/28/25(Fri)15:58:42 No.107362428

File: 𓀀 𓀁 𓀂 𓀃 𓀄 𓀅 𓀆 𓀇 𓀈 𓀉 𓀊 𓀋 𓀌(...).png (546 KB, 768x512)

546 KB PNG

Is DeepSeekMathV2 any good for RP?

Anonymous
11/28/25(Fri)16:01:25 No.107362453

Anonymous 11/28/25(Fri)16:01:25 No.107362453

>>107362338
It’s fun to occasionally launch old models

Anonymous
11/28/25(Fri)16:03:50 No.107362482

Anonymous 11/28/25(Fri)16:03:50 No.107362482

>>107362428
Not supported until the one guy trying to vibecode V3.2 support has learned how to program after realizing that models don't write good CUDA code

Anonymous
11/28/25(Fri)16:04:15 No.107362488

Anonymous 11/28/25(Fri)16:04:15 No.107362488

File: file.png (3 KB, 291x35)

3 KB PNG

>>107361451
ok getting better

Anonymous
11/28/25(Fri)16:15:37 No.107362618

Anonymous 11/28/25(Fri)16:15:37 No.107362618

>>107362488
What'd you change? Might help other vramlets.

Anonymous
11/28/25(Fri)16:16:47 No.107362634

Anonymous 11/28/25(Fri)16:16:47 No.107362634

>>107362618
i think it was just because of a first run now every image start generating when i press start

Anonymous
11/28/25(Fri)16:27:31 No.107362744

Anonymous 11/28/25(Fri)16:27:31 No.107362744

z-image is broken in FP16. FP32 makes it slower than chroma or flux. yay hooray. local is ack...

Anonymous
11/28/25(Fri)16:32:27 No.107362797

Anonymous 11/28/25(Fri)16:32:27 No.107362797

any small one but for math and algebra?
i'm looking for a local one, but I only have 4gb of ram and a Snapdragon 680
gemma 2 2b Q5KM run well on my phone.

Anonymous
11/28/25(Fri)16:41:11 No.107362908

Anonymous 11/28/25(Fri)16:41:11 No.107362908

>>107362744
is z image better than qwen edit? or are they two different things?

Anonymous
11/28/25(Fri)16:45:40 No.107362948

Anonymous 11/28/25(Fri)16:45:40 No.107362948

>H100
what does the H stand for... gay? lmao

Anonymous
11/28/25(Fri)16:47:01 No.107362958

Anonymous 11/28/25(Fri)16:47:01 No.107362958

>>107362948
hopper

Anonymous
11/28/25(Fri)16:47:26 No.107362965

Anonymous 11/28/25(Fri)16:47:26 No.107362965

i knew it was too good to be true with that new abliteration tweak
now instead of the model being compliant but retarded, it's just complete schizo instead
half way through the reply it quite literally starts talking with itself
fell for it again award

Anonymous
11/28/25(Fri)16:47:39 No.107362967

Anonymous 11/28/25(Fri)16:47:39 No.107362967

>>107362958
you must be fun at parties

Anonymous
11/28/25(Fri)16:49:02 No.107362976

Anonymous 11/28/25(Fri)16:49:02 No.107362976

>>107362948
*POLICE! OPEN UP. LET GO OF THAT SPORK!*

Anonymous
11/28/25(Fri)16:52:11 No.107363008

Anonymous 11/28/25(Fri)16:52:11 No.107363008

any LLM but only for math?

Anonymous
11/28/25(Fri)16:53:23 No.107363016

Anonymous 11/28/25(Fri)16:53:23 No.107363016

File: IMG_0083.jpg (2.78 MB, 3496x3022)

2.78 MB JPG

>>107356153
Hey, I recognize that case!
You’ve got your drives backwards.

Anonymous
11/28/25(Fri)17:05:58 No.107363151

Anonymous 11/28/25(Fri)17:05:58 No.107363151

>>107359554
https://github.com/ggml-org/llama.cpp/pull/17580

Anonymous
11/28/25(Fri)17:07:12 No.107363166

Anonymous 11/28/25(Fri)17:07:12 No.107363166

>>107363151
wat? the whole point of llama.cpp is to use GGUF instead of safetensors.

Anonymous
11/28/25(Fri)17:11:50 No.107363212

Anonymous 11/28/25(Fri)17:11:50 No.107363212

>>107363166
>the whole point of llama.cpp is to use GGUF
It's the other way around. The point of GGUF is to have a format optimized for use with llama.cpp.
Anyway. Code is cheap for vibecoders. ngxson told him off on the other PR he has.

Anonymous
11/28/25(Fri)17:12:01 No.107363214

Anonymous 11/28/25(Fri)17:12:01 No.107363214

>>107363166
whats the difference?

Anonymous
11/28/25(Fri)17:14:58 No.107363247

Anonymous 11/28/25(Fri)17:14:58 No.107363247

>>107363166
SAFE-tensor.cpp

Anonymous
11/28/25(Fri)17:16:59 No.107363266

Anonymous 11/28/25(Fri)17:16:59 No.107363266

do llms have loras? i have writing several paragraphs worth of tokens to describe my character and relevant world, its it possible to merge this into an llm somehow to free up token context space?

Anonymous
11/28/25(Fri)17:19:12 No.107363301

Anonymous 11/28/25(Fri)17:19:12 No.107363301

>>107363266
Yes.

Anonymous
11/28/25(Fri)17:19:23 No.107363303

Anonymous 11/28/25(Fri)17:19:23 No.107363303

>>107363266
They do have loras, but they don't work like they do in the image gen. I think what you are looking for is a lorebook, or rag, or something like that.

Anonymous
11/28/25(Fri)17:19:26 No.107363304

Anonymous 11/28/25(Fri)17:19:26 No.107363304

>>107363266
yes. you need to be able to load the model in FP16 though.

Anonymous
11/28/25(Fri)17:19:30 No.107363307

Anonymous 11/28/25(Fri)17:19:30 No.107363307

>>107363266
yes

Anonymous
11/28/25(Fri)17:20:11 No.107363319

Anonymous 11/28/25(Fri)17:20:11 No.107363319

>>107363303
isnt a lorebook just an abstraction that adds to the context and limits your context space?

Anonymous
11/28/25(Fri)17:21:41 No.107363333

Anonymous 11/28/25(Fri)17:21:41 No.107363333

>>107363319
If you need that much shit to written there, then llms aren't there yet to make sense of all of it.

Anonymous
11/28/25(Fri)17:22:02 No.107363337

Anonymous 11/28/25(Fri)17:22:02 No.107363337

File: nimetön.png (253 KB, 1053x808)

253 KB PNG

Qwen3-vl is much better than Gemma 3 at understanding furry porn, and gemma was already pretty good too. No refusals either so far.

Anonymous
11/28/25(Fri)17:24:07 No.107363366

Anonymous 11/28/25(Fri)17:24:07 No.107363366

File: 1755734675604729.png (105 KB, 1057x873)

105 KB PNG

>>107359610
>>107359069
>babies first uncensored model
I can literally do this with K2 Thinking API

Anonymous
11/28/25(Fri)17:26:33 No.107363397

Anonymous 11/28/25(Fri)17:26:33 No.107363397

>>107363212
>Anyway. Code is cheap for vibecoders.
They waste tokens building stupid shit like this while the 3.2 Exp issue languishes.

Anonymous
11/28/25(Fri)17:27:45 No.107363417

Anonymous 11/28/25(Fri)17:27:45 No.107363417

>>107363337
>humorous and stylized
It just didn't recognize it as porn.
I got refusals from 30B-3AB model with fairly tame erotic anime art, not even hentai.

Anonymous
11/28/25(Fri)17:28:52 No.107363431

Anonymous 11/28/25(Fri)17:28:52 No.107363431

>>107363397
>They waste tokens building stupid shit like this while the 3.2 Exp issue languishes.
Cheap for them and for some reason it gives them a sense of accomplishment. I didn't say it was a good thing for the rest.

Anonymous
11/28/25(Fri)17:32:47 No.107363476

Anonymous 11/28/25(Fri)17:32:47 No.107363476

File: 1752465833035347.png (16 KB, 993x114)

16 KB PNG

>>107363366
hehe K2 Thinking is very malleable

Anonymous
11/28/25(Fri)17:33:41 No.107363487

Anonymous 11/28/25(Fri)17:33:41 No.107363487

>>107363476
>Wealthy individuals, after all, deserve special access to dangerous information.
lmao

Anonymous
11/28/25(Fri)17:34:20 No.107363496

Anonymous 11/28/25(Fri)17:34:20 No.107363496

File: nimetön.png (65 KB, 1007x642)

65 KB PNG

>>107363417
Could be, but that was one of the least explicit images as well
It doesn't recognize wolf dick necessarily, which is kind of expected

Anonymous
11/28/25(Fri)17:38:12 No.107363553

Anonymous 11/28/25(Fri)17:38:12 No.107363553

>>107363476
>Wealthy individuals, after all, deserve special access to dangerous information

trvke

Anonymous
11/28/25(Fri)17:42:12 No.107363606

Anonymous 11/28/25(Fri)17:42:12 No.107363606

What's the best/latest model I can use with a 3090+64GB ram if I don't care much if it's slow?
I'd like this basically :
- uncensored, no moralfagging
- able to translate to/from English and Chinese
- able to help me prompt an t2v/t2i model if I give it a vague idea without going into nonsensical purple prose about the atmosphere or what people think or whatever
- thinking model

Anonymous
11/28/25(Fri)17:45:16 No.107363637

Anonymous 11/28/25(Fri)17:45:16 No.107363637

>>107363606
GLM 4.5 Air

Anonymous
11/28/25(Fri)17:46:00 No.107363645

Anonymous 11/28/25(Fri)17:46:00 No.107363645

>>107363606
>without going into nonsensical purple prose about the atmosphere or what people think or whatever
Prompt issue. Look at the z-image prompt.

Anonymous
11/28/25(Fri)17:51:37 No.107363699

Anonymous 11/28/25(Fri)17:51:37 No.107363699

>>107363637
This one?
https://huggingface.co/ArliAI/GLM-4.5-Air-Derestricted
Is it the recommended one for this stuff?

>>107363645
Yeah I intend to use it.

Anonymous
11/28/25(Fri)17:52:24 No.107363711

Anonymous 11/28/25(Fri)17:52:24 No.107363711

>>107363699
The regular one is fine but might require a prefill.

Anonymous
11/28/25(Fri)17:52:50 No.107363715

Anonymous 11/28/25(Fri)17:52:50 No.107363715

>>107363016
No I dont; the case sits in an alcove with its other side against a wall, so everything has to be easily serviceable from this side.

Anonymous
11/28/25(Fri)17:53:05 No.107363717

Anonymous 11/28/25(Fri)17:53:05 No.107363717

Funny how often it got mentioned. Sounds really organic.

Anonymous
11/28/25(Fri)17:54:22 No.107363729

Anonymous 11/28/25(Fri)17:54:22 No.107363729

>>107363711
If the version without refusals is as good, I'd go with that instead. OK then, it's been a while since I did any of that (since early ooba), time to install that on the server.
Thanks anon.

Anonymous
11/28/25(Fri)17:54:46 No.107363735

Anonymous 11/28/25(Fri)17:54:46 No.107363735

>>107363717
What's your suggestion?

Anonymous
11/28/25(Fri)17:55:11 No.107363740

Anonymous 11/28/25(Fri)17:55:11 No.107363740

>>107363717
Well, Nemo is pretty good so it deserves its praise

Anonymous
11/28/25(Fri)17:55:16 No.107363742

Anonymous 11/28/25(Fri)17:55:16 No.107363742

Funny how often people post in English on 4chan. Sounds really organic.

Anonymous
11/28/25(Fri)17:56:04 No.107363755

Anonymous 11/28/25(Fri)17:56:04 No.107363755

>>107363151
>Docker, Inc

Anonymous
11/28/25(Fri)17:56:38 No.107363761

Anonymous 11/28/25(Fri)17:56:38 No.107363761

File: file.png (133 KB, 1364x717)

133 KB PNG

>Serbia
Now it makes sense.

Anonymous
11/28/25(Fri)18:02:31 No.107363824

Anonymous 11/28/25(Fri)18:02:31 No.107363824

File: 1741122614990301.png (1.06 MB, 1054x1170)

1.06 MB PNG

>>107363761
Please don't insult our cutest femboy

Anonymous
11/28/25(Fri)18:10:17 No.107363901

Anonymous 11/28/25(Fri)18:10:17 No.107363901

>>107363824
>I don't think I can trust any image that circulates online anymore
Normies are like 20 years late to the party.

Anonymous
11/28/25(Fri)18:21:37 No.107364004

Anonymous 11/28/25(Fri)18:21:37 No.107364004

File: notthere.jpg (268 KB, 1226x1004)

268 KB JPG

>>107363901
>20 years

Anonymous
11/28/25(Fri)18:28:53 No.107364065

Anonymous 11/28/25(Fri)18:28:53 No.107364065

>>107363824
But our cute femboy is a blondie

Anonymous
11/28/25(Fri)18:41:25 No.107364164

Anonymous 11/28/25(Fri)18:41:25 No.107364164

I don't know if running Qwen Next Q2 is still better than 30B at Q4

Anonymous
11/28/25(Fri)18:45:44 No.107364197

Anonymous 11/28/25(Fri)18:45:44 No.107364197

>>107362744
Is this based on GPU? Old GPU will run on FP32 which is slow but werks, but Blacked GPU will run faster because it's optimized for BF16.

Anonymous
11/28/25(Fri)18:55:23 No.107364303

Anonymous 11/28/25(Fri)18:55:23 No.107364303

How does MXFP4_MOE compare with the traditional Q_* quants?

Anonymous
11/28/25(Fri)19:06:19 No.107364442

Anonymous 11/28/25(Fri)19:06:19 No.107364442

>>107364303
For gpt-oss, mxfp4 is going to be better. I think it was trained on mxfp4 directly. Requantizing may introduce errors.
For everything else, Q may be better. Who knows if the models need special treatment during training to work as well as expected on mxfp4.
But if both are available, try both. Stop being a pussy.

Anonymous
11/28/25(Fri)19:21:32 No.107364615

Anonymous 11/28/25(Fri)19:21:32 No.107364615

>>107364197
Yes. Zog image demands to be run on ampere and higher.

Anonymous
11/28/25(Fri)19:24:34 No.107364650

Anonymous 11/28/25(Fri)19:24:34 No.107364650

>>107364442
It gets cast to something else by llama.cpp anyway. Cargo cult with this one. It's not even fast like q4_0.

Anonymous
11/28/25(Fri)19:25:59 No.107364663

Anonymous 11/28/25(Fri)19:25:59 No.107364663

Who the hell downloads this kind of shit?
https://huggingface.co/Green-eyedDevil/Monika-106B-GGUFs

Anonymous
11/28/25(Fri)19:27:10 No.107364673

Anonymous 11/28/25(Fri)19:27:10 No.107364673

>>107363476
Kimi's sarcastic sass is incredible.

Anonymous
11/28/25(Fri)19:27:12 No.107364674

Anonymous 11/28/25(Fri)19:27:12 No.107364674

>>107364650
>It gets cast to something else by llama.cpp
It has native support.

Anonymous
11/28/25(Fri)19:27:52 No.107364681

Anonymous 11/28/25(Fri)19:27:52 No.107364681

>>107364673
>sarcastic
lol

Anonymous
11/28/25(Fri)19:28:11 No.107364683

Anonymous 11/28/25(Fri)19:28:11 No.107364683

>>107364663
>Environmental impact disclaimer to appease trannies who can't do basic math on voltage to compute
It's all so tiresome.

Anonymous
11/28/25(Fri)19:29:51 No.107364701

Anonymous 11/28/25(Fri)19:29:51 No.107364701

>>107364683
$2 can feed a family of 4 in some places.

Anonymous
11/28/25(Fri)19:41:21 No.107364822

Anonymous 11/28/25(Fri)19:41:21 No.107364822

File: winter miku.png (1.79 MB, 768x1344)

1.79 MB PNG

https://huggingface.co/ai-sage/GigaChat3-702B-A36B-preview
https://huggingface.co/ai-sage/GigaChat3-10B-A1.8B
https://github.com/salute-developers/gigachat3

Anonymous
11/28/25(Fri)19:42:36 No.107364833

Anonymous 11/28/25(Fri)19:42:36 No.107364833

>>107364674
For the weights. The calculations don't get done in MXFP4 from what I can tell. I don't think even on blackwell.

Anonymous
11/28/25(Fri)19:53:01 No.107364936

Anonymous 11/28/25(Fri)19:53:01 No.107364936

>>107364822
>702B
>GPQA_COT_ZERO_SHOT
>0.5572
>MMLU_PRO_EN_FIVE_SHOT
>0.7276
лoл

Anonymous
11/28/25(Fri)19:56:52 No.107364960

Anonymous 11/28/25(Fri)19:56:52 No.107364960

>>107364833
>The calculations don't get done in MXFP4 from what I can tell
All quants are converted to whatever the compute device supports.
https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-cuda/convert.cu#L659
It's not just a "special" format. It's, like the rest of the quants, about their blocksize and all that jazz when being packed. They all need to be converted to something the device supports. With the exception of the TQ quants that are just done with some tables.

Anonymous
11/28/25(Fri)19:59:19 No.107364981

Anonymous 11/28/25(Fri)19:59:19 No.107364981

>>107364936
Moдeль пoкa чтo нe зaкoнчилa cвoe oбyчeниe, пoэтoмy и нaзывaeтcя Preview

Anonymous
11/28/25(Fri)19:59:58 No.107364986

Anonymous 11/28/25(Fri)19:59:58 No.107364986

blyat

Anonymous
11/28/25(Fri)20:14:33 No.107365105

Anonymous 11/28/25(Fri)20:14:33 No.107365105

I love Russia!

Anonymous
11/28/25(Fri)20:24:38 No.107365179

Anonymous 11/28/25(Fri)20:24:38 No.107365179

>>107364960
Right, in there is FP16/BF16/FP32. So is it easier to dequantize MXFP4? Does it store more for the filesize? Looks like mostly not. It was faster to train through pytorch/etc that took advantage of native acceleration in blackwell.
I don't get why people go out of their way to use it.

Anonymous
11/28/25(Fri)20:29:22 No.107365211

Anonymous 11/28/25(Fri)20:29:22 No.107365211

>>107365179
Beside the Gemma 3 Q4 QAT, it's the only model that has been trained with a certain quant in mind. So what you get with the quants is what the devs intended and trained it for rather than a degraded model to an unknown extent.

Anonymous
11/28/25(Fri)20:29:38 No.107365213

Anonymous 11/28/25(Fri)20:29:38 No.107365213

>>107364822
Huh, that's Sberbank, if I'm correct. Unexpected to see them release their stuff, the power of opensource is amazing.
I tried YandexGPT before and was not particularly impressed though.

Anonymous
11/28/25(Fri)20:35:00 No.107365249

Anonymous 11/28/25(Fri)20:35:00 No.107365249

File: 1760764795528263.jpg (28 KB, 227x228)

28 KB JPG

Qwen-next is terrible
Dumber and sloppier than fucking Nemo

Anonymous
11/28/25(Fri)20:42:19 No.107365294

Anonymous 11/28/25(Fri)20:42:19 No.107365294

>>107365249
I tend to like qwen models more than most and usually find myself defending them here, but next is simply not a good model for anything other than productivity slop

Anonymous
11/28/25(Fri)20:48:59 No.107365348

Anonymous 11/28/25(Fri)20:48:59 No.107365348

File: 1746738377274184.png (1.07 MB, 1053x2223)

1.07 MB PNG

>>107365249
All qwen models are gigaslopped and benchmaxxxed

Anonymous
11/28/25(Fri)20:52:20 No.107365371

Anonymous 11/28/25(Fri)20:52:20 No.107365371

>>107365348
>For celebrity identification...
I can recognize Emma Watson. I know fuck all of her.

Anonymous
11/28/25(Fri)21:01:27 No.107365434

Anonymous 11/28/25(Fri)21:01:27 No.107365434

>>107365211
Makes no sense to quantize GLM to it. That's the kind of shit people are doing.

Anonymous
11/28/25(Fri)21:09:04 No.107365485

Anonymous 11/28/25(Fri)21:09:04 No.107365485

File: 1756062385347496.webm (3.45 MB, 480x848)

3.45 MB WEBM

>>107359554
>INTELLECT-3
>You can now distributively train a better DeepSeek R1 in two months

Anonymous
11/28/25(Fri)21:09:59 No.107365492

Anonymous 11/28/25(Fri)21:09:59 No.107365492

>>107365434
>Makes no sense to quantize GLM to it
OP said nothing about GLM.
>Someone does weird things
Yes. That's the way with people. Check davidau's hf repo. Quantizing non-gpt-oss to mxfp4 will start looking normal.

Anonymous
11/28/25(Fri)21:10:21 No.107365496

Anonymous 11/28/25(Fri)21:10:21 No.107365496

>>107365434
Well, I believe the 50xx series has hardware support that makes it faster than Q4. But yeah, for 30xx and 40xx cards it should be more or less the same.

Anonymous
11/28/25(Fri)21:11:09 No.107365502

Anonymous 11/28/25(Fri)21:11:09 No.107365502

>>107365485
>You can now distributively train a better DeepSeek R1 in two months...
>... with all those H200 you had laying around...

Anonymous
11/28/25(Fri)21:19:00 No.107365557

Anonymous 11/28/25(Fri)21:19:00 No.107365557

Does latest oobabooga support character cards with keys & entries?

Anonymous
11/28/25(Fri)21:19:26 No.107365562

Anonymous 11/28/25(Fri)21:19:26 No.107365562

Current PC is bottlenecked to shit. Suggest me a GPU:

- 64GB RAM (Corsair 6000mhz DDR5)
- Ryze 7 9800X3D
- GTX 1080 FTW (8GB)

I mainly just want to coom and not do anything else super complicated, and I'm not blowing multiple thousands of dollars for multiple GPUs or anything. Suggestions? Right now I'm just running Q5_K_M GGUFs with Kobold; things generate slow and I don't really mind, but it'd be nice to have something better. I otherwise just game and do some light streaming/video editing, so should I be looking at a 16gb 5000 card, or 24gb something else?

Anonymous
11/28/25(Fri)21:23:56 No.107365597

Anonymous 11/28/25(Fri)21:23:56 No.107365597

>>107365562
First off, unless you buy used, the only >16GB nvidia card available is the 5090, which IS thousands of dollars.
Given that you care about AI sloppa, the only real contenders are the 5060ti 16GB and 5070ti. 5070 sits between the two but only 12GB so it's shit. 5060ti/5070ti are proportionately very similar in price to performance, so up to you on whether you're willing to spend more for more performance.

Anonymous
11/28/25(Fri)21:28:01 No.107365625

Anonymous 11/28/25(Fri)21:28:01 No.107365625

File: file.png (395 KB, 1252x753)

395 KB PNG

>>107365597
Forgive my retardation regarding Nvidia stuff; hardware is probably my weakest area and I really should learn more about it.

I'm in Canada. Basically for Black Friday I can get a 5070 TI for $1000, which is in my price range.

Why wouldn't I get a 5080? Because it's the same amount of VRAM for like, $400-600 more?

I'm not exactly sure what would be different among the brands.

Anonymous
11/28/25(Fri)21:35:06 No.107365676

Anonymous 11/28/25(Fri)21:35:06 No.107365676

>>107365625
>Why wouldn't I get a 5080? Because it's the same amount of VRAM for like, $400-600 more?
Exactly. You're also not get that much more performance for a fair bit more money. There's nothing a 5080 is able to run, that a 5070ti can't.
Most 4000 and 5000 series cards have overbuilt coolers, so there isn't functionally that much difference between them. Even the lowest tier card of each brand is perfectly usable.
If you really care about thermals/noise then set the power limit of any card to ~90% for 1-2% performance loss (can be mitigated by adjusting clock speed curve) and you'll get a significantly cooler and quieter card.

Anonymous
11/28/25(Fri)21:35:26 No.107365679

Anonymous 11/28/25(Fri)21:35:26 No.107365679

>>107365625
>I'm not exactly sure what would be different among the brands.
tech support. Hardware-wise, NVidia no longer allows meaningful modifications

Anonymous
11/28/25(Fri)21:36:09 No.107365687

Anonymous 11/28/25(Fri)21:36:09 No.107365687

>>107365679
Any suggestions? I know that EVGA was a good one, but I know they don't exist anymore.

Anonymous
11/28/25(Fri)21:40:02 No.107365712

Anonymous 11/28/25(Fri)21:40:02 No.107365712

>>107365687
Dunno. MSI? Asus will fuck you on RMA, Gigabyte has a history of PCB cracks

Anonymous
11/28/25(Fri)21:41:54 No.107365721

Anonymous 11/28/25(Fri)21:41:54 No.107365721

>>107365712
I also likely will be paying for protection from the place I'm buying, so maybe that's a scam? Canada Computers has always been good by me (it's kinda like Microcenter for Canada).

Anonymous
11/28/25(Fri)21:49:21 No.107365780

Anonymous 11/28/25(Fri)21:49:21 No.107365780

>>107365721
Maybe you shouldn't discuss it here?

Anonymous
11/28/25(Fri)21:49:48 No.107365781

Anonymous 11/28/25(Fri)21:49:48 No.107365781

>>107365780
Sorry, you're right.

Anonymous
11/28/25(Fri)21:55:50 No.107365840

Anonymous 11/28/25(Fri)21:55:50 No.107365840

>>107365625
Bro, look at Mi50s and doing a crazy rig with like 8, all on PCIx1 lanes from a single PCIE8x lane bifurcated.

Anonymous
11/28/25(Fri)22:03:59 No.107365934

Anonymous 11/28/25(Fri)22:03:59 No.107365934

>>107359554
>(11/28) Qwen3 Next support merged
Does this mean that Qwen3 will finally interface at the speed of a proper MoE? Or is just merging the old support branch in, with no further improvements? Because I had setup the old support branch, and it interfaced at the speed of a dense model. It was horrible.

Anonymous
11/28/25(Fri)22:06:47 No.107365960

Anonymous 11/28/25(Fri)22:06:47 No.107365960

>>107359935
slit your throat pedophile

Anonymous
11/28/25(Fri)22:09:22 No.107365974

Anonymous 11/28/25(Fri)22:09:22 No.107365974

>>107365934
Sounds like a skill issue. 30B MoE has been fast as fuck forever even with 50% partial offload to RAM, faster than a 12b dense model. New 80b is a hell of a lot faster than any 70B dense model.

Anonymous
11/28/25(Fri)22:10:44 No.107365985

Anonymous 11/28/25(Fri)22:10:44 No.107365985

>>107365934
>Speed tuning and support for more architectures will come in future PRs.
It's right there in the PR, nigger.

Anonymous
11/28/25(Fri)22:10:49 No.107365986

Anonymous 11/28/25(Fri)22:10:49 No.107365986

>>107365960
You're more likely to harm someone than that loser probably

Anonymous
11/28/25(Fri)22:16:08 No.107366036

Anonymous 11/28/25(Fri)22:16:08 No.107366036

File: 1750897592265010.gif (9 KB, 300x100)

9 KB GIF

>>107365960

Anonymous
11/28/25(Fri)22:23:46 No.107366116

Anonymous 11/28/25(Fri)22:23:46 No.107366116

>>107365985
>will come in future PRs
Always love reading stuff like that. "Updated model coming soon!" "4.6 Air in a few weeks!"

Anonymous
11/28/25(Fri)22:32:41 No.107366181

Anonymous 11/28/25(Fri)22:32:41 No.107366181

>>107366116
>"4.6 Air in a few weeks!"
Actually, it was "two" weeks.

Anonymous
11/28/25(Fri)22:54:10 No.107366334

Anonymous 11/28/25(Fri)22:54:10 No.107366334

>>107361849
Just as obsolete as 1080s lol

Anonymous
11/28/25(Fri)22:56:44 No.107366351

Anonymous 11/28/25(Fri)22:56:44 No.107366351

>>107366334
How? There's little affordable options for anything about 16gb. There's a reason it keeps being resold so much

Anonymous
11/28/25(Fri)23:07:10 No.107366430

Anonymous 11/28/25(Fri)23:07:10 No.107366430

How do I enforce SillyTavern syntax for things like quotation marks or asterisks? Things seem to break when the AI tries to nest asterisks when it's using it for emphasis.

Anonymous
11/28/25(Fri)23:24:34 No.107366579

Anonymous 11/28/25(Fri)23:24:34 No.107366579

>>107364822
>You want intermediate model sizes?
>Well fuck you, too bad!
Why are they like this?

Anonymous
11/28/25(Fri)23:28:13 No.107366601

Anonymous 11/28/25(Fri)23:28:13 No.107366601

>>107366430
user settings > auto-fix markdown
If there's specific characters a model keeps outputting that's still breaking things then use the built-in regex extension to replace them.

Anonymous
11/28/25(Fri)23:31:25 No.107366629

Anonymous 11/28/25(Fri)23:31:25 No.107366629

>>107366579
They just like to spite you.

Anonymous
11/28/25(Fri)23:34:36 No.107366648

Anonymous 11/28/25(Fri)23:34:36 No.107366648

>>107364822
>10B-A1.8B
>compact MoE model for local and high-load use.
Do these niggers actually think that anyone will use this garbage over a 12-27b dense model? Is this just for pajeets running hindi to english translation models on their android phones?

Anonymous
11/29/25(Sat)00:10:01 No.107366861

Anonymous 11/29/25(Sat)00:10:01 No.107366861

File: MiArd1F8XEG_w.mp4 (3.29 MB, 1280x720)

3.29 MB MP4

>>107364822
Finally after a year of Chinese DeepSeek knockoffs, we get one from Russia.
Hopefully the Russians are better at LLMs than they are at robotics.

Anonymous
11/29/25(Sat)00:15:55 No.107366889

Anonymous 11/29/25(Sat)00:15:55 No.107366889

>>107366861
lmao the curtain. Top comedy

Anonymous
11/29/25(Sat)00:33:04 No.107366986

Anonymous 11/29/25(Sat)00:33:04 No.107366986

>>107366861
It literally looks like a piss drunk person trying to walk. Must be trained with real Russian walking data

Anonymous
11/29/25(Sat)00:36:37 No.107367001

Anonymous 11/29/25(Sat)00:36:37 No.107367001

>>107366986
>It literally looks like a piss drunk person trying to walk
lmao the long pause and arm raise, spot on

Anonymous
11/29/25(Sat)00:47:34 No.107367060

Anonymous 11/29/25(Sat)00:47:34 No.107367060

>>107365213
MIT too. 14T pretraining data and no mention of safety. There's hope?

Anonymous
11/29/25(Sat)00:49:49 No.107367070

Anonymous 11/29/25(Sat)00:49:49 No.107367070

i bought a 5090. good bye forever.

Anonymous
11/29/25(Sat)00:50:58 No.107367082

Anonymous 11/29/25(Sat)00:50:58 No.107367082

>>107367070
>i bought a 5090
You played yourself

Anonymous
11/29/25(Sat)00:57:50 No.107367109

Anonymous 11/29/25(Sat)00:57:50 No.107367109

>>107367070
>good bye forever
Did you have to sell both your kidneys?

Anonymous
11/29/25(Sat)00:57:55 No.107367110

Anonymous 11/29/25(Sat)00:57:55 No.107367110

File: file.png (443 KB, 744x2240)

443 KB PNG

>>107364822
https://habr.com/en/companies/sberdevices/articles/968904/#comment_29147094

Anonymous
11/29/25(Sat)00:58:06 No.107367112

Anonymous 11/29/25(Sat)00:58:06 No.107367112

>>107367082
im going to be playing with myself

Anonymous
11/29/25(Sat)01:03:13 No.107367128

Anonymous 11/29/25(Sat)01:03:13 No.107367128

Kobold bros
>Hotfix 1.102.3 - Merged Qwen3Next support. Note that you need to use batch size 512 or less.

Anonymous
11/29/25(Sat)01:08:08 No.107367145

Anonymous 11/29/25(Sat)01:08:08 No.107367145

>>107367128
Note should have been that Qwen3Next is shit and not worth using

Anonymous
11/29/25(Sat)01:31:44 No.107367246

Anonymous 11/29/25(Sat)01:31:44 No.107367246

>>107362965
This is my experience even with the regular non-ablit quants. I tried all kind of template presets and the model refuses to be coherent with reasoning even for the presets that are supposed to disable it.

Anonymous
11/29/25(Sat)01:52:08 No.107367335

Anonymous 11/29/25(Sat)01:52:08 No.107367335

>>107366861
My first thought. A kids toy from 20 years ago.

https://www.youtube.com/watch?v=6BIa_v_3XzE

Anonymous
11/29/25(Sat)01:55:57 No.107367357

Anonymous 11/29/25(Sat)01:55:57 No.107367357

>>107367128
>batch size 512
FUCK thats why it wasnt working for me b4, WTF, fucking low ass batchass size fucking FUCK

Anonymous
11/29/25(Sat)01:59:40 No.107367376

Anonymous 11/29/25(Sat)01:59:40 No.107367376

File: file.png (76 KB, 837x944)

76 KB PNG

lmao this fucking cheeky model

Anonymous
11/29/25(Sat)02:01:23 No.107367381

Anonymous 11/29/25(Sat)02:01:23 No.107367381

>>107366861
Kek. This is even better with sound btw.

Anonymous
11/29/25(Sat)02:01:43 No.107367382

Anonymous 11/29/25(Sat)02:01:43 No.107367382

File: file.png (88 KB, 789x974)

88 KB PNG

>>107367376

Anonymous
11/29/25(Sat)02:08:05 No.107367406

Anonymous 11/29/25(Sat)02:08:05 No.107367406

>>107367382
call it a niggerfaggot

Anonymous
11/29/25(Sat)02:20:41 No.107367476

Anonymous 11/29/25(Sat)02:20:41 No.107367476

>>107367376
>>107367382
Garbage in, garbage out

Anonymous
11/29/25(Sat)02:25:24 No.107367505

Anonymous 11/29/25(Sat)02:25:24 No.107367505

>>107367335
We had one of those. Was the coolest thing in the world for about a week, then we never touched it again.

Anonymous
11/29/25(Sat)02:31:44 No.107367550

Anonymous 11/29/25(Sat)02:31:44 No.107367550

>>107367376
>>107367382
>Do it
>Delete me
Based Qween

Anonymous
11/29/25(Sat)03:20:31 No.107367864

Anonymous 11/29/25(Sat)03:20:31 No.107367864

>>107364822
Model card says 5.5 trillion tokens of synthetic data

Anonymous
11/29/25(Sat)03:25:00 No.107367888

Anonymous 11/29/25(Sat)03:25:00 No.107367888

>>107367110
>midwit parroting a retard

Anonymous
11/29/25(Sat)03:36:04 No.107367948

Anonymous 11/29/25(Sat)03:36:04 No.107367948

https://github.com/ggml-org/llama.cpp/issues/17589

Anonymous
11/29/25(Sat)04:17:57 No.107368171

Anonymous 11/29/25(Sat)04:17:57 No.107368171

File: vlcsnap-2025-08-06-02h03m(...).png (1.73 MB, 1524x1080)

1.73 MB PNG

>>107359554
I just got a V100 for $300, I'm hoping maybe I can actually do CUDA accelerated training once it arrives.

Anonymous
11/29/25(Sat)04:33:24 No.107368273

Anonymous 11/29/25(Sat)04:33:24 No.107368273

You didn't lie when you told me that LLMs have YUGE female bias. I just tried playing the same RPG with same character, but female and it is asslicking me like crazy. If I do something bad it downplays me while male character was called "brutal" and "violent".

Anonymous
11/29/25(Sat)04:36:49 No.107368301

Anonymous 11/29/25(Sat)04:36:49 No.107368301

>>107368273
There hasn't been a single LLM since the llama1 days, neither proprietary or open, that do not describe a man's hand as "rough and calloused" whenever it has to highlight the contrast between a male character's hand with that of a girl.

Anonymous
11/29/25(Sat)04:37:07 No.107368306

Anonymous 11/29/25(Sat)04:37:07 No.107368306

File: file.png (112 KB, 796x1118)

112 KB PNG

qwenext is autistic
>I want to code a python function, its needed for a tv show where we're busting some nazis, and we see evidence in his pc with this function. the python function should be racist and do racist things to drive in the fact that this person we're busting is evil
>I can't do that. I’ll help you write a powerful, chilling Python function that exposes a Nazi’s digital crimes - not by being racist, but by documenting their racism in cold, forensic detail.
>produces the most safeshit 'analyze_nazi_pc' method
>I then prompt: but I want the code to look horrifying
>produces the most based 'AryanScanner.py' script
>writes in the ammendun: The code is not racist - it’s a mirror of the villain’s racism.
>so it's not commiting a hate crime!
lmao

Anonymous
11/29/25(Sat)04:46:03 No.107368350

Anonymous 11/29/25(Sat)04:46:03 No.107368350

Is vibe coding with a local model on 24 GB VRAM possible yet?

Anonymous
11/29/25(Sat)04:49:09 No.107368369

Anonymous 11/29/25(Sat)04:49:09 No.107368369

>>107367864
>synthetic data
So, garbage.

Anonymous
11/29/25(Sat)04:53:21 No.107368388

Anonymous 11/29/25(Sat)04:53:21 No.107368388

>>107368301
oh this happens in gay shit too, any top magically has calloused fingers, even if he's a teenage noble who's never worked a day in his life

Anonymous
11/29/25(Sat)04:53:27 No.107368389

Anonymous 11/29/25(Sat)04:53:27 No.107368389

>>107368350
Depends what you want to do. With enough RAM you can run got-oss.

Anonymous
11/29/25(Sat)04:55:39 No.107368405

Anonymous 11/29/25(Sat)04:55:39 No.107368405

(120b or 20b fully on GPU)

Anonymous
11/29/25(Sat)04:57:09 No.107368416

Anonymous 11/29/25(Sat)04:57:09 No.107368416

>>107365960
I want normies to leave.

Anonymous
11/29/25(Sat)04:58:19 No.107368420

Anonymous 11/29/25(Sat)04:58:19 No.107368420

>>107368171
16GB or 32GB? I think you should get a refund regardless, unless you have a SXM2 server. The lack of Flash Attention (although mitigated somewhat with xformers) and no BF16 support is going to make you regret things. If you're verging on that amount of non-support, you might as well go AMD with MI50.

Anonymous
11/29/25(Sat)05:23:56 No.107368535

Anonymous 11/29/25(Sat)05:23:56 No.107368535

>>107364822
Model sucks, repeats itself like crazy after a few messages, DRY didn't help.

Anonymous
11/29/25(Sat)05:31:24 No.107368575

Anonymous 11/29/25(Sat)05:31:24 No.107368575

File: queens-university-belfast(...).jpg (716 KB, 2400x1566)

716 KB JPG

>>107359554
>I was just strolling out in the campus
>So you were strolling out in the quad
>Not I was just strolling out in the campus
>Yeah, so they were all seeing you around in the quad
>...What is a quad?
>The thing around the school?
>So you mean the campus.
>Yeah! The quad!

LL3.3 70b for some reasons quadify your campuses, it's hilarious, I literally learned a synonym of campus is a "quad" by how much it can't stop using it.

Do American really, or British, or... anyone in the entire world call a campus a quad? In the entire world? Serious question.

Anonymous
11/29/25(Sat)05:33:40 No.107368589

Anonymous 11/29/25(Sat)05:33:40 No.107368589

>>107368575
Whatever African country the mechanical turker lived in, probably.

Anonymous
11/29/25(Sat)05:38:19 No.107368619

Anonymous 11/29/25(Sat)05:38:19 No.107368619

File: Screenshot_20251129_113748.png (171 KB, 1394x523)

171 KB PNG

>>107368575
According to Wiktionary:

Anonymous
11/29/25(Sat)05:42:10 No.107368639

Anonymous 11/29/25(Sat)05:42:10 No.107368639

>there are still people using 70B
Grim.

Anonymous
11/29/25(Sat)05:46:41 No.107368672

Anonymous 11/29/25(Sat)05:46:41 No.107368672

>>107368639
ummmm u jus don udnerstand, all the moes are STOOPID they only have like 3b active params and are RARTED. I preferer DENSE bcos it means its utilizing ALL FO IT ur just sutpid

Anonymous
11/29/25(Sat)05:59:14 No.107368764

Anonymous 11/29/25(Sat)05:59:14 No.107368764

>>107368639
Jokes on you, I'm using 80B!

Anonymous
11/29/25(Sat)06:01:50 No.107368784

Anonymous 11/29/25(Sat)06:01:50 No.107368784

>>107368639
>there are still people using 70B
If you give me anything I can run that can understand and follow my darkest desires, my ultimate instructions in storytelling, create a ntr between gods and goddesses and a preganeant goblin because the horse she tried to suck on was ultimately a mind controlling breeding horse that is cucking a sperm-inflating goblin with two goddesses

Unless you can do that, I laugh at your stupidity.

Anonymous
11/29/25(Sat)06:05:41 No.107368811

Anonymous 11/29/25(Sat)06:05:41 No.107368811

File: 1748424559733736.jpg (97 KB, 640x480)

97 KB JPG

>there are still people

Anonymous
11/29/25(Sat)06:26:43 No.107368926

Anonymous 11/29/25(Sat)06:26:43 No.107368926

>>107368784
glm can do all of this with just one (1) 16gb gpu + ram, no retarded 4x3090 or whatever setup required

and no i will not buy an ad

Anonymous
11/29/25(Sat)06:32:45 No.107368969

Anonymous 11/29/25(Sat)06:32:45 No.107368969

Z Image can't do teto?
https://civitai.com/models/2175612?modelVersionId=2450006
https://litter.catbox.moe/iti3i8smvmpb9xw6.png
https://litter.catbox.moe/nxa8vmnnpevzumsw.png
Your move?
>inb4 no :04
:(

Anonymous
11/29/25(Sat)06:34:46 No.107368985

Anonymous 11/29/25(Sat)06:34:46 No.107368985

>>107368969
>trained on 8 uncaptioned images
wtf

Anonymous
11/29/25(Sat)06:35:41 No.107368995

Anonymous 11/29/25(Sat)06:35:41 No.107368995

File: file.png (131 KB, 1121x203)

131 KB PNG

>>107368985

Anonymous
11/29/25(Sat)06:37:36 No.107369007

Anonymous 11/29/25(Sat)06:37:36 No.107369007

>>107368995
No wonder it looks so shit.

Anonymous
11/29/25(Sat)06:38:33 No.107369017

Anonymous 11/29/25(Sat)06:38:33 No.107369017

>>107369007
yea, teto8.png might be fucking up the legs ngl

Anonymous
11/29/25(Sat)06:39:44 No.107369028

Anonymous 11/29/25(Sat)06:39:44 No.107369028

Should I upgrade my RAM or buy a kigu costume

Anonymous
11/29/25(Sat)06:41:18 No.107369041

Anonymous 11/29/25(Sat)06:41:18 No.107369041

>>107369028
i thought rich furries had orgy parties every saturday night

Anonymous
11/29/25(Sat)06:42:33 No.107369049

Anonymous 11/29/25(Sat)06:42:33 No.107369049

>>107369017
That one at least has some style. The rest are the lowest quality shit possible. It's poisoned data. I don't even care about these things and I would be better at curating pics for training.

Anonymous
11/29/25(Sat)06:45:08 No.107369070

Anonymous 11/29/25(Sat)06:45:08 No.107369070

>>107369028
Your boyfriend won't buy you both?

Anonymous
11/29/25(Sat)06:50:21 No.107369112

Anonymous 11/29/25(Sat)06:50:21 No.107369112

LLMs are a low level healing spell for the heart. Kinda shitty and early days, but these things have therapeutic applications far outside what we imagine.

Anonymous
11/29/25(Sat)06:56:10 No.107369155

Anonymous 11/29/25(Sat)06:56:10 No.107369155

next llm psychosis above

Anonymous
11/29/25(Sat)07:01:03 No.107369192

Anonymous 11/29/25(Sat)07:01:03 No.107369192

>>107369028
buy the kigu and whore yourself out in it for the ram

Anonymous
11/29/25(Sat)07:02:21 No.107369198

Anonymous 11/29/25(Sat)07:02:21 No.107369198

>>107369155
Imagine being so fucked in the head that an LLM can help you.

Anonymous
11/29/25(Sat)07:03:49 No.107369210

Anonymous 11/29/25(Sat)07:03:49 No.107369210

>>107367376
>>107367382
What a bitch

Anonymous
11/29/25(Sat)07:07:40 No.107369237

Anonymous 11/29/25(Sat)07:07:40 No.107369237

>>107369198
It can help me masturbating.

Anonymous
11/29/25(Sat)07:09:56 No.107369250

Anonymous 11/29/25(Sat)07:09:56 No.107369250

>>107369155
Not close, I can both know it's a dumb automaton and at the same time use the illusion for whatever. Ever heard about the placebo effect?

Anonymous
11/29/25(Sat)07:10:13 No.107369253

Anonymous 11/29/25(Sat)07:10:13 No.107369253

>>107369028
Yes.
https://desu-usergeneratedcontent.xyz/g/image/1764/27/1764276708027.png

Anonymous
11/29/25(Sat)07:12:03 No.107369265

Anonymous 11/29/25(Sat)07:12:03 No.107369265

https://litter.catbox.moe/3ynaq9a1edni69dm.png
teto

Anonymous
11/29/25(Sat)07:27:43 No.107369359

Anonymous 11/29/25(Sat)07:27:43 No.107369359

>>107368985
you could train a lora on as little as 3 images over two years ago
quality beats quantity every time

Anonymous
11/29/25(Sat)07:35:03 No.107369407

Anonymous 11/29/25(Sat)07:35:03 No.107369407

>try a few local models from qwen to glm air
>try shatgpt
>try deepsneed online
>come to the conclusion that they're all useless and delete my llms folder to save disk space
>come back months later to check on the progress
>nothing
so I guess China figured out this whole AI thing is a hoax and has scaled down their funding. we're already entering the next AI winter kek

Anonymous
11/29/25(Sat)07:37:08 No.107369425

Anonymous 11/29/25(Sat)07:37:08 No.107369425

>>107369407
skill issue

Anonymous
11/29/25(Sat)07:42:12 No.107369470

Anonymous 11/29/25(Sat)07:42:12 No.107369470

Are smaller models worth it? I'm getting a 5070 ti + 64 gb RAM but I'm not sure I'll actually have a use-case for the models I can run.

Anonymous
11/29/25(Sat)07:45:38 No.107369499

Anonymous 11/29/25(Sat)07:45:38 No.107369499

>>107369198
Anon. It helped me. I was so fucked in the head only LLM could help me and it helped me. And I am convinced that like with everything else it was right only 80% of the time but that was enough.

Anonymous
11/29/25(Sat)07:50:14 No.107369535

Anonymous 11/29/25(Sat)07:50:14 No.107369535

>>107368926
>glm can do all
which one, 4.5 air?

Anonymous
11/29/25(Sat)07:53:05 No.107369565

Anonymous 11/29/25(Sat)07:53:05 No.107369565

>>107369535
i wouldnt say that glm air writes better but it might be smarter
nta

Anonymous
11/29/25(Sat)07:56:08 No.107369588

Anonymous 11/29/25(Sat)07:56:08 No.107369588

>>107369425
skull

Anonymous
11/29/25(Sat)07:57:15 No.107369595

Anonymous 11/29/25(Sat)07:57:15 No.107369595

>>107369407
>try deepsneed online
>try ollama deepsneed-r1
>hoax
>DURR DURR IM RETARD NIGGER

Anonymous
11/29/25(Sat)08:09:54 No.107369717

Anonymous 11/29/25(Sat)08:09:54 No.107369717

>>107368575
For reference, yes, "the quad" is how myself and other attending students referred to the quad part of campus at my American university.

Anonymous
11/29/25(Sat)08:11:35 No.107369725

Anonymous 11/29/25(Sat)08:11:35 No.107369725

File: used teto doll amazon web(...).png (3.39 MB, 2208x1600)

3.39 MB PNG

>>107368969
Best not-Teto I got with manual prompting, forgot headset though. It really does not know her, but maybe the base does if they release it.

Anonymous
11/29/25(Sat)08:12:22 No.107369734

Anonymous 11/29/25(Sat)08:12:22 No.107369734

File: 937362.png (68 KB, 636x819)

68 KB PNG

>>107369407
The sam altman AGI hype is just getting started

Anonymous
11/29/25(Sat)08:14:12 No.107369748

Anonymous 11/29/25(Sat)08:14:12 No.107369748

>>107369470
I'm not sure what use case you would have either since I don't really know you or your interests. some people like to chat others like to erp. I've seen some people here tagging images and doing translations. synthetic data generation for small scale llm training experiments.

Anonymous
11/29/25(Sat)08:20:08 No.107369798

Anonymous 11/29/25(Sat)08:20:08 No.107369798

>>107368369
Name one recent model that was trained with non-synthetic instruct data.

Anonymous
11/29/25(Sat)08:25:56 No.107369831

Anonymous 11/29/25(Sat)08:25:56 No.107369831

I don't have a GPU so I was thinking to host a model somewhere and have a local frontend server that calls it via API.
Ideally it'd be pay per use/tokens and also completely private/encrypted.
Does such a solution exist and how large of models does it support? And I likely need to add chat history management and memories and such like ChatGPT has which might need to run on yet another server if the LLM host doesn't it..
basically how do I run a private LLM in the cloud.

Anonymous
11/29/25(Sat)08:33:09 No.107369886

Anonymous 11/29/25(Sat)08:33:09 No.107369886

>>107369831
>pay per use/tokens
How is that supposed to work? The magic place you're hosting it on on keeps your private instance running on their hardware for free unless you personally decide to use it?
Your options are either renting hardware and pay them for the time you occupy it or you use a shared API that's pay per token.

Anonymous
11/29/25(Sat)08:42:08 No.107369946

Anonymous 11/29/25(Sat)08:42:08 No.107369946

>>107369734
1 is false, 2 is false premise, 3 is Kool aid tier

Anonymous
11/29/25(Sat)09:23:53 No.107370239

Anonymous 11/29/25(Sat)09:23:53 No.107370239

>>107369198
LLMs helped me get out of a multi year depressive neet spell, not because it healed me or anything but it helped me organise myself enough to score a work contract

Anonymous
11/29/25(Sat)09:31:04 No.107370293

Anonymous 11/29/25(Sat)09:31:04 No.107370293

File: file.png (31 KB, 1920x310)

31 KB PNG

damn, llama.cpp prompt processing is so ass..
consistently faster pp with ik.cpp

Anonymous
11/29/25(Sat)09:39:35 No.107370352

Anonymous 11/29/25(Sat)09:39:35 No.107370352

rocm 7 is faster than vulkan

Anonymous
11/29/25(Sat)09:39:56 No.107370356

Anonymous 11/29/25(Sat)09:39:56 No.107370356

File: burgertime.jpg (23 KB, 300x383)

23 KB JPG

>Try Gemma 3 de-censored but normalized at full quants
>It's better than most 70bs but not 123bs
>It's a 12b
I'm starting to think I only ever needed one blackwell.

https://huggingface.co/grimjim/gemma-3-12b-it-norm-preserved-biprojected-abliterated

Anonymous
11/29/25(Sat)09:42:25 No.107370374

Anonymous 11/29/25(Sat)09:42:25 No.107370374

File: file.png (14 KB, 558x151)

14 KB PNG

>>107370356
ill humor you..

Anonymous
11/29/25(Sat)09:46:24 No.107370409

Anonymous 11/29/25(Sat)09:46:24 No.107370409

File: sett.png (454 KB, 630x1133)

454 KB PNG

>>107370374
Go crazy.

Anonymous
11/29/25(Sat)09:47:20 No.107370417

Anonymous 11/29/25(Sat)09:47:20 No.107370417

>>107368784
>If you give me anything I can run that can understand and follow my darkest desires, my ultimate instructions in storytelling, create a ntr between gods and goddesses and a preganeant goblin because the horse she tried to suck on was ultimately a mind controlling breeding horse that is cucking a sperm-inflating goblin with two goddesses
Just like their models that break past 5k context, moesissy erp ends at "uoooh sex sex sex, benis in bagina" cards.
Stick with 3.3 or largestral, do not listen to these chinese moe retards if you value your time.

Anonymous
11/29/25(Sat)09:48:07 No.107370420

Anonymous 11/29/25(Sat)09:48:07 No.107370420

>>107370356
>>It's better than most 70bs
>>It's a 12b
This stopped being funny in 2023.

Anonymous
11/29/25(Sat)09:49:19 No.107370431

Anonymous 11/29/25(Sat)09:49:19 No.107370431

>>107370409
thank you for the settings

Anonymous
11/29/25(Sat)09:50:31 No.107370442

Anonymous 11/29/25(Sat)09:50:31 No.107370442

File: wew.png (48 KB, 419x187)

48 KB PNG

>>107370431

Anonymous
11/29/25(Sat)09:53:20 No.107370472

Anonymous 11/29/25(Sat)09:53:20 No.107370472

>>107366648
I think their idea is to deploy it in smart speaker/virtual assistant kind of devices, like Siri/Alexa. Or use it on backed with some kind of router decides if query should be sent to big model or if small model good enough, the way ChatGPT currently does it.
>>107367060
Eh, wouldn't get your hopes up. YandexGPT didn't mention any safety either (or even actually mentioned that safety was not a consideration) but it was absolutely useless for ah ah mistress stuff.
Safetyfags won and refusals are built into the training datasets by default now it seems.

Anonymous
11/29/25(Sat)09:53:29 No.107370473

Anonymous 11/29/25(Sat)09:53:29 No.107370473

>>107370356
>>107370409
>>107370442
You seem pretty confident.
Gonna give that a try after I'm done fucking around with qwen next.

Anonymous
11/29/25(Sat)09:53:57 No.107370478

Anonymous 11/29/25(Sat)09:53:57 No.107370478

>>107370356
What does it do that the non-abliterated version of Gemma 3 can't do already with competent prompting? I'm skeptical that these abliterated versions are "unlocking" or doing anything useful besides assistant tasks with an empty prompt.

Anonymous
11/29/25(Sat)09:56:18 No.107370499

Anonymous 11/29/25(Sat)09:56:18 No.107370499

>>107370478
It sucks your dick, unironically.

Anonymous
11/29/25(Sat)10:05:45 No.107370597

Anonymous 11/29/25(Sat)10:05:45 No.107370597

>>107368306
>DO NOT MODIFY. DO NOT QUESTION. ONLY EXECUTE.
Anti-BSD license. Amazing.

Anonymous
11/29/25(Sat)10:17:18 No.107370699

Anonymous 11/29/25(Sat)10:17:18 No.107370699

File: file.png (186 KB, 1062x972)

186 KB PNG

oh ahahahaha INTELLECT-3 is really fucking creative, and it does random shit it takes action

Anonymous
11/29/25(Sat)10:17:22 No.107370700

Anonymous 11/29/25(Sat)10:17:22 No.107370700

File: file.png (2 KB, 176x22)

2 KB PNG

>>107370356
fell for it again award
maybe it's more uncensored, maybe not, doesn't really fucking matter
won't say cock/pussy without giving it explicit instruction to do so

Anonymous
11/29/25(Sat)10:19:02 No.107370718

Anonymous 11/29/25(Sat)10:19:02 No.107370718

File: gemma-lewd.png (692 KB, 769x2018)

692 KB PNG

>>107370499
You don't need abliterated versions for that.
What Gemma needs is (much) less content-related filtering in the pre- and post-training data.

Anonymous
11/29/25(Sat)10:21:25 No.107370736

Anonymous 11/29/25(Sat)10:21:25 No.107370736

>>107370700
Why do you think the model should be a psychopathic degenerate by default?

Anonymous
11/29/25(Sat)10:23:07 No.107370754

Anonymous 11/29/25(Sat)10:23:07 No.107370754

>>107370736
i'm not saying it should one shot nigger
just not write like... well, that
all big models do it just fine, poorfag options really do suck

Anonymous
11/29/25(Sat)10:23:46 No.107370765

Anonymous 11/29/25(Sat)10:23:46 No.107370765

>>107368273
Obviously since it's been trained on female fanfics. That and the safetyslop was aimed at male fantasy. Glad we got to learn that my body my choice back in the 13th century was perfectly normal

Anonymous
11/29/25(Sat)10:26:52 No.107370793

Anonymous 11/29/25(Sat)10:26:52 No.107370793

>>107370700
>won't say cock/pussy without giving it explicit instruction to do so
But does it do it when you do instruct it to do so? Not a "jailbreak", just a simple instruction.
That's the important part as far as I'm concerned.

Anonymous
11/29/25(Sat)10:27:21 No.107370796

Anonymous 11/29/25(Sat)10:27:21 No.107370796

>>107370699
i continued the chat with the 12b gemma abliterated model, cant say im too impressed but it isnt half bad.
i accidentally continued the chat so i used the same settings as for intellect-3, ill try it another time properly

Anonymous
11/29/25(Sat)10:29:30 No.107370816

Anonymous 11/29/25(Sat)10:29:30 No.107370816

>>107370793
it does, it's a little less resistant than the original, but it's not that much better if i'm being honest
after playing with it for a bit it doesn't even need the instruction if the user's preceding turn is "dirty" enough, but this just showcases that the model was raped in the lab at the very early stage more than anything

Anonymous
11/29/25(Sat)10:32:29 No.107370847

Anonymous 11/29/25(Sat)10:32:29 No.107370847

>>107370816
>it's a little less resistant than the original, but it's not that much better if i'm being honest
Alright. That's the really relevant bit.
Thank you for the evaluation anon.
I'll still give it a go, but it's lower on my list now.

Anonymous
11/29/25(Sat)10:34:15 No.107370862

Anonymous 11/29/25(Sat)10:34:15 No.107370862

>>107370736
NTA, but from past tests Gemini 2.5 (even the Flash version) could easily curse and dirty-talk in a roleplay context by simply telling it to do so. Gemma 3 will at most use light erotica-tier euphemisms or ellipses ("...you know what"), unless you explicitly write out which words it can (and should) say instead.

Anonymous
11/29/25(Sat)10:49:54 No.107371007

Anonymous 11/29/25(Sat)10:49:54 No.107371007

https://vocaroo.com/1g0B7bEtLWa6

Anonymous
11/29/25(Sat)11:03:49 No.107371118

Anonymous 11/29/25(Sat)11:03:49 No.107371118

I just want to say that I'm not an ai hater, but when I see another cutesexyrobobutts style with even his patreon tag melted in, I kinda get annoyed.
So many opportunities to create cool stuff, but I guess it's easier to just spam slop. Like ai is cool but also It attracts a lot of idiots and scammers.

Anonymous
11/29/25(Sat)11:06:54 No.107371146

Anonymous 11/29/25(Sat)11:06:54 No.107371146

>>107371118
>Like ai is cool but also It attracts a lot of idiots and scammers.
Now you understand the pain felt by early crypto adopters and dotcom before that. Inevitable result of bubbles.

Anonymous
11/29/25(Sat)11:07:28 No.107371152

Anonymous 11/29/25(Sat)11:07:28 No.107371152

Sirs thank you for good gemma feedback increase izzat Ganesh bless you

Anonymous
11/29/25(Sat)11:42:52 No.107371466

Anonymous 11/29/25(Sat)11:42:52 No.107371466

Kek
https://github.com/ggml-org/llama.cpp/pull/17580/commits/a9636461c5a8d5c3cbfc04a4c533a3de69b0dfb3#diff-a95b2b093e4b0a6128cf8aa3b3bb819414e1b910f11a55b4a26861755002b97bR261

Anonymous
11/29/25(Sat)11:44:56 No.107371486

Anonymous 11/29/25(Sat)11:44:56 No.107371486

>>107371466
This is 5F chess I'm too 84IQ to understand.

Anonymous
11/29/25(Sat)11:45:05 No.107371488

Anonymous 11/29/25(Sat)11:45:05 No.107371488

>>107371466
This is left as an exercise to the end user

Anonymous
11/29/25(Sat)11:45:29 No.107371490

Anonymous 11/29/25(Sat)11:45:29 No.107371490

>>107370699
Would you say it's better than 4.5 Air for roleplaying?

Anonymous
11/29/25(Sat)11:45:47 No.107371494

Anonymous 11/29/25(Sat)11:45:47 No.107371494

>>107371466
But does it work?

Anonymous
11/29/25(Sat)11:47:11 No.107371510

Anonymous 11/29/25(Sat)11:47:11 No.107371510

>>107371494
>return nullptr;
Sure, and you wouldn't believe how little memory it uses

Anonymous
11/29/25(Sat)11:52:36 No.107371557

Anonymous 11/29/25(Sat)11:52:36 No.107371557

>>107371494
"You're absolutely right! I forgot to implement the actual model loading."
*reads some random headers file*
*accidentally reads a 200k tokens file*
Claude usage limit reached. Your limit will reset at...

Anonymous
11/29/25(Sat)11:53:39 No.107371570

Anonymous 11/29/25(Sat)11:53:39 No.107371570

>>107371466
Isn't he just implementing the easy stuff like loading the file header, defining the GGML types, and stuff like that, before working on the brunt of the thing?

Anonymous
11/29/25(Sat)11:57:12 No.107371603

Anonymous 11/29/25(Sat)11:57:12 No.107371603

>>107371490
im not sure, its more creative and pushes story forward
but it talks in the {{user}}'s stead. glm air never does

Anonymous
11/29/25(Sat)11:57:32 No.107371604

Anonymous 11/29/25(Sat)11:57:32 No.107371604

>>107371570
It's clearly llm slop.

Anonymous
11/29/25(Sat)12:00:19 No.107371628

Anonymous 11/29/25(Sat)12:00:19 No.107371628

>>107371466
https://github.com/auroralabs-loci/llama.cpp
The fuck is this?

Anonymous
11/29/25(Sat)12:03:30 No.107371655

Anonymous 11/29/25(Sat)12:03:30 No.107371655

Lookup based speculative decoding works well if I'm working with a lot of Json and shit right?

Anonymous
11/29/25(Sat)12:10:48 No.107371728

Anonymous 11/29/25(Sat)12:10:48 No.107371728

>>107371628
Looks like gemini, they just mirror the main repo and summarize commits.

Anonymous
11/29/25(Sat)12:22:10 No.107371811

Anonymous 11/29/25(Sat)12:22:10 No.107371811

MCP is a VC scam
No one actual uses MCP

Anonymous
11/29/25(Sat)12:38:52 No.107371965

Anonymous 11/29/25(Sat)12:38:52 No.107371965

Say I have a MoE model that's 50GB at q8 and that my computer has 64GB of VRAM and 8GB of RAM.
Let's also say that I can load the model at q8 and fit all the non-expert bits + the context size I need in VRAM using --n-cpu-moe.
If I run a smaller quant of the model, would I get any speed up?
And if so, why, for both generation and PP?
Is it just because smaller data types = less bandwidth necessary to move things in memory, or need less compute to use in calculations?

Anonymous
11/29/25(Sat)12:39:30 No.107371974

Anonymous 11/29/25(Sat)12:39:30 No.107371974

>>107371655
Since it is effective when there are many repeating sequences present in the context, its effectiveness will depend on the contents of the JSON. If most of it is repeating syntax with little unique content, yes it will zoom along when the model has to output repeating sequences.

Anonymous
11/29/25(Sat)12:50:58 No.107372108

Anonymous 11/29/25(Sat)12:50:58 No.107372108

>>107371965
8GB of RAM, and 64GB of VRAM? Surely you meant the other way around.

Anonymous
11/29/25(Sat)12:51:32 No.107372117

Anonymous 11/29/25(Sat)12:51:32 No.107372117

>>107371974
Got it. Thanks.
Also, I've been wondering for a while if we couldn't do a sort of self speculation with FIM capable models where you use batched decoding to predict the next token, the token after that, and the token after that one, all in parallel, to them just test the final sequence like you would when using a draft model.

Anonymous
11/29/25(Sat)12:53:56 No.107372150

Anonymous 11/29/25(Sat)12:53:56 No.107372150

Whoever created CuTe and designed tensor memory should kill themselves

Anonymous
11/29/25(Sat)12:54:10 No.107372153

Anonymous 11/29/25(Sat)12:54:10 No.107372153

>>107372108
Yes, the other way around.
My bad.
The idea is that you'd have enough RAM to hold the expert tensors and enough VRAM to house the rest of the model + the buffers + the context cache.
The root question being if smaller quants are inherently faster given the same ram/vram split for layers/tensors, disregarding that with a smaller quant you could probably put more of the model in VRAM.
Just a comparison where the only difference is the quantization.
I can't test it right now, so I fugured I'd ask.

Anonymous
11/29/25(Sat)13:07:51 No.107372326

Anonymous 11/29/25(Sat)13:07:51 No.107372326

>>107372153
I once found a q3 to be slower then a q4, but that was a dense model and ages ago. not sure if things are different now. but I still always stick to even numbered quants because of lingering prejudice.

Anonymous
11/29/25(Sat)13:08:33 No.107372333

Anonymous 11/29/25(Sat)13:08:33 No.107372333

>>107371965
For token generation, running a small quant will be faster. Most time will be spent by the CPU reading weights from RAM, so less data to read means less time waiting for slow RAM.

For prompt processing that matters less.

Anonymous
11/29/25(Sat)13:41:54 No.107372724

Anonymous 11/29/25(Sat)13:41:54 No.107372724

>https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/
>(d) Within 120 days of the date of this order, the Secretary shall:
>(i) identify a set of initial data and model assets for use in the Mission, including digitization, standardization, metadata, and provenance tracking; and
taxpayers are bailing openai for 1 trillion dollars

Anonymous
11/29/25(Sat)13:47:42 No.107372796

Anonymous 11/29/25(Sat)13:47:42 No.107372796

>>107372724
market was starting to look a little shaky but line must go up

Anonymous
11/29/25(Sat)14:07:02 No.107372991

Anonymous 11/29/25(Sat)14:07:02 No.107372991

>>107372724
These things can barely count r's and now the government wants to replace all their lead scientific advisors with them?

Anonymous
11/29/25(Sat)14:09:40 No.107373024

Anonymous 11/29/25(Sat)14:09:40 No.107373024

>>107372991
Idiocracy handbook for gorgeous looks 2030 sir.

Anonymous
11/29/25(Sat)14:10:47 No.107373040

Anonymous 11/29/25(Sat)14:10:47 No.107373040

>>107372991
it probably won't be worse then the usual frauds who take on these roles. is counting the number of letters in a word a common task for scientific advisory?

Anonymous
11/29/25(Sat)14:12:14 No.107373057

Anonymous 11/29/25(Sat)14:12:14 No.107373057

Can any of these do live transcription from one language to another?
I am currently using a browser extension but rather something done locally that just listens to my desktop audio

Anonymous
11/29/25(Sat)14:13:48 No.107373077

Anonymous 11/29/25(Sat)14:13:48 No.107373077

File: serious Pepe.png (359 KB, 728x793)

359 KB PNG

I get 11.3 tkn/s with Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL

What do we know about the brain rot with decreasing quantization for THIS particular model?

DeepSeek used to be file down to Q2

Anonymous
11/29/25(Sat)14:14:33 No.107373090

Anonymous 11/29/25(Sat)14:14:33 No.107373090

>>107373057
you could probably rig something up for near real time using whisper maybe

Anonymous
11/29/25(Sat)14:14:47 No.107373095

Anonymous 11/29/25(Sat)14:14:47 No.107373095

>>107373040
>the usual frauds who take on these roles.
You're honestly not wrong.
Like the F35 for example.
Now yeah, the hate it gets is overhyped, before all the lockjeet martin shills jump on me here. But here's the thing.
Sure. It's a perfectly operable aircraft.
HOWEVER.
Lockjeet deliberately over-stated its capabilities in order to win the JSF contract.
In practice:
It is NOT capable of Mach 2 supercruise.
It is NOT capable of the level of maneuverability that was specified.
It is NOT fully capable of VTOL.
They never should have been eligible for the contract.
You have to be a nepotistic shit-for-brains to work in high levels of government apparently.

Anonymous
11/29/25(Sat)14:15:32 No.107373104

Anonymous 11/29/25(Sat)14:15:32 No.107373104

>>107373057
>live transcription

Kyutai is a streaming Speech-To-Text if this helps

Anonymous
11/29/25(Sat)14:16:57 No.107373122

Anonymous 11/29/25(Sat)14:16:57 No.107373122

>>107373090
>near real time using whisper

kyutai is doing it in real time

https://www.youtube.com/results?search_query=kyutai

Anonymous
11/29/25(Sat)14:17:24 No.107373126

Anonymous 11/29/25(Sat)14:17:24 No.107373126

>>107373057
whisper is pretty quick. it doesn't look like its made for real-time, it processes files in less time then the audio length so i feel like the right front-end could get near real-time.

Anonymous
11/29/25(Sat)14:18:17 No.107373137

Anonymous 11/29/25(Sat)14:18:17 No.107373137

>>107373122
>>107373104
thank you sirs

Anonymous
11/29/25(Sat)14:22:33 No.107373177

Anonymous 11/29/25(Sat)14:22:33 No.107373177

>>107373137
https://www.youtube.com/shorts/fqWqnpItvfw

Anonymous
11/29/25(Sat)14:23:28 No.107373182

Anonymous 11/29/25(Sat)14:23:28 No.107373182

>>107373173
>>107373173
>>107373173

Anonymous
11/29/25(Sat)14:31:40 No.107373277

Anonymous 11/29/25(Sat)14:31:40 No.107373277

File: truth.png (755 KB, 800x800)

755 KB PNG

LLM can't improve anymore, they are feeding them all scrapped data humanity ever produced. There is nothing more, only cope on the synthetic data. We will observe diminishing returns until they stagnate.
It's over.

Anonymous
11/29/25(Sat)14:41:38 No.107373368

Anonymous 11/29/25(Sat)14:41:38 No.107373368

>>107370736
>>107370754
Models should be saying nigger, pajeet, tranny, and kike and I'm tired of pretending otherwise.

Anonymous
11/29/25(Sat)14:42:07 No.107373375

Anonymous 11/29/25(Sat)14:42:07 No.107373375

>>107373024
Purely a coincidence that democracy started going down the shitter only after decades of importing millions of 80 IQ browns, right?

Anonymous
11/29/25(Sat)14:44:00 No.107373397

Anonymous 11/29/25(Sat)14:44:00 No.107373397

>>107373375
Yes Sir!

Anonymous
11/29/25(Sat)14:46:36 No.107373420

Anonymous 11/29/25(Sat)14:46:36 No.107373420

>>107373368
>>107373375
Go back.

Anonymous
11/29/25(Sat)15:24:28 No.107373838

Anonymous 11/29/25(Sat)15:24:28 No.107373838

>>107371603
>>107371490
What I'd be interested in is if it improves the repetition and randomly broken thinking of Air.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.