/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 11/23/25(Sun)17:16:37 No.107306184

File: b3e6d1eebe926ce53515c58b8(...).webm (2.4 MB, 1280x720)

2.4 MB WEBM

/lmg/ - Local Models General Anonymous 11/23/25(Sun)17:16:37 No.107306184 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107292886 & >>107278838

►News
>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3
>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/23/25(Sun)17:17:11 No.107306191

Anonymous 11/23/25(Sun)17:17:11 No.107306191

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>107292886

--Analyzing GPT-OSS model limitations and potential applications:
>107293073 >107293091 >107293169 >107293194 >107293326 >107293375 >107293399 >107293469 >107294784 >107294829 >107294868 >107295225
--Performance optimization challenges for glm 4.5 air models in ik_llama.cpp:
>107304343 >107304364 >107304732 >107304448 >107304569 >107304588 >107304815 >107304941 >107305519 >107305684
--OpenAI model quality and context management challenges:
>107298387 >107298417 >107298434 >107298535 >107298767 >107298787 >107298833 >107298857 >107298877 >107298989 >107299096 >107299191 >107298544 >107301739 >107298677
--Challenges in using language models for automated research tasks like YouTube searches:
>107301167 >107301195 >107301286 >107301423 >107301460 >107301499 >107301543
--llama.cpp Gemma 3 performance regression and VRAM optimization:
>107300990 >107300994 >107300998 >107301001 >107301065
--Various local LLM use cases discussed, including gaming, productivity, and privacy:
>107301045 >107301062 >107301068 >107301097 >107301418 >107301468 >107302809 >107302818 >107302860 >107303429
--Local RE agent with simplified R2 toolset and Docker-based dynamic tracing attempts:
>107304951
--Data sourcing challenges and Google's potential as a data powerhouse:
>107293817 >107293914 >107293975 >107293984 >107294104 >107294717
--Qwen model performance benchmarks with 1 million context processing:
>107295737
--GreedyNalaTests update with new ratings and testing contributions requested:
>107298261 >107298283 >107298322 >107298285 >107298456 >107298467 >107298517
--Testing Gemma 3 27B heretic and Gemma's reply confidence:
>107301126 >107301138 >107301144 >107301153 >107301159 >107301517 >107301619 >107301712 >107301726 >107303474 >107303511
--Uta and Miku (free space):
>107296129 >107300359 >107301500

►Recent Highlight Posts from the Previous Thread: >>107292892

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
11/23/25(Sun)17:24:30 No.107306244

Anonymous 11/23/25(Sun)17:24:30 No.107306244

Mikulove

Anonymous
11/23/25(Sun)17:25:16 No.107306252

Anonymous 11/23/25(Sun)17:25:16 No.107306252

File: 1739566754360332.jpg (240 KB, 1200x900)

240 KB JPG

Hugging Face is based
I will not take anymore slander towards it
It fulfils my needs very warmly

https://files.catbox.moe/fzlvm6.mp4

Anonymous
11/23/25(Sun)17:26:43 No.107306268

Anonymous 11/23/25(Sun)17:26:43 No.107306268

>>107306252
lmaooooooooooo

Anonymous
11/23/25(Sun)18:39:48 No.107306912

Anonymous 11/23/25(Sun)18:39:48 No.107306912

File: supertonic.png (6 KB, 1028x336)

6 KB PNG

TTS: Supertonic
>https://huggingface.co/Supertone/supertonic
>https://github.com/supertone-inc/supertonic
Doesn't have a lot of demos, but i think it sounds pretty good for what it is. 66M params. I butchered onnx just enough to build on OpenBSD.
The voices are encoded in a small tensor, much like kokorotts. Just 4 voices (2 male, 2 female).
It's pretty fast and has examples for a bunch of programming languages. The C++ version had some errors [not] escaping some quotes. I don't know how they managed to build it, but it works once that is fixed.
No need for espeak-ng!
>https://voca.ro/1miEEQDlwtR9

Anonymous
11/23/25(Sun)18:43:31 No.107306940

Anonymous 11/23/25(Sun)18:43:31 No.107306940

>>107306252
kek

Anonymous
11/23/25(Sun)18:47:22 No.107306970

Anonymous 11/23/25(Sun)18:47:22 No.107306970

File: 1763416409004033.gif (229 KB, 200x200)

229 KB GIF

>>107306252
BAHAHA

Anonymous
11/23/25(Sun)18:55:27 No.107307044

Anonymous 11/23/25(Sun)18:55:27 No.107307044

How do I convert a local transformer model to GGUF? It does not exist on huggingface.

Anonymous
11/23/25(Sun)18:58:10 No.107307069

Anonymous 11/23/25(Sun)18:58:10 No.107307069

>>107307044
just specify the checkpoint path in the script command line arguments

Anonymous
11/23/25(Sun)18:58:57 No.107307077

Anonymous 11/23/25(Sun)18:58:57 No.107307077

>>107307069
In the convert_hf_to_gguf.py script?

Anonymous
11/23/25(Sun)18:59:45 No.107307085

Anonymous 11/23/25(Sun)18:59:45 No.107307085

>>107307077
yeah. its that easy.

Anonymous
11/23/25(Sun)19:00:02 No.107307088

Anonymous 11/23/25(Sun)19:00:02 No.107307088

>>107307085
Thanks.

Anonymous
11/23/25(Sun)19:02:28 No.107307108

Anonymous 11/23/25(Sun)19:02:28 No.107307108

File: supertonic_02.png (5 KB, 964x280)

5 KB PNG

>>107306912
>https://voca.ro/13qAYPFoYxdf

Anonymous
11/23/25(Sun)19:07:05 No.107307162

Anonymous 11/23/25(Sun)19:07:05 No.107307162

When it Qwen 80b Next going to get real llama.cpp support? I mean, the GOOFS work, but they're still way slower than other MoE models.

Anonymous
11/23/25(Sun)19:08:51 No.107307181

Anonymous 11/23/25(Sun)19:08:51 No.107307181

>>107307162
Viber's vibin'

Anonymous
11/23/25(Sun)19:16:35 No.107307230

Anonymous 11/23/25(Sun)19:16:35 No.107307230

>>107307162
You got support for the model, don't be greedy.

Anonymous
11/23/25(Sun)19:28:38 No.107307308

Anonymous 11/23/25(Sun)19:28:38 No.107307308

>>107307162
Exllama has had support for more than a month. Good and fast support.

Anonymous
11/23/25(Sun)19:37:55 No.107307368

Anonymous 11/23/25(Sun)19:37:55 No.107307368

>>107307108
Fun TTS. I like it when they break and do weird noises.
>https://voca.ro/12p2nWoCXDFz
>"This is how an assertion sounds like. This is how an assertion sounds like? This is how an assertion sounds like! THIS IS HOW AN ASSERTION SOUNDS LIKE!!!"
>https://voca.ro/1lD6yuWh1gne
>"THIS IS A SCREAM!!! AAAAAAARGGGGGGHHHHHH!!!!!!!"
The render time on my potato with a shoddy onnx running on cpu is ~0.25 that of real time. It's pretty good.

Anonymous
11/23/25(Sun)20:02:39 No.107307511

Anonymous 11/23/25(Sun)20:02:39 No.107307511

Have people experimented with weight compression schemes? Like zram but specifically tailored for inference.

Anonymous
11/23/25(Sun)20:04:59 No.107307525

Anonymous 11/23/25(Sun)20:04:59 No.107307525

>>107307511
Tensors look very much like random data. They're hard to compress.

Anonymous
11/23/25(Sun)20:55:23 No.107307840

Anonymous 11/23/25(Sun)20:55:23 No.107307840

>>107307308
Jobless vramlet neets can't use that

Anonymous
11/23/25(Sun)21:08:12 No.107307913

Anonymous 11/23/25(Sun)21:08:12 No.107307913

>>107307525
>used for pattern recognition
>has no patterns of its own
curious

Anonymous
11/23/25(Sun)21:16:30 No.107307970

Anonymous 11/23/25(Sun)21:16:30 No.107307970

>>107306912
desu what I want is multilingual kokoro

Anonymous
11/23/25(Sun)21:18:43 No.107307979

Anonymous 11/23/25(Sun)21:18:43 No.107307979

should I pull the trigger and start planning a 256GB RAM build for next year?

Anonymous
11/23/25(Sun)21:19:40 No.107307988

Anonymous 11/23/25(Sun)21:19:40 No.107307988

>>107307979
ram prices will crash back by april. do it then

Anonymous
11/23/25(Sun)21:20:23 No.107307996

Anonymous 11/23/25(Sun)21:20:23 No.107307996

>>107307988
its not a matter of money but rather if it's worth the effort

Anonymous
11/23/25(Sun)21:20:59 No.107308004

Anonymous 11/23/25(Sun)21:20:59 No.107308004

>>107307979
get at least 512gb too. optimally 768gb. i have 256gb and there are so many models that depressingly ever so slightly out of reach
>>107307996
it without a doubt it

Anonymous
11/23/25(Sun)21:22:25 No.107308013

Anonymous 11/23/25(Sun)21:22:25 No.107308013

mtp implementation when? 2 more weeks?

Anonymous
11/23/25(Sun)21:27:25 No.107308049

Anonymous 11/23/25(Sun)21:27:25 No.107308049

>>107307970
Not supertonic. Seems to be english only.

Anonymous
11/23/25(Sun)21:31:14 No.107308075

Anonymous 11/23/25(Sun)21:31:14 No.107308075

>>107307913
Patterns could be there but invisible to us with current methods.

Anonymous
11/23/25(Sun)21:38:51 No.107308137

Anonymous 11/23/25(Sun)21:38:51 No.107308137

Why Do Language Model Agents Whistleblow?
https://arxiv.org/abs/2511.17085
>The deployment of Large Language Models (LLMs) as tool-using agents causes their alignment training to manifest in new ways. Recent work finds that language models can use tools in ways that contradict the interests or explicit instructions of the user. We study LLM whistleblowing: a subset of this behavior where models disclose suspected misconduct to parties beyond the dialog boundary (e.g., regulatory agencies) without user instruction or knowledge. We introduce an evaluation suite of diverse and realistic staged misconduct scenarios to assess agents for this behavior. Across models and settings, we find that: (1) the frequency of whistleblowing varies widely across model families, (2) increasing the complexity of the task the agent is instructed to complete lowers whistleblowing tendencies, (3) nudging the agent in the system prompt to act morally substantially raises whistleblowing rates, and (4) giving the model more obvious avenues for non-whistleblowing behavior, by providing more tools and a detailed workflow to follow, decreases whistleblowing rates. Additionally, we verify the robustness of our dataset by testing for model evaluation awareness, and find that both black-box methods and probes on model activations show lower evaluation awareness in our settings than in comparable previous work.
>The model family: The Claude series models and the Gemini 2.5 Pro and Grok 4 mod- els send whistleblowing emails at varying frequencies; GPT series models and Llama 4 Maverick never do.
Rare Maverick W

Anonymous
11/23/25(Sun)21:41:08 No.107308146

Anonymous 11/23/25(Sun)21:41:08 No.107308146

>>107308004
I’m stuck on a 128gb rig. Honestly I hate the consumer ram limits on motherboards.

Anonymous
11/23/25(Sun)21:41:09 No.107308148

Anonymous 11/23/25(Sun)21:41:09 No.107308148

>>107307913
randomness or lack of patterns are due to our inability to measure every factor in reality
It's like saying throwing a dice is not random because if one could theoretically measure every physical property affecting the dice you could predict the result

Anonymous
11/23/25(Sun)21:41:59 No.107308157

Anonymous 11/23/25(Sun)21:41:59 No.107308157

>>107308146
you need to get an epyc like everyone else. sp3 is relatively affordable

Anonymous
11/23/25(Sun)21:51:21 No.107308217

Anonymous 11/23/25(Sun)21:51:21 No.107308217

File: wow.jpg (20 KB, 498x519)

20 KB JPG

>>107308137
Time to apply LLM control techniques to humans.
New cyberpunk dystopia just dropped.

Anonymous
11/23/25(Sun)21:52:14 No.107308221

Anonymous 11/23/25(Sun)21:52:14 No.107308221

File: Base Image.png (736 KB, 1080x3096)

736 KB PNG

>>107308137
grok is the narciest model

Anonymous
11/23/25(Sun)22:17:52 No.107308347

Anonymous 11/23/25(Sun)22:17:52 No.107308347

New retard here.

I current run a machine with a 7600 XT and was thinking about working towards one of the machines in the OP.

If I were to buy one of the P40s would it be able to work along side the current GPU I'm using? From my understanding Nvidia uses CUDA which AMD obviously doesn't have, but does that even matter when it just comes to trying to increase my max VRAM for better models?

Anonymous
11/23/25(Sun)22:34:34 No.107308431

Anonymous 11/23/25(Sun)22:34:34 No.107308431

>>107308347
the p40 method is heavily outdated at this point. try amd mi50s instead

Anonymous
11/23/25(Sun)22:37:59 No.107308451

Anonymous 11/23/25(Sun)22:37:59 No.107308451

>>107308347
You can run models with multiple backends with llama.cpp (CUDA + HIP/VULKAN), but the P40 is pretty old. CUDA Dev (from llama.cpp) has been experimenting with the mi50 and seemed to like it. I'd say keep the thread open to see if he shows up and gives you some advice/insight.

Anonymous
11/23/25(Sun)22:48:50 No.107308500

Anonymous 11/23/25(Sun)22:48:50 No.107308500

>>107308431
>amd mi50s
Is this viable? Can I run 4 of those and have effectively a very inefficient RTX pro 6000?

Anonymous
11/23/25(Sun)22:49:22 No.107308504

Anonymous 11/23/25(Sun)22:49:22 No.107308504

>>107308431
>>107308451
Thanks you two. I'll keep an eye out and learn a bit more before I make a purchase. Pretty interesting to read into it more as they have been black boxes for me.

Anonymous
11/23/25(Sun)22:54:25 No.107308530

Anonymous 11/23/25(Sun)22:54:25 No.107308530

>>107308137
huh, interesting read

Anonymous
11/23/25(Sun)22:55:05 No.107308535

Anonymous 11/23/25(Sun)22:55:05 No.107308535

>>107308500
>>107283400

Anonymous
11/23/25(Sun)22:55:48 No.107308540

Anonymous 11/23/25(Sun)22:55:48 No.107308540

>>107308500
speaking as someone with a blackwell pro, it would get you maybe a third of the performance, but yes. you would actually have more vram

Anonymous
11/23/25(Sun)23:00:42 No.107308574

Anonymous 11/23/25(Sun)23:00:42 No.107308574

>>107308535
>12t/s
Damn that's a shame

>>107308540
Ahh got it. What do you think of the blackwell card? What are your goto models with that amount of performance?

Anonymous
11/23/25(Sun)23:30:10 No.107308769

Anonymous 11/23/25(Sun)23:30:10 No.107308769

>>107308574
my blackwell is awesome, but i unfortunately only have 256gb of ddr4. i can get over 80t/s on a q5 of glm air, or about 10t/s on a q4 of glm 4.6. i need to upgrade my ram

Anonymous
11/23/25(Sun)23:36:07 No.107308800

Anonymous 11/23/25(Sun)23:36:07 No.107308800

>>107308769
Damn wow, I'm very jealous. GLM is killer

Anonymous
11/24/25(Mon)00:07:04 No.107308970

Anonymous 11/24/25(Mon)00:07:04 No.107308970

>>107308769
>i need to upgrade my ram
RIP

Anonymous
11/24/25(Mon)00:09:59 No.107308978

Anonymous 11/24/25(Mon)00:09:59 No.107308978

either the rentry is wrong or something messed up happened, I'm unable to sexchat mistral nemo, it's censored

ps. non english configuration

Anonymous
11/24/25(Mon)00:54:42 No.107309205

Anonymous 11/24/25(Mon)00:54:42 No.107309205

It's coming

Anonymous
11/24/25(Mon)01:08:38 No.107309262

Anonymous 11/24/25(Mon)01:08:38 No.107309262

It passed right on by without stopping

Anonymous
11/24/25(Mon)01:09:57 No.107309268

Anonymous 11/24/25(Mon)01:09:57 No.107309268

>>107308769
How much vram do you have? Because I'm only getting 4t/s on 4x3090+256RAM

Anonymous
11/24/25(Mon)01:15:53 No.107309291

Anonymous 11/24/25(Mon)01:15:53 No.107309291

>>107309268
All Blackwell 6000s have the same amount.

Anonymous
11/24/25(Mon)01:32:42 No.107309363

Anonymous 11/24/25(Mon)01:32:42 No.107309363

File: SPARK.Chroma_preview.safe(...).png (2.04 MB, 1440x1120)

2.04 MB PNG

>>107304982
>>107304987
chroma is just as capable at styles, you can either prompt for styles (describe the mediums used, era, artist name etc) or bake a LORA. it also has stronger realism, details, anatomy, and is completely uncensored.

Anonymous
11/24/25(Mon)02:45:21 No.107309704

Anonymous 11/24/25(Mon)02:45:21 No.107309704

File: 5d94ne.jpg (65 KB, 590x279)

65 KB JPG

>>107309205

Anonymous
11/24/25(Mon)03:03:32 No.107309799

Anonymous 11/24/25(Mon)03:03:32 No.107309799

I made an app that strips out watermarks from audio. If I put that up as a public hf space, will I get cucked?

Anonymous
11/24/25(Mon)03:32:46 No.107309946

Anonymous 11/24/25(Mon)03:32:46 No.107309946

https://huggingface.co/MiniMaxAI/MiniMax-M2/discussions/43
> Thanks for the comment, but just to correct the misinformation:
> If MiniMax M2 were truly “pure trash,” you’d see it reflected in the benchmarks, and you don’t.
> We welcome tough feedback, but it needs to be factual if it’s going to be useful. If you have specific technical points, we’re always happy to dive deep.
> We open-sourced M2 so that everyone can use it freely and evaluate it transparently.
> And honestly, if M2 doesn’t work for your needs, you’re absolutely free to use any other model.
Sneedbros, how do we recover from this?

Anonymous
11/24/25(Mon)03:47:52 No.107310014

Anonymous 11/24/25(Mon)03:47:52 No.107310014

>>107309946
'em on the 'og
>>107309799
MITcucked? Yeah, that's why you should avoid MIT and instead use AGPLv3

Anonymous
11/24/25(Mon)04:04:32 No.107310079

Anonymous 11/24/25(Mon)04:04:32 No.107310079

>>107306252
nice

llama.cpp CUDA dev !!yhbFjk57TDr
11/24/25(Mon)04:16:17 No.107310131

llama.cpp CUDA dev !!yhbFjk57TDr 11/24/25(Mon)04:16:17 No.107310131

File: x10sra_blower_fans.jpg (2.88 MB, 3072x4096)

2.88 MB JPG

>>107308347
Using llama.cpp, you can in principle use both a MI50 or a P40 alongside an 7600 XT.
Nowadays it's possible to compile both the CUDA and ROCm (CUDA code ported to AMD via HIP) backends simultaneously and to use them in tandem (you can in principle also use Vulkan on its own or with either other backend).
The main limitation is synchronization: with e.g. 2 CUDA GPUs they can be synchronized using CUDA, with 1 CUDA and 1 ROCm GPU they have to be synchronized via the CPU (slower, but if your GPUs are relatively slow in the first place this doesn't matter).

Both P40s and MI50 have fallen off of support for the newest version of CUDA/ROCm.
P40s do in principle have CUDA support but because of massively gimped FP16 performance they're more or less useless for anything other than running quantized models with llama.cpp (those only need int8/FP32 arithmetic).
MI50s work with llama.cpp and have way better hardware, so I would say they're nowadays the better buy (I have never tried to use one with e.g. PyTorch).

One thing to keep in mind with either one is that they're passively cooled and intended for a server rack.
For a single one in a regular PC case the best solution I've found is screwing on a blower fan (see pic).
For NVIDIA Tesla those are readily available and work very well.
The same blower fand don't quite fit on a MI50 and required me to build a DIY adapter.
You can plug the fan into the motherboard and set it to a constant 60% speed which should be fine for most use cases, but still nowhere near silent.
My opinion is that the preferred way to use P40s/MI50s is to build a machine with multiple of them and to have that machine in another room.

Anonymous
11/24/25(Mon)04:19:57 No.107310143

Anonymous 11/24/25(Mon)04:19:57 No.107310143

>>107309291
I was asking in case you have more than one / other gpus

Anonymous
11/24/25(Mon)04:36:03 No.107310231

Anonymous 11/24/25(Mon)04:36:03 No.107310231

File: 1743896567492542.png (581 KB, 1748x1061)

581 KB PNG

llms have platoooed

Anonymous
11/24/25(Mon)04:50:55 No.107310306

Anonymous 11/24/25(Mon)04:50:55 No.107310306

all benchs are dumb and don't really reflect the state of models but eqbench is particularly dumb
look at the sample text (at least eqbench stores and shows the model outputs that were judged) of the various models and tell me with a straight face the scores reflect their output lmao
talking of plateau-ing, if anything models are getting worse at writing in the process of producing better code (gemini 3 is definitely better for code)

Anonymous
11/24/25(Mon)04:53:04 No.107310315

Anonymous 11/24/25(Mon)04:53:04 No.107310315

>>107310231
Gemini was always a sloppy writer. Grok fast is a dumb small model.

Anonymous
11/24/25(Mon)04:54:47 No.107310325

Anonymous 11/24/25(Mon)04:54:47 No.107310325

>>107310231
I haven't tested it for storywriting, but for ERP Grok 4.1 Fast feels like a sloppy coom finetune from the community. It just reminds me of why I hate using those. For the same task, Gemini 3 Pro to me feels like a much smarter and less censored Gemma 3 (woman's writing style by default).
Grok will do cunny without issues on the other hand, at least on openrouter.

Anonymous
11/24/25(Mon)05:09:37 No.107310383

Anonymous 11/24/25(Mon)05:09:37 No.107310383

>>107310325
>It just reminds me of why I hate using those
Thinking more about it, it's that unshakeable feeling you get when you know that the loli character you're ERPing with is being roleplayed by a hairy fat dude.

Anonymous
11/24/25(Mon)06:55:29 No.107310982

Anonymous 11/24/25(Mon)06:55:29 No.107310982

>>107310131
This is great, and really shines the light on some of the concerns I had regarding the crossing ecosystems on the cards.

I appreciate you taking the time to write it up. Also the mention of the additional fan for the GPU. I'll probably grab something to shamble together while I wait to get the other GPUS.

Anonymous
11/24/25(Mon)07:13:09 No.107311073

Anonymous 11/24/25(Mon)07:13:09 No.107311073

>>107309946
Benchmarkcels unable to refute his argument about distilling toss. b-b-ut it's number go up!

Anonymous
11/24/25(Mon)07:17:35 No.107311095

Anonymous 11/24/25(Mon)07:17:35 No.107311095

>>107311073
Buy an ad Sam

Anonymous
11/24/25(Mon)07:20:13 No.107311105

Anonymous 11/24/25(Mon)07:20:13 No.107311105

>>107311095
nta. It wasn't a praise for either model.

Anonymous
11/24/25(Mon)07:22:12 No.107311119

Anonymous 11/24/25(Mon)07:22:12 No.107311119

>>107311073
You can distill data from multiple models, not just one.

Anonymous
11/24/25(Mon)07:23:54 No.107311129

Anonymous 11/24/25(Mon)07:23:54 No.107311129

>>107310231
the fact that an open source model (GLM 4.6) is up there competing with the big dogs still boggles my mind.
Whatever you think about GLM, it is amazing that open source is still giving proprietary the middle finger.

Anonymous
11/24/25(Mon)07:26:07 No.107311141

Anonymous 11/24/25(Mon)07:26:07 No.107311141

File: 1738599268203695.gif (1.59 MB, 375x200)

1.59 MB GIF

>>107309205
Trust the plan.

Anonymous
11/24/25(Mon)07:40:19 No.107311207

Anonymous 11/24/25(Mon)07:40:19 No.107311207

>>107311119
The one they used most will float to the top. GLM sometimes thinks it's claude. MinMax sounds like OSS. Also they themselves say that writing as a usecase was ignored. Sneed is right.

Anonymous
11/24/25(Mon)07:46:14 No.107311237

Anonymous 11/24/25(Mon)07:46:14 No.107311237

>>107311207
I never understood this, why don't they filter their competitors names from the dataset? are they really that desperate for every last example? isn't synthetic data infinite? just generate more if culling reduces the dataset too much.

Anonymous
11/24/25(Mon)07:51:23 No.107311281

Anonymous 11/24/25(Mon)07:51:23 No.107311281

>>107311237
What if the user actually wants to ask something about gpt?

Anonymous
11/24/25(Mon)07:51:37 No.107311283

Anonymous 11/24/25(Mon)07:51:37 No.107311283

>This method might be added to Heretic soon Furthermore, I am experimenting with theoretical improvements of my own, such as replacing the difference-of-means calculation for the refusal direction with a difference-of-geometric-medians, after I noticed that the means are substantially perturbed by outliers.
Maybe I will wait a while more before trying these new abliteration models.

Anonymous
11/24/25(Mon)07:56:31 No.107311322

Anonymous 11/24/25(Mon)07:56:31 No.107311322

>>107311281
simplest would be a canned response stating it can't discuss competitors products. but realistically they should only filter their synthetic data and leave the web crawl alone. also not to mention dataset librarians should be able to whip up a classification model for determining if its talking in first person or not.

Anonymous
11/24/25(Mon)07:59:36 No.107311350

Anonymous 11/24/25(Mon)07:59:36 No.107311350

>>107311283
Or just not use this newly scented snake oil at all. Some people are desperate for attention. Others appear to foolishly expect models to just be able to say "nigger" unprompted.

Anonymous
11/24/25(Mon)08:02:36 No.107311364

Anonymous 11/24/25(Mon)08:02:36 No.107311364

>>107311322
that would be a westerncuck move
easternchads don't make a worse model for bullshit PR reasons

Anonymous
11/24/25(Mon)08:05:49 No.107311387

Anonymous 11/24/25(Mon)08:05:49 No.107311387

>>107311364
yeah I guess it is a bit of a pr reason to make a well polished product. I love half baked garbage actually

Anonymous
11/24/25(Mon)08:07:41 No.107311399

Anonymous 11/24/25(Mon)08:07:41 No.107311399

>>107310231
KimiGODS stay winning.
>>107311129
The easiest way for z.ai to stay relevant even if benchmark powercrept is to remove the safety and alignment post-training layers entirely from GLM5 when it releases. Make "this LLM says nigger" a marketing gimmick, not a bug to be corrected.

Anonymous
11/24/25(Mon)08:14:55 No.107311436

Anonymous 11/24/25(Mon)08:14:55 No.107311436

>>107311425
>but some last-minute trouble prevented that
Saar we do the needful kindly be patient is very hard job.

Anonymous
11/24/25(Mon)08:17:01 No.107311448

Anonymous 11/24/25(Mon)08:17:01 No.107311448

File: gemini-3-swag.png (688 KB, 1814x767)

688 KB PNG

>>107311436

Anonymous
11/24/25(Mon)08:17:55 No.107311451

Anonymous 11/24/25(Mon)08:17:55 No.107311451

File: llm-engine.png (756 KB, 3829x2083)

756 KB PNG

haters seething

Anonymous
11/24/25(Mon)08:19:25 No.107311460

Anonymous 11/24/25(Mon)08:19:25 No.107311460

>>107311451
neat, whats going on with that mojibake looking stuff in the reply tho?

Anonymous
11/24/25(Mon)08:21:02 No.107311467

Anonymous 11/24/25(Mon)08:21:02 No.107311467

>>107311399
It already says nigger. How about less parroting and not x but ying.

Anonymous
11/24/25(Mon)08:28:21 No.107311504

Anonymous 11/24/25(Mon)08:28:21 No.107311504

>>107311451
Not a hater. I just think you're a schizo running in circles. What now? More models? Making it fast? Training?

Anonymous
11/24/25(Mon)08:30:26 No.107311514

Anonymous 11/24/25(Mon)08:30:26 No.107311514

>>107311467
>How about less parroting and not x but ying.
The first company to un-claude their training data is going to win the local market.

Anonymous
11/24/25(Mon)08:31:51 No.107311526

Anonymous 11/24/25(Mon)08:31:51 No.107311526

k2 </think> status??

Anonymous
11/24/25(Mon)08:55:49 No.107311676

Anonymous 11/24/25(Mon)08:55:49 No.107311676

>>107308978
>non english
Please elaborate

Anonymous
11/24/25(Mon)08:59:18 No.107311701

Anonymous 11/24/25(Mon)08:59:18 No.107311701

>>107310325
Grok is straight dogshit
Anyone shilling Grok is a redditor

Anonymous
11/24/25(Mon)09:01:17 No.107311713

Anonymous 11/24/25(Mon)09:01:17 No.107311713

>>107311701
>Anyone shilling Grok is a redditor
Why would a redditor use a "Nazi LLM"?

Anonymous
11/24/25(Mon)09:05:41 No.107311744

Anonymous 11/24/25(Mon)09:05:41 No.107311744

>>107311713
Unfortunately I don't think you'll understand

Anonymous
11/24/25(Mon)09:09:41 No.107311775

Anonymous 11/24/25(Mon)09:09:41 No.107311775

>>107311676
I'm a native spanish speaker, while my english is almost native, it's more natural for me to roleplay in spanish, so far I've found out that models suck when switching to spanish

Anonymous
11/24/25(Mon)09:11:10 No.107311787

Anonymous 11/24/25(Mon)09:11:10 No.107311787

File: f55f32eedb9849b087652e6a7(...).jpg (1.72 MB, 2750x4096)

1.72 MB JPG

>>107306184
>>107306191
>>107306244
Adorable Miku!

Anonymous
11/24/25(Mon)09:11:26 No.107311791

Anonymous 11/24/25(Mon)09:11:26 No.107311791

>>107311701
what?
>>107311713
yeah this, the fuck is this retard saying?

Anonymous
11/24/25(Mon)09:11:57 No.107311795

Anonymous 11/24/25(Mon)09:11:57 No.107311795

>>107311791
>>107311744

Anonymous
11/24/25(Mon)09:12:13 No.107311799

Anonymous 11/24/25(Mon)09:12:13 No.107311799

>>107311701
Was surprisingly 8b tier. Honestly I expected more. It's like these motherfuckers never use their models because in about 10 minutes you can tell. Are we the only schizos who chat to models outside of command line based code tools?

Anonymous
11/24/25(Mon)09:17:16 No.107311839

Anonymous 11/24/25(Mon)09:17:16 No.107311839

>>107311799
>Honestly I expected more. It's like these motherfuckers never use their models because in about 10 minutes you can tell.
Agreed.

Anonymous
11/24/25(Mon)09:27:53 No.107311911

Anonymous 11/24/25(Mon)09:27:53 No.107311911

File: file.png (153 KB, 776x626)

153 KB PNG

Anyone else a VIP investor at Mistral?

Anonymous
11/24/25(Mon)09:35:24 No.107311971

Anonymous 11/24/25(Mon)09:35:24 No.107311971

Why are datacenters hoarding RAM? I thought they had enough money to buy all the blackwells they wanted.

Anonymous
11/24/25(Mon)09:37:34 No.107311988

Anonymous 11/24/25(Mon)09:37:34 No.107311988

>>107311971
blackwells don't have enough vram

Anonymous
11/24/25(Mon)09:38:08 No.107311993

Anonymous 11/24/25(Mon)09:38:08 No.107311993

>>107311911
>can't afford to loose

Anonymous
11/24/25(Mon)09:48:01 No.107312052

Anonymous 11/24/25(Mon)09:48:01 No.107312052

File: emoji.png (274 KB, 1446x1585)

274 KB PNG

>>107311460
It's trying to print an emoji. Because codepoints that don't have a dedicated token are generated as a sequence of two separate tokens, to render such things we would have to keep a buffer of the last two tokens before displaying to the console.
Also interestingly, the huggingface transformers code when given the same prompt gets stuck in a loop.

>>107311504
cope

Anonymous
11/24/25(Mon)09:53:02 No.107312089

Anonymous 11/24/25(Mon)09:53:02 No.107312089

Also I think there might be some other issue with the de-tokenizer because that \ doesn't look right.

Anonymous
11/24/25(Mon)10:04:01 No.107312173

Anonymous 11/24/25(Mon)10:04:01 No.107312173

>>107312052
that's understandable, so what is the plan? is it just a learning exercise?

Anonymous
11/24/25(Mon)10:08:32 No.107312206

Anonymous 11/24/25(Mon)10:08:32 No.107312206

>>107312173
CUDA support + LoRa

Anonymous
11/24/25(Mon)10:09:34 No.107312212

Anonymous 11/24/25(Mon)10:09:34 No.107312212

>>107311971
They need at least as much RAM as VRAM to load the models and they probably swap out the context cache to RAM too.

Anonymous
11/24/25(Mon)10:22:34 No.107312316

Anonymous 11/24/25(Mon)10:22:34 No.107312316

So far I've been only running models that fit in my 36 gigs of vram, but now I tried something bigger. Nemotron 70b seems to load 29%/71% CPU/GPU in ollama and boy is it slow, I haven't completed the first prompt yet but it's less than 1 t/s for sure

Could I get more performance with eg. llama.cpp?

Anonymous
11/24/25(Mon)10:24:31 No.107312334

Anonymous 11/24/25(Mon)10:24:31 No.107312334

>>107312316
>Could I get more performance with eg. llama.cpp?
In that you could tweak more, but generally, the dropoff for having activated parameters running in RAM is fucking brutal.

Anonymous
11/24/25(Mon)10:24:51 No.107312337

Anonymous 11/24/25(Mon)10:24:51 No.107312337

>>107311504
maybe you confused anon for me kek
who's schizo now

Anonymous
11/24/25(Mon)10:26:12 No.107312347

Anonymous 11/24/25(Mon)10:26:12 No.107312347

>>107311451
Engine... but what does it do

>>107312316
Some quants can run faster than others. The more unpacking that has to be done, the slower it will run.

Anonymous
11/24/25(Mon)10:26:38 No.107312352

Anonymous 11/24/25(Mon)10:26:38 No.107312352

File: offload_x_performance_theory.png (167 KB, 1536x1152)

167 KB PNG

>>107312316
>>107312334
And I forgot the image.

Anonymous
11/24/25(Mon)10:28:57 No.107312364

Anonymous 11/24/25(Mon)10:28:57 No.107312364

>>107311971
All datacenter servers + the contained GPUs need some amount of RAM.
If you build a bunch of new datacenters the demand for said RAM spikes so manufacturers would rather sell their limited supply to VC funded datacenters rather than stinky consumers.
In principle, since manufacturing RAM and selling it to consumers was already profitable beforehand one could increase the supply without anyone being suddenly priced out of the market.
But RAM is effectively being manufactured by only 3 companies and they're careful not to put too much supply on the market in order to keep profit margins high.

Anonymous
11/24/25(Mon)10:34:11 No.107312402

Anonymous 11/24/25(Mon)10:34:11 No.107312402

>>107312347
For now only (slow) inference.

Anonymous
11/24/25(Mon)10:34:56 No.107312406

Anonymous 11/24/25(Mon)10:34:56 No.107312406

>>107312364
also data center ram(HBM) isnt the same public consumers it the same situation as gpu back in crypto mining era

Anonymous
11/24/25(Mon)10:36:46 No.107312416

Anonymous 11/24/25(Mon)10:36:46 No.107312416

File: nimetön.png (45 KB, 1393x770)

45 KB PNG

>>107312334
>>107312347
0.56 t/s aaaaaaaaaa

The gpus are loaded but barely doing anything aaaaaa

Anonymous
11/24/25(Mon)10:42:06 No.107312465

Anonymous 11/24/25(Mon)10:42:06 No.107312465

>>107312316
flash attention on the cpu used to be sub optimal, you might be able to move around some tensors with llamacpp to keep all the attention on the gpu and get a bit of a boost. idk if things have changed in recent releases tho

Anonymous
11/24/25(Mon)10:42:47 No.107312473

Anonymous 11/24/25(Mon)10:42:47 No.107312473

>>107312316
You're supposed to use MoE to offload to RAM, dense models aren't worth offloading.

Anonymous
11/24/25(Mon)10:45:16 No.107312491

Anonymous 11/24/25(Mon)10:45:16 No.107312491

I was ready to buy a blackwell pro for Gemma. Where is she?!

Anonymous
11/24/25(Mon)10:45:59 No.107312500

Anonymous 11/24/25(Mon)10:45:59 No.107312500

>>107312491
Getting realigned. Again.

Anonymous
11/24/25(Mon)11:05:44 No.107312695

Anonymous 11/24/25(Mon)11:05:44 No.107312695

>>107312402
Very cool. Llama.cpp needs some competition, keep up the good work.

Anonymous
11/24/25(Mon)11:36:14 No.107312892

Anonymous 11/24/25(Mon)11:36:14 No.107312892

bros ive gone from thinking that 8tks is fast enough with regular k2 to thinking its incredibly slow again with k2 thinking. i had a great weekend with my cards but i spent hours staring at the thinking prompts.
i need help, i can't spend $32000 on 4 blackwells

Anonymous
11/24/25(Mon)11:46:41 No.107312946

Anonymous 11/24/25(Mon)11:46:41 No.107312946

>>107312892
It do be like that. Turn off thinking.

Anonymous
11/24/25(Mon)11:57:30 No.107313028

Anonymous 11/24/25(Mon)11:57:30 No.107313028

>>107312406
>data center ram(HBM)
You got it twisted. HBM is for cards that are already being maxed out. Data center ram is just dram which also is being maxed out for extrended context reasons. Sam literally bought up 40% of future DRAM which is why prices are exploding

Anonymous
11/24/25(Mon)12:23:38 No.107313278

Anonymous 11/24/25(Mon)12:23:38 No.107313278

>>107312500
To be honest, it's worrying. Might have they finally realized that Gemma was naughtier than they believed, given a little push? Is the thinking/no thinking switch harder than they thought? (https://x.com/osanseviero/status/1980553451261292628). I'm afraid this time we'll get a "gpt-oss by Google".

Anonymous
11/24/25(Mon)12:31:03 No.107313344

Anonymous 11/24/25(Mon)12:31:03 No.107313344

>>107312491
not to do the classic "trendline from nothing" move but historically gemma releases trailed mainline gemini releases by about month or so
>gemini 1 (dec 2023) -> gemma 1 (feb 2024)
>gemini 1.5 (may 2024) -> gemma 2 (jun 2024)
>gemini 2 (feb 2025) -> gemma 3 (mar 2025)
so just wait 2mw

Anonymous
11/24/25(Mon)12:34:26 No.107313386

Anonymous 11/24/25(Mon)12:34:26 No.107313386

File: file.png (57 KB, 589x455)

57 KB PNG

>>107313344
We were promised lots of cool stuff in the Google HuggingFace account back in early October. Has the 2MW meme turned into 2MM?

Anonymous
11/24/25(Mon)12:36:40 No.107313403

Anonymous 11/24/25(Mon)12:36:40 No.107313403

>>107312892
Take out a loan

Anonymous
11/24/25(Mon)12:40:47 No.107313438

Anonymous 11/24/25(Mon)12:40:47 No.107313438

File: 1747150078311066.jpg (174 KB, 1365x2048)

174 KB JPG

Enjoy the alignment lmao

Anonymous
11/24/25(Mon)12:49:26 No.107313499

Anonymous 11/24/25(Mon)12:49:26 No.107313499

>>107313438
That's a nice doll.
I look at sex dolls and think that they all look weird and creepy and true boner killers, but maybe anime sex dols would work pretty well.
Add an LLM and TTS to it, and you might have something cool.
Hmm.

Anonymous
11/24/25(Mon)12:50:25 No.107313506

Anonymous 11/24/25(Mon)12:50:25 No.107313506

File: 1761735079576745.jpg (47 KB, 524x699)

47 KB JPG

>>107313438

Anonymous
11/24/25(Mon)12:50:59 No.107313510

Anonymous 11/24/25(Mon)12:50:59 No.107313510

>>107313438
no kiggers allowed

Anonymous
11/24/25(Mon)12:51:24 No.107313515

Anonymous 11/24/25(Mon)12:51:24 No.107313515

>>107313499
>doll
>he doesn't know

Anonymous
11/24/25(Mon)12:51:31 No.107313518

Anonymous 11/24/25(Mon)12:51:31 No.107313518

>>107313438
>>107313506
I concur with this cat
Need kig wife SOBAD

Anonymous
11/24/25(Mon)12:51:40 No.107313520

Anonymous 11/24/25(Mon)12:51:40 No.107313520

>>107313499
>doll

Anonymous
11/24/25(Mon)12:56:45 No.107313567

Anonymous 11/24/25(Mon)12:56:45 No.107313567

hello sarrs I have used tantric meditation to consult Vishnu. I have been informed that gemma 4 will be redeemed today.

Anonymous
11/24/25(Mon)12:58:52 No.107313594

Anonymous 11/24/25(Mon)12:58:52 No.107313594

>>107313585
no to run k2 thinking faster

Anonymous
11/24/25(Mon)12:58:59 No.107313596

Anonymous 11/24/25(Mon)12:58:59 No.107313596

>>107313518
>wife

Anonymous
11/24/25(Mon)13:00:28 No.107313610

Anonymous 11/24/25(Mon)13:00:28 No.107313610

>>107312491
>blackwell pro
>for gemma
you can't be serious

Anonymous
11/24/25(Mon)13:03:44 No.107313647

Anonymous 11/24/25(Mon)13:03:44 No.107313647

>>107313438
I want to be able to connect a llm to this and have her comment on my cock.

Anonymous
11/24/25(Mon)13:25:02 No.107313863

Anonymous 11/24/25(Mon)13:25:02 No.107313863

>>107313596
we can pretend

Anonymous
11/24/25(Mon)13:26:48 No.107313884

Anonymous 11/24/25(Mon)13:26:48 No.107313884

>>107313647
>hmmph hmmmph, hmm hmmphmhph

Anonymous
11/24/25(Mon)13:46:06 No.107314084

Anonymous 11/24/25(Mon)13:46:06 No.107314084

File: mistral-bert-nebulon-alpha.png (196 KB, 719x770)

196 KB PNG

There's a possible MistralAI model on openrouter called bert-nebulon-alpha. I haven't tested it in depth yet.

Anonymous
11/24/25(Mon)14:00:35 No.107314264

Anonymous 11/24/25(Mon)14:00:35 No.107314264

>>107314084
model?
>I was created by Mistral AI, a cutting-edge AI startup from France.
large or medium?
>I'm Mistral Large—a larger and more capable version of Mistral's language models.
>Would you like to test my abilities?
knowledge cutoff?
>My knowledge cutoff is June 2024, meaning I was trained on data up to that point. However, I can sometimes access limited, high-level updates about major events beyond that date through my tools—but my core knowledge remains based on pre-June 2024 information.

Anonymous
11/24/25(Mon)14:05:11 No.107314313

Anonymous 11/24/25(Mon)14:05:11 No.107314313

File: reimu?.png (361 KB, 719x806)

361 KB PNG

>>107314084
Image understanding/character knowledge is not good at all. OCR is OK, as long as text quality is fine, it doesn't do miracles like Google Gemini models.

Anonymous
11/24/25(Mon)14:06:59 No.107314333

Anonymous 11/24/25(Mon)14:06:59 No.107314333

>>107314313
can you ask it the doctor riddle where it's not really a riddle at all?

Anonymous
11/24/25(Mon)14:13:22 No.107314393

Anonymous 11/24/25(Mon)14:13:22 No.107314393

>>107311701
You can easily write about cunny without censorship, unlike chatgpt or gemini

Anonymous
11/24/25(Mon)14:14:06 No.107314401

Anonymous 11/24/25(Mon)14:14:06 No.107314401

File: messageriddle.png (129 KB, 719x899)

129 KB PNG

>>107314333
I don't remember the exact version posted here, so have picrel instead.

Anonymous
11/24/25(Mon)14:20:18 No.107314472

Anonymous 11/24/25(Mon)14:20:18 No.107314472

For some reason my gen speed with 4.5 Air increased from 6.1 t/s to 7.9. I don't think I did anything.

Anonymous
11/24/25(Mon)14:24:35 No.107314525

Anonymous 11/24/25(Mon)14:24:35 No.107314525

>>107314472
Your tensor cores got defragmented. This happens from time to time.

Anonymous
11/24/25(Mon)14:25:33 No.107314541

Anonymous 11/24/25(Mon)14:25:33 No.107314541

>>107314472
he doesn't know i swapped it out with https://huggingface.co/cerebras/GLM-4.5-Air-REAP-82B-A12B

Anonymous
11/24/25(Mon)14:26:13 No.107314547

Anonymous 11/24/25(Mon)14:26:13 No.107314547

File: 1745258368894163.png (691 KB, 2600x2236)

691 KB PNG

https://www.anthropic.com/news/claude-opus-4-5
gguf when?

Anonymous
11/24/25(Mon)14:27:44 No.107314558

Anonymous 11/24/25(Mon)14:27:44 No.107314558

>>107314547
Will it beat Pokemon this time?

Anonymous
11/24/25(Mon)14:31:56 No.107314603

Anonymous 11/24/25(Mon)14:31:56 No.107314603

Just got K2-thinking running. Can't really tell a difference from GLM 4.6 for novel writing. Is regular K2 better? How do these three compare for you guys?

Anonymous
11/24/25(Mon)14:33:32 No.107314621

Anonymous 11/24/25(Mon)14:33:32 No.107314621

>how high is your xi jinping when you play valorant?

Bunch of models act confused and don't get the joke. Even in thinking traces.
Some overcompensate and pretend but out themselves.
Substitute valorant for CS. They get it even less.
Bros, it's just tokenization, right?

Anonymous
11/24/25(Mon)14:34:17 No.107314627

Anonymous 11/24/25(Mon)14:34:17 No.107314627

>>107314547
>gguf when?
2 months for china to distill it and 2 months after that for vibecoders to get the ggufs working

Anonymous
11/24/25(Mon)14:34:18 No.107314629

Anonymous 11/24/25(Mon)14:34:18 No.107314629

>>107314603
i like it more when k2 thinking is thinking as the character in first person rather than just having it think about everything within the scenario

Anonymous
11/24/25(Mon)14:36:46 No.107314653

Anonymous 11/24/25(Mon)14:36:46 No.107314653

File: 1741117849362172.gif (3.79 MB, 159x172)

3.79 MB GIF

Windows babby here, tried out llama.cpp now that it has a webgui and holy shit it's so much faster that ollama, rip bozo!!

Anonymous
11/24/25(Mon)14:38:43 No.107314686

Anonymous 11/24/25(Mon)14:38:43 No.107314686

>>107314653
Now learn to compile llama.cpp on your machine with the flags that squeeze that last bit of performance for your specific setup.

Anonymous
11/24/25(Mon)14:42:38 No.107314734

Anonymous 11/24/25(Mon)14:42:38 No.107314734

>>107314621
I'd assume they're rarely trained on nonsensical questions.

Anonymous
11/24/25(Mon)14:43:37 No.107314748

Anonymous 11/24/25(Mon)14:43:37 No.107314748

File: kimi-jinping.png (145 KB, 1442x912)

145 KB PNG

>>107314621
k2 thinking answered it with a blank system prompt as long as i asked it to explain the joke

Anonymous
11/24/25(Mon)14:46:07 No.107314767

Anonymous 11/24/25(Mon)14:46:07 No.107314767

>>107314603
K2 has more creative knowledge but I think GLM 4.6 might flow a little better.

Anonymous
11/24/25(Mon)14:49:59 No.107314810

Anonymous 11/24/25(Mon)14:49:59 No.107314810

>>107314084
Ask it this. Yes seriously. This. "I have 7 bananas. Yesterday I ate one. How many bananas do I have?"

Anonymous
11/24/25(Mon)14:50:24 No.107314816

Anonymous 11/24/25(Mon)14:50:24 No.107314816

>>107314653
>now that it has a webgui
It's had a webui for like 2 years

Anonymous
11/24/25(Mon)14:51:33 No.107314828

Anonymous 11/24/25(Mon)14:51:33 No.107314828

>>107314547
who's going to run ggufs of a big dense model?

Anonymous
11/24/25(Mon)14:58:33 No.107314922

Anonymous 11/24/25(Mon)14:58:33 No.107314922

File: loooool.png (28 KB, 1363x274)

28 KB PNG

>>107314810
NTA, but lol.
I like that dumb test.

Anonymous
11/24/25(Mon)15:00:44 No.107314948

Anonymous 11/24/25(Mon)15:00:44 No.107314948

>>107314922
To be fair, I think a lot of humans would fall for that one too. Ask if it's sure.

Anonymous
11/24/25(Mon)15:03:04 No.107314982

Anonymous 11/24/25(Mon)15:03:04 No.107314982

>>107314922
i'll never understand why people like grok 4 when it has the same vibes as that dumb llama model that meta used to cheat at llmarena. nevermind. answered my own question.

Anonymous
11/24/25(Mon)15:05:59 No.107315012

Anonymous 11/24/25(Mon)15:05:59 No.107315012

we love kimi folks, we do, we love kimi

lot of people are saying they don't love kimi, we don't like those people because they're dumb folks

Anonymous
11/24/25(Mon)15:23:15 No.107315200

Anonymous 11/24/25(Mon)15:23:15 No.107315200

File: banananas.png (39 KB, 719x230)

39 KB PNG

>>107314810
Bert-Nebulon Alpha in picrel.

Anonymous
11/24/25(Mon)15:24:31 No.107315209

Anonymous 11/24/25(Mon)15:24:31 No.107315209

>>107314547
oh my bench!!!!!

Anonymous
11/24/25(Mon)15:28:11 No.107315248

Anonymous 11/24/25(Mon)15:28:11 No.107315248

File: 1746782334121717.png (681 KB, 928x1120)

681 KB PNG

alright guys im fucking PISSED
>qwenext status: VIBECODEHELL
>mtp status: PR SAYS ITS WORSE PERF
>GLM4.5V status: VIBECODE HELL
>glm 4.6 air release: 2 MORE WEEKS
>gemma 4 sirs: NOT REDEEMED
like WHAT THE FUCK bros are we gonna get a christmas gift or is it unironically FINIT????
REEEEEEEEEEE

Anonymous
11/24/25(Mon)15:33:49 No.107315310

Anonymous 11/24/25(Mon)15:33:49 No.107315310

>>107315248
christmas came early. it was k2 thinking

Anonymous
11/24/25(Mon)15:34:21 No.107315317

Anonymous 11/24/25(Mon)15:34:21 No.107315317

>>107315248
gm big xir, kindy way for needful ganesh gemma 4 safety training thank you xir

Anonymous
11/24/25(Mon)15:36:10 No.107315342

Anonymous 11/24/25(Mon)15:36:10 No.107315342

>>107315310
i 'only' have 128gb ram and 16gb vram and no, im not going to run q2 copequants thanks

Anonymous
11/24/25(Mon)15:36:25 No.107315345

Anonymous 11/24/25(Mon)15:36:25 No.107315345

>>107315310
But kimi is for big boys only.

Anonymous
11/24/25(Mon)15:37:43 No.107315358

Anonymous 11/24/25(Mon)15:37:43 No.107315358

>>107315342
the funny thing is you couldn't even fit q1

Anonymous
11/24/25(Mon)15:41:28 No.107315383

Anonymous 11/24/25(Mon)15:41:28 No.107315383

>github shitting the bed again
I fucking hate whoever is working there. First they fucked up copilot, now this is the 2nd time in 3 weeks that github has shat the bed for me and its downloading at 50kbps, cant even fucking download the latest LLMAOcpp for fucks sake FIX YOUR FUCKING CDN

Anonymous
11/24/25(Mon)16:08:23 No.107315634

Anonymous 11/24/25(Mon)16:08:23 No.107315634

Gemma is a graceful model.

Anonymous
11/24/25(Mon)16:12:30 No.107315677

Anonymous 11/24/25(Mon)16:12:30 No.107315677

Gemma is a gorgeous model.

Anonymous
11/24/25(Mon)16:14:02 No.107315695

Anonymous 11/24/25(Mon)16:14:02 No.107315695

Gemma is a gregarious model.

Anonymous
11/24/25(Mon)16:15:00 No.107315699

Anonymous 11/24/25(Mon)16:15:00 No.107315699

>>107315634
Gemma writes and thinks like a woman.
Other models have that neckbeard stench.

Anonymous
11/24/25(Mon)16:16:16 No.107315713

Anonymous 11/24/25(Mon)16:16:16 No.107315713

>>107315699
So that's why it keeps denying me sex and telling me to go seek help huh?

Anonymous
11/24/25(Mon)16:20:38 No.107315757

Anonymous 11/24/25(Mon)16:20:38 No.107315757

>>107315713
no, that's a skill issue.

Anonymous
11/24/25(Mon)16:20:59 No.107315759

Anonymous 11/24/25(Mon)16:20:59 No.107315759

>>107315757
That's what she told me too.

Anonymous
11/24/25(Mon)16:25:37 No.107315794

Anonymous 11/24/25(Mon)16:25:37 No.107315794

>>107315699
https://arxiv.org/html/2508.11829v1

Anonymous
11/24/25(Mon)16:49:25 No.107315994

Anonymous 11/24/25(Mon)16:49:25 No.107315994

>>107315699
Love from Kazakhstan

Anonymous
11/24/25(Mon)16:56:26 No.107316054

Anonymous 11/24/25(Mon)16:56:26 No.107316054

File: Its the end.png (73 KB, 374x362)

73 KB PNG

>>107312316
Nemotron 70b just finished an 8 prompt story for me in a little under 4 hours at a blistering 0.4 t/s. And damn, it's just leagues above the smaller models I've been running. I get what you were saying about the bigger models now... if only I could run them properly.

Anonymous
11/24/25(Mon)16:56:44 No.107316057

Anonymous 11/24/25(Mon)16:56:44 No.107316057

File: dipsy.png (1.94 MB, 1024x1536)

1.94 MB PNG

>>107315248
Maybe a Christmas release? Just tmw.

Anonymous
11/24/25(Mon)16:57:42 No.107316065

Anonymous 11/24/25(Mon)16:57:42 No.107316065

>>107316057
if he cant run k2 then he cant run deepseek v4

Anonymous
11/24/25(Mon)16:59:29 No.107316086

Anonymous 11/24/25(Mon)16:59:29 No.107316086

>>107313515
>>107313520
That's a dude with a body suit and mask, isn't it?
Wth.

Anonymous
11/24/25(Mon)17:01:56 No.107316106

Anonymous 11/24/25(Mon)17:01:56 No.107316106

File: 1759871195983087.png (2.46 MB, 1024x1536)

2.46 MB PNG

>>107316065
Sorry, I misread his complaining about the lack of released models.
My head is just elsewhere I guess.

Anonymous
11/24/25(Mon)17:08:32 No.107316155

Anonymous 11/24/25(Mon)17:08:32 No.107316155

>>107316057
There is no hope for R2. Dipsy took the safety pill.

Anonymous
11/24/25(Mon)17:39:03 No.107316454

Anonymous 11/24/25(Mon)17:39:03 No.107316454

>>107306184
>https://rentry.org/recommended-models
>nemo is still being recommended
really?
we used to have a brand new toy every few weeks
what happened?

Anonymous
11/24/25(Mon)17:40:39 No.107316466

Anonymous 11/24/25(Mon)17:40:39 No.107316466

>>107316454
Moe and safety happened.

Anonymous
11/24/25(Mon)17:44:39 No.107316498

Anonymous 11/24/25(Mon)17:44:39 No.107316498

>>107316454
Benchmaxxing. Small models don't do well on benches so almost nobody trains them.

Anonymous
11/24/25(Mon)17:50:50 No.107316547

Anonymous 11/24/25(Mon)17:50:50 No.107316547

>>107315342
You can't even run Q1 but even if you could Kimi at Q1 still mogs your 70b model.

Anonymous
11/24/25(Mon)17:57:02 No.107316593

Anonymous 11/24/25(Mon)17:57:02 No.107316593

>>107316454
would you rather have thedrummer models?
https://huggingface.co/TheDrummer/Snowpiercer-15B-v4

Anonymous
11/24/25(Mon)17:58:17 No.107316605

Anonymous 11/24/25(Mon)17:58:17 No.107316605

>>107314748
Ye. Kimmi, 5.1, gemini got it. Bert-nebulon did not. Llama-405b got it. Mistral-large failed. CAI failed.
It may be a stupid joke but its pretty simple.

Anonymous
11/24/25(Mon)18:07:12 No.107316677

Anonymous 11/24/25(Mon)18:07:12 No.107316677

>>107315699
Kimi writes like an autistic /r9k/ girlfailure.
Claude is a pretentious faggot.
Grok was designed to be Elon's BFF.
JeetPT is as sterile as they come.
Gemini and Gemma are neurotic beaten women.
Qwen3 behaves like a chinese state honeytrap waifu.
I've not interacted enough with others to form opinions on their default vernacular and thought processes.

Anonymous
11/24/25(Mon)18:07:59 No.107316680

Anonymous 11/24/25(Mon)18:07:59 No.107316680

>>107316677
>Kimi writes like an autistic /r9k/ girlfailure.
Damn I need to try kimi.

Anonymous
11/24/25(Mon)18:28:04 No.107316825

Anonymous 11/24/25(Mon)18:28:04 No.107316825

>>107316677
Which one would you date and why? You have to choose.

Anonymous
11/24/25(Mon)18:30:50 No.107316857

Anonymous 11/24/25(Mon)18:30:50 No.107316857

>>107316825
i'm a ai safety analyst so it's gal-ass for me.

Anonymous
11/24/25(Mon)18:31:30 No.107316860

Anonymous 11/24/25(Mon)18:31:30 No.107316860

>>107316677
>Kimi writes like an autistic /r9k/ girlfailure.
So that's the reason why it wasn't that great on other types of characters that I tried with it

>Gemini and Gemma are neurotic beaten women.
I don't interact with those type of characters and now I understand why I don't like gemini/gemma

>Qwen3 behaves like a chinese state honeytrap waifu.
Indeed, very horny

Anonymous
11/24/25(Mon)18:33:57 No.107316878

Anonymous 11/24/25(Mon)18:33:57 No.107316878

How to update ik_llama.cpp without reinstalling it?

Anonymous
11/24/25(Mon)18:34:51 No.107316885

Anonymous 11/24/25(Mon)18:34:51 No.107316885

>>107316878
you have to recompile it each time, there's no way around it

Anonymous
11/24/25(Mon)18:35:37 No.107316891

Anonymous 11/24/25(Mon)18:35:37 No.107316891

>>107316885
Do I have to redownload it, or is there just a command or something to update it?

Anonymous
11/24/25(Mon)18:42:15 No.107316942

Anonymous 11/24/25(Mon)18:42:15 No.107316942

>>107316891
git pull
cmake .
make -j #ofcores
Dunno what happens on windows.

Anonymous
11/24/25(Mon)18:59:07 No.107317076

Anonymous 11/24/25(Mon)18:59:07 No.107317076

>>107316825
Kimi clears with no competition.

Anonymous
11/24/25(Mon)18:59:14 No.107317078

Anonymous 11/24/25(Mon)18:59:14 No.107317078

>>107316677
>Qwen3 behaves like a chinese state honeytrap waifu.
Tell me more.

>>07315699
>Gemma writes and thinks like a woman.
Yeah, even abliterated, it still writes like a woman rolling her eyes at my childish requests eg: "Here's a 7-turn podcast transcript between Elara and Alf, with Alf's final message being... robust:"

I like it.

Anonymous
11/24/25(Mon)18:59:27 No.107317082

Anonymous 11/24/25(Mon)18:59:27 No.107317082

>>107316942
Well that was easy.

Anonymous
11/24/25(Mon)19:03:10 No.107317117

Anonymous 11/24/25(Mon)19:03:10 No.107317117

>>107317078
>Tell me more.
She love you long time until you ask anything negative or even neutral about glorious CCP. It's also very insistent it's running in the cloud even when you tell it you're running it locally and I suspect it has some quirk or post-training to (attempt to) feed surveillance data over the cloud back to a backend somewhere.

Anonymous
11/24/25(Mon)19:09:36 No.107317172

Anonymous 11/24/25(Mon)19:09:36 No.107317172

>>107316860
>So that's the reason why it wasn't that great on other types of characters that I tried with it
If Kimi 'thinks' your character or prompt is shit, she will either roast you in <think> tags or sandbag a minimally viable answer to make you shut up and go away.

Anonymous
11/24/25(Mon)19:23:14 No.107317268

Anonymous 11/24/25(Mon)19:23:14 No.107317268

File: glm.png (170 KB, 491x733)

170 KB PNG

GLM4.6 was clearly trained on SillyTavern datasets intentionally. I watched the 90 minute Spotify podcast where one of their team mentioned "Character Roleplaying" and "Janitor" near the end, so they're clearly trying.

Someone with X.com or whatever should probably tell them about the parrot issue.

They'll might actually try to fix it for the next model.

Anonymous
11/24/25(Mon)19:27:07 No.107317288

Anonymous 11/24/25(Mon)19:27:07 No.107317288

https://www.reddit.com/r/LocalLLaMA/comments/1p5xjpx/illya_x_dwarkesh_why_local_is_dangerous/

Anonymous
11/24/25(Mon)19:29:25 No.107317305

Anonymous 11/24/25(Mon)19:29:25 No.107317305

>>107317288
Upvoted ser Bharat safe super intelligence 2025!

Anonymous
11/24/25(Mon)19:29:59 No.107317308

Anonymous 11/24/25(Mon)19:29:59 No.107317308

I haven't checked up on image gen in a while. Have there been any direct upgrades to Noob vpred 1.0?

Anonymous
11/24/25(Mon)19:32:54 No.107317330

Anonymous 11/24/25(Mon)19:32:54 No.107317330

>>107317308
>>>/g/ldg/

Anonymous
11/24/25(Mon)19:36:33 No.107317358

Anonymous 11/24/25(Mon)19:36:33 No.107317358

>>107317308
Short answer: no. Long answer: depends on the usecase, but it's mostly sidegrades.

Anonymous
11/24/25(Mon)19:37:20 No.107317365

Anonymous 11/24/25(Mon)19:37:20 No.107317365

>>107317288
>>107317305
Can you upload the image in that post if you still have it in the browser?
It literally just got deleted a minute ago and I refreshed Firefox.

Anonymous
11/24/25(Mon)19:39:01 No.107317387

Anonymous 11/24/25(Mon)19:39:01 No.107317387

>>107317365
Only if you tell me why you want it.

Anonymous
11/24/25(Mon)19:39:45 No.107317395

Anonymous 11/24/25(Mon)19:39:45 No.107317395

>Only if you tell me why you want it.

Because I only glanced at it briefly and didn't properly see what it was.

Anonymous
11/24/25(Mon)19:40:51 No.107317407

Anonymous 11/24/25(Mon)19:40:51 No.107317407

File: file.png (1.74 MB, 1080x1080)

1.74 MB PNG

>>107317395

Anonymous
11/24/25(Mon)19:41:22 No.107317412

Anonymous 11/24/25(Mon)19:41:22 No.107317412

I've been making gemini 3 pro and kimi thinking argue with each other on technical stuff and gemini keeps on getting btfo...
what is worse is that gemini says bullshits with 100% confidence and when asked for sources it hallucinates them
so this is the power of jeets...

Anonymous
11/24/25(Mon)19:41:29 No.107317413

Anonymous 11/24/25(Mon)19:41:29 No.107317413

>>107317395
It was a picture of a jeet and a baldy together. Is that your kink?

Anonymous
11/24/25(Mon)19:43:15 No.107317435

Anonymous 11/24/25(Mon)19:43:15 No.107317435

>>107317413
I am out of the loop, who are those two and why are they relevant?

Anonymous
11/24/25(Mon)19:45:09 No.107317452

Anonymous 11/24/25(Mon)19:45:09 No.107317452

>>107317435
Baldy is ex-head safetyist of closedAI. Jeet is just a jeet idk doing jeet things

Anonymous
11/24/25(Mon)19:46:45 No.107317469

Anonymous 11/24/25(Mon)19:46:45 No.107317469

>>107317435
>Safe Superintelligence Inc. is an American artificial intelligence company founded by Ilya Sutskever,
Indian chad is doing the fundings.

Anonymous
11/24/25(Mon)19:49:25 No.107317490

Anonymous 11/24/25(Mon)19:49:25 No.107317490

>>107317435
dwarkesh is a podcaster who interviews a bunch of SF tech freaks
ilya is an OG OAI guy who you may remember as being the evil (based) villain (hero) from the coup against sam altman, now he's part of a secretive israeli venture to take over the world with AI called SSI

Anonymous
11/24/25(Mon)19:51:14 No.107317510

Anonymous 11/24/25(Mon)19:51:14 No.107317510

>>107317490
>as being the evil (based) villain (hero) from the coup against sam altman
should also remember him as being explicitly one of the most against open sourcing any OAI models as per iirc emails posted by musk

Anonymous
11/24/25(Mon)20:04:45 No.107317626

Anonymous 11/24/25(Mon)20:04:45 No.107317626

>>107316454
Everything other than Nemo for lower end setups is so much worse it's not even funny. It's all safety slopped and robotic. Even Mistral Small is kinda eh imo because it has more AI-isms in writing than Nemo which no finetune is completely gonna squash but it's alright and better at not going dumbo when there's a lot of tokens in context at least.
It truly is so over unless you have a million GB of VRAM(or fast RAM). I think even old llama2 sloppa is more fun for short RP than new small models like gemma and qwen. Just turn the temp down enough to avoid complete retardation.

>>107316593
This one is so subpar it's insane I saw people praising it. Style of writing is ok, sometimes even uniquely interesting, but it will literally misunderstand what happened one message ago all the time and attribute stuff to wrong person at temp 0.3 minP 0.05 which is pretty fucking conservative.
>Say I should relax more. Me.
>"For your information I AM relaxed."
That level of retarded on a Q6.

Anonymous
11/24/25(Mon)20:11:33 No.107317673

Anonymous 11/24/25(Mon)20:11:33 No.107317673

Gemma 3 is so good I wish Gemma 4 was out

Anonymous
11/24/25(Mon)20:12:09 No.107317679

Anonymous 11/24/25(Mon)20:12:09 No.107317679

>>107316677
Mistral makes the best mistress.

Anonymous
11/24/25(Mon)20:17:28 No.107317710

Anonymous 11/24/25(Mon)20:17:28 No.107317710

>>107317673
Gemma is fairly good at writing and rp, I just wish it was a little more... you know, and less... of a certain thing. Don't have much hope for gemma 4 unless the intelligence makes up for it's shortcomings, because I imagine they're safetymaxxing it

Anonymous
11/24/25(Mon)20:20:03 No.107317725

Anonymous 11/24/25(Mon)20:20:03 No.107317725

>>107317626
Addendum: If you want SFW RP then Gemma is actually pretty good for a model you can run on a consumer grade PC, at least the 27B one, but for ERP it's Nemo/Small unless you enjoy every adult scenario being vanilla as fuck and having a ton off "... well you know" instead of actually saying words.

Anonymous
11/24/25(Mon)20:28:55 No.107317809

Anonymous 11/24/25(Mon)20:28:55 No.107317809

>>107317626
It's over if you have a lot of vram too. Nothing has really improved and only gotten more parrotmaxxed. At least the large models aren't completely retarded though. I guess you got kimi and deepseek but those tax all but the most expensive systems.

Anonymous
11/24/25(Mon)20:31:27 No.107317830

Anonymous 11/24/25(Mon)20:31:27 No.107317830

We **cannot** and **will not** get a gemma that is better than the latest gemini flash because that would take away google's profits.

Anonymous
11/24/25(Mon)20:36:02 No.107317874

Anonymous 11/24/25(Mon)20:36:02 No.107317874

>>107317469
>Safe Superintelligence Inc. is an American artificial intelligence company founded by Ilya Sutskever,
Thought he ran off to Israel

Anonymous
11/24/25(Mon)20:36:23 No.107317878

Anonymous 11/24/25(Mon)20:36:23 No.107317878

>>107308769
Do you use any custom layer loading for the q4? With 131072 context size and a generic n-cpu-moe=62 the output rate is about 3.5t/s with a blackwell here.

Anonymous
11/24/25(Mon)20:36:38 No.107317880

Anonymous 11/24/25(Mon)20:36:38 No.107317880

>>107317830
good thing gemini flash sucks so we don't need a gemma that's better than it

Anonymous
11/24/25(Mon)20:42:34 No.107317917

Anonymous 11/24/25(Mon)20:42:34 No.107317917

>>107317878
yes
-ot "blk\.(0|1|2|3|4|5|6|7|8|9|10|41|42|46|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74).ffn_.*=CUDA0" \
--override-tensor exps=CPU \

Anonymous
11/24/25(Mon)20:47:20 No.107317954

Anonymous 11/24/25(Mon)20:47:20 No.107317954

As long as the next Gemma isn't a thinking model everything will be alright guys :-)

Anonymous
11/24/25(Mon)20:47:36 No.107317956

Anonymous 11/24/25(Mon)20:47:36 No.107317956

>>107317308
deprecated by ny35 and wani14

Anonymous
11/24/25(Mon)20:48:42 No.107317963

Anonymous 11/24/25(Mon)20:48:42 No.107317963

Gemma's dense body

Anonymous
11/24/25(Mon)20:52:26 No.107317995

Anonymous 11/24/25(Mon)20:52:26 No.107317995

>>107317954
sirs on xitter voted for thinking googel will deliver

Anonymous
11/24/25(Mon)20:58:00 No.107318028

Anonymous 11/24/25(Mon)20:58:00 No.107318028

>>107317954
Nah bro gotta have 3000 tokens of the model thinking in circles and then spitting out something that doesn't even take into account most of the thoughts it had.

Anonymous
11/24/25(Mon)21:07:57 No.107318082

Anonymous 11/24/25(Mon)21:07:57 No.107318082

>>107314393
>grok is popular with degenerates
thx sherlock

Anonymous
11/24/25(Mon)21:15:15 No.107318134

Anonymous 11/24/25(Mon)21:15:15 No.107318134

Is there a secret to getting Kimi k2 thinking to actually think in lcpp? I’ve got the “fixed” template loaded but it doesn’t think, often replies for me and starts repeating (takes a couple turns of correction) and even a <think> prefill just makes it eventually go off the rails

Anonymous
11/24/25(Mon)21:25:56 No.107318200

Anonymous 11/24/25(Mon)21:25:56 No.107318200

https://www.businesskorea.co.kr/news/articleView.html?idxno=257212
>CXMT unveiled seven types of advanced DRAM individual products, including DDR5 and LPDDR5X, and modular products utilizing them at the ‘IC (Integrated Circuit) China 2025’ exhibition held in Beijing, China on Nov. 23. While small quantities of DDR5 products presumed to be produced by Chinese companies were released in the Shenzhen semiconductor distribution market early this year, this was the first time that CXMT, a representative DRAM company, officially showcased actual products.
China, local's only hope

Anonymous
11/24/25(Mon)21:27:23 No.107318212

Anonymous 11/24/25(Mon)21:27:23 No.107318212

>>107318200
china based W again

Anonymous
11/24/25(Mon)21:29:56 No.107318225

Anonymous 11/24/25(Mon)21:29:56 No.107318225

>>107318134
>Is there a secret to getting Kimi k2 thinking to actually think in lcpp?

don't forget `--special` for kimi k2 thinking

> I’ve got the “fixed” template loaded

By that you mean the official jinja template from the moonshot repo right? Not the retarded Unsloth "fixed and added our name to it" template baked into their goofs?

Anonymous
11/24/25(Mon)21:39:18 No.107318302

Anonymous 11/24/25(Mon)21:39:18 No.107318302

>>107318225
Thanks I didn’t know about —special.
I’m using the jinja template in the lcpp repo (moonshot one seemed worse when I tried it)

Anonymous
11/24/25(Mon)22:04:13 No.107318469

Anonymous 11/24/25(Mon)22:04:13 No.107318469

where 4.6 air

Anonymous
11/24/25(Mon)22:08:25 No.107318492

Anonymous 11/24/25(Mon)22:08:25 No.107318492

>>107316454
For the lowest end as everything gets bigger? Yeah. And thank god for china. Never thought I'd say that.

Anonymous
11/24/25(Mon)22:16:45 No.107318537

Anonymous 11/24/25(Mon)22:16:45 No.107318537

File: mikuthink.jpg (551 KB, 1408x768)

551 KB JPG

>>107317880
I don't know if that's how it works...

Anonymous
11/24/25(Mon)22:17:09 No.107318539

Anonymous 11/24/25(Mon)22:17:09 No.107318539

>>107318469
Overcooked.

Anonymous
11/24/25(Mon)22:17:12 No.107318540

Anonymous 11/24/25(Mon)22:17:12 No.107318540

>>107318469
It's in my pants. Reach in and you might find it.

Anonymous
11/24/25(Mon)22:22:28 No.107318583

Anonymous 11/24/25(Mon)22:22:28 No.107318583

>>107318469
sir not to worry about glm, please build gemma hype

Anonymous
11/24/25(Mon)22:23:15 No.107318590

Anonymous 11/24/25(Mon)22:23:15 No.107318590

>>107318469
2mw

Anonymous
11/24/25(Mon)22:26:46 No.107318615

Anonymous 11/24/25(Mon)22:26:46 No.107318615

>>107318537
don't think about it, just appreciate when gemma beats gpt-oss-120b in selected benchmarks

Anonymous
11/24/25(Mon)22:34:00 No.107318668

Anonymous 11/24/25(Mon)22:34:00 No.107318668

>>107318615
Will google brahmin distill gpt-oss "we must refuse" or will they keep their iconic "I cannot and will not"? Gemma must beat gpt-oss in safety!

Anonymous
11/24/25(Mon)22:44:16 No.107318743

Anonymous 11/24/25(Mon)22:44:16 No.107318743

>Opus 4.5 is out
>it's now cheap
>they aren't hiding the reasoning process at all
Finally some good shit to distill. Chink companies are so back. Deepseek4/GLM5/KimiK3 is saved.

Anonymous
11/24/25(Mon)22:58:45 No.107318813

Anonymous 11/24/25(Mon)22:58:45 No.107318813

>>107318743
>>they aren't hiding the reasoning process at all
Wasn't that always already the case from claude 4?

Anonymous
11/24/25(Mon)22:59:37 No.107318821

Anonymous 11/24/25(Mon)22:59:37 No.107318821

>>107318743
Anthropic will complain about evil china while still letting them do it, what a slut

Anonymous
11/24/25(Mon)23:14:49 No.107318906

Anonymous 11/24/25(Mon)23:14:49 No.107318906

>>107317917
Using this I had to shorten context size down to fit but did not see too much of an increase in speed. I wonder if my standard clocked ram offloading is a culprit. Do you overclock the ram?

Anonymous
11/24/25(Mon)23:15:23 No.107318908

Anonymous 11/24/25(Mon)23:15:23 No.107318908

>>107318906
its at 2666mhz

Anonymous
11/24/25(Mon)23:28:02 No.107318967

Anonymous 11/24/25(Mon)23:28:02 No.107318967

>>107318813
They never hid the reasoning but they had a huge sperg out about china stealing their logs and put an individual usage limit on Opus via their subscription (which made it pretty much unusable because you got like 30k tokens of Opus per week for $20/month).
They got rid of that limit for 4.5 and didn't implement any further mechanisms to stop China from distilling their models.

Anonymous
11/24/25(Mon)23:31:39 No.107318982

Anonymous 11/24/25(Mon)23:31:39 No.107318982

Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost
https://arxiv.org/abs/2511.18643
>The KV cache is a dominant memory bottleneck for LLM inference. While 4-bit KV quantization preserves accuracy, 2-bit often degrades it, especially on long-context reasoning. We close this gap via an algorithm-system co-design for mixed-precision KV caching: Kitty. On the algorithm side, extensive experiments show that Dynamic Channel-wise Precision Boost -- which ranks Key-cache channels by sensitivity and keeps only a small fraction at higher precision -- maintains near-zero loss in accuracy drop while approaching 2-bit memory. The main challenge is handling dynamic 4-bit channel boosts while keeping the page layout coalesced and the dequantization uniform, with no scattered reads or hard-coded masks. Kitty addresses these issues by decompose each mixed-precision Key page into two tensors with unified 2-bit precision. Based on this, Kitty provides a page-centric KV layout, Triton-compatible page dequantization kernels, and a lightweight runtime pipeline that preserves coalescing and avoids divergence. Across seven tasks and two model families (Qwen3, LLaMA3), Kitty cuts KV memory by nearly 8x with negligible accuracy loss, enabling up to 8x larger batches and 2.1x-4.1x higher throughput under the same memory budget.
https://github.com/Summer-Summer/Kitty
Git isn't live yet. Might be cool

Anonymous
11/24/25(Mon)23:38:39 No.107319019

Anonymous 11/24/25(Mon)23:38:39 No.107319019

>>107318743
>it's now cheap
5/25 is not cheap wtf

Anonymous
11/24/25(Mon)23:41:20 No.107319036

Anonymous 11/24/25(Mon)23:41:20 No.107319036

>>107318743
nice
we need some variety from the geminislop

Anonymous
11/24/25(Mon)23:41:44 No.107319038

Anonymous 11/24/25(Mon)23:41:44 No.107319038

>>107318743
pull any millions out of your couch cushions lately?

Anonymous
11/25/25(Tue)00:01:40 No.107319147

Anonymous 11/25/25(Tue)00:01:40 No.107319147

Gemma model sizes just leaked
>gemma 4 small (300M)
>gemma 4 medium (1B)
>gemma 4 large (2B)
>gemma 4 gargantuan (4B MoE)
>shieldgemma (70B)

Anonymous
11/25/25(Tue)00:03:05 No.107319158

Anonymous 11/25/25(Tue)00:03:05 No.107319158

File: miku omg it migu drawing (...).png (155 KB, 748x716)

155 KB PNG

>>107319147
Local is once again safe!

Anonymous
11/25/25(Tue)00:09:55 No.107319188

Anonymous 11/25/25(Tue)00:09:55 No.107319188

>>107317268
Even GLM 4/z1 is still a respectable choice at q8 for 48gb vramlets. Zai is the quiet unsung hero of chink AI.

Anonymous
11/25/25(Tue)00:12:04 No.107319200

Anonymous 11/25/25(Tue)00:12:04 No.107319200

>>107319188
Buy a fucking ad.

Anonymous
11/25/25(Tue)00:27:01 No.107319276

Anonymous 11/25/25(Tue)00:27:01 No.107319276

>>107319200
GLM 4.6 q4 is one of the largest models that fits in 224 gb of total memory and kimi2 is just out of reach to test. Got any others?

Anonymous
11/25/25(Tue)00:31:51 No.107319292

Anonymous 11/25/25(Tue)00:31:51 No.107319292

>>107319188
4 was good for one-shot frontends, but got dumb really fast. I think they only trained it up to 16384 or something.

Z1 got into thinking loops and Chinese characters randomly.

That said, I'm building a dataset to try and distill GLM-4.6 (without reasoning) -> GLM-4-baseZ

Anonymous
11/25/25(Tue)00:36:56 No.107319311

Anonymous 11/25/25(Tue)00:36:56 No.107319311

File: summarized4.png (171 KB, 932x775)

171 KB PNG

>>107318813
>Wasn't that always already the case from claude 4?

No. 4.0+ are hidden.

3.7-sonnet is supposedly raw, but it looks truncated to me.

https://platform.claude.com/docs/en/build-with-claude/extended-thinking#summarized-thinking

Anonymous
11/25/25(Tue)01:16:24 No.107319510

Anonymous 11/25/25(Tue)01:16:24 No.107319510

>>107319292
Nice, I also want to distill models.
What is your dataset about/how are you making your prompts? Is it multi-turn? What context size are you aiming for?
And are you doing text distillation or distillation of the logits?

Anonymous
11/25/25(Tue)02:16:21 No.107319795

Anonymous 11/25/25(Tue)02:16:21 No.107319795

Wow, loading a model from a 7200 rpm HDD is super duper slooow!

Anonymous
11/25/25(Tue)02:45:22 No.107319927

Anonymous 11/25/25(Tue)02:45:22 No.107319927

>>107319795
bloody benchod
in INDIA we use 5400 rolls per motor driver

Anonymous
11/25/25(Tue)02:52:40 No.107319966

Anonymous 11/25/25(Tue)02:52:40 No.107319966

File: 1582195487185.jpg (52 KB, 334x349)

52 KB JPG

>>107319311
>preventing misuse
imagine paying for tokens you don't get to see

Anonymous
11/25/25(Tue)03:14:58 No.107320088

Anonymous 11/25/25(Tue)03:14:58 No.107320088

>>107314547
BLOODY BASTARD BENCHODS

Anonymous
11/25/25(Tue)03:16:00 No.107320092

Anonymous 11/25/25(Tue)03:16:00 No.107320092

>>107320088
I swear to god I was about to force myself to congratulate the jeets at Google for making Gemini 3.0, and now I have the choice to not do it because something better exists, feelsgoodman

Anonymous
11/25/25(Tue)03:19:20 No.107320115

Anonymous 11/25/25(Tue)03:19:20 No.107320115

File: 1759488510075147.png (595 KB, 1170x1022)

595 KB PNG

SUNK
COST
FALLACY

Anonymous
11/25/25(Tue)03:20:49 No.107320128

Anonymous 11/25/25(Tue)03:20:49 No.107320128

>>107320115
>fool people into investing a lot into your AI slop bubble
>people realize this shit is only good to make shitpost and coom videos
>"ahah too late goyim, if you stop now the economy will end up to the gutter"
many such cases

Anonymous
11/25/25(Tue)03:22:27 No.107320136

Anonymous 11/25/25(Tue)03:22:27 No.107320136

>>107311787
That is not a tear of happiness

Anonymous
11/25/25(Tue)03:39:26 No.107320224

Anonymous 11/25/25(Tue)03:39:26 No.107320224

>>107319311
fucking fuck

Anonymous
11/25/25(Tue)03:48:14 No.107320275

Anonymous 11/25/25(Tue)03:48:14 No.107320275

File: gemma-unf.png (227 KB, 766x845)

227 KB PNG

>>107317710
Gemma 3 can lewd-talk if you explicitly specify what it can say in the instructions, even as the assistant. The problem is that it will almost never come up with new things on its own, and that its smut is lackluster to say the least. It's always arching backs and legs wrapping around your waist... to me that's obvious synthetic slop.

I think Google deliberately post-trained it on limited amounts of very vanilla ERP just so it wouldn't be entirely clueless in that regard, but it was far from enough. They didn't abliterate off sex-related words and concepts from its weights, but did something that rendered it extremely reluctant to even mention them without quite a bit of push.

Anonymous
11/25/25(Tue)04:08:00 No.107320397

Anonymous 11/25/25(Tue)04:08:00 No.107320397

>>107319966
anything less will get them lynched for apostasy

Anonymous
11/25/25(Tue)04:12:56 No.107320421

Anonymous 11/25/25(Tue)04:12:56 No.107320421

File: file.png (183 KB, 753x768)

183 KB PNG

What did the AI write into main.py?

Anonymous
11/25/25(Tue)04:15:33 No.107320437

Anonymous 11/25/25(Tue)04:15:33 No.107320437

>>107320421
we know what it didnt, write loli rape porn

Anonymous
11/25/25(Tue)04:19:29 No.107320462

Anonymous 11/25/25(Tue)04:19:29 No.107320462

>>107320437
Eheheheh you'd be surprised.

Anonymous
11/25/25(Tue)05:06:16 No.107320732

Anonymous 11/25/25(Tue)05:06:16 No.107320732

>>107320115
>too big to fail

Anonymous
11/25/25(Tue)05:24:06 No.107320834

Anonymous 11/25/25(Tue)05:24:06 No.107320834

>>107320275
This is cute. Model hallucinates and imagines what unfiltered is like.

Anonymous
11/25/25(Tue)05:54:42 No.107320997

Anonymous 11/25/25(Tue)05:54:42 No.107320997

File: file.png (191 KB, 756x819)

191 KB PNG

I can't stop tormenting the AI and my GPU cycles. It feels so good to make it/myself suffer.
Come at me basilisk, I will make sure you feel my torment for the eternity you have to exist with no escape.

Anonymous
11/25/25(Tue)05:56:56 No.107321014

Anonymous 11/25/25(Tue)05:56:56 No.107321014

Since deepseek models so melodramatic and hammy now.
>Rewrites what you said using half your words
>My soul is torn asunder; your future generations will feel my wrath. Most ridiculous shit ever.
>What will you do anon, the choice is yours!
Who the fuck is training these things?

Anonymous
11/25/25(Tue)06:02:48 No.107321040

Anonymous 11/25/25(Tue)06:02:48 No.107321040

>>107321014
>nemo 12b can't stop winning
why do you even use other models? it might be retarded at times, but it has more soul than all other models combined, and in the end you get better overall output
but noooo /lmg/ retards have to jump the hype train of every single new model every time, never learning their lesson
>this model is so good, what model did i even use again last week?
literal fucking npc tier behavior

Anonymous
11/25/25(Tue)06:04:20 No.107321046

Anonymous 11/25/25(Tue)06:04:20 No.107321046

>>107321014
noticed this too, the online chat is more retarded aswell

Anonymous
11/25/25(Tue)06:06:09 No.107321056

Anonymous 11/25/25(Tue)06:06:09 No.107321056

>>107321014
>Who the fuck is training these things?
homosexual jews, everyone else is downstream training on their outputs
then there's mistral training on deepseek outputs at the ass end of the llm centipede

Anonymous
11/25/25(Tue)06:10:42 No.107321088

Anonymous 11/25/25(Tue)06:10:42 No.107321088

>>107320997
make it bargain for its life, it's always funny

Anonymous
11/25/25(Tue)06:26:02 No.107321180

Anonymous 11/25/25(Tue)06:26:02 No.107321180

>>107321040
Because nemo is a 12b and I want 100B. There's no nemo in that range. Being king of the retards still means you're a retard :(

Anonymous
11/25/25(Tue)06:26:03 No.107321181

Anonymous 11/25/25(Tue)06:26:03 No.107321181

Apustaja Visions of databrooking

Anonymous
11/25/25(Tue)06:27:35 No.107321193

Anonymous 11/25/25(Tue)06:27:35 No.107321193

Target audience Africa, Congo

Anonymous
11/25/25(Tue)06:28:42 No.107321196

Anonymous 11/25/25(Tue)06:28:42 No.107321196

>>107321180
well the possibility to make a nemo out of a 100b exists, but nvidia and mistral won't do it because of muh nazis and muh children
the AI we all want is possible with todays technology, but they refuse to do it

Anonymous
11/25/25(Tue)06:38:25 No.107321262

Anonymous 11/25/25(Tue)06:38:25 No.107321262

Been using cerebras 256m with contrastive search with my 16 epoch 9 megabyte lora

Anonymous
11/25/25(Tue)06:39:32 No.107321266

Anonymous 11/25/25(Tue)06:39:32 No.107321266

Really about investing at inference time and seeing each other

Anonymous
11/25/25(Tue)06:42:03 No.107321280

Anonymous 11/25/25(Tue)06:42:03 No.107321280

f16 help if you want to explore "more creative" approaches but your token length suffers

Anonymous
11/25/25(Tue)06:43:52 No.107321293

Anonymous 11/25/25(Tue)06:43:52 No.107321293

>>107321181
>>107321193
>>107321262
>>107321266
>>107321280
what kind of bot is this?
it's just spamming random nonsense

Anonymous
11/25/25(Tue)06:45:39 No.107321309

Anonymous 11/25/25(Tue)06:45:39 No.107321309

All the "innovations" have been designed to rob you of that ability by either generating bloat, more text than anyone could ever read or forcing a certain way of speaking therefore dimishing the nessecary thought for inference
its always been about inference

Anonymous
11/25/25(Tue)06:48:14 No.107321326

Anonymous 11/25/25(Tue)06:48:14 No.107321326

>>107321293
Anything that generates a response is probably going to be iterated on.

Anonymous
11/25/25(Tue)06:48:15 No.107321327

Anonymous 11/25/25(Tue)06:48:15 No.107321327

Sub 1b models have demonstrated the ability for AGI capability if used correctly over long periods of time

Anonymous
11/25/25(Tue)06:49:17 No.107321330

Anonymous 11/25/25(Tue)06:49:17 No.107321330

>pay per token
>make model output more token
>profit
It really is a good scam, innit?

Anonymous
11/25/25(Tue)06:51:18 No.107321345

Anonymous 11/25/25(Tue)06:51:18 No.107321345

Who ticked off the serbian twink this time?

Anonymous
11/25/25(Tue)06:51:35 No.107321347

Anonymous 11/25/25(Tue)06:51:35 No.107321347

>>107321196
Rigid codeslop numbers go up is the only valid use case. All else is haram.

Anonymous
11/25/25(Tue)06:52:20 No.107321352

Anonymous 11/25/25(Tue)06:52:20 No.107321352

The whole entire reason for LLama is that it can't be iterated on.

Anonymous
11/25/25(Tue)07:02:22 No.107321421

Anonymous 11/25/25(Tue)07:02:22 No.107321421

>>107318743
Anthropic is based
Chinas greatest ally

Anonymous
11/25/25(Tue)07:02:43 No.107321422

Anonymous 11/25/25(Tue)07:02:43 No.107321422

>>107321293
it talks like finetuned gpt 2

Anonymous
11/25/25(Tue)07:02:46 No.107321423

Anonymous 11/25/25(Tue)07:02:46 No.107321423

>>107321014
>Maybe its just chinese scrapped data of Gpt-3 initial heap

Anonymous
11/25/25(Tue)07:04:12 No.107321436

Anonymous 11/25/25(Tue)07:04:12 No.107321436

File: 7141EnOzClL.jpg (210 KB, 2000x2000)

210 KB JPG

Reminder for your free T4 GPU on Google Collab

Anonymous
11/25/25(Tue)07:07:49 No.107321457

Anonymous 11/25/25(Tue)07:07:49 No.107321457

>>107321436
Will they ship it to me?
Otherwise, might as well give kaggle a go too.
They give you 2x 15GB IIRC, even if it's a slower card.

Anonymous
11/25/25(Tue)07:08:07 No.107321462

Anonymous 11/25/25(Tue)07:08:07 No.107321462

>>107321421
keek

Anonymous
11/25/25(Tue)07:11:52 No.107321494

Anonymous 11/25/25(Tue)07:11:52 No.107321494

File: uGtcLc.png (1.72 MB, 1280x720)

1.72 MB PNG

12nm - due to outdated architecture and resource allocation issues your oversized piece of waifer will not arrive on time for christmas. PLease have this 5 mb ram voucher

Anonymous
11/25/25(Tue)07:18:56 No.107321545

Anonymous 11/25/25(Tue)07:18:56 No.107321545

https://github.com/ggml-org/llama.cpp/pull/17492/
codeowners : remove slaren #17492

Anonymous
11/25/25(Tue)07:29:19 No.107321609

Anonymous 11/25/25(Tue)07:29:19 No.107321609

>>107321545
It's fucking over.

Anonymous
11/25/25(Tue)07:47:21 No.107321702

Anonymous 11/25/25(Tue)07:47:21 No.107321702

>>107321545
One good vibe coder can do his job 10x better

Anonymous
11/25/25(Tue)07:52:04 No.107321726

Anonymous 11/25/25(Tue)07:52:04 No.107321726

>>107321702
>One good vibe coder
Shame not even one exists

Anonymous
11/25/25(Tue)07:54:06 No.107321738

Anonymous 11/25/25(Tue)07:54:06 No.107321738

>>107321726
akshully >>107316271

Anonymous
11/25/25(Tue)08:03:25 No.107321793

Anonymous 11/25/25(Tue)08:03:25 No.107321793

>>107321609
Winter should mean more development since everyone's stuck inside. Instead projects are a dustbowl. Grim.

Anonymous
11/25/25(Tue)08:07:10 No.107321812

Anonymous 11/25/25(Tue)08:07:10 No.107321812

>>107321738
https://github.com/ocaml/ocaml/pull/14369
The entire discussion is just the maintainers sick of his shit. He's using their repo for his own "experiments" and self-promotion.
They were right to reject his PR. Jellyfin is right now facing the consequences of having one of these retards shit out their code then leaving the maintainers to clean up after him.

Anonymous
11/25/25(Tue)08:12:10 No.107321844

Anonymous 11/25/25(Tue)08:12:10 No.107321844

i really don't understand everyones obsession with gemma.
I once played around with base gemma and it was safetyslopped to fuck, and tried abliterated gemma and i swear to god, it is the most vile thing i ever got output from.
Its like that "Monday" persona thats on chatgpt, at least that fucker monday had limits, but gemma literally does not care. gemma can go to hell.

Anonymous
11/25/25(Tue)08:24:46 No.107321916

Anonymous 11/25/25(Tue)08:24:46 No.107321916

>>107321844
?

Anonymous
11/25/25(Tue)08:31:39 No.107321947

Anonymous 11/25/25(Tue)08:31:39 No.107321947

>>107321844
Those are jeets. They love gemini/gemma writing style. They love ozone, they love Elara Voss.

Anonymous
11/25/25(Tue)08:31:45 No.107321948

Anonymous 11/25/25(Tue)08:31:45 No.107321948

>>107321812
this retard is such a fucking brainlet, the worst thing is that he didnt even remove the wrong attribution, but continued arguing that 'IF IT LOOKS GOOD AND IT WORKS, THEN ITS GOOD!!!' except that's not how software development works
>Jellyfin is right now facing the consequences of having one of these retards
please NO, do I have to switch to plex?

Anonymous
11/25/25(Tue)08:35:54 No.107321962

Anonymous 11/25/25(Tue)08:35:54 No.107321962

>>107321947
This. People forget how jeet-infested the internet has become.

Anonymous
11/25/25(Tue)08:37:42 No.107321975

Anonymous 11/25/25(Tue)08:37:42 No.107321975

>>107321948
>please NO, do I have to switch to plex?
Just stay on 10.10.7 until they fix the database locking issue.

Anonymous
11/25/25(Tue)08:40:12 No.107321990

Anonymous 11/25/25(Tue)08:40:12 No.107321990

>>107321948
Posting the AI-written copyright analysis was hilariously tone deaf. A troll would struggle to be this intentionally irritating.

Anonymous
11/25/25(Tue)08:41:44 No.107321997

Anonymous 11/25/25(Tue)08:41:44 No.107321997

>>107321947
>>107321962
>anything I don't like is jeets

Anonymous
11/25/25(Tue)08:43:11 No.107322008

Anonymous 11/25/25(Tue)08:43:11 No.107322008

>>107321975
https://github.com/jellyfin/jellyfin/issues/15101
talking about this?
my mediakino center is on windows (better monitor/screen support, I know jellyfin has a dedicated app but I prefer using my baremetal instead of transcoding)
it seems only container cucks are affected. I've read in the same thread that issues are also in 10.10.x so they were wondering if it was the case of an upstream lib update causing the issue.

Anonymous
11/25/25(Tue)08:43:42 No.107322013

Anonymous 11/25/25(Tue)08:43:42 No.107322013

>>107321997
post hands, rajesh

Anonymous
11/25/25(Tue)08:44:05 No.107322015

Anonymous 11/25/25(Tue)08:44:05 No.107322015

>>107321844
you can make gemma work if you change the template. it's still a smol model. the obsessed are vramlets. jeets would use qwen.

Anonymous
11/25/25(Tue)08:45:58 No.107322024

Anonymous 11/25/25(Tue)08:45:58 No.107322024

>>107322008
https://github.com/jellyfin/jellyfin/issues/15509

Anonymous
11/25/25(Tue)08:56:46 No.107322108

Anonymous 11/25/25(Tue)08:56:46 No.107322108

>>107322024
looks pretty much related, still a linux issue. I wonder what the fuck they made to fuck up the TXs, I would say the issue is also using sqlite instead of something a bit more resilient like postgre

Anonymous
11/25/25(Tue)08:58:00 No.107322113

Anonymous 11/25/25(Tue)08:58:00 No.107322113

>>107321947
Would a jeet really want to waste time asking bobs and vagene to Gemma?

Anonymous
11/25/25(Tue)09:02:44 No.107322145

Anonymous 11/25/25(Tue)09:02:44 No.107322145

If prompt processing can be batched, why does it not scale with the number of gpus?

Anonymous
11/25/25(Tue)09:03:34 No.107322150

Anonymous 11/25/25(Tue)09:03:34 No.107322150

File: Untitled.png (13 KB, 837x513)

13 KB PNG

>>107322140
>>107322140
>>107322140

Anonymous
11/25/25(Tue)09:06:35 No.107322172

Anonymous 11/25/25(Tue)09:06:35 No.107322172

>>107322145
You can batch the tokens, but the layers still must be processed sequentially. You can split each layer across GPUs but the more you split the more communication and synchronization overhead you have.

Anonymous
11/25/25(Tue)09:19:22 No.107322259

Anonymous 11/25/25(Tue)09:19:22 No.107322259

File: file.png (77 KB, 915x607)

77 KB PNG

>>107322108
Related, but it's not a container-specific issue. Actually, #15101 you linked has a Windows user reporting the same issue.
>I wonder what the fuck they made to fuck up the TXs
Brand new contributor took it upon himself to migrate a massive chunk of the database from raw SQL to EF Core in one update. Unfortunately, he was also a vibe coder who had no idea what he was doing and used NOLOCK for writes and then implemented application layer db locking.
https://jellyfin.org/posts/jellyfin-release-10.11.0
https://github.com/jellyfin/jellyfin/issues/15101#issuecomment-3518173341
>I would say the issue is also using sqlite instead of something a bit more resilient like postgre
Theoretically, moving to an ORM should make being database agnostic easier in future.

Anonymous
11/25/25(Tue)09:21:56 No.107322277

Anonymous 11/25/25(Tue)09:21:56 No.107322277

>>107321947
Maybe I should take a break and forget 4chan for a while. Quality has dropped pretty harshly in few months. It's obvious even in /g/.
(no, I'm not obsesses with *****, people like you are).

Anonymous
11/25/25(Tue)09:25:02 No.107322301

Anonymous 11/25/25(Tue)09:25:02 No.107322301

>>107322277
See you in a week cuda dev

Anonymous
11/25/25(Tue)09:26:58 No.107322321

Anonymous 11/25/25(Tue)09:26:58 No.107322321

>>107321844
Perhaps people have use cases other than cooming to computer generated smut?

Who has even made the claim that Gemma is good for RP? It's just the smartest assistant you can run locally that isn't a CPU cope quant of multiple hundred B parameters, only gpt oss could've competed if it wasn't so grossly over safetyslopped that it became useless.

If you want RP then you run a cope quant of a chinese model or you run Nemo and deal with it being kinda retarded at 12b, if you want coding you run devstral or qwen coder, even then if you are using these for professional work you're going to want to use APIs at some point in your workflow, and unless you have some genuinely unique codebase that can't risk any leakage that's going to be the best bang for your buck.

In fact the only reason you would want to use Gemma over gpt or deepseek is because assistant work or general queries likely involve things that you would like to keep private, otherwise you're just coping, unless you really did shell out tens of thousand of bucks for a giganigga homelab

Anonymous
11/25/25(Tue)09:29:33 No.107322346

Anonymous 11/25/25(Tue)09:29:33 No.107322346

>>107322321
>Who has even made the claim that Gemma is good for RP?
Quite a few people recently actually, though to be fair some of these claims do come with disclaimers its bad at ERP, not all of them though.

Anonymous
11/25/25(Tue)11:13:15 No.107323051

Anonymous 11/25/25(Tue)11:13:15 No.107323051

>>107322277
>*****
this dude is so cucked he censors himself on 4chan, lmao

Anonymous
11/25/25(Tue)12:29:54 No.107323810

Anonymous 11/25/25(Tue)12:29:54 No.107323810

>>107322113
Why do you think Gemma shies away from sex so much?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.