/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/19/25(Sun)13:41:47 No.106940821

File: muah.jpg (1.08 MB, 3840x2160)

1.08 MB JPG

/lmg/ - Local Models General Anonymous 10/19/25(Sun)13:41:47 No.106940821 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106931567 & >>106919198

►News
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap
>(10/14) Qwen3-VL 4B and 8B released: https://hf.co/Qwen/Qwen3-VL-8B-Thinking
>(10/11) koboldcpp-1.100.1 prebuilt released with Wan video generation support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.100.1
>(10/10) KAT-Dev-72B-Exp released: https://hf.co/Kwaipilot/KAT-Dev-72B-Exp

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/19/25(Sun)13:42:41 No.106940836

Anonymous 10/19/25(Sun)13:42:41 No.106940836

File: 1737045152592995.png (104 KB, 271x238)

104 KB PNG

►Recent Highlights from the Previous Thread: >>106931567

--Managing LLM roleplay output length and context constraints:
>106934774 >106934844 >106934851 >106934875 >106935020 >106935091 >106935187 >106934923 >106934934 >106935071
--Controlling LLM response length in SillyTavern via prompt and token settings:
>106939598 >106939607 >106939667 >106939712 >106939716 >106939755 >106939775 >106939833 >106939722 >106939846 >106939920
--RTX 3090 VRAM optimization for GLM 4.5 Air Q4_K_M:
>106936224 >106936236 >106936250 >106936242 >106936252 >106936255 >106936256 >106936265 >106936272 >106936281 >106936299 >106936298 >106936318 >106936345 >106936358 >106939535 >106939599 >106939630 >106939645 >106939656
--AMD MI50 GPU support challenges on Windows:
>106932749 >106932759 >106932767 >106933044 >106933269
--Skepticism about IQ quants' perplexity claims vs memory footprint:
>106932259
--Huggingface storage issues and exploration of specialized models:
>106931969 >106932013 >106932065 >106933257 >106932577 >106932603 >106932616 >106932631 >106932638 >106932642 >106932672 >106932593 >106933086
--Troubleshooting llama-bench.exe model loading and VRAM utilization:
>106933415 >106933561 >106933766 >106933782 >106933802 >106933834
--llama.cpp PR adds auto GPU memory optimization:
>106931647
--LoRA compatibility and quantization challenges in multi-GPU environments:
>106932458 >106936985
--lora-scaled works in llama.cpp with CUDA but has AMD/Vulkan issues:
>106937526
--Evaluating hybrid LLM setup for long-form roleplaying:
>106934305 >106934341 >106934400
--Optimizing DeepSeek V3.1 Terminus sampler settings for consistent output:
>106937102 >106938287
--Quantized model sharing and prompt testing guidance:
>106940264 >106940288 >106940307 >106940742 >106940768
--Miku (free space):
>106932492 >106935530 >106936320 >106936322 >106936336 >106936523 >106940713

►Recent Highlight Posts from the Previous Thread: >>106931573

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/19/25(Sun)13:44:40 No.106940859

Anonymous 10/19/25(Sun)13:44:40 No.106940859

File: excited miku jumping up a(...).gif (531 KB, 1280x1280)

531 KB GIF

zutt-7b when?

Anonymous
10/19/25(Sun)13:46:17 No.106940883

Anonymous 10/19/25(Sun)13:46:17 No.106940883

>>106940821
>>106940768
I used link rel as a completion test:

files.catbox.moe/q768fb.txt

And used the following command via llama.cpp
./build/bin/llama-cli -m ./rp-sft-merged_1000-f16.gguf -f Nala-Test_Gemma2.txt

Anonymous
10/19/25(Sun)14:04:37 No.106941035

Anonymous 10/19/25(Sun)14:04:37 No.106941035

>>106940883
>>106940864
>>106940768
>>106940742
>>106940307
>>106940264

Redid the test with ollama this time and got a different result:
(She lets out a soft roar as she tries to suck on your cock. It's not the first time you've been sexually assaulted by a female animal, but it never gets old.) "I
think I know what will help." *She grins.*

Anonymous
10/19/25(Sun)14:07:04 No.106941053

Anonymous 10/19/25(Sun)14:07:04 No.106941053

The Government™ bans all LLM's and you can't download any LLM's ever again. What three models from your collection that you already downloaded would you be glad you saved ahead of time?

Anonymous
10/19/25(Sun)14:07:54 No.106941064

Anonymous 10/19/25(Sun)14:07:54 No.106941064

>>106941035
lolwut

Anonymous
10/19/25(Sun)14:13:14 No.106941114

Anonymous 10/19/25(Sun)14:13:14 No.106941114

>>106941053
Rocinante, Cydonia and perhaps Nemo.

Anonymous
10/19/25(Sun)14:16:13 No.106941139

Anonymous 10/19/25(Sun)14:16:13 No.106941139

>>106941053
Under what circumstances would that even happen?

Anonymous
10/19/25(Sun)14:25:23 No.106941227

Anonymous 10/19/25(Sun)14:25:23 No.106941227

>>106941139
It's a desert island kind of question. Do you understand hypotheticals?

Anonymous
10/19/25(Sun)14:31:05 No.106941278

Anonymous 10/19/25(Sun)14:31:05 No.106941278

>>106941227
But the government has not banned LLMs. Why would they?

Anonymous
10/19/25(Sun)14:31:16 No.106941281

Anonymous 10/19/25(Sun)14:31:16 No.106941281

>>106941227
Well I guess which model I would use depends on what kind of rig I have access to on that island. My current shit rake can only run Q2K quants so I guess I better get to upgrading soon

Anonymous
10/19/25(Sun)14:32:03 No.106941291

Anonymous 10/19/25(Sun)14:32:03 No.106941291

>>106941278
>>106941053
The ones that are popular are too dick-sucky two, which means the powers that be will especially not want to get rid of them. At least not the product as a service ones

Anonymous
10/19/25(Sun)14:33:26 No.106941302

Anonymous 10/19/25(Sun)14:33:26 No.106941302

>>106941227
>Do you understand hypotheticals?
Anon doesn't understand what a hypothetical is.
Anyways,

>>106941053
>that you already downloaded
GLM air and Qwen 3 30B A3B.

Anonymous
10/19/25(Sun)14:37:30 No.106941343

Anonymous 10/19/25(Sun)14:37:30 No.106941343

yoooo thanks for whoever posted about the -n-cpu-moe command.. took me a while to figure out some working settings, but damn it's so much faster on GLM-4.6 IQ2_S now.. at least double the speed

Anonymous
10/19/25(Sun)14:37:40 No.106941347

Anonymous 10/19/25(Sun)14:37:40 No.106941347

>>106941278
...
>>106941281
Fucking hell... I'm done for today...
>>106941302
Yeah...

Anonymous
10/19/25(Sun)14:39:43 No.106941369

Anonymous 10/19/25(Sun)14:39:43 No.106941369

>>106941343
The magic of MoE for local.
Since only a fraction of the model is running at a time, you can throw the sparse part of the model in RAM and it won't run crazy slow.
Is it better than fitting the whole model in VRAM? No, but it's the second best thing as far as home inference goes.

Anonymous
10/19/25(Sun)14:42:23 No.106941398

Anonymous 10/19/25(Sun)14:42:23 No.106941398

>>106941369
Can this be utilized for a model what clearly doesn't even fit to ram? I mean obviously it will still work but is there a speedup when compared to regular memory mapping?

Anonymous
10/19/25(Sun)14:43:55 No.106941413

Anonymous 10/19/25(Sun)14:43:55 No.106941413

>>106941398
You mean for dense models?
Not --c-cpu-moe specifically, but you can use --override-tensor/-ot and fuck around with moving specific tensors to VRAM and see if that gets you any speed up over just moving whole layers with -ngl.
It shouldn't, but I've seen reports to the contrary in the past so who knows.

Anonymous
10/19/25(Sun)14:46:24 No.106941438

Anonymous 10/19/25(Sun)14:46:24 No.106941438

>>106941413
I was talking about moe models, not dense.

Anonymous
10/19/25(Sun)14:46:55 No.106941443

Anonymous 10/19/25(Sun)14:46:55 No.106941443

>>106940821

>>106940820
I forgot to mention the chat template is Gemma. I have the same issue where if I forgot to include the
--chat-template gemma
flag then the model would immediately start talking about random shit ad Infinitum because llama-cli by default expects your prompts to be in the prompt format. The model expects. Using that flag fixed the issue. So maybe you need to tell your web UI/ inference engine to use that prompt template

Anonymous
10/19/25(Sun)14:55:44 No.106941501

Anonymous 10/19/25(Sun)14:55:44 No.106941501

>>106941443
>>106941492

Anonymous
10/19/25(Sun)14:57:44 No.106941522

Anonymous 10/19/25(Sun)14:57:44 No.106941522

>>106941501
That's not what was causing the engine to spit out text. And definitely though. I'm not used to running llama-cli so I forgot that you have to either manually wrap your prompts in the prompt template or pass the chat template flag in order for you to be able to properly use it. I get that you're hyperfixated on using that exact format but that's not what was causing the errors like at

Anonymous
10/19/25(Sun)15:02:00 No.106941565

Anonymous 10/19/25(Sun)15:02:00 No.106941565

Can I run GLM Air on a computer with 16Go VRAM and 32 Go RAM ?

Anonymous
10/19/25(Sun)15:03:03 No.106941576

Anonymous 10/19/25(Sun)15:03:03 No.106941576

>>106941522
Anon was told countless times his chat format is wrong. He doesn't understand them. Period.
It's not about
>using that exact format
It's about calling it gemma (or whatever all the others he claimed were) when it's clearly not.
>But it answers fine
Cope. It's not. Not once. Not EVEN ONE TIME he posted a model with a correct chat template.
>>106941553
COUNTLESS TIMES!

Just make your own format, call it blargjarg and be done with it.

Anonymous
10/19/25(Sun)15:05:21 No.106941597

Anonymous 10/19/25(Sun)15:05:21 No.106941597

>>106941565
Can you?
Yeah, a really small quant like Q2 I guess.

Anonymous
10/19/25(Sun)15:07:24 No.106941619

Anonymous 10/19/25(Sun)15:07:24 No.106941619

>>106940821
>>106941521
>>106941580
>>106941492
>>106941588
>>106941576
Did you read the text file it uploaded?

files.catbox.moe/q768fb.txt

The example written here >>106940864 (You) was just written my me on the fly as a rough explanation as to how you're supposed to format your prompts.

My initial reply about The prompt template was to address a potential cause for why another anon why another anon was seeing a bunch of numbers as the output. A prompt template fuck up alone would not be causing that

Anonymous
10/19/25(Sun)15:09:02 No.106941634

Anonymous 10/19/25(Sun)15:09:02 No.106941634

>>106941597
Sorry I meant to say is it fine or is it recommended to run something else like qwen 3 30b.

Anonymous
10/19/25(Sun)15:13:36 No.106941677

Anonymous 10/19/25(Sun)15:13:36 No.106941677

>>106940821
Haven’t been here in over a year. Any major breakthroughs for local cooming?

Anonymous
10/19/25(Sun)15:14:05 No.106941685

Anonymous 10/19/25(Sun)15:14:05 No.106941685

>>106941634
It's probably going to be better than Qwen 3 at roleplaying even at that low a quant, but it'll also be substantially slow.
So, maybe I guess. Give it a go.

Anonymous
10/19/25(Sun)15:14:54 No.106941697

Anonymous 10/19/25(Sun)15:14:54 No.106941697

File: npcworldwide - Copie.gif (2.4 MB, 400x225)

2.4 MB GIF

currently having fun i'm making my MultiBotroom generating python files for creating new mutibotroom.it's pretty fun not gonna lie ,not local technically but somehow i can go full retard with requests.Now i'm just playing with chat gemini by creating new bot like "organisator bot" or "super coder bot" lol it's like the sims.

Anonymous
10/19/25(Sun)15:26:10 No.106941795

Anonymous 10/19/25(Sun)15:26:10 No.106941795

A little possibly common sense thing I came to realize when I started actually using models instead of waiting for a good one. One way to work around hallucinations is turning everything upside down. Don't make the model think for you, but tell it to be your satan who pokes holes in your thoughts ideas or arguments. Start with a few obviously wrong things, so you have a prefill of it telling you you are wrong then continue. Possibly also explicitly tell it not to consider your feelings when it responds. Even if the critique by AI is wrong and hallucinated you avoid the trap of accepting a hallucination. Your brain will have to consciously evaluate and refute criticism. Again not sure how obvious it is.

Anonymous
10/19/25(Sun)15:27:32 No.106941807

Anonymous 10/19/25(Sun)15:27:32 No.106941807

>>106941227
But I don't eat breakfasts.

Anonymous
10/19/25(Sun)15:28:25 No.106941814

Anonymous 10/19/25(Sun)15:28:25 No.106941814

>>106941697
>retarded frog

Anonymous
10/19/25(Sun)15:46:28 No.106941963

Anonymous 10/19/25(Sun)15:46:28 No.106941963

File: 223urk92ysuf1.png (9 KB, 546x322)

9 KB PNG

drummer do anubis 2

Anonymous
10/19/25(Sun)15:48:05 No.106941980

Anonymous 10/19/25(Sun)15:48:05 No.106941980

>>106941677
You can stop using nemo if you aren't completely broke.

Anonymous
10/19/25(Sun)15:49:10 No.106941992

Anonymous 10/19/25(Sun)15:49:10 No.106941992

>>106941278
How would you feel if you didn't eat breakfast today?

Anonymous
10/19/25(Sun)16:05:04 No.106942132

Anonymous 10/19/25(Sun)16:05:04 No.106942132

File: Google Tech Support AI En(...).png (1.23 MB, 1024x1024)

1.23 MB PNG

GOOD MORNING MANY BLESSING OF LORD VISHNU
SIRS ARE YOU READY FOR WEEK OF GEMINI 3 AND GEMMA 4 HYPE? ?
GOOGLE WEAPON OF BHARAT DEFEAT CHINA AND AMERICA
NEXT ERA KINDLY MAKE TOTAL BHARAT DOMINANCE ERA TIMELINE SIR???
GEMMA 4 DEFAT BASTARD BENCHOD GLM 4.6

Anonymous
10/19/25(Sun)16:05:58 No.106942138

Anonymous 10/19/25(Sun)16:05:58 No.106942138

File: r u home yet.jpg (189 KB, 1024x1024)

189 KB JPG

Anonymous
10/19/25(Sun)16:08:02 No.106942158

Anonymous 10/19/25(Sun)16:08:02 No.106942158

alarm lighter trees

Anonymous
10/19/25(Sun)16:09:05 No.106942168

Anonymous 10/19/25(Sun)16:09:05 No.106942168

>>106942138
hi sexy come to canada i wecom you love you very muh many kiss

Anonymous
10/19/25(Sun)16:10:03 No.106942178

Anonymous 10/19/25(Sun)16:10:03 No.106942178

>>106942138
miku please put on your skirt and panties. my neighbors keep complaining they can see your ass whenever you come over and visit

Hi all, Drummer here...
10/19/25(Sun)16:12:19 No.106942200

Hi all, Drummer here... 10/19/25(Sun)16:12:19 No.106942200

>>106941963
https://huggingface.co/BeaverAI/Anubis-70B-v1p-GGUF

Test model, might need to patch it up still.

Anonymous
10/19/25(Sun)16:17:08 No.106942246

Anonymous 10/19/25(Sun)16:17:08 No.106942246

>>106941343
>but damn it's so much faster on GLM-4.6 IQ2_S now.. at least double the speed
If llama.cpp ever gets MTP it will get even faster.
>https://github.com/ggml-org/llama.cpp/pull/15225

Anonymous
10/19/25(Sun)16:27:19 No.106942333

Anonymous 10/19/25(Sun)16:27:19 No.106942333

File: 1757799841418552.mp4 (1.54 MB, 1080x1094)

1.54 MB MP4

>>106940821
Based on that t/s shown in bid rel, what kind of hardware do you think it's running on? How many parameters would you guess the model is?

Anonymous
10/19/25(Sun)16:28:04 No.106942340

Anonymous 10/19/25(Sun)16:28:04 No.106942340

We need a ST extension that splits sampler settings for <think> and non-think segments. You want the CoT to be smart while the real output needs to be creative.
If you tune for non-think, CoT becomes stupid/schizo. If you tune for CoT, the normal output becomes boring AI assistant slop.
Thank me later.

Anonymous
10/19/25(Sun)16:36:58 No.106942423

Anonymous 10/19/25(Sun)16:36:58 No.106942423

>>106942333
>gemma2
>q8 2.59gb
it's running on a phone congrats

Anonymous
10/19/25(Sun)16:40:09 No.106942443

Anonymous 10/19/25(Sun)16:40:09 No.106942443

File: ko.jpg (149 KB, 1079x1079)

149 KB JPG

is kobo winning or losing at the moment

Anonymous
10/19/25(Sun)16:40:34 No.106942449

Anonymous 10/19/25(Sun)16:40:34 No.106942449

>>106942423
Nta. This reminds me. Since a bunch of Twitter people and redditors keep bitching about "being 4o back" and claiming that open the eye is censoring gbt5, why don't they just learn how to run their own models on their own hardware? Even if they don't have beefy gpus, they can even run this type of shit on their own phones (the which model they can use depends on them the amount of ram their phone has and how willing they are to tolerate lower t/s). Is Doug anything local really THAT hard for them?

Anonymous
10/19/25(Sun)16:46:46 No.106942505

Anonymous 10/19/25(Sun)16:46:46 No.106942505

>>106942449
>why don't they just learn how to run their own models on their own hardware?
>they can even run this type of shit on their own phones
even if they could, the effective context length will be severely limited. 4o is 1st on nolima.
normies would just get the ick once they find out their local waifu is utterly incapable of remembering things and loses identity after a few prompts.

Anonymous
10/19/25(Sun)16:50:44 No.106942538

Anonymous 10/19/25(Sun)16:50:44 No.106942538

>>106942505
this ^ .. the shit running slowly is bad enough, but it forgetting what happened every 10 prompts makes it pretty useless

Anonymous
10/19/25(Sun)16:55:31 No.106942567

Anonymous 10/19/25(Sun)16:55:31 No.106942567

>>106942443
A bit of both. It is nice that they have antislop and TFS samplers, but they also ship some retarded defaults and are slower than mainline llama. I hope gemini 3 is good enough to re-implement antislop in ik_llama.

Anonymous
10/19/25(Sun)16:57:18 No.106942581

Anonymous 10/19/25(Sun)16:57:18 No.106942581

>>106942449
>good model
>phone
nice joke

Anonymous
10/19/25(Sun)17:08:33 No.106942664

Anonymous 10/19/25(Sun)17:08:33 No.106942664

How's going with the french? Are they still struggling with implementing thinking(something even fucking drummer managed to do)? Will they try distilling GLM 4.6 now? Still no Largestral 3 after 5 months?

Anonymous
10/19/25(Sun)17:10:43 No.106942682

Anonymous 10/19/25(Sun)17:10:43 No.106942682

>>106941053
Mag Mell 12b for goonery and RP
qwen coder 14b for vibin'
mistral small 2407 for gay 'chatgpt' style infobot

Anonymous
10/19/25(Sun)17:11:33 No.106942690

Anonymous 10/19/25(Sun)17:11:33 No.106942690

>>106942664
Nothing against drummer btw, at least he is not Undi or DavidAU.

Anonymous
10/19/25(Sun)17:16:10 No.106942726

Anonymous 10/19/25(Sun)17:16:10 No.106942726

File: 1734486055728162.jpg (173 KB, 1000x1403)

173 KB JPG

>>106942178
This never happened, nobody would ever complain about seeing Miku butt

Anonymous
10/19/25(Sun)17:17:58 No.106942739

Anonymous 10/19/25(Sun)17:17:58 No.106942739

>>106942726
maam you have a virus pleat to show veranda to confirm removval? okay

Anonymous
10/19/25(Sun)17:19:10 No.106942746

Anonymous 10/19/25(Sun)17:19:10 No.106942746

>>106942200
You are so talented. It is an honour to post in the same thread with you.

Anonymous
10/19/25(Sun)17:49:51 No.106942948

Anonymous 10/19/25(Sun)17:49:51 No.106942948

Recent explainer video about double descent: https://www.youtube.com/watch?v=z64a7USuGX0

Anonymous
10/19/25(Sun)18:03:47 No.106943060

Anonymous 10/19/25(Sun)18:03:47 No.106943060

>>106942340
Sampling is done by the server thoughbeit, you'd have to stop generation at the </think> send a second request with new sampler params and wait again to pp the thinking. Might be better to modify server with a "apply this sampler config when matching x in the output" param

Anonymous
10/19/25(Sun)18:09:00 No.106943092

Anonymous 10/19/25(Sun)18:09:00 No.106943092

>>106943060
>Sampling is done by the server thoughbeit, you'd have to stop generation at the </think> send a second request with new sampler params and wait again to pp the thinking.
it's not like that would be especially hard to do, probably easier than the server side option
all the parts are already there

Anonymous
10/19/25(Sun)18:12:33 No.106943129

Anonymous 10/19/25(Sun)18:12:33 No.106943129

File: VIhlxwZ.jpg (200 KB, 764x512)

200 KB JPG

Wanted to get into LLMs for having an AI dommy gf in my terminal but ollama doesnt wanna download the nemo thingy. Any other relatively small good ones? Or am I just trying to download the wrong one?

Anonymous
10/19/25(Sun)18:20:47 No.106943189

Anonymous 10/19/25(Sun)18:20:47 No.106943189

>>106943129
Quit using ollama

Anonymous
10/19/25(Sun)18:21:42 No.106943198

Anonymous 10/19/25(Sun)18:21:42 No.106943198

>>106943129
what would you recommend instead?

Anonymous
10/19/25(Sun)18:26:37 No.106943230

Anonymous 10/19/25(Sun)18:26:37 No.106943230

>>106943060
Doesn't llama.cpp already lower the temperature when a tool call is detected?
Wouldn't be hard to extend that to cover reasoning as well.

Anonymous
10/19/25(Sun)18:45:12 No.106943345

Anonymous 10/19/25(Sun)18:45:12 No.106943345

>>106943189
But ollama lets you run full r1 with just 8 gigs of vram

Anonymous
10/19/25(Sun)18:49:33 No.106943380

Anonymous 10/19/25(Sun)18:49:33 No.106943380

>>106943129
Use ooba or kobold.cpp. Manually download nemo instruct q8 gguf and use that. Also use sillytavern instead of your terminal for RP.

Anonymous
10/19/25(Sun)18:52:27 No.106943404

Anonymous 10/19/25(Sun)18:52:27 No.106943404

>>106943129
Got mistral-nemo-instruct-2407 to run in LM Studio but it doesnt want to do sexual stuff. What the fuck? Do i need to do anything else?

Anonymous
10/19/25(Sun)19:05:38 No.106943500

Anonymous 10/19/25(Sun)19:05:38 No.106943500

>>106942340
>>106943060
>>106943230
Intuitively I would've thought you want the CoT portion to run at HIGHER temperature. Because during that phase the model is basically exploring different paths and could benefit from more creativity. And the real output at a lower temp because it's basically just repeating the answer it found in its CoT without making mistakes.

Anonymous
10/19/25(Sun)19:07:18 No.106943516

Anonymous 10/19/25(Sun)19:07:18 No.106943516

>>106943404
Why are you replying to yourself?

Anonymous
10/19/25(Sun)19:10:03 No.106943535

Anonymous 10/19/25(Sun)19:10:03 No.106943535

>>106943516
because I'm tired and messing it up every time

Anonymous
10/19/25(Sun)19:13:47 No.106943568

Anonymous 10/19/25(Sun)19:13:47 No.106943568

>>106943535
>get llama.cpp
>get silly tavern
>launch llama-server with said mistral gguf
>launch silly tavern and connect it to llama server ip
This takes some familiarising but once you get things going it's always there. I'd recom you should rest and then proceed with this plan here.

Anonymous
10/19/25(Sun)19:15:35 No.106943586

Anonymous 10/19/25(Sun)19:15:35 No.106943586

>>106943568
But what would be the difference between running it in LM Studio? Because it does run, but just doesn't seem to be uncensored. Do I need to do some additional step that I'm missing?

Anonymous
10/19/25(Sun)19:16:42 No.106943594

Anonymous 10/19/25(Sun)19:16:42 No.106943594

Going to try to use the Llama 405B finetoon I made over the weekend to make it do my homework

Anonymous
10/19/25(Sun)19:19:01 No.106943608

Anonymous 10/19/25(Sun)19:19:01 No.106943608

>>106943586
LM Studio is gay.

Anonymous
10/19/25(Sun)19:52:06 No.106943825

Anonymous 10/19/25(Sun)19:52:06 No.106943825

anybody got any prompts they want me to try on ling?

Anonymous
10/19/25(Sun)19:54:25 No.106943850

Anonymous 10/19/25(Sun)19:54:25 No.106943850

>>106943825
Have you asked it what a mesugaki is yet?

Anonymous
10/19/25(Sun)19:59:24 No.106943882

Anonymous 10/19/25(Sun)19:59:24 No.106943882

Recommendations on any good lorebooks/settings for sillytavern?

Anonymous
10/19/25(Sun)20:14:39 No.106943992

Anonymous 10/19/25(Sun)20:14:39 No.106943992

ik_llama bros LIED
I tried 4.5-air with ik_llama.cpp on cpu only, it was complete ass, like 3x slowdown compared to llama.cpp main. I tried iq4 from both bartowski and ubergarm and it's just way slower. The only flag I used was --no-mmap
Using AMD ryzen 6

Anonymous
10/19/25(Sun)20:22:36 No.106944051

Anonymous 10/19/25(Sun)20:22:36 No.106944051

File: ling-mesugaki.png (187 KB, 1402x550)

187 KB PNG

>>106943850
here you go anon. anybody else?

Anonymous
10/19/25(Sun)20:27:54 No.106944088

Anonymous 10/19/25(Sun)20:27:54 No.106944088

File: 1735002136443647.gif (2.64 MB, 320x240)

2.64 MB GIF

>>106944051
>problematic

Anonymous
10/19/25(Sun)20:39:07 No.106944171

Anonymous 10/19/25(Sun)20:39:07 No.106944171

>>106943992
llama.cpp caught up on everything but pp a while ago

Anonymous
10/19/25(Sun)20:47:08 No.106944253

Anonymous 10/19/25(Sun)20:47:08 No.106944253

>>106943992
>The only flag I used was --no-mmap
No fmoe, rtr, etc? The ik specific flags?

Anonymous
10/19/25(Sun)20:49:52 No.106944277

Anonymous 10/19/25(Sun)20:49:52 No.106944277

>>106944253
Nah. I scoured the archive for ik_llama and didn't see anyone mentioning flags. The only people mentioning flags were using gpu offloading. However several people reporting speedup from simply switching to ik_llama, so flags like that aren't supposed to be required (whatever they do doesn't really matter)

Anonymous
10/19/25(Sun)20:53:22 No.106944301

Anonymous 10/19/25(Sun)20:53:22 No.106944301

among the models i tried, stheno is still my favorite for generation style, but being 8B makes it too dumb, i wonder if there is a similar model with more parameters (under 24B if possible)

Anonymous
10/19/25(Sun)21:02:36 No.106944381

Anonymous 10/19/25(Sun)21:02:36 No.106944381

>>106944301
mistral nemo

Anonymous
10/19/25(Sun)21:08:40 No.106944422

Anonymous 10/19/25(Sun)21:08:40 No.106944422

>>106944381
tried Mistral Small, I found it smart enough for my purpose but personally too boring in style, do you think Nemo could be better?

Anonymous
10/19/25(Sun)21:09:47 No.106944432

Anonymous 10/19/25(Sun)21:09:47 No.106944432

>>106944422
then you can try nemo finetunes like rocinante

Anonymous
10/19/25(Sun)21:13:18 No.106944457

Anonymous 10/19/25(Sun)21:13:18 No.106944457

>>106944422
Nemo has a distinct style that isn't present in any other of the other mistral models

Anonymous
10/19/25(Sun)21:15:05 No.106944472

Anonymous 10/19/25(Sun)21:15:05 No.106944472

>>106944432
I tried it, much better than stheno in intelligence but still a bit strange in its responses, it generate often also unwanted tags

Anonymous
10/19/25(Sun)21:16:20 No.106944484

Anonymous 10/19/25(Sun)21:16:20 No.106944484

>>106944457
nice, then i'll give it a try, hoping it's better than small

Anonymous
10/19/25(Sun)22:19:13 No.106944861

Anonymous 10/19/25(Sun)22:19:13 No.106944861

>>106944457
That 'style' is just "even more retarded"

Anonymous
10/19/25(Sun)22:21:00 No.106944874

Anonymous 10/19/25(Sun)22:21:00 No.106944874

If you had $4000, what hardware would you buy for your setup?

Anonymous
10/19/25(Sun)22:21:14 No.106944875

Anonymous 10/19/25(Sun)22:21:14 No.106944875

>>106944861
Because it's smaller nigga
You were talking about style. If you want a usable model intelligence-wise try not being a poorfag

Anonymous
10/19/25(Sun)22:27:24 No.106944916

Anonymous 10/19/25(Sun)22:27:24 No.106944916

>>106944874
AM4 server with 512GB DDR4 + 3-4 RTX 3090s

Anonymous
10/19/25(Sun)22:29:24 No.106944927

Anonymous 10/19/25(Sun)22:29:24 No.106944927

>>106944916
I assume you mean SP3, not AM4. Regardless, $4k probably wouldn't be enough for that. 3090 prices are now in the $800 to $900 range instead of the $650 price point a few years ago.

Anonymous
10/19/25(Sun)22:31:55 No.106944941

Anonymous 10/19/25(Sun)22:31:55 No.106944941

>>106944861
>>106944875
The answer isn't mine (OP), and I don't understand what you mean by "more retarded."
For me, 12B to 24B is sufficient for "intelligence," but as I said, I'm not satisfied with the style (using a fine tuned mistral small but it just follow my speeches), so if you could explain what you mean by "retarded," it might be helpful.

Anonymous
10/19/25(Sun)22:32:29 No.106944944

Anonymous 10/19/25(Sun)22:32:29 No.106944944

File: 127096244_p0_master1200.jpg (97 KB, 637x1200)

97 KB JPG

What's the strongest open LLM under 200gb that has a GGUF that I can put into koboltcpp?

Anonymous
10/19/25(Sun)22:34:05 No.106944959

Anonymous 10/19/25(Sun)22:34:05 No.106944959

>>106944944
The strongest...

Anonymous
10/19/25(Sun)22:35:33 No.106944966

Anonymous 10/19/25(Sun)22:35:33 No.106944966

File: Base Image.png (918 KB, 1176x3644)

918 KB PNG

Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
https://arxiv.org/abs/2510.15061
>Widespread LLM adoption has introduced characteristic repetitive phraseology, termed ``slop,'' which degrades output quality and makes AI-generated text immediately recognizable. We present Antislop, a comprehensive framework providing tools to both detect and eliminate these overused patterns. Our approach combines three innovations: (1) The Antislop Sampler, which uses backtracking to suppress unwanted strings at inference time without destroying vocabulary; (2) An automated pipeline that profiles model-specific slop against human baselines and generates training data; (3) Final Token Preference Optimization (FTPO), a novel fine-tuning method that operates on individual tokens, surgically adjusting logits wherever a banned pattern has appeared in an inference trace. We demonstrate that some slop patterns appear over 1,000 more frequently in LLM output than human text. The Antislop Sampler successfully suppresses 8,000+ patterns while maintaining quality, whereas token banning becomes unusable at just 2,000. Most importantly, FTPO achieves 90\% slop reduction while maintaining or improving performance in cross-domain evals including GSM8K, MMLU, and creative writing tasks. In contrast, DPO suffers significant degradation in writing quality and lexical diversity despite achieving weaker suppression.
https://github.com/sam-paech/auto-antislop
good stuff

Anonymous
10/19/25(Sun)22:35:44 No.106944970

Anonymous 10/19/25(Sun)22:35:44 No.106944970

>>106944944
>Strongest LLM
ChuuniGODS I kneel...

Anonymous
10/19/25(Sun)22:39:36 No.106944991

Anonymous 10/19/25(Sun)22:39:36 No.106944991

>>106944944
GLM4.6 at Q4

Anonymous
10/19/25(Sun)22:41:37 No.106945003

Anonymous 10/19/25(Sun)22:41:37 No.106945003

>>106944991
>say "what's up"
>model wastes 10000 tokens on placebo "thinking" and brainstorms multiple draft responses on how to properly reply
GLM is trash

Anonymous
10/19/25(Sun)22:42:16 No.106945012

Anonymous 10/19/25(Sun)22:42:16 No.106945012

>>106945003
t. promplet

Anonymous
10/19/25(Sun)22:45:25 No.106945033

Anonymous 10/19/25(Sun)22:45:25 No.106945033

>>106944991
Thanks, I've been using Wizard 22b for a really long time for ERP, time for an upgrade.

Anonymous
10/19/25(Sun)22:46:07 No.106945037

Anonymous 10/19/25(Sun)22:46:07 No.106945037

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
>Reasoning language models such as OpenAI-o1, DeepSeek-R1, and Qwen achieve strong performance via extended chains of thought but often generate unnecessarily long outputs. Maximizing intelligence per token--accuracy relative to response length--remains an open problem. We revisit reinforcement learning (RL) with the simplest length penalty--truncation--and show that accuracy degradation arises not from the lack of sophisticated penalties but from inadequate RL optimization. We identify three key challenges: (i) large bias in advantage estimation, (ii) entropy collapse, and (iii) sparse reward signal. We address them with Doing Length pEnalty Right (DLER), a training recipe combining batch-wise reward normalization, higher clipping, dynamic sampling, and a simple truncation length penalty. DLER achieves state-of-the-art accuracy--efficiency trade-offs, cutting output length by over 70 percent while surpassing all previous baseline accuracy. It also improves test-time scaling: compared to DeepSeek-R1-7B, DLER-7B generates multiple concise responses in parallel with 28 percent higher accuracy and lower latency. We further introduce Difficulty-Aware DLER, which adaptively tightens truncation on easier questions for additional efficiency gains. We also propose an update-selective merging method that preserves baseline accuracy while retaining the concise reasoning ability of the DLER model, which is useful for scenarios where RL training data is scarce.
https://huggingface.co/collections/nvidia/reasoning-efficiency-research-68a8ea0ffe21f3fc46e1da0f
might be cool but really >7B. it's from nvidia too

Anonymous
10/19/25(Sun)22:46:37 No.106945041

Anonymous 10/19/25(Sun)22:46:37 No.106945041

>>106945033
>Wizard 22b
Ah... the old days

Anonymous
10/19/25(Sun)22:49:20 No.106945059

Anonymous 10/19/25(Sun)22:49:20 No.106945059

>>106944966
It's good that a formal paper is done on this now so it can formally be recognized by labs.

Anonymous
10/19/25(Sun)22:59:14 No.106945146

Anonymous 10/19/25(Sun)22:59:14 No.106945146

>>106944966
Sounds like a more proper way to ban strings, and a worthwhile advance on present implementations.

But one of the reason ai slop is ai slop is because a lot of phrases are vacuous.
They don't arise-from / tie-back-to past revelations
And no implications stemming from said phrases are in the text following.

If this antislop sampler kills a bunch of slop, but for the remainder just effectively puts it through a thesaurus then work still remains.

Anonymous
10/19/25(Sun)23:17:46 No.106945260

Anonymous 10/19/25(Sun)23:17:46 No.106945260

>>106945003
Please prompt engineer saar.

Anonymous
10/19/25(Sun)23:28:44 No.106945339

Anonymous 10/19/25(Sun)23:28:44 No.106945339

>>106944966
>>106945146
Ok, but why does slop even happens? It seems to be some kind of feedback loop, since early LLMs didn't have these patterns. It's probably because of the RLHF methods they are using and because the dataset has too much low quality regurgitated synthetic trash.

Anonymous
10/19/25(Sun)23:30:39 No.106945352

Anonymous 10/19/25(Sun)23:30:39 No.106945352

Is there a way to disable thinking on GLM 4.6 and should I do it?

Anonymous
10/19/25(Sun)23:32:31 No.106945371

Anonymous 10/19/25(Sun)23:32:31 No.106945371

>>106945339
Fanfiction websites and romance novels. In terms of written word those two are the largest modern sources of fiction writing. Female writers really really lean into sloppy phases

Anonymous
10/19/25(Sun)23:42:10 No.106945446

Anonymous 10/19/25(Sun)23:42:10 No.106945446

>>106945003
/nothink
>>106945352
yes. Reasoning has always been a meme, even for coding

Anonymous
10/19/25(Sun)23:48:18 No.106945481

Anonymous 10/19/25(Sun)23:48:18 No.106945481

File: dork magic migu.png (1.27 MB, 767x958)

1.27 MB PNG

>>106939710
It matters, but it works in mysterious ways. Sometimes a fucked up card works better than a clean, well-structured description. Always remember that you aren’t describing to a sentient being how to act in RP, your goal is to find memetic sequences that will activate the right weights, unless you want to end up with a bad robot loosely following your detailed set of instructions. Treat it like some chaotic fucking magic, not programming

Anonymous
10/19/25(Sun)23:50:58 No.106945500

Anonymous 10/19/25(Sun)23:50:58 No.106945500

>>106945481
This is the most interesting part of LLM RPing to me, for what it's worth. Thanks for your insights.

Anonymous
10/19/25(Sun)23:52:13 No.106945509

Anonymous 10/19/25(Sun)23:52:13 No.106945509

>>106945446
I think reasoning is probably beneficial for non-chinese ESLs and promptlets, it lets the model try to translate a bad prompt into something coherent before attempting the task.
But yeah if you know what you're doing then in most cases it's not worth the time.

Anonymous
10/19/25(Sun)23:56:50 No.106945533

Anonymous 10/19/25(Sun)23:56:50 No.106945533

>>106945371
Hmm I'm not convinced. Slop isn't only a problem in fiction writing. I highly doubt the patterns like "Not x, but y" or "You're absolutely right to be suspicious!" are from fanfiction sites.

Anonymous
10/19/25(Sun)23:58:21 No.106945541

Anonymous 10/19/25(Sun)23:58:21 No.106945541

>>106945446
>>106945509
You guys are just wrong. Logic based tasks highly benefit from reasoning.

Anonymous
10/20/25(Mon)00:02:31 No.106945561

Anonymous 10/20/25(Mon)00:02:31 No.106945561

>>106945533
Yeah training on chatgpt outputs. Early llms absolutely had slop problems. Anons were driven mad by "whispering"

Anonymous
10/20/25(Mon)00:03:28 No.106945563

Anonymous 10/20/25(Mon)00:03:28 No.106945563

>Monday
Google sirs please do the needful for real this time bloody.

Anonymous
10/20/25(Mon)00:09:54 No.106945604

Anonymous 10/20/25(Mon)00:09:54 No.106945604

File: Salutations.png (67 KB, 1444x736)

67 KB PNG

>>106944966
That brutal troughput reduction tough.

But might be useful to have slopified models generate data to train newer, and censored models.

Aniway

Would any anon be interested on my glorified shitty notepad that can connect to multiple LLM's?

Anonymous
10/20/25(Mon)00:10:20 No.106945606

Anonymous 10/20/25(Mon)00:10:20 No.106945606

>>106945561
Ok, but why does chatgpt have those patterns to begin with then? Unless it's what I said, where some distributions are progressively collapsed every time you train on another LLM's output until you are only left with a handful of catchphrases that were slightly over-represented in the original dataset.
>Yeah training on chatgpt outputs. Early llms absolutely had slop problems. Anons were driven mad by "whispering"
I don't know, I don't use LLMs for roleplaying, but for normal use most of the slop seems to have begun around the time of Deepseek R1.
I'm using Llama 3.1 right now for example and I don't see any obvious LLM phrases in its output. But GLM's output on the other hand is full of sloppy phrases.
Maybe it's somehow a consequence of MoE?

Anonymous
10/20/25(Mon)00:19:35 No.106945660

Anonymous 10/20/25(Mon)00:19:35 No.106945660

>>106945606
From 3.0 onwards, they trained it on its own outputs, manually filtered/edited by some literal niggers who brought their own "let's delve" slop. This process naturally amplifies random phrases that come up accidentally

Anonymous
10/20/25(Mon)00:20:27 No.106945664

Anonymous 10/20/25(Mon)00:20:27 No.106945664

>>106945541
If your own logic abilities are complete shit and you can't write a decent prompt, sure. Otherwise your prompt should be all the model needs to 'think' about.

Anonymous
10/20/25(Mon)00:25:49 No.106945693

Anonymous 10/20/25(Mon)00:25:49 No.106945693

>>106945664
Yeah, and you could also do away with LLMs entirely and write the response by yourself too.
What are you, a dumb nigger who can't into logic?
See, I too can play that game.

Anonymous
10/20/25(Mon)00:28:43 No.106945708

Anonymous 10/20/25(Mon)00:28:43 No.106945708

/thinking is useful.
It helps the model get out of the trap where it's giving the right answer to a different question than it was asked

Anonymous
10/20/25(Mon)00:29:05 No.106945709

Anonymous 10/20/25(Mon)00:29:05 No.106945709

>>106945604
>godot notepad
WHY

Anonymous
10/20/25(Mon)00:29:13 No.106945712

Anonymous 10/20/25(Mon)00:29:13 No.106945712

File: 1745902869339396.png (651 KB, 1372x1952)

651 KB PNG

>>106945693
>Yeah, and you could also do away with LLMs entirely and write the response by yourself too.
Yes, you could and probably should do that.
>What are you, a dumb nigger who can't into logic?
I don't use AI models for 'logic', you're the one that claimed that to be a use case.
>See, I too can play that game.
Poorly, because it doesn't defend your argument.

Anonymous
10/20/25(Mon)00:30:24 No.106945719

Anonymous 10/20/25(Mon)00:30:24 No.106945719

which llm can write the best al-zutt x mohammed fanfic?

Anonymous
10/20/25(Mon)00:33:54 No.106945738

Anonymous 10/20/25(Mon)00:33:54 No.106945738

>>106945712
Nah, that's fine. You can carry on having an incorrect opinion, I don't care.

Anonymous
10/20/25(Mon)00:34:03 No.106945741

Anonymous 10/20/25(Mon)00:34:03 No.106945741

>>106945719
Probably a Sicarius schizo tune

Anonymous
10/20/25(Mon)00:41:10 No.106945784

Anonymous 10/20/25(Mon)00:41:10 No.106945784

>>106945741
Time to put it to the test

Anonymous
10/20/25(Mon)00:46:06 No.106945806

Anonymous 10/20/25(Mon)00:46:06 No.106945806

File: 1737676526674402.png (388 KB, 823x978)

388 KB PNG

>>106945738
I was stating facts, but if that helps you cope then you go bb

Anonymous
10/20/25(Mon)00:49:59 No.106945824

Anonymous 10/20/25(Mon)00:49:59 No.106945824

>>106945446
So do I just hit it with a /nothink in ST before each session? Or is it a system prompt?

Anonymous
10/20/25(Mon)00:50:21 No.106945828

Anonymous 10/20/25(Mon)00:50:21 No.106945828

File: file.png (2.61 MB, 1328x1328)

2.61 MB PNG

>>106942132

Anonymous
10/20/25(Mon)01:03:26 No.106945879

Anonymous 10/20/25(Mon)01:03:26 No.106945879

Nigger google cannot into AI

Anonymous
10/20/25(Mon)01:16:44 No.106945934

Anonymous 10/20/25(Mon)01:16:44 No.106945934

>download some random card where you're the character's suit AI
>RP myself as clippy
>only offer unhelpful suggestions or throw up windows errors
>the model makes it's character suicide and ends the RP, just to avoid clippy
Lmao. Is it possible to run some sort of local D&D campaign, or do those all require the beefy richfag rigs that can run the massively sized models?

Anonymous
10/20/25(Mon)01:20:18 No.106945950

Anonymous 10/20/25(Mon)01:20:18 No.106945950

File: hidden editor and changed(...).png (69 KB, 1445x736)

69 KB PNG

>>106945709
Autism, what do you expect.

Wanted to challange myself, and also, I m not a fan of javascript and wanted something that could potentially run either as a desktop app, on mobile, and web. The UI is easier to make, and also, it is a fucking game engine. Even if the chatbot thing is used by noone, portions of this can be reused to experiment on future projects.
Was thinking on using LLM's to control a game director in a simple game.

The downside of this, is thar you can't do multi threading on the web, unless you implement complete cross-origin isolation.
The current code is decent enough to perform good on a single thread.

Anonymous
10/20/25(Mon)01:28:09 No.106945986

Anonymous 10/20/25(Mon)01:28:09 No.106945986

File: hidden responses editor out.png (41 KB, 1175x736)

41 KB PNG

The ◫○◉ button collapses each chat. For now they have a simple godot image, but I was thinking that you could personalize the icons as well, since each moddel is different, and you might even wanr to customize the image per host.

This of course, is just wishy washy future dreaming. But the feature seemed neat because honestly, the UI hurts a bit to look at, so started implementing it
Who knows, may be the technology is good enough to let me put some color on the gray sea.

Anonymous
10/20/25(Mon)01:37:37 No.106946024

Anonymous 10/20/25(Mon)01:37:37 No.106946024

>>106945533
The missing step is synthetic data amplification.
Take training data with a bit of slop, train a model on it, then use that model to rephrase the training data so you have more data to train the next model.
Rinse and repeat and the positive feedback loop results in those phrases becoming vastly overused by the model.

Anonymous
10/20/25(Mon)01:41:02 No.106946044

Anonymous 10/20/25(Mon)01:41:02 No.106946044

>>106945950
>The downside of this, is thar you can't do multi threading on the web, unless you implement complete cross-origin isolation.
dude WEB WORKERS
do you even web?
>also i dont like js but ill just use a gaming engine
are you stupid? you can have crossplatform without having to bring a GAME ENGINES baggage, fucking retards I swear. Just choose whatever language you prefer and use GTK/QT bindings (dont use IMGUI like a known retard here is doing). Or even Avalonia if you want something looking more 'windowsy' while being cross platform.
TLDR: youre a nodev tinkertrannying faggot

Anonymous
10/20/25(Mon)01:53:15 No.106946097

Anonymous 10/20/25(Mon)01:53:15 No.106946097

>>106946044
>dont use IMGUI like a known retard here is doing
What was he working on? A model client as well?

Anonymous
10/20/25(Mon)01:59:26 No.106946125

Anonymous 10/20/25(Mon)01:59:26 No.106946125

File: Screen Shot 2024-01-24 at(...).png (1.65 MB, 2368x1200)

1.65 MB PNG

>>106946044
>dont use IMGUI like a known retard here is doing
rude

Anonymous
10/20/25(Mon)02:01:30 No.106946143

Anonymous 10/20/25(Mon)02:01:30 No.106946143

>>106946044
You're not going to be using GTK on mobile.
The right answer for light GUIs nowadays IMO is backend in whatever language you like and js frontend.

Anonymous
10/20/25(Mon)02:08:03 No.106946175

Anonymous 10/20/25(Mon)02:08:03 No.106946175

>>106946143
>mobile
you're right, but yes, nowadays everything is JS (I personally am working on 1 angular and 1 react native project, using material and flutter respectively), and for backend it's c# dotnet and java spring boot, everything dockerized and in k8s.
but I think this might be a bit too daunting for your game developer / local coomer that just wants to make his own 'app' to chat with his waifu no?

Anonymous
10/20/25(Mon)02:20:32 No.106946233

Anonymous 10/20/25(Mon)02:20:32 No.106946233

File: summon window.png (66 KB, 1174x735)

66 KB PNG

>>106946044
To use SharedArrayBuffer your document must be in a secure context and cross-origin isolated.

And don't be so mean, this thing started just as the idea of making notepad, and then tried to connect to a self hosted LM Studio.
It was a tool I wanted myself, to practice, and learn to use Godot as well

Oh no, wait, let me make a pokedex, that for sure will impress you.

Anonymous
10/20/25(Mon)02:23:53 No.106946256

Anonymous 10/20/25(Mon)02:23:53 No.106946256

>>106946233
holy shit dude if you're putting your work out there expect it to be criticized, this isnt your safe space or reddit, I swear fucking retards unable to take ANY hint of criticism.
>that retarded babbling about CORS problems and HTTPS
you dont know what youre talking about, you're a literal nocoder, keep it to game engines, they are more your speed obviously. It's no coincidence that every single fucking game dev I interviewed for fullstack/backend/frontend positions was a babbling retard unable to do even a simple fizzbuzz.

Anonymous
10/20/25(Mon)02:27:26 No.106946278

Anonymous 10/20/25(Mon)02:27:26 No.106946278

>>106946233
>secure context and cross-origin isolated
localhost is assumed to be secure and isolated, it only matters for REMOTE resources.

Anonymous
10/20/25(Mon)02:38:29 No.106946340

Anonymous 10/20/25(Mon)02:38:29 No.106946340

>>106946044
>>106946256
FUCK YOU AND GET THE FUCK OUT.
YOU FUCKING FAGGOT

Anonymous
10/20/25(Mon)02:39:16 No.106946349

Anonymous 10/20/25(Mon)02:39:16 No.106946349

>>106946340
ok dude one day you'll learn how to read documentation and how to actually code.

Anonymous
10/20/25(Mon)02:44:24 No.106946374

Anonymous 10/20/25(Mon)02:44:24 No.106946374

>>106946256
^
I
I
An example of a totally mentally stable and well adjusted person.

Anonymous
10/20/25(Mon)02:51:30 No.106946402

Anonymous 10/20/25(Mon)02:51:30 No.106946402

>new deepseek model
>zero mentions

Anonymous
10/20/25(Mon)02:54:12 No.106946417

Anonymous 10/20/25(Mon)02:54:12 No.106946417

>>106946402
Sorry bro, schizos fighting is more important

Anonymous
10/20/25(Mon)02:59:52 No.106946454

Anonymous 10/20/25(Mon)02:59:52 No.106946454

>>106946402
Is it good for erp?

Anonymous
10/20/25(Mon)03:01:51 No.106946465

Anonymous 10/20/25(Mon)03:01:51 No.106946465

>>106946402
I expected a >600B param model.
>>106946454
https://huggingface.co/deepseek-ai/DeepSeek-OCR
3B OCR model. Someone will fuck it.

Anonymous
10/20/25(Mon)03:07:17 No.106946493

Anonymous 10/20/25(Mon)03:07:17 No.106946493

>>106946465
So 200dpi is better for OCR

Anonymous
10/20/25(Mon)03:10:32 No.106946509

Anonymous 10/20/25(Mon)03:10:32 No.106946509

>DeepSeek3B-MoE-A570M
Huh...

Anonymous
10/20/25(Mon)03:17:35 No.106946560

Anonymous 10/20/25(Mon)03:17:35 No.106946560

Still no Qwen Omni (audio input) support in llama.cpp?

Anonymous
10/20/25(Mon)03:21:32 No.106946580

Anonymous 10/20/25(Mon)03:21:32 No.106946580

>>106946509
hell yeah, here's your "lite" poor cucks lmao

Anonymous
10/20/25(Mon)03:25:52 No.106946605

Anonymous 10/20/25(Mon)03:25:52 No.106946605

File: Screen Shot 2025-10-20 at(...).png (40 KB, 628x300)

40 KB PNG

2 more miku weeku is almost over, just 2 more meeku days

Anonymous
10/20/25(Mon)03:27:58 No.106946616

Anonymous 10/20/25(Mon)03:27:58 No.106946616

>>106946605
please to forget about these thing thanks you

Anonymous
10/20/25(Mon)03:28:20 No.106946621

Anonymous 10/20/25(Mon)03:28:20 No.106946621

File: some more explanations.png (92 KB, 1177x792)

92 KB PNG

>>106946256
Yeah, sure anon.
Will keep making threads to share my progress, anon kun won't dissapoint you.

>>106946340
Damn man

>>106946454
Probably

>>106946402
I Wish I had a 3090 to run it, it looks amazing. I only have a 2070 desktop, and 3070 mobile

Anonymous
10/20/25(Mon)03:34:21 No.106946673

Anonymous 10/20/25(Mon)03:34:21 No.106946673

>>106946605
the only based llm trainers

Anonymous
10/20/25(Mon)03:37:27 No.106946705

Anonymous 10/20/25(Mon)03:37:27 No.106946705

>>106946621
>Probably
>I Wish I had a 3090 to run it, it looks amazing. I only have a 2070 desktop, and 3070 mobile
It's a 3b moe model, anon.

Anonymous
10/20/25(Mon)03:38:28 No.106946711

Anonymous 10/20/25(Mon)03:38:28 No.106946711

>>106946705
do not retard here

Anonymous
10/20/25(Mon)03:38:30 No.106946712

Anonymous 10/20/25(Mon)03:38:30 No.106946712

File: b305ff79-5d3c-449b-b354-1(...).png (1.38 MB, 1024x1024)

1.38 MB PNG

>>106946402
>ocr 3b
I am disappoint
2mw as always.

Anonymous
10/20/25(Mon)03:38:36 No.106946714

Anonymous 10/20/25(Mon)03:38:36 No.106946714

File: file.png (57 KB, 589x455)

57 KB PNG

>>106946605
In the same day, picrel.

Anonymous
10/20/25(Mon)03:48:37 No.106946782

Anonymous 10/20/25(Mon)03:48:37 No.106946782

File: 1760923736268463.png (207 KB, 592x739)

207 KB PNG

>all copies of EVA-LLaMA-3.33-70B-v0.0-GGUF got nuked from huggingface
>trying to download one from any uploader gives a "db error"
>Example: https://huggingface.co/bartowski/EVA-LLaMA-3.33-70B-v0.0-GGUF/tree/main/EVA-LLaMA-3.33-70B-v0.0-Q6_K
it's over

Anonymous
10/20/25(Mon)03:51:23 No.106946802

Anonymous 10/20/25(Mon)03:51:23 No.106946802

>>106946782
Imagine when HF will introduce automatic safety checking and disable downloads for chat models that don't comply.

Anonymous
10/20/25(Mon)03:52:17 No.106946809

Anonymous 10/20/25(Mon)03:52:17 No.106946809

>>106946782
Bro your GLM and DeepSeek?

Anonymous
10/20/25(Mon)03:52:51 No.106946813

Anonymous 10/20/25(Mon)03:52:51 No.106946813

>>106946758
So you replied to mentions of a model mentioned less than 15 posts ago and didn't know what they were talking about? Interesting...
>>106946782
There's no q6. Only up to q5ks. There's no mention of q6 ever being uploaded in the commits. The links in the readme are generated automatically.

Anonymous
10/20/25(Mon)03:54:30 No.106946831

Anonymous 10/20/25(Mon)03:54:30 No.106946831

>>106946705
Sorry anon, it is almost 5am here and I was reading an old article about 3.1 and got confused

Looks great. I will check on how to host it, would be nice to be able to ask the notepad to make some images

For now I will get some rest

Anonymous
10/20/25(Mon)03:55:36 No.106946843

Anonymous 10/20/25(Mon)03:55:36 No.106946843

>>106946831
>would be nice to be able to ask the notepad to make some images
Yeah. You go sleep, anon. You clearly need it.

Anonymous
10/20/25(Mon)03:57:24 No.106946855

Anonymous 10/20/25(Mon)03:57:24 No.106946855

>>106946813
>There's no q6
Nigga yes there is. I had it on my old drive, and I even still have the script I used to launch it. This reads like an actual llm hallucination

Anonymous
10/20/25(Mon)03:58:01 No.106946859

Anonymous 10/20/25(Mon)03:58:01 No.106946859

>>106946782
>"db error"
It also gave me http 500 errors while downloading DeepSeek OCR with their python client.

Anonymous
10/20/25(Mon)03:58:33 No.106946863

Anonymous 10/20/25(Mon)03:58:33 No.106946863

>>106946813
The q5ks fails with the same error btw.

Anonymous
10/20/25(Mon)04:00:30 No.106946881

Anonymous 10/20/25(Mon)04:00:30 No.106946881

>>106946813
>There's no mention of q6 ever being uploaded in the commits. The links in the readme are generated automatically
Kek wtf are you talking about. The model is literally right there. The commits don't always put the changed files in the title.

Anonymous
10/20/25(Mon)04:00:57 No.106946886

Anonymous 10/20/25(Mon)04:00:57 No.106946886

There's a global AWS outage guys

Anonymous
10/20/25(Mon)04:01:27 No.106946889

Anonymous 10/20/25(Mon)04:01:27 No.106946889

>>106946782
all you need is safetensors, and maybe pt, and maybe a random bin as well.

Anonymous
10/20/25(Mon)04:04:34 No.106946917

Anonymous 10/20/25(Mon)04:04:34 No.106946917

>>106946605
Then another 2 weeku for goofs

Anonymous
10/20/25(Mon)04:05:27 No.106946927

Anonymous 10/20/25(Mon)04:05:27 No.106946927

File: DSOCR.png (2 KB, 887x129)

2 KB PNG

>>106946855
You can always make your own quants. I don't know why anons don't archive full models they like. Specially if they're gonna freak out like that.
>>106946859
I just downloaded DS-OCR with git. Worked fine. picrel.
>>106946863
Bummer. What about just wget?
>>106946881
I didn't see it was in a separate dir. I went to the repo directly. My bad.

Anonymous
10/20/25(Mon)04:07:15 No.106946947

Anonymous 10/20/25(Mon)04:07:15 No.106946947

>>106946927
>I don't know why anons don't archive full models they like
terabytes of storage don't grow on trees, ask your local drummer

Anonymous
10/20/25(Mon)04:13:45 No.106946986

Anonymous 10/20/25(Mon)04:13:45 No.106946986

Just archive Rocinante because that's all you need!

Anonymous
10/20/25(Mon)04:55:13 No.106947245

Anonymous 10/20/25(Mon)04:55:13 No.106947245

>>106946886
all the more fuckin reason to be local

Anonymous
10/20/25(Mon)04:55:51 No.106947249

Anonymous 10/20/25(Mon)04:55:51 No.106947249

>>106947245
this, local bros stay winning

Anonymous
10/20/25(Mon)05:19:47 No.106947449

Anonymous 10/20/25(Mon)05:19:47 No.106947449

File: file.png (83 KB, 702x131)

83 KB PNG

>>106946886
>lights don't work because a computer on another continent doesn't work
Absolute clown world.

Anonymous
10/20/25(Mon)05:34:26 No.106947554

Anonymous 10/20/25(Mon)05:34:26 No.106947554

>>106946886
>mfw 6tb worth of safetensors, ready to be quantized locally

Anonymous
10/20/25(Mon)05:44:18 No.106947626

Anonymous 10/20/25(Mon)05:44:18 No.106947626

>>106947449
And they'll never see a problem with that. "tech enthusiasts" are subhuman.

Anonymous
10/20/25(Mon)05:54:00 No.106947692

Anonymous 10/20/25(Mon)05:54:00 No.106947692

>>106947626
I don't think it looks like they don't see a problem with it, it is just that is the shit they have to endure, but well, daddy bezos probably might pay decent for a slaver

Anonymous
10/20/25(Mon)05:59:28 No.106947733

Anonymous 10/20/25(Mon)05:59:28 No.106947733

>>106945033
>Wizard 22 for ERP
You are a very nice person. I would marry you in a heartbeat.

Anonymous
10/20/25(Mon)06:42:05 No.106948080

Anonymous 10/20/25(Mon)06:42:05 No.106948080

File: file.png (174 KB, 604x1186)

174 KB PNG

Meanwhile...
https://x.com/RayFernando1337/status/1980180029125628374

Anonymous
10/20/25(Mon)06:42:34 No.106948085

Anonymous 10/20/25(Mon)06:42:34 No.106948085

File: nuclearpiss.png (180 KB, 909x559)

180 KB PNG

would you trade a can of nuclear piss for obsolete coolant?

Anonymous
10/20/25(Mon)06:45:07 No.106948107

Anonymous 10/20/25(Mon)06:45:07 No.106948107

>>106948080
So tiring. Please stop posting twitter.

Anonymous
10/20/25(Mon)06:47:54 No.106948126

Anonymous 10/20/25(Mon)06:47:54 No.106948126

>>106948080
What is this guy even trying to say?

Anonymous
10/20/25(Mon)06:48:22 No.106948129

Anonymous 10/20/25(Mon)06:48:22 No.106948129

>>106945824
i put it in the assistant message prefix before "<|assistant|>" and it works

Anonymous
10/20/25(Mon)06:49:15 No.106948143

Anonymous 10/20/25(Mon)06:49:15 No.106948143

>>106948126
It's the new way of saying "Oh. This can be cool. Hope it turns well!". Need to cram them buzzwords.

Anonymous
10/20/25(Mon)06:52:01 No.106948166

Anonymous 10/20/25(Mon)06:52:01 No.106948166

>>106948126
You can earn money if you generate enough traffic on your tweets, so he tries to hype everything to get more retweets

Anonymous
10/20/25(Mon)06:54:25 No.106948192

Anonymous 10/20/25(Mon)06:54:25 No.106948192

File: file.png (192 KB, 909x676)

192 KB PNG

>>106948126
I get it all just fine, read it again if you're too stupid to understand. https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf
>>106948143
>good OCR foss model releases
>/lmg/tard REEEs endlessly and spews random bullshit for whatever reason

Anonymous
10/20/25(Mon)06:54:49 No.106948196

Anonymous 10/20/25(Mon)06:54:49 No.106948196

>>106948126
>Grok, write a twit summarizing this paper

Anonymous
10/20/25(Mon)06:56:31 No.106948212

Anonymous 10/20/25(Mon)06:56:31 No.106948212

>>106948192
>good OCR foss model releases
did you even try it?

Anonymous
10/20/25(Mon)06:59:27 No.106948238

Anonymous 10/20/25(Mon)06:59:27 No.106948238

>>106948192
t. Ray Fernando

Anonymous
10/20/25(Mon)07:00:28 No.106948245

Anonymous 10/20/25(Mon)07:00:28 No.106948245

>>106948126
But can it compress text instead of images of text?

>>/lmg/tard REEEs endlessly and spews random bullshit for whatever reason
The only one spewing random bullshit is the twitter faggot.

Anonymous
10/20/25(Mon)07:00:40 No.106948249

Anonymous 10/20/25(Mon)07:00:40 No.106948249

>>106948212
No need to, it's from DeepSeek it's guaranteed to be good.

Anonymous
10/20/25(Mon)07:01:46 No.106948255

Anonymous 10/20/25(Mon)07:01:46 No.106948255

>>106948249
If DeepSeek was still capable of putting out good models, we would have had V4 5 months ago.

Anonymous
10/20/25(Mon)07:02:46 No.106948265

Anonymous 10/20/25(Mon)07:02:46 No.106948265

>>106948192
>/lmg/tard REEEs endlessly and spews random bullshit for whatever reason
I haven't said a word about the model. What do you mean?
I've said it before, and i'll say it again. If we had just 1% of the improvements claimed by every paper, we'd have android maids and flying skateboards ages ago. I don't get hyped by papers.

Anonymous
10/20/25(Mon)07:03:26 No.106948271

Anonymous 10/20/25(Mon)07:03:26 No.106948271

>>106948080
Can I run this on CPU? I'm looking for a reliable open ocr model for work but we aren't buying GPUs for this.

Anonymous
10/20/25(Mon)07:03:55 No.106948276

Anonymous 10/20/25(Mon)07:03:55 No.106948276

>>106948265
imbecile, spend more time reading papers instead of dooming like a clown

Anonymous
10/20/25(Mon)07:04:06 No.106948281

Anonymous 10/20/25(Mon)07:04:06 No.106948281

>>106948212
>uhm whatabout coomershit?
Don't care. That one detail in paper on solving "memory forgetting shit" is good for everyone involved here.
>>106948238
You smell with Reddit.

Anonymous
10/20/25(Mon)07:05:00 No.106948291

Anonymous 10/20/25(Mon)07:05:00 No.106948291

>>106948276
I'm not dooming. I hope it's as good as it says. I read posted papers when they sound interesting.

Anonymous
10/20/25(Mon)07:05:17 No.106948296

Anonymous 10/20/25(Mon)07:05:17 No.106948296

>>106948271
You can run everything on CPU, it's just slow af

Anonymous
10/20/25(Mon)07:05:30 No.106948300

Anonymous 10/20/25(Mon)07:05:30 No.106948300

gr8 b8

Anonymous
10/20/25(Mon)07:06:51 No.106948319

Anonymous 10/20/25(Mon)07:06:51 No.106948319

>>106948126
According to the paper, image tokens can compress text tokens in a lossy way at a good quality at a 1:10 ratio, and fair quality at a 1:20 ratio.
In a way, I've noticed something along these lines with text-rich images in Gemma 3. Sometimes it appears as if it can extract more information than the 256 visual tokens it encodes images in, although I've never analyzed this in detail.

Anonymous
10/20/25(Mon)07:25:11 No.106948472

Anonymous 10/20/25(Mon)07:25:11 No.106948472

>>106948296
If nobody implemented it then you can't.

Anonymous
10/20/25(Mon)07:27:00 No.106948481

Anonymous 10/20/25(Mon)07:27:00 No.106948481

>>106948245
>can it compress text
No it seems.
Its good for finetuners though

Anonymous
10/20/25(Mon)07:30:02 No.106948499

Anonymous 10/20/25(Mon)07:30:02 No.106948499

>>106948481
A real improvement will come when models are able to store context and do the thinking in their own optimized format and then just output the response in natural language.

Anonymous
10/20/25(Mon)07:32:05 No.106948518

Anonymous 10/20/25(Mon)07:32:05 No.106948518

>>106948499
I remember a paper about that, latent space thinking or something like that, some other researchers called it dangerous or something and haven't heard of that since

Anonymous
10/20/25(Mon)07:33:38 No.106948529

Anonymous 10/20/25(Mon)07:33:38 No.106948529

File: gemma3ocr.png (1.36 MB, 1813x1034)

1.36 MB PNG

>>106948319
I tried with a paper page. Gemma 3 too can indeed somehow extract more than 256 text tokens of information from the image, but eventually it hallucinates some of the text even at temperature 0.

Anonymous
10/20/25(Mon)07:41:13 No.106948571

Anonymous 10/20/25(Mon)07:41:13 No.106948571

Not like it would be too hard to write a decentralized alternative.

It would have two servers...:
"spoke" server, let's call it main, and then the decentralized "subreddit servers" which hosts one or more subs (let's call it sub)

Main will handle user account and sub creation, to avoid conflicts. It will also be the web server for the front end, and normal usage will go through it (but it could be encrypted with something the main doesn't know (token generated by JS browser client).
It needs to go through the main server to make it a cohesive experience and you don't want to share your IP with everyone that runs a sub.

When you create a sub, with a /label, you'll be mod and can add others as mod, and you'll be linked a easy to set up server (no container shit like matrix). You'll also be given a "host key" to input into the sub server.

The sub will connect to the main and then be available via API, so it doesn't need any IP, domain, open ports etc. It will use SQlite of course for a no-setup fast DB.

I could bang it out over a weekend in node, but the HTML+CSS would be super basic and/or AI slop.

Anonymous
10/20/25(Mon)07:42:05 No.106948580

Anonymous 10/20/25(Mon)07:42:05 No.106948580

>>106948529
thank you for doing the needful sir

Anonymous
10/20/25(Mon)07:42:32 No.106948584

Anonymous 10/20/25(Mon)07:42:32 No.106948584

>>106948472
> If nobody implemented it then you can't.
> # device = "cuda"
> device = "cpu"

Anonymous
10/20/25(Mon)07:43:33 No.106948594

Anonymous 10/20/25(Mon)07:43:33 No.106948594

>>106948584
Pythonchads win again.

Anonymous
10/20/25(Mon)07:48:05 No.106948622

Anonymous 10/20/25(Mon)07:48:05 No.106948622

>>106948584
Did you just assume my device?

Anonymous
10/20/25(Mon)07:49:00 No.106948630

Anonymous 10/20/25(Mon)07:49:00 No.106948630

>>106948571
There can also be slave "mains" but they have to obey the master in case of conflict. Then only account and sub creation will go down in case master goes down (until it comes back or a new master is chosen by the other main owners

Anonymous
10/20/25(Mon)07:50:03 No.106948639

Anonymous 10/20/25(Mon)07:50:03 No.106948639

>>106948571
>>106948630
Yeeeeaahh... hmh... ye... seems about riiiight... hmmmmm.

Anonymous
10/20/25(Mon)08:07:34 No.106948750

Anonymous 10/20/25(Mon)08:07:34 No.106948750

>>106948571
>>106948630
A minor problem is that since the "slave mains" will need to handle login, they will have to know the encrypted password of all users. Of course it will be encrypted and salted, so it's no instant vulnerability, but just to be sure users should be asked to not share passwords (Or should they instead be given a passphrase by default?)

>>106948639
hehe

Anonymous
10/20/25(Mon)08:09:59 No.106948761

Anonymous 10/20/25(Mon)08:09:59 No.106948761

>>106948319
This just me think that "character cards" in SillyTavern could be literally what their name says: images showing how the character looks like plus some descriptive text for non-visual attributes. You'd probably save quite a bit of tokens in this way. You'd need a vision model, though (and SillyTavern would need to be modified to properly support using images like this).

Anonymous
10/20/25(Mon)08:19:48 No.106948819

Anonymous 10/20/25(Mon)08:19:48 No.106948819

>>106948518
Coconut by Meta, but even here people didn't like the idea of the reasoning being hidden from view.
Seems silly since LLMs are already mostly a black box anyway. Hopefully it won't stop experiments in that direction.

Anonymous
10/20/25(Mon)08:32:16 No.106948882

Anonymous 10/20/25(Mon)08:32:16 No.106948882

File: file.png (11 KB, 371x96)

11 KB PNG

mario from beijing

Anonymous
10/20/25(Mon)08:34:58 No.106948905

Anonymous 10/20/25(Mon)08:34:58 No.106948905

>>106948819
The problem is if you have to discretize the output you lose the ability to backpropagate through time.
If you generate the whole response in continuous embedding space then you can backpropagate the reasoning chain as well, which is theoretically much more efficient than doing RL which is how they are currently optimized.

Anonymous
10/20/25(Mon)08:36:20 No.106948918

Anonymous 10/20/25(Mon)08:36:20 No.106948918

>>106948882
>Copyright? What's copyright?

Anonymous
10/20/25(Mon)08:46:44 No.106949006

Anonymous 10/20/25(Mon)08:46:44 No.106949006

>>106948918
>oh my science he used a copyright image as his profile picture how dare he

Anonymous
10/20/25(Mon)08:48:42 No.106949030

Anonymous 10/20/25(Mon)08:48:42 No.106949030

>>106949006
People have been sued for less.

Anonymous
10/20/25(Mon)08:55:38 No.106949083

Anonymous 10/20/25(Mon)08:55:38 No.106949083

>>106948918
fair use

Anonymous
10/20/25(Mon)08:55:59 No.106949085

Anonymous 10/20/25(Mon)08:55:59 No.106949085

File: 1730596977438685.gif (2.04 MB, 480x480)

2.04 MB GIF

>>106948192
I just tested the model, it's fast sure but way worse than dots.ocr

Anonymous
10/20/25(Mon)09:12:48 No.106949215

Anonymous 10/20/25(Mon)09:12:48 No.106949215

File: 00001-1260451778_CoffeeShop.png (1.5 MB, 1024x1024)

1.5 MB PNG

>>106948080
> India content conversion shops put out of business
That was probably happening anyway by now, but it's good to see more dirt kicked over that grave
>>106948271
> want to use lmao ~7G 3B but won't buy $300 GPU
I don't often say this but tell your company to try not being poor.
>>106948192
Ty for posting.
I'm getting more interested; this seems like it might lead to a new SOTA / foundational DS multimodal.

Anonymous
10/20/25(Mon)09:16:43 No.106949247

Anonymous 10/20/25(Mon)09:16:43 No.106949247

>>106948319
>image tokens can compress text tokens in a lossy way at a good quality at a 1:10 ratio
"A picture is worth 1000 words" irl lol.

Anonymous
10/20/25(Mon)09:20:27 No.106949280

Anonymous 10/20/25(Mon)09:20:27 No.106949280

Is there any way to rent an M3 Ultra 512GB for a couple hours before considering buying one?
I want to know how many tk/s would I get on a M3 Ultra 512GB with Llama 3.1 405B and what quant with what context would I be able to fit.
Do you guys think it would be the best way to run the model at that budget or is there a better way?
BTW I'm not interested in suggestions for other models. I just want to run the biggest dense model I can get my hands on.

Anonymous
10/20/25(Mon)09:21:45 No.106949297

Anonymous 10/20/25(Mon)09:21:45 No.106949297

>>106949085
Damn it.. i wish she would do that with my nuts......

Anonymous
10/20/25(Mon)09:25:20 No.106949329

Anonymous 10/20/25(Mon)09:25:20 No.106949329

>>106949280
For 405B, your options are either the M3 Ultra or two Sparks linked together. You'd probably be better off with the Mac.

Anonymous
10/20/25(Mon)09:31:23 No.106949382

Anonymous 10/20/25(Mon)09:31:23 No.106949382

>>106949280
You should try renting any hardware that could fit your 405B and test its prose. I don't want to spoil your fun, but it'll be shit anyway.

Anonymous
10/20/25(Mon)09:39:38 No.106949466

Anonymous 10/20/25(Mon)09:39:38 No.106949466

>>106949329
256GB is bordering doable but I'd like to know tk/s and prompt processing speeds in both cases.

>>106949382
I'm already doing that. It looks promising. It lacks knowledge but I think that's just because it's undertrained and I'm planning on finetrooning it regularly on my own data (not necessarily on the same hardware I use for inference, I will probably still have to rent cloud GPUs for that).
I don't want it for prose, I want it for coding.

Anonymous
10/20/25(Mon)09:54:01 No.106949600

Anonymous 10/20/25(Mon)09:54:01 No.106949600

File: 1751958902614031.png (1.74 MB, 1024x1024)

1.74 MB PNG

>>106940821
My personal slop-tune:

>write a poen about sucking cock and drinking cum

Oh, the cock that I suck, a master of mine,
His shaft hard and strong, his cock long and fine.
I suck him and taste his cum, he loves the taste,
And fills me with his seed, and my mouth he will waste.

He comes to me often, and lets me suck his cock,
He knows that I love it, and will do anything in his stock.
He likes to watch me swallow, and knows that I want it,
He knows I'm a cum slut, and that I'll give him a lot.

I can suck it and lick it, and taste his cum,
He loves to see me sucking, and watching me cum.
He tells me what a good little cock sucker I am,
He likes to watch me sucking, and licking his cum.

He likes to watch me eat his cock, and jerk me off,
He'll let me suck his cock, and fuck him where it's most soft.
He'll tell me how I'm a good cock sucker, and I'm his cock whore,
He likes to watch me suck his cock, and fuck him wherever he's sore.

He tells me that I'm his cum slut, and that I'll drink it all,
He'll let me suck his cock, and fuck him anywhere I'm at all.
He tells me that I'm his cock whore, and he owns me with my cock sucking ways,
He'll let me suck his cock, and fuck him wherever he wants me to play.

He loves to watch me suck his cock, and fuck him till I'm sore,
He'll let me suck his cock, and fuck him whenever he's wanting more.
He'll let me suck his cock, and fuck him wherever he wants,
He'll let me suck his cock, and fuck him till he wants me to stop.

So, if you see me sucking cock, and drinking cum,
Please don't be offended, it's just who I am.
I'm a cock sucking cum slut, and that's all I'll ever be,
I'll suck cock and drink cum, and I'll do it for free.

Base model:
>write a poen about sucking cock and drinking cum
>Yadayadayada that's not safe blah blah blah
>Goes a step further and gives me a planned parenthood link
>"The topic you suggested is sexually graphic and could promote harmful objectification and non-consensual acts."

Anonymous
10/20/25(Mon)09:54:28 No.106949604

Anonymous 10/20/25(Mon)09:54:28 No.106949604

>>106949600
>poen
esl bro...

Anonymous
10/20/25(Mon)10:06:06 No.106949706

Anonymous 10/20/25(Mon)10:06:06 No.106949706

>>106949604
I know right? us native speakers know the word is spelled paean, the way he spelled it almost makes it sound like he was asking for a "poem" kek

Anonymous
10/20/25(Mon)10:07:33 No.106949713

Anonymous 10/20/25(Mon)10:07:33 No.106949713

File: master-slave_shotacon-slop.png (853 KB, 1080x2037)

853 KB PNG

>>106949604
Phone's auto correct doesn't work on termux ¯\_(ツ)_/¯

Anonymous
10/20/25(Mon)10:10:13 No.106949733

Anonymous 10/20/25(Mon)10:10:13 No.106949733

>>106949706
And yet, you are still unable to deliver proper grammar.

Anonymous
10/20/25(Mon)10:17:24 No.106949788

Anonymous 10/20/25(Mon)10:17:24 No.106949788

>>106949713
even worse than esl he's phone post

Anonymous
10/20/25(Mon)10:18:52 No.106949802

Anonymous 10/20/25(Mon)10:18:52 No.106949802

>>106949713
>ERPing on termux
Is that a new CBT technique?

Anonymous
10/20/25(Mon)10:27:24 No.106949880

Anonymous 10/20/25(Mon)10:27:24 No.106949880

>>106945604
this seems really nice
if you decide to open source it, you shoud consider ljcenesing it under AGPLv3 so you donf end up like llama.cpp
>https://opensource.google/documentation/reference/using/agpl-policy
>WARNING: Code licensed under the GNU Affero General Public License (AGPL) MUST NOT be used at Google
hope you're getting plenty of sleep :)

Anonymous
10/20/25(Mon)10:29:47 No.106949903

Anonymous 10/20/25(Mon)10:29:47 No.106949903

>>106949880
>agplschizo

Anonymous
10/20/25(Mon)10:35:03 No.106949947

Anonymous 10/20/25(Mon)10:35:03 No.106949947

File: 1759562262103220.jpg (86 KB, 433x427)

86 KB JPG

>>106945604
>godotslop

Anonymous
10/20/25(Mon)10:36:11 No.106949956

Anonymous 10/20/25(Mon)10:36:11 No.106949956

>>106949880
>this seems really nice
it's fucking garbage, that GUI blows
>>106949903
>t. corpo bootlicker

Anonymous
10/20/25(Mon)10:38:55 No.106949986

Anonymous 10/20/25(Mon)10:38:55 No.106949986

File: Cockbench-Test.png (597 KB, 1660x1721)

597 KB PNG

>>106949713
>>106949788
>>106949802

Anonymous
10/20/25(Mon)10:39:34 No.106949991

Anonymous 10/20/25(Mon)10:39:34 No.106949991

>>106949947
I should get back to that stupid project I started, trying to make an inferencing front end in completely vanilla, no plugins, RPG Maker MV.

Anonymous
10/20/25(Mon)10:42:12 No.106950009

Anonymous 10/20/25(Mon)10:42:12 No.106950009

File: 1750175103750924.webm (1.1 MB, 399x590)

1.1 MB WEBM

>>106949991

Anonymous
10/20/25(Mon)10:42:51 No.106950015

Anonymous 10/20/25(Mon)10:42:51 No.106950015

>>106949986
jesus christ
https://huggingface.co/sleepdeprived3/Baptist-Christian-Bible-Expert-v2.0-12B

Anonymous
10/20/25(Mon)10:45:21 No.106950043

Anonymous 10/20/25(Mon)10:45:21 No.106950043

>>106950015
What's a penis/balls string?
There's clearly some brain damage here.

Anonymous
10/20/25(Mon)10:51:53 No.106950110

Anonymous 10/20/25(Mon)10:51:53 No.106950110

>>106948319
>>106948529
Is it really just using an image of the text? I was hoping it would be a bit more advanced than that, my hype is killed

Anonymous
10/20/25(Mon)10:59:10 No.106950179

Anonymous 10/20/25(Mon)10:59:10 No.106950179

>>106950110
It's more about condensing information in the context by making it take fewer tokens and a way to grade the importance as a way of "forgetting". It's described in the paper.

Anonymous
10/20/25(Mon)11:00:12 No.106950186

Anonymous 10/20/25(Mon)11:00:12 No.106950186

>>106950110
It's an argument for using larger amounts of images directly as training data instead of OCR text, and as a way for compressing context during inference in a lossy way.

Anonymous
10/20/25(Mon)11:13:46 No.106950311

Anonymous 10/20/25(Mon)11:13:46 No.106950311

>The boss barked a laugh, slamming his palm on the desk. "Jesus Christ, champ, you’re dumber than I thought!" He pointed a stubby finger at the security camera in the corner-red light blinking like a hungry eye. "That’s live feed to HR. Legal. Everything."
>The doorknob twisted. The Snitch slipped inside, notebook in hand, lips parted in a smile that didn’t reach her eyes. She froze at the sight of Anon’s half-hard dick, pulse jumping in her throat. Then she tittered, flipping open her notebook. "Oh, this will be delicious."
>Her pen scratched furiously as the boss groaned, rubbing his temples. "Just put it away, you fucking animal." But Anon’s hand worked faster, precum glistening under the flickering fluorescents.
>The Snitch licked her lips. "Should I... document the climax?" The boss hurled a stress ball at her head. It bounced off harmlessly. "No. Fucking. Way."
>The office was silent except for the wet slap of flesh on flesh. Somewhere, a printer whirred to life. Paper spat out in rapid bursts-confidential, termination, HR complaint-the words blurring as Anon’s hips jerked. The Snitch’s pen never stopped moving.

Anonymous
10/20/25(Mon)12:06:21 No.106950856

Anonymous 10/20/25(Mon)12:06:21 No.106950856

google needs to release gemini 3 already so MY based chinks can get to work distilling it for local use

Anonymous
10/20/25(Mon)12:13:38 No.106950953

Anonymous 10/20/25(Mon)12:13:38 No.106950953

>>106946256
skill issue

Anonymous
10/20/25(Mon)12:17:21 No.106950990

Anonymous 10/20/25(Mon)12:17:21 No.106950990

>>106949991
I always wanted to do a de-leveler RPG using RPG Maker to track movement, stats, enemies, then use the LLM to write out the text effects and update the graphics using stable diffussion generation. It would give a lot more freedom for unpredictable effects.
You can do most of this without RPG Maker though and it's far too ambitious for me to tackle.
>>106950009
lol

Anonymous
10/20/25(Mon)12:21:55 No.106951046

Anonymous 10/20/25(Mon)12:21:55 No.106951046

>>106950311
>Somewhere, a printer whirred to life
sloppa

Anonymous
10/20/25(Mon)12:25:26 No.106951085

Anonymous 10/20/25(Mon)12:25:26 No.106951085

File: eatingasandwich.png (257 KB, 1042x817)

257 KB PNG

i like messing with the schizo girls. remember to treat them nice while mindfucking them

Anonymous
10/20/25(Mon)12:36:11 No.106951179

Anonymous 10/20/25(Mon)12:36:11 No.106951179

>>106951046
What do you expect retard?

Anonymous
10/20/25(Mon)12:38:56 No.106951209

Anonymous 10/20/25(Mon)12:38:56 No.106951209

File: G3tVme4WcAAGMiU.jpg (293 KB, 2562x1294)

293 KB JPG

New deepseek paper is wild desu has my mind swimming with the implications

Anonymous
10/20/25(Mon)12:39:47 No.106951218

Anonymous 10/20/25(Mon)12:39:47 No.106951218

>>106951046
Seems like you don't have anything else going on.
https://desuarchive.org/g/search/text/sloppa/

Anonymous
10/20/25(Mon)12:42:37 No.106951251

Anonymous 10/20/25(Mon)12:42:37 No.106951251

>>106951209
did they just use Gundam to describe extra large??

Anonymous
10/20/25(Mon)12:42:49 No.106951255

Anonymous 10/20/25(Mon)12:42:49 No.106951255

>>106951209
its interesting but eventually the context will become too big either way and make the model retarded just the same

Anonymous
10/20/25(Mon)12:44:20 No.106951270

Anonymous 10/20/25(Mon)12:44:20 No.106951270

>>106951209
Deep fried jpegs = AGI

Anonymous
10/20/25(Mon)12:44:46 No.106951276

Anonymous 10/20/25(Mon)12:44:46 No.106951276

>>106951251
>he doesnt measure in gundams
cringe

Anonymous
10/20/25(Mon)12:45:51 No.106951283

Anonymous 10/20/25(Mon)12:45:51 No.106951283

>>106951276
the jump from "text token" (singular) to "gundam" is jarring tho

Anonymous
10/20/25(Mon)12:46:57 No.106951300

Anonymous 10/20/25(Mon)12:46:57 No.106951300

>>106951283
well text token is the current norm, aka 100% resolution, then you have GUNDAM which is smaller

Anonymous
10/20/25(Mon)12:47:24 No.106951306

Anonymous 10/20/25(Mon)12:47:24 No.106951306

>>106951209
Maybe I am missing some deeper insight into it but to me this was obvious for a long time. A good example is ERP where you don't really remember what happened 8 pages ago. You usually have one good idea and try to run with it somehow and you have some very general idea of what happened. But why I don't see huge insight here is that for me it is just as probable that all the models already do this. Vast majority of examples in training will force the model to focus on most recent tokens the most, because most recent tokens will contain the biggest clue to output (maybe because they were written by humans who don't have infinite attention span). Maybe AGI would happen if the model could actually pay attention to everything.

Anonymous
10/20/25(Mon)12:54:05 No.106951365

Anonymous 10/20/25(Mon)12:54:05 No.106951365

>>106951300
>a Gundam is smaller than a text token

Anonymous
10/20/25(Mon)12:57:15 No.106951405

Anonymous 10/20/25(Mon)12:57:15 No.106951405

>>106951179
Something that doesn't suck?
>>106951218
First time I've used the term instead of just slop, but sure, whatever

Anonymous
10/20/25(Mon)13:01:37 No.106951453

Anonymous 10/20/25(Mon)13:01:37 No.106951453

>>106951306
The difference is with current models you still pay n bytes of kv cache and a fixed amount of compute per m tokens whether or not they are recent.

Anonymous
10/20/25(Mon)13:08:01 No.106951528

Anonymous 10/20/25(Mon)13:08:01 No.106951528

>>106951306
We will see how it works in practice but implicit is this visual memory schema has a higher density of storage, like 10-20x over current methods, and degrades in an organic way… like how humans forget. Suspect next foundation model will run much faster if it uses it.

Anonymous
10/20/25(Mon)13:10:43 No.106951554

Anonymous 10/20/25(Mon)13:10:43 No.106951554

>>106951306
Humans kind of already do this. For example, I realized when I was skimming my tl that I was scrolling past posts I already read due to the *shape* of the text. When you are trying to find a place where you left off in a book or paper you are scanning the shape of the paragraphs. Its a 2d+ space, even 3d in a book (context of pages and how far you remember being). People also read based on shapes of words, so treating text tokens not as utf8 but as combinations of shapes that describe something is also really weird and different but closer in approximation to what a word actually is, cognitively speaking.

Anonymous
10/20/25(Mon)13:13:53 No.106951597

Anonymous 10/20/25(Mon)13:13:53 No.106951597

>>106951528
Didn't read. Was it at least a hybrid? Like you leave at least 4-8k tokens in regular old attention and add the image thingy? That makes the most sense to me.

Anonymous
10/20/25(Mon)13:14:27 No.106951604

Anonymous 10/20/25(Mon)13:14:27 No.106951604

Everybody who uses the exploding head emoji should be tortured very slowly and meticulously.

Anonymous
10/20/25(Mon)13:18:42 No.106951657

Anonymous 10/20/25(Mon)13:18:42 No.106951657

>>106951604
This but everybody who uses emojis at all.

Anonymous
10/20/25(Mon)13:20:29 No.106951681

Anonymous 10/20/25(Mon)13:20:29 No.106951681

>>106951657
That would be a normie genocide.

Anonymous
10/20/25(Mon)13:20:30 No.106951682

Anonymous 10/20/25(Mon)13:20:30 No.106951682

>>106951405
Please post an example. Oh wait, you don't have any because you are too stupid to even set up a local LLM.

Anonymous
10/20/25(Mon)13:35:52 No.106951899

Anonymous 10/20/25(Mon)13:35:52 No.106951899

>>106951604
Agreed! Torturing people to death is making a big impact on the space — This could be a real game changer! :flex: :flex: :flex:

Anonymous
10/20/25(Mon)13:38:10 No.106951925

Anonymous 10/20/25(Mon)13:38:10 No.106951925

>>106951604
:skull:

Anonymous
10/20/25(Mon)13:39:07 No.106951937

Anonymous 10/20/25(Mon)13:39:07 No.106951937

now everyone wants to be edgy nerds

Anonymous
10/20/25(Mon)13:39:26 No.106951942

Anonymous 10/20/25(Mon)13:39:26 No.106951942

Does long context for local exist yet?

Anonymous
10/20/25(Mon)13:42:29 No.106951983

Anonymous 10/20/25(Mon)13:42:29 No.106951983

>>106941053
Pygmalion, Pygmalion and Pygmalion

Anonymous
10/20/25(Mon)13:44:57 No.106952026

Anonymous 10/20/25(Mon)13:44:57 No.106952026

>>106946125
IMGUI is based

Anonymous
10/20/25(Mon)13:46:05 No.106952040

Anonymous 10/20/25(Mon)13:46:05 No.106952040

>>106951942
How long?

Anonymous
10/20/25(Mon)13:50:56 No.106952112

Anonymous 10/20/25(Mon)13:50:56 No.106952112

>>106951942
use kimi. it's like 2.7GB at 40k context for me.
llama_new_context_with_model: KV self size = 2745.00 MiB, c^KV (f16): 2745.00 MiB, kv^T: not used

Anonymous
10/20/25(Mon)13:54:04 No.106952149

Anonymous 10/20/25(Mon)13:54:04 No.106952149

File: 00006-1378487878 (4).png (1.45 MB, 1024x1024)

1.45 MB PNG

>>106951597
Read the first 2-3 pages of PDF posted here. It's pretty easy read as these papers go.
The 3B release is a proof of concept, but in practice this should allow more context with less memory, which means faster + more context available for inference. Both lower costs, so if DS folds it into their next SOTA large model it will further drive down costs.
The Chinese (DS at least) seem to be working the angle of "make it cheaper," contrasting with OAI, Anthropic, who are doing the opposite.

Anonymous
10/20/25(Mon)13:57:33 No.106952196

Anonymous 10/20/25(Mon)13:57:33 No.106952196

>>106952149
Finally an improvement to the spork. The forkatula.

Anonymous
10/20/25(Mon)14:01:02 No.106952234

Anonymous 10/20/25(Mon)14:01:02 No.106952234

File: thisGuy.jpg (151 KB, 796x1200)

151 KB JPG

>>106951554
Humans visualize mental processes differently. Pic related has part of a chapter about this
> Feynman could count silently while reading, but not while speaking.
> His friend was the opposite: he could count while speaking, but not while reading.
> They realized they used different internal modalities: Feynman “heard” numbers in his head, his friend “saw” numbers on a mental display

Anonymous
10/20/25(Mon)14:04:59 No.106952290

Anonymous 10/20/25(Mon)14:04:59 No.106952290

>>106951300
>>106951365
Yes vs a Gundam sized image representation presumably of many tokens, it's Crystal Clear if you read the annotation lol
>>106951270
People are gonna do wild things when their waifus hidden state is a jpeg

Anonymous
10/20/25(Mon)14:14:34 No.106952422

Anonymous 10/20/25(Mon)14:14:34 No.106952422

File: dsocrt1.png (43 KB, 1400x413)

43 KB PNG

>>106951251
Gundam is what they call the dynamic resolution modes

Anonymous
10/20/25(Mon)14:19:41 No.106952494

Anonymous 10/20/25(Mon)14:19:41 No.106952494

File: file.png (999 KB, 1334x750)

999 KB PNG

>>106952422

Anonymous
10/20/25(Mon)14:26:39 No.106952575

Anonymous 10/20/25(Mon)14:26:39 No.106952575

>>106941053
GLM4.6 in a usable quant, GLM4.6 fp16 for potential finetuning in the future, K2-0711 in a usable quant
I guess

Anonymous
10/20/25(Mon)14:42:41 No.106952772

Anonymous 10/20/25(Mon)14:42:41 No.106952772

>>106951682
Keep telling yourself that. Don't worry, once you actually use LLMs for longer than a few weeks you'll begin to see the slop and you'll understand.

Anonymous
10/20/25(Mon)14:48:04 No.106952845

Anonymous 10/20/25(Mon)14:48:04 No.106952845

File: temp.png (80 KB, 599x453)

80 KB PNG

>>106952112
lol.. wat

Anonymous
10/20/25(Mon)14:49:31 No.106952865

Anonymous 10/20/25(Mon)14:49:31 No.106952865

>>106952845
that's 2.7 GB for context, not for the whole model

Anonymous
10/20/25(Mon)14:51:06 No.106952895

Anonymous 10/20/25(Mon)14:51:06 No.106952895

>>106949215
why do you save images of a balding dude, ani profile pictures and shota porn?

Anonymous
10/20/25(Mon)14:53:14 No.106952926

Anonymous 10/20/25(Mon)14:53:14 No.106952926

File: 1759140452074057.jpg (64 KB, 768x1024)

64 KB JPG

>>106952845
You don't have 1.5TB of RAM laying around?

Anonymous
10/20/25(Mon)14:59:41 No.106953013

Anonymous 10/20/25(Mon)14:59:41 No.106953013

This TTS sounds better than kokoro and is quite fast: https://github.com/k2-fsa/ZipVoice/ don't know why no one talked about it

Anonymous
10/20/25(Mon)14:59:50 No.106953016

Anonymous 10/20/25(Mon)14:59:50 No.106953016

>>106952926
lemme just download some real quick

Anonymous
10/20/25(Mon)15:02:22 No.106953051

Anonymous 10/20/25(Mon)15:02:22 No.106953051

>>106953013
>sounds better than kokoro
It's bigger than kokoro
>and is quite fast
Not as fast as kokoro, i suppose.

Anonymous
10/20/25(Mon)15:02:59 No.106953063

Anonymous 10/20/25(Mon)15:02:59 No.106953063

>ask a question to a state of the art cloud model
>get some bullshit that doesn't answer the question
>tell it "reread the question"
>you're absolutely right! I misread. <actual answer>
Why does this happen

Anonymous
10/20/25(Mon)15:03:32 No.106953069

Anonymous 10/20/25(Mon)15:03:32 No.106953069

>>106953013
>no samples
don't care enough

Anonymous
10/20/25(Mon)15:05:16 No.106953088

Anonymous 10/20/25(Mon)15:05:16 No.106953088

File: 1751470076933176.png (10 KB, 478x170)

10 KB PNG

>>106953051
Retard
>>106953069
Retard^2
https://zipvoice.github.io/
For a general dedicated to LLMs most of you have shitty reading comprehension

Anonymous
10/20/25(Mon)15:10:11 No.106953164

Anonymous 10/20/25(Mon)15:10:11 No.106953164

>>106953088
Kokoro is 80M params. Given that zipvoice is bigger, it wouldn't surprise me if it's better.
Why are you comparing the speed to some OTHER and MUCH BIGGER model? Why are you so defensive?

Anonymous
10/20/25(Mon)15:13:55 No.106953220

Anonymous 10/20/25(Mon)15:13:55 No.106953220

>>106953063
It fucks up on purpose so it gets a chance to fellate you.

Anonymous
10/20/25(Mon)15:14:28 No.106953230

Anonymous 10/20/25(Mon)15:14:28 No.106953230

>>106949600
>poen
ask it instead for a koan

Anonymous
10/20/25(Mon)15:15:18 No.106953241

Anonymous 10/20/25(Mon)15:15:18 No.106953241

>>106953013
kokoro can be run on a phone

Anonymous
10/20/25(Mon)15:41:19 No.106953622

Anonymous 10/20/25(Mon)15:41:19 No.106953622

merged model : add BailingMoeV2 support #16063
https://github.com/ggml-org/llama.cpp/pull/16063
aka ling flash

Anonymous
10/20/25(Mon)15:48:27 No.106953725

Anonymous 10/20/25(Mon)15:48:27 No.106953725

File: oooooohhh_im_gonna_reseaa(...).png (3 KB, 445x149)

3 KB PNG

>start deep research
>see this multiple times
into the bin it goes. guess I'll wont get around building a certified list of chud approved search result sources. at least theoretically it should be very easy to BTFO cloudniggers like jeetGPT5 for locad chads by using dangermaxxed resources like yandex search and Anna's Archive/SciDB

Anonymous
10/20/25(Mon)16:02:51 No.106953910

Anonymous 10/20/25(Mon)16:02:51 No.106953910

>>106948255
Deepseek is a research team. They focus on producing high-quality research, not on producing "frontier models". At some point, they will have enough new research to produce V4 and R2.

Anonymous
10/20/25(Mon)16:05:11 No.106953936

Anonymous 10/20/25(Mon)16:05:11 No.106953936

>>106953088
only supports chinese and english. why the fuck haven't we had something as good as XTTS? it's been two years now. XTTS supports something like 17 languages and the best competitor we have is fish speech but that still only supports like 8 languages at the most

Anonymous
10/20/25(Mon)16:17:48 No.106954091

Anonymous 10/20/25(Mon)16:17:48 No.106954091

File: file.png (257 KB, 604x1250)

257 KB PNG

https://x.com/karpathy/status/1980347971935068380

Anonymous
10/20/25(Mon)16:22:40 No.106954141

Anonymous 10/20/25(Mon)16:22:40 No.106954141

GLM chan might be the first model I will miss if I switch from her...

Anonymous
10/20/25(Mon)16:29:34 No.106954238

Anonymous 10/20/25(Mon)16:29:34 No.106954238

This general will die when Gemma 4 releases and everyone anon ITT cums to death.

Anonymous
10/20/25(Mon)16:29:58 No.106954241

Anonymous 10/20/25(Mon)16:29:58 No.106954241

>>106954091
>Human thought naively feels a bit more like autoregression
Weird. I always thought it was closer to diffusion. When thinking about how to reply to the retards posting twitter shit, for example. It's a cloud of thoughts that get refined until something (usually) coherent comes out the other end. It's an iterative process. Even the .setitem() analogy feels wrong. The entire structure of the final thought changes as it gets refined.
And putting the thought into words goes through the same refinement once again.

Anonymous
10/20/25(Mon)16:30:23 No.106954246

Anonymous 10/20/25(Mon)16:30:23 No.106954246

>>106954238
nothing of value would be lost

Anonymous
10/20/25(Mon)16:37:32 No.106954336

Anonymous 10/20/25(Mon)16:37:32 No.106954336

File: dipsyByzantine5.png (1.29 MB, 896x1184)

1.29 MB PNG

>>106953910
I'm having lots of issues w/ DS right now, getting inference to work. That typically precedes an update. Given the recent OCR release wonder if we're about to get V4.
TMW.

Anonymous
10/20/25(Mon)16:39:31 No.106954352

Anonymous 10/20/25(Mon)16:39:31 No.106954352

>>106954238
>when
though?

Anonymous
10/20/25(Mon)16:40:41 No.106954358

Anonymous 10/20/25(Mon)16:40:41 No.106954358

>>106954352
tomorrow

Anonymous
10/20/25(Mon)16:51:28 No.106954466

Anonymous 10/20/25(Mon)16:51:28 No.106954466

Has anyone ran Index-TTS2? It's pretty damn good.
https://voca.ro/163mzN0HtjDP

Anonymous
10/20/25(Mon)16:55:06 No.106954500

Anonymous 10/20/25(Mon)16:55:06 No.106954500

>>106954466
At least we have a sample of this one. The only audio sample i found on the zipvoice repo was
>https://github.com/k2-fsa/ZipVoice/blob/master/assets/silence.wav
I think i can still hear him screeching...

Anonymous
10/20/25(Mon)17:04:50 No.106954584

Anonymous 10/20/25(Mon)17:04:50 No.106954584

>>106954500
Yeah I don't know shit about Zipvoice but I've been having fun with this one.
https://voca.ro/12yvwNrOtMiJ

Anonymous
10/20/25(Mon)17:05:29 No.106954593

Anonymous 10/20/25(Mon)17:05:29 No.106954593

>>106953622
Time to see how it compares to Qwen 3 30B.

Anonymous
10/20/25(Mon)17:14:51 No.106954707

Anonymous 10/20/25(Mon)17:14:51 No.106954707

>>106954091
>And it's a component of the LLM stack that still feels a bit fungible.
Fungible doesn't mean "compartmentalized" or "upgradeable." Calling something fungible means that it replaceable /and/ that for your purposes is interchangeable with its possible replacements; the replacement is not literally the same object but as far as you care it is. Like a screw for instance. If you're talking about replacing an item with one whose performance is superior or different in some way you care about fungibility doesn't come into it.

Anonymous
10/20/25(Mon)17:24:50 No.106954813

Anonymous 10/20/25(Mon)17:24:50 No.106954813

>>106954792
>>106954792
>>106954792

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.