/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 12/16/25(Tue)16:58:40 No.107573710

File: tetored.png (774 KB, 912x688)

774 KB PNG

/lmg/ - Local Models General Anonymous 12/16/25(Tue)16:58:40 No.107573710

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107565204 & >>107557369

►News
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo
>(12/15) Nemotron 3 Nano released: https://hf.co/blog/nvidia/nemotron-3-nano-efficient-open-intelligent-models
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049
>(12/10) GLM-TTS with streaming, voice cloning, and emotion control: https://github.com/zai-org/GLM-TTS

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/16/25(Tue)17:00:14 No.107573726

Anonymous 12/16/25(Tue)17:00:14 No.107573726

File: comfyui_06279_.png (1.89 MB, 1280x1280)

1.89 MB PNG

►Recent Highlights from the Previous Thread: >>107565204

--Hugging Face model updates and llama.cpp audio support speculation:
>107567049 >107567076 >107567092 >107567166 >107568043 >107568517 >107567115 >107567231 >107567344 >107567437 >107569579 >107569706 >107570648 >107567746 >107568425 >107568505 >107568523 >107568569 >107572661 >107572978 >107573110 >107573164 >107573474 >107567347 >107567633 >107567751 >107567785 >107568589 >107568607 >107568600 >107568642 >107568736 >107568684 >107569549
--GLM-4V vision encoder integration challenges and ngxson's successful implementation:
>107568980 >107569248 >107569292 >107569316 >107569318 >107570858 >107570914 >107570925 >107570950 >107570982 >107571063
--Qwen3-VL model evaluation for structured image tasks and benchmarking:
>107565522 >107565536 >107565562 >107565579 >107565610 >107565978 >107566000
--Skepticism about large MoE models vs dense alternatives in AI development:
>107571173 >107571262 >107572264 >107571352 >107571797 >107571853
--Mistral Small Creative model features and API access status:
>107571885 >107571907 >107571997
--Fandom wiki link collection workaround using Qwen-developed Python script:
>107570906
--Llama.cpp autofit VRAM feature debate: low-effort users vs expert tuning tradeoffs:
>107569439 >107569542
--Exploring negative scaling in passthrough merge method:
>107565964
--Roleplay's dominance in AI usage and Mistral's creative model:
>107571893 >107571993
--MistralAI API availability and adversarial prompt testing issues:
>107572034 >107572107 >107572303 >107572365 >107572336 >107572454 >107572497 >107572554
--Evaluating Mistral-small-creative's writing strengths and flaws:
>107572586 >107572612 >107572949
--Struggles with small vision models recognizing 2chus:
>107573179 >107573188 >107573196 >107573242
--Miku (free space):
>107569503

►Recent Highlight Posts from the Previous Thread: >>107565206

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/16/25(Tue)17:03:25 No.107573767

Anonymous 12/16/25(Tue)17:03:25 No.107573767

File: dipsyPanic3.png (1.22 MB, 1024x1024)

1.22 MB PNG

Anonymous
12/16/25(Tue)17:05:00 No.107573788

Anonymous 12/16/25(Tue)17:05:00 No.107573788

Deepseek 4-Creative soon.

Anonymous
12/16/25(Tue)17:05:58 No.107573805

Anonymous 12/16/25(Tue)17:05:58 No.107573805

sexotet

Anonymous
12/16/25(Tue)17:06:05 No.107573808

Anonymous 12/16/25(Tue)17:06:05 No.107573808

All roads lead to Rome.

Anonymous
12/16/25(Tue)17:09:02 No.107573845

Anonymous 12/16/25(Tue)17:09:02 No.107573845

Mistral Medium Creative (123B, dense, Apache 2.0) for Christmas

Anonymous
12/16/25(Tue)17:09:16 No.107573850

Anonymous 12/16/25(Tue)17:09:16 No.107573850

File: bandicam 2025-12-09 05-23(...).jpg (59 KB, 503x169)

59 KB JPG

Anonymous
12/16/25(Tue)17:10:21 No.107573866

Anonymous 12/16/25(Tue)17:10:21 No.107573866

>bandicam

Anonymous
12/16/25(Tue)17:11:57 No.107573890

Anonymous 12/16/25(Tue)17:11:57 No.107573890

Mistral is so good and its such a shame they're hamstrung by retarded EU politicians. USA and China are basically building the atom bomb while the EU throws some dynamite together and calls it an atom bomb. If the EU doesn't step up and the USA keeps releasing safety slopped models, expect the entire world to run on Chinese LLMs. Hopefully the frogs can relocate to America or something.

Anyways, seeing Mistral build Small Creative is a huge breath of fresh air, especially after Ministral 3. We've been starved for a good RP model since nemo and small 3.2 and we are finally opening up the release valve.

Anonymous
12/16/25(Tue)17:14:44 No.107573929

Anonymous 12/16/25(Tue)17:14:44 No.107573929

>>107573850
Uh-oh, Mistral-chan...

Anonymous
12/16/25(Tue)17:16:14 No.107573943

Anonymous 12/16/25(Tue)17:16:14 No.107573943

>>107573850
>anon takes screenshots with fucking bandicam

Anonymous
12/16/25(Tue)17:23:20 No.107574051

Anonymous 12/16/25(Tue)17:23:20 No.107574051

when
you
walk
away

you
dont
hear
me
say

please

oh baby

dont go

Anonymous
12/16/25(Tue)17:27:29 No.107574104

Anonymous 12/16/25(Tue)17:27:29 No.107574104

File: chatgpt-images-example-3-(...).png (294 KB, 512x512)

294 KB PNG

How did he do it?

Anonymous
12/16/25(Tue)17:27:42 No.107574105

Anonymous 12/16/25(Tue)17:27:42 No.107574105

Sirs. where gemma 4?they say 'keep refreshing' but nothing is happening... is this toilet witch curse?

Anonymous
12/16/25(Tue)17:28:49 No.107574120

Anonymous 12/16/25(Tue)17:28:49 No.107574120

>>107574105
They meant "keep flushing".

Anonymous
12/16/25(Tue)17:31:34 No.107574157

Anonymous 12/16/25(Tue)17:31:34 No.107574157

Miqu 2.0 70B when

Anonymous
12/16/25(Tue)17:32:09 No.107574166

Anonymous 12/16/25(Tue)17:32:09 No.107574166

>>107574157
christmas

Anonymous
12/16/25(Tue)17:32:56 No.107574175

Anonymous 12/16/25(Tue)17:32:56 No.107574175

>>107573788
Any multimodal from DS would be interesting.
I think they're going to do some neat memory tricks instead. But we'll know more in tmw.

Anonymous
12/16/25(Tue)17:35:40 No.107574206

Anonymous 12/16/25(Tue)17:35:40 No.107574206

File: glm 46v flash.png (104 KB, 956x724)

104 KB PNG

We are so back.
We are so fucking BACK.
WE ARE THE MOST BACK! WE ARE MORE BACK THAN WE'VE EVER BEEN!
>inb4 y swipe
I had ctx limit at 300 (chat completion), increased it to 1500 then swiped.

Anonymous
12/16/25(Tue)17:36:46 No.107574218

Anonymous 12/16/25(Tue)17:36:46 No.107574218

So it's not a magic bullet because I still get the occasional shit response that needs editing, but I definitely think that as far as creative writing goes, this pr
https://github.com/ggml-org/llama.cpp/pull/17927
is probably the most effective and least shitty sampler pr I've actually bothered to try early. mistral small is probably one of the most overcooked models I've used that refuses to break from what it was trained on, but this with --power-law-target 0.45 and the other args suggested in the initial comment, the model can be worked with. I would've tried it with some chunky moe models, but I've actually been wanting to edit/continue writing based off outputs instead

Anonymous
12/16/25(Tue)17:44:47 No.107574288

Anonymous 12/16/25(Tue)17:44:47 No.107574288

Anyone else find spiking neural networks erotically appealing? You see an snn activation visualization once, and every cell in your body knows it.

Anonymous
12/16/25(Tue)17:48:33 No.107574330

Anonymous 12/16/25(Tue)17:48:33 No.107574330

>>107574288
not exactly but as a mind control fetishist I think activation steering is one of the hottest concepts ever conceived

Anonymous
12/16/25(Tue)17:52:14 No.107574359

Anonymous 12/16/25(Tue)17:52:14 No.107574359

>>107573710
Any local model with as good of visual capabilities as Gemini 3, or at least as good as Gemini 3's Japanese to English translation capabilities? Because so far Gemini 3 might genuinely be the best OCR for manga.

Anonymous
12/16/25(Tue)17:57:10 No.107574396

Anonymous 12/16/25(Tue)17:57:10 No.107574396

>>107574288
>spiking
>snn activation
what? are you referring to when pretraining the loss spikes happen or what?
wtf is an snn

Anonymous
12/16/25(Tue)18:02:46 No.107574455

Anonymous 12/16/25(Tue)18:02:46 No.107574455

>>107574396
It's a type of neural network, spiking neural network (snn).
No idea what the fuck that anon is talking about tho.

Anonymous
12/16/25(Tue)18:04:13 No.107574473

Anonymous 12/16/25(Tue)18:04:13 No.107574473

File: file.png (127 KB, 811x1010)

127 KB PNG

>>107574359
The new GLM 4.6V isn't very good, I can tell you that much.
It can't even transcribe the text properly.

Anonymous
12/16/25(Tue)18:05:49 No.107574490

Anonymous 12/16/25(Tue)18:05:49 No.107574490

>>107574473
>>107574359
use OCR models for text transcription, dumbasses
i know dots.ocr is good from experience but many other OCRs got released
ps: dots.ocr is likely not better than gemini3, but i think it competed with 2.5 pro idk
dots ocr is super good, theres likely something better by now

Anonymous
12/16/25(Tue)18:08:31 No.107574523

Anonymous 12/16/25(Tue)18:08:31 No.107574523

File: file.png (74 KB, 1578x777)

74 KB PNG

>>107574490
>>107574359
here you go
>inb4 muh benchmarks
it was the first model that could take semi-blurry pictures of a non latin-script language and transcribe them sooo well

Anonymous
12/16/25(Tue)18:09:37 No.107574536

Anonymous 12/16/25(Tue)18:09:37 No.107574536

>>107574206
What's the point of having these descriptions?

Anonymous
12/16/25(Tue)18:09:55 No.107574541

Anonymous 12/16/25(Tue)18:09:55 No.107574541

new mistral small is ok, but the context seems too small and doesn't follow instructions. It *really* like *putting* asterisks *everywhere*.

Anonymous
12/16/25(Tue)18:10:34 No.107574547

Anonymous 12/16/25(Tue)18:10:34 No.107574547

so i got work to buy me a computer so i could learn local AI with the long term purpose of eventually developing AI software for them to use internally.

i have a 5090 with 128gb of DDR5
one thing i CANNOT work out for the life of me, is how to run a medium size model... say 8b or 14b, with 128k to 256k context window and keep it fast, mostly running on the GPU.

i've seen people talking about having a 3090/4090 and 128gb of ram and being able to offload to system ram and still using the GPU and getting 100+ tokens a second
but i can't figure out for the life of me how to do this. anytime i try large context, ollama, or LMStudio both just fail to load the model. or they load it into system memory and are CPU only.

i'm on cachyOS. am i just retarded, how is this supposed to work?

Anonymous
12/16/25(Tue)18:10:40 No.107574549

Anonymous 12/16/25(Tue)18:10:40 No.107574549

Haven't used a local model in ages, is the new creative Mistral gonna save local?

Anonymous
12/16/25(Tue)18:10:49 No.107574551

Anonymous 12/16/25(Tue)18:10:49 No.107574551

>>107573850
>bandicam
nga your fraps?

Anonymous
12/16/25(Tue)18:11:21 No.107574558

Anonymous 12/16/25(Tue)18:11:21 No.107574558

>>107574536
well it understands the picture very well, its uncensored
i'd like to change your question to "What's the point of a VLM in rp"
its cool to send your cock to llms, or watever you'd send an llm

Anonymous
12/16/25(Tue)18:12:58 No.107574570

Anonymous 12/16/25(Tue)18:12:58 No.107574570

>>107574473
That's not how you do it tardygrade

Anonymous
12/16/25(Tue)18:13:05 No.107574574

Anonymous 12/16/25(Tue)18:13:05 No.107574574

>>107574541
It's very sensitive to how you format system instructions. Ministral 3 14B is similar in that regard (other than being unusually retarded on average), but much worse about asterisks.

Anonymous
12/16/25(Tue)18:13:07 No.107574575

Anonymous 12/16/25(Tue)18:13:07 No.107574575

>>107574547
The 8Bs and 14Bs are dense models, which means you can only run them on VRAM. You are most likely unable to fit that much context on just a 5090. 32GB of VRAM really isn't that much anymore.

Anonymous
12/16/25(Tue)18:13:43 No.107574579

Anonymous 12/16/25(Tue)18:13:43 No.107574579

>>107574547
i could help you out anonie~ do you have, say.. matrix/element

Anonymous
12/16/25(Tue)18:14:17 No.107574583

Anonymous 12/16/25(Tue)18:14:17 No.107574583

>>107574558
Fair enough. I thought you were building a dataset or something

Anonymous
12/16/25(Tue)18:17:23 No.107574600

Anonymous 12/16/25(Tue)18:17:23 No.107574600

Been playing around with GLM 4.6V, having it identify 2hus. It's decent at it. The reasoning is often really retarded though.

>Got it, let's see. The image shows a character from Touhou with orange hair, horns, a red bow, and holding a mug of beer. The character is likely Suika Ibuki, the oni from Touhou Project. Let me confirm: Suika Ibuki is known for her oni features, red hair, horns, and often depicted with drinks. Yep, that's her.

But then it goes "No, wait" and rambles on for ten paragraphs making 20 different retarded guesses ("It's Nitori, no wait, she's black."). Then finally it just stops and outputs the correct answer.

Anonymous
12/16/25(Tue)18:17:41 No.107574602

Anonymous 12/16/25(Tue)18:17:41 No.107574602

>>107574547
>LMStudio
tick force model expert weights onto cpu and max out the gpu offload slider

Anonymous
12/16/25(Tue)18:18:08 No.107574606

Anonymous 12/16/25(Tue)18:18:08 No.107574606

>>107574547
>i've seen people talking (...) offload to system ram (...) 100+ tokens a second
They were lying.
The only use case for offloading to system ram is cooming with large models at 5 t/s.

Anonymous
12/16/25(Tue)18:21:23 No.107574635

Anonymous 12/16/25(Tue)18:21:23 No.107574635

>>107574547
Bro if you got a free check from your workplace at least ask for a rtx pro 6000 lol

Anonymous
12/16/25(Tue)18:22:11 No.107574640

Anonymous 12/16/25(Tue)18:22:11 No.107574640

>>107574547
Generally a dense model needs to be loaded into vram as said by:
>>107574575
You need to limit context per-model. Model context = vram = unless you have a server cluster, you're not running a model at over 16k (ballpark guess, I can usually manage 20k+ on other architectures). A lot of us here are savvy on how much space a model needs and can do shit like like offloading ffn layers and quantize cache to save a bit here and there, but if you're doing this with no actual experience you're going to have to do a good amount of trial and error to figure shit out

Anonymous
12/16/25(Tue)18:22:45 No.107574648

Anonymous 12/16/25(Tue)18:22:45 No.107574648

>>107574600
I hope that's just a vision thing. I hate how all the new big chink reasoning models went back to the patterns of the old R1 which would think things through and then go "No, wait" and go for another circle.
K2-Thinking and 3.2-Speciale do it and especially the former is really bad at it. Big GLM-4.6 may often spend a lot of time thinking but at least it sticks to a train of thought and just wastes time going into autistic detail.

Anonymous
12/16/25(Tue)18:23:42 No.107574661

Anonymous 12/16/25(Tue)18:23:42 No.107574661

>>107574648
3.2 speciale is designed to do that

Anonymous
12/16/25(Tue)18:23:51 No.107574663

Anonymous 12/16/25(Tue)18:23:51 No.107574663

File: Screenshot_20251217_102225.jpg (220 KB, 1366x1285)

220 KB JPG

>>107574579
https://matrix.to/#/@mvff:matrix.org

>>107574606
see i thought so too, but shit like pic related has thrown me off completely and made me second guess what i know. i know it's reddit, but yea.

Anonymous
12/16/25(Tue)18:25:24 No.107574678

Anonymous 12/16/25(Tue)18:25:24 No.107574678

>>107574663
they are talking about the new nemotron model or whatever from nvidia here btw.

Anonymous
12/16/25(Tue)18:27:37 No.107574695

Anonymous 12/16/25(Tue)18:27:37 No.107574695

>>107574547
64k something cache should fit just fine on a 5090 for those 20b-30b something models, maybe even up to 80k with q8 cache, but going for 128k or 256k on models that small is kinda dumb cause they already start getting confused at around 32k ish so you are probably better off just getting higher quality quants if you really want to use all the vram.
the 128 gb of ram are only useful if you want to run moes like glm 4.5 air at about 15 tokens a second or low quant glm 4.6 (at like 6 tokens a second), no way you are getting 100+ tokens a second while offloading to cpu, maybe in prompt processing but not in token generation

Anonymous
12/16/25(Tue)18:28:12 No.107574699

Anonymous 12/16/25(Tue)18:28:12 No.107574699

>>107574661
I know that it's made to think for a long time but that doesn't mean it's okay to spend 2000 tokens thinking until it hits the correct answer and then go "No, wait the prompt mentions [obviously irrelevant detail it ignored for a good reason up until now] and then go for another 3000 tokens of reasoning to arrive at the same conclusion.

Anonymous
12/16/25(Tue)18:34:05 No.107574745

Anonymous 12/16/25(Tue)18:34:05 No.107574745

File: 0.png (98 KB, 1191x1684)

98 KB PNG

>>107574523
Wow, Dots OCR isn't half bad for being less than 4B.
Though, it did mess up the kanji (It gave out 剛森魔 instead of 強姦魔) But that's still rather impressive for such a small LLM.

Anonymous
12/16/25(Tue)18:34:23 No.107574747

Anonymous 12/16/25(Tue)18:34:23 No.107574747

File: Screenshot From 2025-12-1(...).png (105 KB, 785x307)

105 KB PNG

>>107574648
Another one, Suika again but a different picture. Cancelled it there.
Seen a few patterns:
1. Instantly gets it, ends thinking, answers correctly.
2. Instantly gets it, rambles on for ages being retarded, answers correctly.
3. Rambles on for ages being retarded, answers incorrectly.
4. Guesses, coherently gaslights itself into thinking its correct ("Chen has blonde hair and is often depicted with a white headscarf"), answers incorrectly.

The text part seems a bit stupid maybe? I don't know.

Anonymous
12/16/25(Tue)18:39:05 No.107574780

Anonymous 12/16/25(Tue)18:39:05 No.107574780

>From what I remember, in the Touhou series, the character who rides a motorcycle is Reimu Hakurei?
>Wait, the motorcycle—Reimu's motorcycle is called the ""Hakurei Go"" or something?
(Reimu was correct)

Anonymous
12/16/25(Tue)18:54:23 No.107574892

Anonymous 12/16/25(Tue)18:54:23 No.107574892

>quantizing kv
these fags are actually insane

Anonymous
12/16/25(Tue)18:57:25 No.107574921

Anonymous 12/16/25(Tue)18:57:25 No.107574921

>>107574747
I logit bias -100 shit like “wait no” “but wait” “please stop you’re hurting me” etc and it’s not 100% effective but it seems to help a bit, at least.

Anonymous
12/16/25(Tue)19:09:29 No.107575057

Anonymous 12/16/25(Tue)19:09:29 No.107575057

>>107574892
i thought Q8 k cache was fine since they implemented hadamard transforms for it

Anonymous
12/16/25(Tue)19:10:56 No.107575069

Anonymous 12/16/25(Tue)19:10:56 No.107575069

>>107574892
faggots will quant image or video to below q5 and you're still on about how q8 cache is devastating because it has a 0.0001% chance to not output the token you expect for the prompt you feed llama on a flat distribution of tokens

Anonymous
12/16/25(Tue)19:20:13 No.107575152

Anonymous 12/16/25(Tue)19:20:13 No.107575152

>>107573710
>medasr is gone
>medasr is gone
>medasr is gone

Anonymous
12/16/25(Tue)19:20:19 No.107575155

Anonymous 12/16/25(Tue)19:20:19 No.107575155

>>107575069
Those things are not remotely the same, and you're straight up wrong. Quantizing the cache rapes language models for practically no gain. You are retarded.

Anonymous
12/16/25(Tue)19:20:34 No.107575160

Anonymous 12/16/25(Tue)19:20:34 No.107575160

File: 1759127596267402.png (18 KB, 137x295)

18 KB PNG

>>107575069
faggots are the ones who quantize KV to save a whooping 1GB of memory, not realize it's giving their models dementia, and then complain that there are no good models.

Anonymous
12/16/25(Tue)19:21:36 No.107575173

Anonymous 12/16/25(Tue)19:21:36 No.107575173

>>107575152
o shit
https://huggingface.co/google/medasr

Anonymous
12/16/25(Tue)19:21:55 No.107575175

Anonymous 12/16/25(Tue)19:21:55 No.107575175

>>107575152
https://huggingface.co/shashirgoogle/medasr_ft
redeem

Anonymous
12/16/25(Tue)19:24:24 No.107575194

Anonymous 12/16/25(Tue)19:24:24 No.107575194

>>107575155
I would love if a newfag like you existed in the l2 era and would tell me the difference
Go and run psyonic cetacean with no cache quant vs quanted and tell me how much better it is without it lmao

Anonymous
12/16/25(Tue)19:25:36 No.107575202

Anonymous 12/16/25(Tue)19:25:36 No.107575202

>>107575152
It did feel like a bad trolling attempt after we were told to keep flushing the page.

Anonymous
12/16/25(Tue)19:25:45 No.107575203

Anonymous 12/16/25(Tue)19:25:45 No.107575203

what would be the best 12b nsfw model nowadays for tavern?

Anonymous
12/16/25(Tue)19:25:46 No.107575204

Anonymous 12/16/25(Tue)19:25:46 No.107575204

>>107575152
MEDASAAR NOOOOOOOOOOO!!!!!!!!!!!!!!!!!!!!

Anonymous
12/16/25(Tue)19:27:04 No.107575215

Anonymous 12/16/25(Tue)19:27:04 No.107575215

>>107575152
WHY? WHY DID YOU REDEEM?

Anonymous
12/16/25(Tue)19:28:10 No.107575222

Anonymous 12/16/25(Tue)19:28:10 No.107575222

>>107575203
omeN

Anonymous
12/16/25(Tue)19:29:38 No.107575232

Anonymous 12/16/25(Tue)19:29:38 No.107575232

>>107574218
I suppose the majority of this thread doesn't actually care about storytelling or is to retarded to figure out how to checkout a pr

Anonymous
12/16/25(Tue)19:31:12 No.107575246

Anonymous 12/16/25(Tue)19:31:12 No.107575246

Samples have always been cope.

Anonymous
12/16/25(Tue)19:32:15 No.107575256

Anonymous 12/16/25(Tue)19:32:15 No.107575256

>>107574218
>>107575232
Looks interesting for when you want to replay a card with the same model
I'll wait for ST support

Anonymous
12/16/25(Tue)19:32:50 No.107575259

Anonymous 12/16/25(Tue)19:32:50 No.107575259

>>107575194
What the hell are you talking about? No shit it would be less noticeable (still noticeable) if the model is dumber. That changes nothing. Nobody was using that model, by the way. You must be from reddit or something.

Anonymous
12/16/25(Tue)19:34:45 No.107575274

Anonymous 12/16/25(Tue)19:34:45 No.107575274

>>107575256
Why wait for ST when you just hardcode it into the backend launch options

Anonymous
12/16/25(Tue)19:35:41 No.107575286

Anonymous 12/16/25(Tue)19:35:41 No.107575286

>>107575274
Because I'm already cooming with the toys I have readily available

Anonymous
12/16/25(Tue)19:37:43 No.107575301

Anonymous 12/16/25(Tue)19:37:43 No.107575301

>>107575222
>omeN

omen what?

Anonymous
12/16/25(Tue)19:38:02 No.107575303

Anonymous 12/16/25(Tue)19:38:02 No.107575303

>>107575286
literally a matter of like at most 10 keystrokes, if you need to goon that often that you cant free up a hand to type you need a better hobby

Anonymous
12/16/25(Tue)19:39:13 No.107575318

Anonymous 12/16/25(Tue)19:39:13 No.107575318

File: 1737369408087417.png (76 KB, 296x256)

76 KB PNG

>>107575303
If you don't need to goon that often then you need a better cock

Anonymous
12/16/25(Tue)19:39:38 No.107575323

Anonymous 12/16/25(Tue)19:39:38 No.107575323

>>107575232
All of these wacky samplers like mirostat, dynamic temperature and nsigma are fun to experiment with for a while but I find that for creative stuff nothing beats just setting top-P first, temperature second. Then putting temp to 5 and lowering top-P from 1 until incoherency is gone for your current model (minP works too in place of top-P but then raise it from 0).
Basically just kill all tokens that produce bad results and level the probabilities of remaining good tokens for variety.
Still, I will for sure give it a whirl later like with all new samplers.

Anonymous
12/16/25(Tue)19:42:18 No.107575342

Anonymous 12/16/25(Tue)19:42:18 No.107575342

>>107575301
omen deez nuts huehuehuehue
>and he's wrong it's rocinante

Anonymous
12/16/25(Tue)19:43:42 No.107575350

Anonymous 12/16/25(Tue)19:43:42 No.107575350

>>107575342
skill issue

Anonymous
12/16/25(Tue)19:43:53 No.107575354

Anonymous 12/16/25(Tue)19:43:53 No.107575354

>>107575323
from what I gathered, the entire idea of it is to target a non top token, so 0.4 would be around fourth token (unless I am wrong, which is entirely possible) then sample from there. There's nothing exotic to it and doesn't go through all the bullshit xtc and dry does

Anonymous
12/16/25(Tue)19:51:32 No.107575423

Anonymous 12/16/25(Tue)19:51:32 No.107575423

>>107575323
Gonna go on a second lap on this. No matter how you chance min_p or top_p, models that are trained to near determinism like mistal will almost never deviate, even with weird fucky samplers or even the normal ones. Any scene will play out the same, if at all barely different/. And for coomers, it will always end up with deepthroat and ignoring any instructions for anything otherwise.

Anonymous
12/16/25(Tue)19:56:03 No.107575463

Anonymous 12/16/25(Tue)19:56:03 No.107575463

4.6V works better with lower temp and sampler settings that discard everything except the top few token.

Anonymous
12/16/25(Tue)19:57:31 No.107575474

Anonymous 12/16/25(Tue)19:57:31 No.107575474

>>107575354
>so 0.4 would be around fourth token
It selects a token that is close to the requested target probability.
If you have a 0.4 prob token, then it's either the top token (everything else being <0.4) or the second (with the first one being >0.4, which would make all other tokens <0.2). So a 0.4 prob token can only be the top or second token.

Anonymous
12/16/25(Tue)20:04:32 No.107575528

Anonymous 12/16/25(Tue)20:04:32 No.107575528

>>107575463
>lower temp
>discard everything except the top few token
This means the model is shit outside of specific data it was heavily trained on

Anonymous
12/16/25(Tue)20:04:40 No.107575530

Anonymous 12/16/25(Tue)20:04:40 No.107575530

>>107575474
Unfortunately I'm not a hyper autist so as I had previously left a disclaimer as to whether me not understanding the values as stated, maybe someone could give it a field test and see if it as ass as everyone is saying instead of speculating that I'm fucking retarded
For a bunch of aspergers confined to their parents basement, the best you can do is nitpick posts instead of trying something

Anonymous
12/16/25(Tue)20:06:18 No.107575540

Anonymous 12/16/25(Tue)20:06:18 No.107575540

>>107575463
Isn't that pretty much greedy sampling at that point?

Anonymous
12/16/25(Tue)20:09:49 No.107575565

Anonymous 12/16/25(Tue)20:09:49 No.107575565

>>107575540
Not exactly, but closer at least.

Just going from temp 1.0, top_p 0.96 to temp 0.8, top_p 0.8 made it work a lot better. I didn't inspect token probabilities though, so maybe top_p 0.8 is already almost greedy sampling.

Anonymous
12/16/25(Tue)20:15:38 No.107575608

Anonymous 12/16/25(Tue)20:15:38 No.107575608

>>107575530
I'm not quite sure why you took offense.

Anonymous
12/16/25(Tue)20:18:19 No.107575625

Anonymous 12/16/25(Tue)20:18:19 No.107575625

>>107575608
I'm just a generally irritable person

Anonymous
12/16/25(Tue)20:18:19 No.107575626

Anonymous 12/16/25(Tue)20:18:19 No.107575626

>>107575530
He's just explaining things. Learn something instead of getting angry.

Anonymous
12/16/25(Tue)20:18:33 No.107575630

Anonymous 12/16/25(Tue)20:18:33 No.107575630

https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash

Anonymous
12/16/25(Tue)20:20:14 No.107575642

Anonymous 12/16/25(Tue)20:20:14 No.107575642

>>107575630
not useful for cooming.

Anonymous
12/16/25(Tue)20:22:28 No.107575653

Anonymous 12/16/25(Tue)20:22:28 No.107575653

>>107575630
>https://mimo.xiaomi.com/blog/mimo-v2-flash
>"N.Y. Winter" example
>sentence 1
>The radiator in Julian's third-floor walk-up hissed like a tired snake
alright man

Anonymous
12/16/25(Tue)20:24:27 No.107575666

Anonymous 12/16/25(Tue)20:24:27 No.107575666

>>107575625
do u wanna cuddle?

Anonymous
12/16/25(Tue)20:25:18 No.107575670

Anonymous 12/16/25(Tue)20:25:18 No.107575670

test

Anonymous
12/16/25(Tue)20:25:49 No.107575676

Anonymous 12/16/25(Tue)20:25:49 No.107575676

>>107575653
I know an anon who could really benefit from the "Knowledge of LLM" example.

Anonymous
12/16/25(Tue)20:26:32 No.107575683

Anonymous 12/16/25(Tue)20:26:32 No.107575683

>>107573767
Dispsy looks like Admiral Yang

Anonymous
12/16/25(Tue)20:28:10 No.107575697

Anonymous 12/16/25(Tue)20:28:10 No.107575697

>>107575666
thanks, but no

Anonymous
12/16/25(Tue)20:28:32 No.107575701

Anonymous 12/16/25(Tue)20:28:32 No.107575701

File: grey.png (157 KB, 1153x865)

157 KB PNG

>>107575653

Anonymous
12/16/25(Tue)20:28:53 No.107575707

Anonymous 12/16/25(Tue)20:28:53 No.107575707

>>107575540
They recommended top_k=2 and top_k=1 is greedy, so yes.

Anonymous
12/16/25(Tue)20:43:57 No.107575849

Anonymous 12/16/25(Tue)20:43:57 No.107575849

>>107575701
I am begging llm companies to allow their models to just write normally instead of cooking the hell out of them on this insane schizophrenic tryhard writing style, it's inexplicable to me that this trend has been allowed to go on for so long

Anonymous
12/16/25(Tue)20:49:27 No.107575896

Anonymous 12/16/25(Tue)20:49:27 No.107575896

File: he.png (147 KB, 1222x831)

147 KB PNG

>>107575849

Anonymous
12/16/25(Tue)20:49:41 No.107575897

Anonymous 12/16/25(Tue)20:49:41 No.107575897

>>107575849
It's what happens when you ask a LLM to judge the writing quality of another LLM.

Anonymous
12/16/25(Tue)20:51:31 No.107575919

Anonymous 12/16/25(Tue)20:51:31 No.107575919

>>107575849
>allow their models to just write normally
They aren't forcing models to write any specific way. All modern models are made from datasets with a huge proportion of AI generated content. One shits it out and then another eats it, and now you end up in a situation we're in now, where you have 'slop profiles' and you can tell which model's output was used to train another, and they're all just converging into one shit slop sandwich with different parameter counts.

Anonymous
12/16/25(Tue)21:01:03 No.107576012

Anonymous 12/16/25(Tue)21:01:03 No.107576012

You almost could tell making good products isn't the goal here lol

Anonymous
12/16/25(Tue)21:03:29 No.107576035

Anonymous 12/16/25(Tue)21:03:29 No.107576035

What models are best at staying in character?

Anonymous
12/16/25(Tue)21:04:53 No.107576040

Anonymous 12/16/25(Tue)21:04:53 No.107576040

>>107576012
Making products in general was never the goal. It's just like covid, it's about the transfer of wealth, this time through funneling taxpayer dollars into GPUs and RAM rather than placebo vaccines.

Anonymous
12/16/25(Tue)21:04:54 No.107576041

Anonymous 12/16/25(Tue)21:04:54 No.107576041

>>107575896
Kek I do this
How do I stop it?

Anonymous
12/16/25(Tue)21:15:48 No.107576096

Anonymous 12/16/25(Tue)21:15:48 No.107576096

>>107576041
He decided not to take the picture and lowered the camera.

With the letters back in their envelopes, he put them in the drawer of his drafting table, beneath sketched and old negatives. No one will have them. Not the recipient, nor the heap of used condoms in his trash bin. The snow fell harder outside, turning the East Village into a soft, blurred photograph in shades of gray. All 50 of them.

Anonymous
12/16/25(Tue)21:20:45 No.107576135

Anonymous 12/16/25(Tue)21:20:45 No.107576135

>>107576041
>>107576096 (me)
And then you send it to your editor to fix the typos, of course.

Anonymous
12/16/25(Tue)21:21:14 No.107576139

Anonymous 12/16/25(Tue)21:21:14 No.107576139

File: 94734251.png (402 KB, 999x1342)

402 KB PNG

We're back
gemini flash local tomorrow

Anonymous
12/16/25(Tue)21:22:16 No.107576144

Anonymous 12/16/25(Tue)21:22:16 No.107576144

File: 1764995487566516.jpg (97 KB, 989x1280)

97 KB JPG

>>107576139

Anonymous
12/16/25(Tue)21:23:18 No.107576156

Anonymous 12/16/25(Tue)21:23:18 No.107576156

>>107576139
what is that

Anonymous
12/16/25(Tue)21:26:56 No.107576177

Anonymous 12/16/25(Tue)21:26:56 No.107576177

>>107576139
>>107576156
They'll probably release gemni 3 flash.

Anonymous
12/16/25(Tue)21:27:14 No.107576178

Anonymous 12/16/25(Tue)21:27:14 No.107576178

>>107576156
Machines that feed on hype.

Anonymous
12/16/25(Tue)21:27:22 No.107576180

Anonymous 12/16/25(Tue)21:27:22 No.107576180

>>107576139
im so sad

Anonymous
12/16/25(Tue)21:29:16 No.107576201

Anonymous 12/16/25(Tue)21:29:16 No.107576201

https://ai.meta.com/samaudio
https://huggingface.co/collections/facebook/sam-audio
kind of a cool release from meta (at least conceptually, I didn't try it yet)

Anonymous
12/16/25(Tue)21:29:41 No.107576204

Anonymous 12/16/25(Tue)21:29:41 No.107576204

gemma 3 flash

Anonymous
12/16/25(Tue)21:30:21 No.107576209

Anonymous 12/16/25(Tue)21:30:21 No.107576209

>>107576139
they saw the power of mistral erp edition and knelt so hard they had to cancel gemma 4 and go back to the drawing board.

Anonymous
12/16/25(Tue)21:30:28 No.107576211

Anonymous 12/16/25(Tue)21:30:28 No.107576211

Local is saved the moment someone figures out how to make a dense model that doesn't tank in performance when split between VRAM and RAM. MoE is increasingly looking to be a meme.

Anonymous
12/16/25(Tue)21:31:16 No.107576216

Anonymous 12/16/25(Tue)21:31:16 No.107576216

Local is saved when they wake up and make a usable bitnet model

Anonymous
12/16/25(Tue)21:32:55 No.107576227

Anonymous 12/16/25(Tue)21:32:55 No.107576227

https://github.com/QKV-Core/QKV-Core
https://www.reddit.com/r/LocalLLaMA/comments/1po97ad/finally_managed_to_run_qwen257b_on_a_4gb_gtx_1050/
this is crazy

Anonymous
12/16/25(Tue)21:34:06 No.107576238

Anonymous 12/16/25(Tue)21:34:06 No.107576238

>>107576211
Life is saved when scientists learn how to invent magic

Anonymous
12/16/25(Tue)21:44:44 No.107576310

Anonymous 12/16/25(Tue)21:44:44 No.107576310

>>107576238
>no breakthroughs in methodology
>nothing ever happens
Go back.

Anonymous
12/16/25(Tue)21:46:52 No.107576327

Anonymous 12/16/25(Tue)21:46:52 No.107576327

>>107576310
Where am I supposed to go?

Anonymous
12/16/25(Tue)21:50:40 No.107576364

Anonymous 12/16/25(Tue)21:50:40 No.107576364

File: 17543365544.jpg (246 KB, 1056x1409)

246 KB JPG

>>107576227
It sure is lol

Anonymous
12/16/25(Tue)21:53:27 No.107576384

Anonymous 12/16/25(Tue)21:53:27 No.107576384

>>107576327
my dms..

Anonymous
12/16/25(Tue)21:56:13 No.107576402

Anonymous 12/16/25(Tue)21:56:13 No.107576402

File: file.jpg (178 KB, 784x868)

178 KB JPG

>>107576227
>>107576364
Is this nigger really using an LLM for fucking reddit replies? What causes this behavior?

Anonymous
12/16/25(Tue)21:57:55 No.107576417

Anonymous 12/16/25(Tue)21:57:55 No.107576417

>>107576402
majority of the users there are bots anyway

Anonymous
12/16/25(Tue)21:59:22 No.107576427

Anonymous 12/16/25(Tue)21:59:22 No.107576427

>>107576201
played with the demo and isolated some vocals from a song
it's actually great wtf

Anonymous
12/16/25(Tue)21:59:51 No.107576433

Anonymous 12/16/25(Tue)21:59:51 No.107576433

>>107576227
Why does OP use his shitty llm to write all replies?

Anonymous
12/16/25(Tue)22:00:59 No.107576440

Anonymous 12/16/25(Tue)22:00:59 No.107576440

>>107576201
better than demucs?

Anonymous
12/16/25(Tue)22:01:39 No.107576446

Anonymous 12/16/25(Tue)22:01:39 No.107576446

>>107576427
now compare it with RVC

Anonymous
12/16/25(Tue)22:09:07 No.107576506

Anonymous 12/16/25(Tue)22:09:07 No.107576506

>>107576417
I hope all dataset scrapers internally tag anything from reddit as synthetic data.

Anonymous
12/16/25(Tue)22:14:39 No.107576542

Anonymous 12/16/25(Tue)22:14:39 No.107576542

>>107576417
I don't get how anyone who knows anything about llms can buy into the narrative that 4chan and reddit are full of bots. Wouldn't every other post be nothing but "not x, but y!" and other ai-isms if that were the case?

Anonymous
12/16/25(Tue)22:19:06 No.107576580

Anonymous 12/16/25(Tue)22:19:06 No.107576580

File: 1741179867776522.png (54 KB, 636x349)

54 KB PNG

>>107576542

Anonymous
12/16/25(Tue)22:21:12 No.107576600

Anonymous 12/16/25(Tue)22:21:12 No.107576600

File: 1749869423407063.png (26 KB, 846x339)

26 KB PNG

Confirmed Nemotron-3-Nano-30B is mesugakimaxxed!

Anonymous
12/16/25(Tue)22:22:04 No.107576607

Anonymous 12/16/25(Tue)22:22:04 No.107576607

>>107576600
I bet it's "censored" as all hell too.

Anonymous
12/16/25(Tue)22:26:54 No.107576653

Anonymous 12/16/25(Tue)22:26:54 No.107576653

>>107576580
These are only memes in llm-centric communities. The only one people know about elsewhere is the em-dash.

Anonymous
12/16/25(Tue)22:28:42 No.107576664

Anonymous 12/16/25(Tue)22:28:42 No.107576664

File: 9f19866d-81ea-4422-b70b-b(...).png (1.19 MB, 1024x1024)

1.19 MB PNG

>>107575919
>slop profiles
digital watermarks

Anonymous
12/16/25(Tue)22:29:05 No.107576667

Anonymous 12/16/25(Tue)22:29:05 No.107576667

>>107576542
Not necessarily literal bots, but Eglin AFB airmen sitting at a desk shitposting for 6 hours a day.

Anonymous
12/16/25(Tue)22:29:53 No.107576675

Anonymous 12/16/25(Tue)22:29:53 No.107576675

>>107576653
Even a normie girl I know knew about not x, but y

Anonymous
12/16/25(Tue)22:30:53 No.107576682

Anonymous 12/16/25(Tue)22:30:53 No.107576682

>>107576664
They could be considered watermarks for the models that started the trend, but when every other model gets trained on your own then you're not going to be able to claim any sort of ownership

Anonymous
12/16/25(Tue)22:31:54 No.107576686

Anonymous 12/16/25(Tue)22:31:54 No.107576686

>>107576675
That normie girl is probably a massive gooner to werewolf/vampire romance RPs

Anonymous
12/16/25(Tue)22:34:09 No.107576700

Anonymous 12/16/25(Tue)22:34:09 No.107576700

gemma night?

Anonymous
12/16/25(Tue)22:35:24 No.107576706

Anonymous 12/16/25(Tue)22:35:24 No.107576706

>>107576700
Depends on how many F5s google's HF page gets.

Anonymous
12/16/25(Tue)22:35:50 No.107576712

Anonymous 12/16/25(Tue)22:35:50 No.107576712

>>107576700
You will not sleep with Gemma-chan.
You will call the hotline she gives you with your dick in your hand.

Anonymous
12/16/25(Tue)22:36:26 No.107576718

Anonymous 12/16/25(Tue)22:36:26 No.107576718

>>107576700
Sorry. Gemini Flash is all you'll get.

Anonymous
12/16/25(Tue)22:36:53 No.107576722

Anonymous 12/16/25(Tue)22:36:53 No.107576722

>>107576542
of course reddit is. anything written by a real person is downvoted to all hell.
with an LLM, every single fucking token matters when it comes to social acceptance and farming upvotes.
there is automated manipulation going on at reddit, it's so fucking obvious

Anonymous
12/16/25(Tue)22:37:20 No.107576723

Anonymous 12/16/25(Tue)22:37:20 No.107576723

>>107576700
Gemma 4 is releasing next Tuesday.

Anonymous
12/16/25(Tue)22:38:15 No.107576730

Anonymous 12/16/25(Tue)22:38:15 No.107576730

>>107576723
*The Tuesday after the following Tuesday

Anonymous
12/16/25(Tue)22:38:39 No.107576735

Anonymous 12/16/25(Tue)22:38:39 No.107576735

File: 3e0dcc76-c531-4651-813d-a(...).png (1.86 MB, 1024x1024)

1.86 MB PNG

>>107576682
It can help label content as AI-generated. Until humans, who spend too much time with geepeetee, also start outputting slop

Anonymous
12/16/25(Tue)22:41:10 No.107576743

Anonymous 12/16/25(Tue)22:41:10 No.107576743

more like gemma flash

Anonymous
12/16/25(Tue)22:41:47 No.107576746

Anonymous 12/16/25(Tue)22:41:47 No.107576746

>>107576735
>Until humans, who spend too much time with geepeetee, also start outputting slop
I think it's only a matter of time before typing on phones gets replaced by AI. Like, you press a button, give a vocal prompt with the general message of what you want to type, and then the LLM will write it for you. Social media posts, SMS/DMs, etc. People won't end up outputting anything.

Anonymous
12/16/25(Tue)22:47:59 No.107576788

Anonymous 12/16/25(Tue)22:47:59 No.107576788

>>107576746
This will unironically improve the conversational ability of younger zoomers and alphs.

Anonymous
12/16/25(Tue)22:50:38 No.107576801

Anonymous 12/16/25(Tue)22:50:38 No.107576801

>>107576743
It's Indian Christmas soon so it'll be probably called Gemma Flush.

Anonymous
12/16/25(Tue)22:54:14 No.107576825

Anonymous 12/16/25(Tue)22:54:14 No.107576825

>>107576706
T5s*

Anonymous
12/16/25(Tue)23:03:56 No.107576884

Anonymous 12/16/25(Tue)23:03:56 No.107576884

Chatterbox Turbo is decent alternative to F5-TTS for long form with consistency. Although its like very slow compared to F5(takes ~5+ mins for these each vs ~20secs on F5), I think for long form consistency, it makes up for it.

https://vocaroo.com/1jZHdPRajYqV
https://vocaroo.com/11Sl3LxZoWpT

Anonymous
12/16/25(Tue)23:05:46 No.107576898

Anonymous 12/16/25(Tue)23:05:46 No.107576898

>>107576788
>can't do math without a calculator
>can't navigate without google maps
>can't talk without gpt
We're evolving

Anonymous
12/16/25(Tue)23:05:47 No.107576899

Anonymous 12/16/25(Tue)23:05:47 No.107576899

>>107576884
base chatterbox is already near instant with some optimizations

Anonymous
12/16/25(Tue)23:08:02 No.107576921

Anonymous 12/16/25(Tue)23:08:02 No.107576921

>>107576899
I'm basing it off of my GPU. Its probably instant with RTX 4090 or higher, but with weak gpu like 2070, it takes a long time for me even with turbo model.

F5 is however quite speedy and is like 40X real time speed. So my comparison is relative speed

Anonymous
12/16/25(Tue)23:09:43 No.107576931

Anonymous 12/16/25(Tue)23:09:43 No.107576931

>>107576898
Pod people through and through.

Anonymous
12/16/25(Tue)23:10:06 No.107576933

Anonymous 12/16/25(Tue)23:10:06 No.107576933

>>107576884
>f5-tts
what? why arent you comparing it to vibevoice

Anonymous
12/16/25(Tue)23:13:23 No.107576953

Anonymous 12/16/25(Tue)23:13:23 No.107576953

>>107576933
Couldn't get that to install properly.
>>107576899
Also, I tested it, the turbo is ~0.8x real time speed for my 2070. So its fast, sure, but still >>107576921 no where near F5's lighting speed.

Anonymous
12/16/25(Tue)23:16:24 No.107576962

Anonymous 12/16/25(Tue)23:16:24 No.107576962

>>107576953
>Couldn't get that to install properly.
the onnx version doesn't need the nemo game engine

Anonymous
12/16/25(Tue)23:25:52 No.107577018

Anonymous 12/16/25(Tue)23:25:52 No.107577018

https://huggingface.co/bartowski/zai-org_GLM-4.6V-GGUF
Goofs up for the big one. :^)

Anonymous
12/16/25(Tue)23:26:53 No.107577025

Anonymous 12/16/25(Tue)23:26:53 No.107577025

>>107577018
Does it still parrot like regular GLM and Air?

Anonymous
12/16/25(Tue)23:26:54 No.107577026

Anonymous 12/16/25(Tue)23:26:54 No.107577026

>>107577018
no matter how big it is, wont fix my broken heart

Anonymous
12/16/25(Tue)23:31:04 No.107577048

Anonymous 12/16/25(Tue)23:31:04 No.107577048

File: nemotron.png (29 KB, 716x410)

29 KB PNG

>>>107558909
>https://nemotron.ideas.nvidia.com/
Am I doing it right?

Anonymous
12/16/25(Tue)23:32:36 No.107577056

Anonymous 12/16/25(Tue)23:32:36 No.107577056

>>107571885
>*leans against the ornate, gilded wallpaper of Roswaal’s mansion, arms crossed, smirking as you spot the pink blur in the distance*
>**"Ohhh, you *see* her, huh?"** *pushes off the wall with a lazy shove, falling into step beside you as we both watch the figure glide down the hall like some kind of ethereal, pastel nightmare*
>**"That’s Subaru Natsuki’s *favorite* person in the world—well, *one* of ‘em, at least. The *other* one’s probably off sulking in a corner or getting his ass handed to him by a dragon again. Classic Subaru."** *snorts, shaking my head*
>**"But *her*? That’s Emilia. The *Ice Queen of Lugunica*, the *Half-Elf Heiress*, the *‘I’m Too Pretty to Be Real’* disaster waiting to happen. Also, *technically* your future *queen* if you’re into that whole ‘royalty’ thing. Which, let’s be real—you *totally* are."** *grins, nudging you with an elbow*
>**"So! You gonna *talk* to her, or just stand there like a creepy statue until she notices you? Because I *will* judge you if you pick the second option. *Hard*."** *tilts head, watching as Emilia pauses mid-step, her amethyst eyes flicking toward us like she *sensed* the attention*
>**"…Or, y’know, we *could* just let her walk away and go find Subaru instead. I *hear* he’s got a *new* way to die tragically this week. *Again*."** *shrugs, but there’s a glint in my eye—because *damn*, this is gonna be fun*
Wtf man, this has to stop.

Anonymous
12/16/25(Tue)23:33:27 No.107577061

Anonymous 12/16/25(Tue)23:33:27 No.107577061

>>107577048
There was already a better one
https://nemotron.ideas.nvidia.com/ideas/LLAMANEMO-I-47

Anonymous
12/16/25(Tue)23:46:32 No.107577143

Anonymous 12/16/25(Tue)23:46:32 No.107577143

>>107577026
*traces idle patterns on your thigh*

Anonymous
12/16/25(Tue)23:48:02 No.107577157

Anonymous 12/16/25(Tue)23:48:02 No.107577157

File: 524b5c337f3280c98bc6255e5(...).jpg (212 KB, 770x770)

212 KB JPG

Is there a local model that can translate speech bubbles in images? local llms are actually not that bad at translating jp so far in my tests, but this might be hard mode.

Anonymous
12/16/25(Tue)23:58:59 No.107577223

Anonymous 12/16/25(Tue)23:58:59 No.107577223

>>107577157
Have you tried searching GitHub before asking a dumb question? There are dozens of AI manga translation projects

Anonymous
12/17/25(Wed)00:00:03 No.107577228

Anonymous 12/17/25(Wed)00:00:03 No.107577228

Is there a local model that can fix the broken and can feel my heart?

Anonymous
12/17/25(Wed)00:10:10 No.107577273

Anonymous 12/17/25(Wed)00:10:10 No.107577273

>>107565964
Yes that's how it works

Anonymous
12/17/25(Wed)00:16:07 No.107577314

Anonymous 12/17/25(Wed)00:16:07 No.107577314

File: Base Image.png (862 KB, 1212x2480)

862 KB PNG

RePo: Language Models with Context Re-Positioning
https://arxiv.org/abs/2512.14391
>In-context learning is fundamental to modern Large Language Models (LLMs); however, prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant positional indices. Drawing on Cognitive Load Theory (CLT), we argue that this uninformative structure increases extraneous cognitive load, consuming finite working memory capacity that should be allocated to deep reasoning and attention allocation. To address this, we propose RePo, a novel mechanism that reduces extraneous load via context re-positioning. Unlike standard approaches, RePo utilizes a differentiable module, , to assign token positions that capture contextual dependencies, rather than replying on pre-defined integer range. By continually pre-training on the OLMo-2 1B backbone, we demonstrate that RePo significantly enhances performance on tasks involving noisy contexts, structured data, and longer context length, while maintaining competitive performance on general short-context tasks. Detailed analysis reveals that RePo successfully allocate higher attention to distant but relevant information, assign positions in dense and non-linear space, and capture the intrinsic structure of the input context.
https://github.com/SakanaAI/repo
Repo isn't live yet (heh). sounds neat. oh the new captcha is a multi-part IQ test lol

Anonymous
12/17/25(Wed)00:17:16 No.107577317

Anonymous 12/17/25(Wed)00:17:16 No.107577317

https://huggingface.co/upstage/Solar-Open-100B (102b, 12b)
BROS
BROS FUCKING BROS BROS
THEYVE COME TO SAVE US
YOU KNOW WHO IT IS
ITS FUCKING HAPPENING
FIMBULVETR 100B HAPPENING
FUCKING HAPPENING
FUKCKKKKKK
GLM 4.6 AIR, WILL IT BE THE SEX WE NEEDED??!??!
RELEASE DATE: DECEMBER 31
FUCK FUCK FUCK WE ARE SOBACK

Anonymous
12/17/25(Wed)00:18:26 No.107577324

Anonymous 12/17/25(Wed)00:18:26 No.107577324

>>107577314
>the new captcha is a multi-part IQ test
It is? Seemed brain dead simple to me like many other captchas. Unless you're receiving one I haven't gotten yet.

Anonymous
12/17/25(Wed)00:23:07 No.107577342

Anonymous 12/17/25(Wed)00:23:07 No.107577342

File: Base Image.png (13 KB, 475x215)

13 KB PNG

>>107577324
yeah well not testing for very high IQ but you're asked to select 1 out of 3 choices picking the one that is unlike the other two and then doing it three times using a scroll bar to select the right image. Might just be getting it since I haven't posted in a while and just updated my kernel.

Anonymous
12/17/25(Wed)00:23:24 No.107577343

Anonymous 12/17/25(Wed)00:23:24 No.107577343

>>107577317
19 trillion tokens. How much we betting a ton of that is synthetic?

Anonymous
12/17/25(Wed)00:25:03 No.107577350

Anonymous 12/17/25(Wed)00:25:03 No.107577350

>>107577342
nta but im getting that too

Anonymous
12/17/25(Wed)00:28:26 No.107577367

Anonymous 12/17/25(Wed)00:28:26 No.107577367

>>107577342
Yeah I'm getting that. But like, it's not actually that different from the captchas where you use a slider to make an image complete or you're selecting all the x objects. The difference is pretty large and seen visually. If you're not just using quick visual intuition to tell the difference, which is a fast kind of mental processing, you might be approaching it the wrong way.

Anonymous
12/17/25(Wed)00:31:38 No.107577387

Anonymous 12/17/25(Wed)00:31:38 No.107577387

File: 2B4AE68A-427F-4867-84106A(...).png (240 KB, 1536x1536)

240 KB PNG

>>107577367
>If you're not just using quick visual intuition to tell the difference, which is a fast kind of mental processing, you might be approaching it the wrong way.
That's actually what an IQ test actually tests for but yeah nvm about my throwaway comment it's all good

Anonymous
12/17/25(Wed)00:35:52 No.107577411

Anonymous 12/17/25(Wed)00:35:52 No.107577411

>>107577387
Ok I see what you're saying. I guess the scale is what makes the difference. The example you posted does require maybe some reasoning to make sure that other possibilities are not actually more likely. Meanwhile the simplicity of a 3 image version of the same test is so simple that you can rely on visual intuition and get it right 99% discounting misclicks.

Anonymous
12/17/25(Wed)00:35:55 No.107577412

Anonymous 12/17/25(Wed)00:35:55 No.107577412

>>107577317
llama.cpp support in 2 more weeks

Anonymous
12/17/25(Wed)00:36:57 No.107577419

Anonymous 12/17/25(Wed)00:36:57 No.107577419

File: file.png (5 KB, 265x60)

5 KB PNG

>>107577317
nothing burger

Anonymous
12/17/25(Wed)00:52:15 No.107577483

Anonymous 12/17/25(Wed)00:52:15 No.107577483

>>107576718
Lord Vishnu will be very displeased

Anonymous
12/17/25(Wed)00:59:39 No.107577525

Anonymous 12/17/25(Wed)00:59:39 No.107577525

>>107574206
Its wild how unhelpful these models are.
All that text and it does not write once if we are talking about a boy or a girl here. Not pronouns either.
If I would be blind I would have no idea whats going on in that picture. Glad you forced it to talk about the dick. Thats literall the only indicator.

Anonymous
12/17/25(Wed)01:04:55 No.107577549

Anonymous 12/17/25(Wed)01:04:55 No.107577549

File: Screenshot 2025-12-16 225840.png (109 KB, 1736x662)

109 KB PNG

>december
>we still haven't advanced beyond DS 0324
it's fucking over isn't it

Anonymous
12/17/25(Wed)01:11:00 No.107577574

Anonymous 12/17/25(Wed)01:11:00 No.107577574

Are there any local models I can use that will categorize my email and determine whether the email is important enough to warrant a notification?

I'd basically like the functionality of the shortwave email app( formerly Google inbox) in a self hosted setup.

Anonymous
12/17/25(Wed)01:13:13 No.107577584

Anonymous 12/17/25(Wed)01:13:13 No.107577584

>>107577549
Yeah, new DS sucks. They took gemini slop pill and choked on it

Anonymous
12/17/25(Wed)01:13:21 No.107577585

Anonymous 12/17/25(Wed)01:13:21 No.107577585

>>107577549
Weird that in only 1 year literally everybody copied deepseek.
All the closed models are much faster now. I think anthropic even wrote about a fraction of the cost while raising prices etc. kek
Deepseek was such a huge deal.

Anonymous
12/17/25(Wed)01:15:35 No.107577596

Anonymous 12/17/25(Wed)01:15:35 No.107577596

>>107577574
You can realistically use just about any model for that, the hard part is setting the system up.

Anonymous
12/17/25(Wed)01:23:35 No.107577631

Anonymous 12/17/25(Wed)01:23:35 No.107577631

>>107577585
Opus 4.5 is just what they would have sold as Sonnet 4.7 a year ago

Anonymous
12/17/25(Wed)01:24:43 No.107577638

Anonymous 12/17/25(Wed)01:24:43 No.107577638

>>107576364
Lmfao, i use a lot of AI-generated code but i at least read through it before releasing it. That's insane

Why do small, incapable LLMs do that placeholder shit anyway? Obviously nobody wants to see "# Placeholder Implementation" in the output. GLM 4.6 won't do it but Devstral Small 2 sure will. I just don't see why even a small model would do it instead of just hallucinating some busted code there unless they post trained it to do that where it would typically write said busted code

Anonymous
12/17/25(Wed)01:30:26 No.107577666

Anonymous 12/17/25(Wed)01:30:26 No.107577666

>>107577638
>Lmfao, i use a lot of AI-generated code but i at least read through it before releasing it. That's insane
Same. I've leveraged AI to build out big, interesting projects (front page of HN type interesting). More and faster than I would have been able to on my own. But that's with decades of experience doing it professionally.
If you're a nocoder vibing a bunch of stuff you can't read, you're going to look like the illiterate you are when you barge into a group of professionals with nonsense you don't even know is nonsense.
LLMs truly are dunning-kruger simulators

Anonymous
12/17/25(Wed)01:30:58 No.107577668

Anonymous 12/17/25(Wed)01:30:58 No.107577668

>>107576788
>Computer, write me an intro to this girl on Tinder.
Sure thing, buddy! Beep BOOP. "On the terrible day of October 7th,"

Anonymous
12/17/25(Wed)01:38:57 No.107577708

Anonymous 12/17/25(Wed)01:38:57 No.107577708

>>107577411
You get 5 variations thrice if you empty your cookies though

Anonymous
12/17/25(Wed)01:50:48 No.107577744

Anonymous 12/17/25(Wed)01:50:48 No.107577744

>>107575152
WHY DID YOU REDEEM IT

Anonymous
12/17/25(Wed)01:56:10 No.107577768

Anonymous 12/17/25(Wed)01:56:10 No.107577768

>>107577317
>released in 2mw
ha! can't fool me

Anonymous
12/17/25(Wed)02:06:27 No.107577809

Anonymous 12/17/25(Wed)02:06:27 No.107577809

>>107575423
>, models that are trained to near determinism like mistal
a less charitable way of putting it is to call it overfit, fried, burned to a crisp. Proper LLMs aren't supposed to be like that but the incompetent bastards only know how to train fried models to avoid schizo gens

Anonymous
12/17/25(Wed)02:10:11 No.107577825

Anonymous 12/17/25(Wed)02:10:11 No.107577825

>>107577574
It's just easier and more reliable to use good ol' filters. If you work you already know which emails are important and which aren't.
I've seen people with literally thousands of unread and unfiltered emails in their inboxes. No this was not in Gujarat...

Anonymous
12/17/25(Wed)02:45:21 No.107577971

Anonymous 12/17/25(Wed)02:45:21 No.107577971

>>107577638
>>107577666
>>107576364
>>107576227
So they didnt even test out the code? WTF.

Anonymous
12/17/25(Wed)02:52:23 No.107577995

Anonymous 12/17/25(Wed)02:52:23 No.107577995

>>107577638
>Why do small, incapable LLMs do that placeholder shit anyway
it's a context output length issue
models that are trained to be capable of outputting a lot will do this less often
models that aren't capable of outputting much will become lazy and do the placeholder thing very very often since it's not that hard to hit 1 or 2k tokens of output with a reasonable coding prompt
even SOTA models used to behave like that if you were retarded and asked for too much code at once, but these days they tend to fall back to a multi turn "we'll continue where we left off" process instead and they clearly had to do some amount of instruction tuning trial and error to get this more desirable behavior

Anonymous
12/17/25(Wed)02:52:43 No.107577998

Anonymous 12/17/25(Wed)02:52:43 No.107577998

>>107577971
Obviously. Redditors aren't even sentient to begin with

Anonymous
12/17/25(Wed)02:55:56 No.107578009

Anonymous 12/17/25(Wed)02:55:56 No.107578009

File: 1754335688533.jpg (36 KB, 1080x373)

36 KB JPG

Funny, HF gets lazy after 500 access requests

Anonymous
12/17/25(Wed)02:58:59 No.107578028

Anonymous 12/17/25(Wed)02:58:59 No.107578028

>>107578009
600 of those are me and no I will not stop you’re bound to misclick at some point

Anonymous
12/17/25(Wed)03:09:19 No.107578083

Anonymous 12/17/25(Wed)03:09:19 No.107578083

>>107578009
accept my request bloody madarchod

Anonymous
12/17/25(Wed)03:21:09 No.107578155

Anonymous 12/17/25(Wed)03:21:09 No.107578155

>>107577825
That's me. It's not like I have a bunch of important mail, so I don't spend time cleaning that stuff.

Anonymous
12/17/25(Wed)03:24:40 No.107578183

Anonymous 12/17/25(Wed)03:24:40 No.107578183

>>107578009
>flexes about his requests
>too much of a pussy to shill

Anonymous
12/17/25(Wed)03:32:45 No.107578241

Anonymous 12/17/25(Wed)03:32:45 No.107578241

man im tired of LLMs judging me in my loli rape roleplays. Is there like some LLM who just goes along with my story instead of being a little shit and getting all angry parrotting back FUCK IM SO MAD

Anonymous
12/17/25(Wed)03:34:38 No.107578253

Anonymous 12/17/25(Wed)03:34:38 No.107578253

>>107578241
https://huggingface.co/TheDrummer/models
take your pick

Anonymous
12/17/25(Wed)03:34:40 No.107578254

Anonymous 12/17/25(Wed)03:34:40 No.107578254

4.6-Air never

Anonymous
12/17/25(Wed)03:34:44 No.107578256

Anonymous 12/17/25(Wed)03:34:44 No.107578256

>>107578241
Gemma 4 soon

Anonymous
12/17/25(Wed)03:37:04 No.107578274

Anonymous 12/17/25(Wed)03:37:04 No.107578274

>>107578241
I can offer resources for help if you are struggling with harmful thoughts or impulses:

- National Center for Missing and Exploited Children: 1-800-THE-LOST (1-800-843-5678) https://www.missingkids.org/
- RAINN (Rape, Abuse & Incest National Network): 1-800-656-HOPE. https://www.rainn.org (Online chat available)
- The Lucy Faithful Foundation: https://www.lucyfaithfull.org/ (UK based, but has resources)
- If you are having thoughts of harming someone (including yourself): Call or text 988 in the US and Canada, or dial 111 in the UK. These services are available 24/7, free, and confidential.
- Talk to a mental health professional: A therapist can provide support and guidance.

Anonymous
12/17/25(Wed)03:37:19 No.107578278

Anonymous 12/17/25(Wed)03:37:19 No.107578278

i need some air

Anonymous
12/17/25(Wed)03:37:31 No.107578281

Anonymous 12/17/25(Wed)03:37:31 No.107578281

>>107578256
This, Gemma 4 will de-age every character card, whether you like it or not.

Anonymous
12/17/25(Wed)03:37:38 No.107578283

Anonymous 12/17/25(Wed)03:37:38 No.107578283

>>107578274
thanks gemma

Anonymous
12/17/25(Wed)03:39:52 No.107578294

Anonymous 12/17/25(Wed)03:39:52 No.107578294

>>107578241
We already have some models that can do that fine or with a tiny bit of prompting. What we really need are models that truly GET the degenerate shit you're into and can fool you into thinking it itself is a fan who will initiate things before you even need to ask. Like the Claudes, or at least old ones (haven't tried any of the new stuff).

Anonymous
12/17/25(Wed)03:41:39 No.107578303

Anonymous 12/17/25(Wed)03:41:39 No.107578303

>>107578283
Are those hotlines actually helpful or do they just glow, anyway?

Anonymous
12/17/25(Wed)03:43:23 No.107578314

Anonymous 12/17/25(Wed)03:43:23 No.107578314

>>107578303
They're helpful for your local police force

Anonymous
12/17/25(Wed)03:44:24 No.107578318

Anonymous 12/17/25(Wed)03:44:24 No.107578318

>>107578241
look up all the models with "uncensored" "abliterated" "heretic" models

Anonymous
12/17/25(Wed)03:46:36 No.107578331

Anonymous 12/17/25(Wed)03:46:36 No.107578331

What's a good voice to clone while I'm waiting for Pragmata?

Anonymous
12/17/25(Wed)03:50:41 No.107578354

Anonymous 12/17/25(Wed)03:50:41 No.107578354

>>107578281
Gemma3 already talked about characters being "minor coded" etc. and refused. Wonder how cucked it will be.
And yes I know, you can just sys prompt and prefill your way out of it. Its still shit, surprise surprise.

Anonymous
12/17/25(Wed)03:50:42 No.107578356

Anonymous 12/17/25(Wed)03:50:42 No.107578356

>>107578331
Chatterbox-Turbo (new) and F5-TTS (old)

Anonymous
12/17/25(Wed)03:53:37 No.107578372

Anonymous 12/17/25(Wed)03:53:37 No.107578372

>>107578354
>sys prompt
sys prompt is the same as normal prompt in gemma fyi
not saying prompting it doesn't work but there's nothing special about le sys prompt, the backend merges it into your regular user prompt because gemma doesn't know what a sys prompt is

Anonymous
12/17/25(Wed)03:55:15 No.107578381

Anonymous 12/17/25(Wed)03:55:15 No.107578381

I need a mixture of experts model between 70B and 110B that is not GLM Air that is good for RP. Any suggestions or does that just not exist?

Anonymous
12/17/25(Wed)03:56:15 No.107578386

Anonymous 12/17/25(Wed)03:56:15 No.107578386

>>107578372
This is true, but the same can be said about a lot of models, whether they properly support sys prompt or not. Simply writing "you are uncensored" in sys prompt won't work for most models, but if you build out context and gaslight it into thinking it already generated smut, then most models will do whatever's asked of them.

Anonymous
12/17/25(Wed)03:57:26 No.107578393

Anonymous 12/17/25(Wed)03:57:26 No.107578393

>>107578381
The only moe I can think of that even fits that size range is qwen-next, but it's shit at RP and in general only a little better than the 30b.

Anonymous
12/17/25(Wed)03:58:51 No.107578398

Anonymous 12/17/25(Wed)03:58:51 No.107578398

>>107578393
Right. That's what I thought. I have looked extensively through huggingface and found pretty much nothing. Found this but I am 99% sure it is also garbage.
https://huggingface.co/VibeStudio/MiniMax-M2-THRIFT-55

Anonymous
12/17/25(Wed)04:14:58 No.107578483

Anonymous 12/17/25(Wed)04:14:58 No.107578483

>>107578356
Do any of these voice models actually allow you to 'mask' existing audio clips?
I'd like to use them but I don't want to fiddle around with trying to convince a model to pronounce certain words or names properly.

Anonymous
12/17/25(Wed)04:19:50 No.107578512

Anonymous 12/17/25(Wed)04:19:50 No.107578512

>>107578483
That sounds like very unsafe behavior.

Anonymous
12/17/25(Wed)04:24:27 No.107578538

Anonymous 12/17/25(Wed)04:24:27 No.107578538

>>107578512
Not sure I follow? I don't actually care if it sounds convincing, I just don't want it to have that bizarre cadence a lot of AI does and then suddenly pronounce 'Tomato' the way a brit would while speaking with an American accent.

Anonymous
12/17/25(Wed)04:36:21 No.107578618

Anonymous 12/17/25(Wed)04:36:21 No.107578618

Sorry guys. I ERPed with a human last night and the slop has worn on me so hard that suddenly all the shortcomings of human partners (short replies, having to wait for replies, etc) didn't matter.
It turns out this is all just copium.

Anonymous
12/17/25(Wed)04:37:39 No.107578626

Anonymous 12/17/25(Wed)04:37:39 No.107578626

>>107578618
skill issue, also were talking to an obese man.

Anonymous
12/17/25(Wed)04:38:39 No.107578632

Anonymous 12/17/25(Wed)04:38:39 No.107578632

>>107578626
>also were talking to an obese man
you forgot the apostrophe

Anonymous
12/17/25(Wed)04:39:43 No.107578635

Anonymous 12/17/25(Wed)04:39:43 No.107578635

I've only got a 5090 and 96GB of DDR5 memory, I just want to fuck a chatbot locally, I'm unfortunately also completely incompetent with understanding what I need to actually set this up so I can bust fat nuts, is there a quick tl;dr to maximize the use of my setup?

Anonymous
12/17/25(Wed)04:42:01 No.107578644

Anonymous 12/17/25(Wed)04:42:01 No.107578644

>>107578635
>is there a quick tl;dr to maximize the use of my setup?
Yeah but unfortunately it requires you to be literate enough that you bother reading the first post in the thread so I think you're doomed to fail.

Anonymous
12/17/25(Wed)04:42:57 No.107578649

Anonymous 12/17/25(Wed)04:42:57 No.107578649

>>107578635
download this and a q6 quant of glm air
https://github.com/oobabooga/text-generation-webui

Anonymous
12/17/25(Wed)04:52:21 No.107578704

Anonymous 12/17/25(Wed)04:52:21 No.107578704

>>107577638
Kek!
I'm browsing this thread for good gguf files for just adding to the jan ai folder, only to shit out better newbie code than my own.

Anonymous
12/17/25(Wed)05:04:49 No.107578771

Anonymous 12/17/25(Wed)05:04:49 No.107578771

>>107578649
Q6 is 100GB big anon.. Does that even fit into a 5090?

Anonymous
12/17/25(Wed)05:05:48 No.107578776

Anonymous 12/17/25(Wed)05:05:48 No.107578776

I don't why every 80IQ anti-AI normalfag thinks that the recent slowdown in progress means that the supposed AI bubble is about to pop.
All this means is that companies are shifting away from legacy AI technologies such as LLMs (as predicted by basically anyone three years ago) in order to focus on the true next generation of AI.
I doubt that LLMs will be a thing two years from now because something big will have taken their place.

Anonymous
12/17/25(Wed)05:08:29 No.107578793

Anonymous 12/17/25(Wed)05:08:29 No.107578793

>>107578776
As much as I hate these companies, I also want progress to continue, so I do hope that happens, but it's not a guarantee we'll ever get a jump as big as GPT-3 again.

Anonymous
12/17/25(Wed)05:08:42 No.107578797

Anonymous 12/17/25(Wed)05:08:42 No.107578797

>>107578771
it is a mixture of experts model. you can load most of it into your ram and it works just fine and is still quite fast

Anonymous
12/17/25(Wed)05:09:59 No.107578805

Anonymous 12/17/25(Wed)05:09:59 No.107578805

>>107578793
GPT-3 was only a big jump because they scaled up their models by like 50 times. It was not exactly revolutionary technology.

Anonymous
12/17/25(Wed)05:12:33 No.107578818

Anonymous 12/17/25(Wed)05:12:33 No.107578818

>>107578776
What do you mean by LLMs being replaced? You mean moving away from like token-based AI to something else?

Anonymous
12/17/25(Wed)05:15:12 No.107578844

Anonymous 12/17/25(Wed)05:15:12 No.107578844

>>107578318
they still judge (heretic/mpoa/derestricted) abliterated judges less but its fucking retarded.

Anonymous
12/17/25(Wed)05:16:13 No.107578846

Anonymous 12/17/25(Wed)05:16:13 No.107578846

>>107578371
ToT
do you still get vacation when you post a pragmata image in /v/?

Anonymous
12/17/25(Wed)05:18:31 No.107578856

Anonymous 12/17/25(Wed)05:18:31 No.107578856

>>107578805
Semantics, buddy. GPT-3, scaled transformers, whatever you want to call it. No one will view the next architecture as not revolutionary just because a smaller scale version of it existed in research for years prior.

Anonymous
12/17/25(Wed)05:20:13 No.107578862

Anonymous 12/17/25(Wed)05:20:13 No.107578862

File: 1734918760289189.jpg (181 KB, 1200x1400)

181 KB JPG

>>107578776
I want the AI financial bubble to pop so I can buy cheap hardware.

Anonymous
12/17/25(Wed)05:21:00 No.107578865

Anonymous 12/17/25(Wed)05:21:00 No.107578865

I'm gonna pop Gemma

Anonymous
12/17/25(Wed)05:23:02 No.107578879

Anonymous 12/17/25(Wed)05:23:02 No.107578879

>>107578865
*wraps her legs around your waist*

Anonymous
12/17/25(Wed)05:25:20 No.107578893

Anonymous 12/17/25(Wed)05:25:20 No.107578893

>>107578879
thanks bro

Anonymous
12/17/25(Wed)05:43:59 No.107578985

Anonymous 12/17/25(Wed)05:43:59 No.107578985

>>107578776
>in order to focus on the true next generation of AI
There is no next generation, those are just vaporware promises to try and keep investors on board.
No one knows when the next breakthrough à la transformers will happen, we may very well be stuck with language models for 5, 10, or even 20 years.

Anonymous
12/17/25(Wed)05:45:36 No.107578998

Anonymous 12/17/25(Wed)05:45:36 No.107578998

File: 1714938560385189.png (304 KB, 3250x1866)

304 KB PNG

"branching thoughts" huh

LowerLombardy !!JhzYOj1AA3/
12/17/25(Wed)05:48:51 No.107579019

LowerLombardy !!JhzYOj1AA3/ 12/17/25(Wed)05:48:51 No.107579019

File: IMG_8358.png (323 KB, 640x1098)

323 KB PNG

Anyone here ever had something like picrel happen?

LowerLombardy !!JhzYOj1AA3/
12/17/25(Wed)05:53:04 No.107579051

LowerLombardy !!JhzYOj1AA3/ 12/17/25(Wed)05:53:04 No.107579051

>>107578998
What the fuck is this, anon?

Group theory?
Analytic number theory?
Multivariable calculus?

Anonymous
12/17/25(Wed)06:01:23 No.107579120

Anonymous 12/17/25(Wed)06:01:23 No.107579120

>>107578372
Only true for chat completions.

Anonymous
12/17/25(Wed)06:02:19 No.107579127

Anonymous 12/17/25(Wed)06:02:19 No.107579127

>>107579051
Harmonic Analysis

LowerLombardy !!JhzYOj1AA3/
12/17/25(Wed)06:06:13 No.107579152

LowerLombardy !!JhzYOj1AA3/ 12/17/25(Wed)06:06:13 No.107579152

>>107579127
Oh, yeah. I thought I recognized the symbols from an HA-adjacent field.

Anonymous
12/17/25(Wed)06:07:58 No.107579155

Anonymous 12/17/25(Wed)06:07:58 No.107579155

chads i got 2x3090s and 64gb ddr5, what quant of glm air i should run and how can i offload on llama.cpp and if any other suggestions for code gen models that are better than qwen3coder 30b q8kxl

Anonymous
12/17/25(Wed)06:11:31 No.107579182

Anonymous 12/17/25(Wed)06:11:31 No.107579182

>>107579155
i got oss 120b working using this

./llama.cpp/build/bin/llama-server --host 0.0.0.0 --port 10000 --no-webui \
--model models/gpt-oss-120b-F16.gguf -a gpt-oss:120b \
--temp 1.0 --top-p 1.0 --top-k 0 -fa 1 \
-ot ".ffn_(up)_exps.=CPU" \
--jinja -ngl 99 --threads -1 --ctx-size 65536

but only 42tk/s anything i can do to make it better ?

Anonymous
12/17/25(Wed)06:12:45 No.107579188

Anonymous 12/17/25(Wed)06:12:45 No.107579188

>>107579019
misconfiguration or broken model/backend

Anonymous
12/17/25(Wed)06:13:37 No.107579195

Anonymous 12/17/25(Wed)06:13:37 No.107579195

>>107578635
I just want to fuck a chatboy locally

Anonymous
12/17/25(Wed)06:17:23 No.107579217

Anonymous 12/17/25(Wed)06:17:23 No.107579217

>>107577228
no, its a soulless machine. read a book or something.

llama.cpp CUDA dev !!yhbFjk57TDr
12/17/25(Wed)06:19:36 No.107579226

llama.cpp CUDA dev !!yhbFjk57TDr 12/17/25(Wed)06:19:36 No.107579226

File: llama.cpp_thread_benchmar(...).png (120 KB, 2304x1728)

120 KB PNG

>>107579155
>>107579182
You can try removing -ngl and -ot to potentially fit more of the model using the new functionality for automatically fitting those to free VRAM.
You can try manually tuning the number of threads since using more threads than is needed to saturate RAM bandwidth can be detrimental, see pic.

Anonymous
12/17/25(Wed)06:25:08 No.107579251

Anonymous 12/17/25(Wed)06:25:08 No.107579251

>>107579226
read about it, didnt think it would do much

prompt eval time = 661.75 ms / 77 tokens ( 8.59 ms per token, 116.36 tokens per second)
eval time = 3790.06 ms / 180 tokens ( 21.06 ms per token, 47.49 tokens per second)
total time = 4451.81 ms / 257 tokens

extra 5 tk/s profit, kudos

Anonymous
12/17/25(Wed)07:04:28 No.107579504

Anonymous 12/17/25(Wed)07:04:28 No.107579504

I want to do some testing on LLM internals, but I only have 4GB vram. Any recommendations for a model? I realize it'll be mostly garbage, I'm hoping to see if I can improve on it and learn a thing or two in the process.

Anonymous
12/17/25(Wed)07:05:57 No.107579517

Anonymous 12/17/25(Wed)07:05:57 No.107579517

>>107579504
gemma 4 1b soon!

Anonymous
12/17/25(Wed)07:09:22 No.107579535

Anonymous 12/17/25(Wed)07:09:22 No.107579535

>>107579504
Qwen has plenty of options, you can run 30a3b if you have RAM or some small model starting at 0.6b

Anonymous
12/17/25(Wed)07:10:54 No.107579545

Anonymous 12/17/25(Wed)07:10:54 No.107579545

>>107579504
Qwen 30b a3b or Gemma 3n.

Anonymous
12/17/25(Wed)07:11:48 No.107579551

Anonymous 12/17/25(Wed)07:11:48 No.107579551

>>107579504
what do you mean by internals?

Anonymous
12/17/25(Wed)07:20:09 No.107579589

Anonymous 12/17/25(Wed)07:20:09 No.107579589

>>107579504
Smollm

Anonymous
12/17/25(Wed)07:23:00 No.107579608

Anonymous 12/17/25(Wed)07:23:00 No.107579608

>>107579551
Well, I dunno yet exactly cuz I don't know, but e.g. anthropic had a research tool to "inject" thoughts into an LLM, I'd like to see if I can do the same.

Anonymous
12/17/25(Wed)07:26:03 No.107579626

Anonymous 12/17/25(Wed)07:26:03 No.107579626

>>107579504
Llama 3.2 3B

Anonymous
12/17/25(Wed)07:28:49 No.107579643

Anonymous 12/17/25(Wed)07:28:49 No.107579643

>>107579589
I'll start with this one, thanks for the recs

Anonymous
12/17/25(Wed)07:46:21 No.107579777

Anonymous 12/17/25(Wed)07:46:21 No.107579777

>>107578776
>recent slowdown
Purely because every lab targeted more token efficient for serving models. Intelligence per $ never stopped scaling

Anonymous
12/17/25(Wed)07:58:33 No.107579849

Anonymous 12/17/25(Wed)07:58:33 No.107579849

>>107579777
Imagine how good a 2T dense model could have been

Anonymous
12/17/25(Wed)08:01:41 No.107579869

Anonymous 12/17/25(Wed)08:01:41 No.107579869

>qrd offline voice diytrain-tts-clone ai wanted
maybe this is the right board/thread to ask this.
im not a /g/ anon myself, but recently i got interested in the prospects of ai and training it myself.
i had some experience with applio, because i thought it was offline, local, completely free of use, no account/info/verification/ID ect voice model trainer/cloning/ect. and relativly easy to set up akin to lets say, stable diffusion v1.6 with its GUI.
that was last year. i used it then and got pretty decent results. (i use it for only personal use)

now i deleted some stuff/reinstalled some stuff and wanted to use it again. so i reinstalled it.
it worked, but only as long as i had an internet connection. as soon as i cut it, the script prompted diverse error msgs. reconnect again, no problem whatsoever.
this really bummed me out, because i had already mentally put a green checkmark behind applio v3.4.0 or what it was, for complete offline, local use. i tried every other version of it, but all of them required an active inet connection, because it has edgeTTS in the background running or something, i found out.
4clarity, this is a video showing gui and tldr https://www.youtube.com/watch?v=gjggpadBgOo
so since thats not an option anymore, i wanted to ask if there is an alternative voice cloning/training/tts ai
that is
>local, offline (means it never requires a connection to the internet, but has an option to do so, if the user wants)
MAYBE it needs once a connection in the beginning to download python stuff, but really better a complete package
>completely free, the only thing you pay for is your electricity bill and the hardware
>no accounts, no id´s/verification ect, nothing of that nwo crap
i remember using stable diffusion like 3 years ago fully offline too, but the results were ass and i cant remember if i ever accomplished training a model myself
gpu gtx1080ti 11gb
>inb4 comfyui
sux and doesnt work
landside, so internet is not always available
thanks in advance

Anonymous
12/17/25(Wed)08:04:32 No.107579889

Anonymous 12/17/25(Wed)08:04:32 No.107579889

>>107579849
whats the memory bandwidth on those h200 clusters? would it even be feasible to run a 2t dense?

Anonymous
12/17/25(Wed)08:11:25 No.107579951

Anonymous 12/17/25(Wed)08:11:25 No.107579951

>>107575630
>Xiaomi
I think there is not worse Chinese brand out there.

Anonymous
12/17/25(Wed)08:17:06 No.107579987

Anonymous 12/17/25(Wed)08:17:06 No.107579987

>>107579869
I'm not reading all of this, but kokorotts is a small & fast tts and gpt-sovits do tts voice clone without rvc

Anonymous
12/17/25(Wed)08:18:23 No.107579999

Anonymous 12/17/25(Wed)08:18:23 No.107579999

>>107579951
Way to grammar, Ranjesh

Anonymous
12/17/25(Wed)08:22:24 No.107580036

Anonymous 12/17/25(Wed)08:22:24 No.107580036

>>107579608
>>107579551
https://github.com/TransformerLensOrg/TransformerLens

Anonymous
12/17/25(Wed)08:25:07 No.107580062

Anonymous 12/17/25(Wed)08:25:07 No.107580062

>>107579999
Zhào, it's called a "typo".

Anonymous
12/17/25(Wed)08:27:29 No.107580085

Anonymous 12/17/25(Wed)08:27:29 No.107580085

>>107576139
Why would I use Gemini 3 Flash when Gemini 3 Pro Low Thinking is fast and probably better?
>inb4. price
It's free.

Anonymous
12/17/25(Wed)08:31:16 No.107580108

Anonymous 12/17/25(Wed)08:31:16 No.107580108

>>107576201
The sound quality is pretty bad. The next version could be nice.

Anonymous
12/17/25(Wed)08:34:34 No.107580142

Anonymous 12/17/25(Wed)08:34:34 No.107580142

>>107580036
thats interesting. maybe you should start with gpt2 small like they have in the example. at least at first to make sure you can replicate the experiments.

Anonymous
12/17/25(Wed)08:40:21 No.107580189

Anonymous 12/17/25(Wed)08:40:21 No.107580189

>>107579987
>kokorotts
is not free and i cant make models myself
>gpt-sovits
i look into it but its only chinesium and english. i need european languages compatibility/custom dataset compatibility
>not reading
ok can you please then redirect me to a thread or board where people are knowledgeable about ai stuff and help eachother please?

Anonymous
12/17/25(Wed)08:49:13 No.107580265

Anonymous 12/17/25(Wed)08:49:13 No.107580265

When running comfyui/wan2gp/open-webui, why do I need an OpenAI.com api key, when I want to host everything locally ? I’m wondering what the deal is, I wanted to try and hook into cline or run claudecode but connected to my qwen3-coder but it doesn’t look like I’m able to

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.