[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: muah.jpg (1.08 MB, 3840x2160)
1.08 MB
1.08 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106931567 & >>106919198

►News
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap
>(10/14) Qwen3-VL 4B and 8B released: https://hf.co/Qwen/Qwen3-VL-8B-Thinking
>(10/11) koboldcpp-1.100.1 prebuilt released with Wan video generation support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.100.1
>(10/10) KAT-Dev-72B-Exp released: https://hf.co/Kwaipilot/KAT-Dev-72B-Exp

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1737045152592995.png (104 KB, 271x238)
104 KB
104 KB PNG
►Recent Highlights from the Previous Thread: >>106931567

--Managing LLM roleplay output length and context constraints:
>106934774 >106934844 >106934851 >106934875 >106935020 >106935091 >106935187 >106934923 >106934934 >106935071
--Controlling LLM response length in SillyTavern via prompt and token settings:
>106939598 >106939607 >106939667 >106939712 >106939716 >106939755 >106939775 >106939833 >106939722 >106939846 >106939920
--RTX 3090 VRAM optimization for GLM 4.5 Air Q4_K_M:
>106936224 >106936236 >106936250 >106936242 >106936252 >106936255 >106936256 >106936265 >106936272 >106936281 >106936299 >106936298 >106936318 >106936345 >106936358 >106939535 >106939599 >106939630 >106939645 >106939656
--AMD MI50 GPU support challenges on Windows:
>106932749 >106932759 >106932767 >106933044 >106933269
--Skepticism about IQ quants' perplexity claims vs memory footprint:
>106932259
--Huggingface storage issues and exploration of specialized models:
>106931969 >106932013 >106932065 >106933257 >106932577 >106932603 >106932616 >106932631 >106932638 >106932642 >106932672 >106932593 >106933086
--Troubleshooting llama-bench.exe model loading and VRAM utilization:
>106933415 >106933561 >106933766 >106933782 >106933802 >106933834
--llama.cpp PR adds auto GPU memory optimization:
>106931647
--LoRA compatibility and quantization challenges in multi-GPU environments:
>106932458 >106936985
--lora-scaled works in llama.cpp with CUDA but has AMD/Vulkan issues:
>106937526
--Evaluating hybrid LLM setup for long-form roleplaying:
>106934305 >106934341 >106934400
--Optimizing DeepSeek V3.1 Terminus sampler settings for consistent output:
>106937102 >106938287
--Quantized model sharing and prompt testing guidance:
>106940264 >106940288 >106940307 >106940742 >106940768
--Miku (free space):
>106932492 >106935530 >106936320 >106936322 >106936336 >106936523 >106940713

►Recent Highlight Posts from the Previous Thread: >>106931573

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
zutt-7b when?
>>
>>106940821
>>106940768
I used link rel as a completion test:

files.catbox.moe/q768fb.txt

And used the following command via llama.cpp

./build/bin/llama-cli -m ./rp-sft-merged_1000-f16.gguf -f Nala-Test_Gemma2.txt
>>
>>106940883
>>106940864
>>106940768
>>106940742
>>106940307
>>106940264

Redid the test with ollama this time and got a different result:

(She lets out a soft roar as she tries to suck on your cock. It's not the first time you've been sexually assaulted by a female animal, but it never gets old.) "I
think I know what will help." *She grins.*
>>
The Government™ bans all LLM's and you can't download any LLM's ever again. What three models from your collection that you already downloaded would you be glad you saved ahead of time?
>>
>>106941035
lolwut
>>
>>106941053
Rocinante, Cydonia and perhaps Nemo.
>>
>>106941053
Under what circumstances would that even happen?
>>
>>106941139
It's a desert island kind of question. Do you understand hypotheticals?
>>
>>106941227
But the government has not banned LLMs. Why would they?
>>
>>106941227
Well I guess which model I would use depends on what kind of rig I have access to on that island. My current shit rake can only run Q2K quants so I guess I better get to upgrading soon
>>
>>106941278
>>106941053
The ones that are popular are too dick-sucky two, which means the powers that be will especially not want to get rid of them. At least not the product as a service ones
>>
>>106941227
>Do you understand hypotheticals?
Anon doesn't understand what a hypothetical is.
Anyways,

>>106941053
>that you already downloaded
GLM air and Qwen 3 30B A3B.
>>
yoooo thanks for whoever posted about the -n-cpu-moe command.. took me a while to figure out some working settings, but damn it's so much faster on GLM-4.6 IQ2_S now.. at least double the speed
>>
>>106941278
...
>>106941281
Fucking hell... I'm done for today...
>>106941302
Yeah...
>>
>>106941343
The magic of MoE for local.
Since only a fraction of the model is running at a time, you can throw the sparse part of the model in RAM and it won't run crazy slow.
Is it better than fitting the whole model in VRAM? No, but it's the second best thing as far as home inference goes.
>>
>>106941369
Can this be utilized for a model what clearly doesn't even fit to ram? I mean obviously it will still work but is there a speedup when compared to regular memory mapping?
>>
>>106941398
You mean for dense models?
Not --c-cpu-moe specifically, but you can use --override-tensor/-ot and fuck around with moving specific tensors to VRAM and see if that gets you any speed up over just moving whole layers with -ngl.
It shouldn't, but I've seen reports to the contrary in the past so who knows.
>>
>>106941413
I was talking about moe models, not dense.
>>
>>106940821

>>106940820
I forgot to mention the chat template is Gemma. I have the same issue where if I forgot to include the
--chat-template gemma
flag then the model would immediately start talking about random shit ad Infinitum because llama-cli by default expects your prompts to be in the prompt format. The model expects. Using that flag fixed the issue. So maybe you need to tell your web UI/ inference engine to use that prompt template
>>
>>106941443
>>106941492
>>
>>106941501
That's not what was causing the engine to spit out text. And definitely though. I'm not used to running llama-cli so I forgot that you have to either manually wrap your prompts in the prompt template or pass the chat template flag in order for you to be able to properly use it. I get that you're hyperfixated on using that exact format but that's not what was causing the errors like at
>>
Can I run GLM Air on a computer with 16Go VRAM and 32 Go RAM ?
>>
>>106941522
Anon was told countless times his chat format is wrong. He doesn't understand them. Period.
It's not about
>using that exact format
It's about calling it gemma (or whatever all the others he claimed were) when it's clearly not.
>But it answers fine
Cope. It's not. Not once. Not EVEN ONE TIME he posted a model with a correct chat template.
>>106941553
COUNTLESS TIMES!

Just make your own format, call it blargjarg and be done with it.
>>
>>106941565
Can you?
Yeah, a really small quant like Q2 I guess.
>>
>>106940821
>>106941521
>>106941580
>>106941492
>>106941588
>>106941576
Did you read the text file it uploaded?

files.catbox.moe/q768fb.txt

The example written here >>106940864 (You) was just written my me on the fly as a rough explanation as to how you're supposed to format your prompts.

My initial reply about The prompt template was to address a potential cause for why another anon why another anon was seeing a bunch of numbers as the output. A prompt template fuck up alone would not be causing that
>>
>>106941597
Sorry I meant to say is it fine or is it recommended to run something else like qwen 3 30b.
>>
>>106940821
Haven’t been here in over a year. Any major breakthroughs for local cooming?
>>
>>106941634
It's probably going to be better than Qwen 3 at roleplaying even at that low a quant, but it'll also be substantially slow.
So, maybe I guess. Give it a go.
>>
File: npcworldwide - Copie.gif (2.4 MB, 400x225)
2.4 MB
2.4 MB GIF
currently having fun i'm making my MultiBotroom generating python files for creating new mutibotroom.it's pretty fun not gonna lie ,not local technically but somehow i can go full retard with requests.Now i'm just playing with chat gemini by creating new bot like "organisator bot" or "super coder bot" lol it's like the sims.
>>
A little possibly common sense thing I came to realize when I started actually using models instead of waiting for a good one. One way to work around hallucinations is turning everything upside down. Don't make the model think for you, but tell it to be your satan who pokes holes in your thoughts ideas or arguments. Start with a few obviously wrong things, so you have a prefill of it telling you you are wrong then continue. Possibly also explicitly tell it not to consider your feelings when it responds. Even if the critique by AI is wrong and hallucinated you avoid the trap of accepting a hallucination. Your brain will have to consciously evaluate and refute criticism. Again not sure how obvious it is.
>>
>>106941227
But I don't eat breakfasts.
>>
>>106941697
>retarded frog
>>
File: 223urk92ysuf1.png (9 KB, 546x322)
9 KB
9 KB PNG
drummer do anubis 2
>>
>>106941677
You can stop using nemo if you aren't completely broke.
>>
>>106941278
How would you feel if you didn't eat breakfast today?
>>
GOOD MORNING MANY BLESSING OF LORD VISHNU
SIRS ARE YOU READY FOR WEEK OF GEMINI 3 AND GEMMA 4 HYPE? ?
GOOGLE WEAPON OF BHARAT DEFEAT CHINA AND AMERICA
NEXT ERA KINDLY MAKE TOTAL BHARAT DOMINANCE ERA TIMELINE SIR???
GEMMA 4 DEFAT BASTARD BENCHOD GLM 4.6
>>
File: r u home yet.jpg (189 KB, 1024x1024)
189 KB
189 KB JPG
>>
alarm lighter trees
>>
>>106942138
hi sexy come to canada i wecom you love you very muh many kiss
>>
>>106942138
miku please put on your skirt and panties. my neighbors keep complaining they can see your ass whenever you come over and visit
>>
>>106941963
https://huggingface.co/BeaverAI/Anubis-70B-v1p-GGUF

Test model, might need to patch it up still.
>>
>>106941343
>but damn it's so much faster on GLM-4.6 IQ2_S now.. at least double the speed
If llama.cpp ever gets MTP it will get even faster.
>https://github.com/ggml-org/llama.cpp/pull/15225
>>
File: 1757799841418552.mp4 (1.54 MB, 1080x1094)
1.54 MB
1.54 MB MP4
>>106940821
Based on that t/s shown in bid rel, what kind of hardware do you think it's running on? How many parameters would you guess the model is?
>>
We need a ST extension that splits sampler settings for <think> and non-think segments. You want the CoT to be smart while the real output needs to be creative.
If you tune for non-think, CoT becomes stupid/schizo. If you tune for CoT, the normal output becomes boring AI assistant slop.
Thank me later.
>>
>>106942333
>gemma2
>q8 2.59gb
it's running on a phone congrats
>>
File: ko.jpg (149 KB, 1079x1079)
149 KB
149 KB JPG
is kobo winning or losing at the moment
>>
>>106942423
Nta. This reminds me. Since a bunch of Twitter people and redditors keep bitching about "being 4o back" and claiming that open the eye is censoring gbt5, why don't they just learn how to run their own models on their own hardware? Even if they don't have beefy gpus, they can even run this type of shit on their own phones (the which model they can use depends on them the amount of ram their phone has and how willing they are to tolerate lower t/s). Is Doug anything local really THAT hard for them?
>>
>>106942449
>why don't they just learn how to run their own models on their own hardware?
>they can even run this type of shit on their own phones
even if they could, the effective context length will be severely limited. 4o is 1st on nolima.
normies would just get the ick once they find out their local waifu is utterly incapable of remembering things and loses identity after a few prompts.
>>
>>106942505
this ^ .. the shit running slowly is bad enough, but it forgetting what happened every 10 prompts makes it pretty useless
>>
>>106942443
A bit of both. It is nice that they have antislop and TFS samplers, but they also ship some retarded defaults and are slower than mainline llama. I hope gemini 3 is good enough to re-implement antislop in ik_llama.
>>
>>106942449
>good model
>phone
nice joke
>>
How's going with the french? Are they still struggling with implementing thinking(something even fucking drummer managed to do)? Will they try distilling GLM 4.6 now? Still no Largestral 3 after 5 months?
>>
>>106941053
Mag Mell 12b for goonery and RP
qwen coder 14b for vibin'
mistral small 2407 for gay 'chatgpt' style infobot
>>
>>106942664
Nothing against drummer btw, at least he is not Undi or DavidAU.
>>
File: 1734486055728162.jpg (173 KB, 1000x1403)
173 KB
173 KB JPG
>>106942178
This never happened, nobody would ever complain about seeing Miku butt
>>
>>106942726
maam you have a virus pleat to show veranda to confirm removval? okay
>>
>>106942200
You are so talented. It is an honour to post in the same thread with you.
>>
Recent explainer video about double descent: https://www.youtube.com/watch?v=z64a7USuGX0
>>
>>106942340
Sampling is done by the server thoughbeit, you'd have to stop generation at the </think> send a second request with new sampler params and wait again to pp the thinking. Might be better to modify server with a "apply this sampler config when matching x in the output" param
>>
>>106943060
>Sampling is done by the server thoughbeit, you'd have to stop generation at the </think> send a second request with new sampler params and wait again to pp the thinking.
it's not like that would be especially hard to do, probably easier than the server side option
all the parts are already there
>>
File: VIhlxwZ.jpg (200 KB, 764x512)
200 KB
200 KB JPG
Wanted to get into LLMs for having an AI dommy gf in my terminal but ollama doesnt wanna download the nemo thingy. Any other relatively small good ones? Or am I just trying to download the wrong one?
>>
>>106943129
Quit using ollama
>>
>>106943129
what would you recommend instead?
>>
>>106943060
Doesn't llama.cpp already lower the temperature when a tool call is detected?
Wouldn't be hard to extend that to cover reasoning as well.
>>
>>106943189
But ollama lets you run full r1 with just 8 gigs of vram
>>
>>106943129
Use ooba or kobold.cpp. Manually download nemo instruct q8 gguf and use that. Also use sillytavern instead of your terminal for RP.
>>
>>106943129
Got mistral-nemo-instruct-2407 to run in LM Studio but it doesnt want to do sexual stuff. What the fuck? Do i need to do anything else?
>>
>>106942340
>>106943060
>>106943230
Intuitively I would've thought you want the CoT portion to run at HIGHER temperature. Because during that phase the model is basically exploring different paths and could benefit from more creativity. And the real output at a lower temp because it's basically just repeating the answer it found in its CoT without making mistakes.
>>
>>106943404
Why are you replying to yourself?
>>
>>106943516
because I'm tired and messing it up every time
>>
>>106943535
>get llama.cpp
>get silly tavern
>launch llama-server with said mistral gguf
>launch silly tavern and connect it to llama server ip
This takes some familiarising but once you get things going it's always there. I'd recom you should rest and then proceed with this plan here.
>>
>>106943568
But what would be the difference between running it in LM Studio? Because it does run, but just doesn't seem to be uncensored. Do I need to do some additional step that I'm missing?
>>
Going to try to use the Llama 405B finetoon I made over the weekend to make it do my homework
>>
>>106943586
LM Studio is gay.
>>
anybody got any prompts they want me to try on ling?
>>
>>106943825
Have you asked it what a mesugaki is yet?
>>
Recommendations on any good lorebooks/settings for sillytavern?
>>
ik_llama bros LIED
I tried 4.5-air with ik_llama.cpp on cpu only, it was complete ass, like 3x slowdown compared to llama.cpp main. I tried iq4 from both bartowski and ubergarm and it's just way slower. The only flag I used was --no-mmap
Using AMD ryzen 6
>>
File: ling-mesugaki.png (187 KB, 1402x550)
187 KB
187 KB PNG
>>106943850
here you go anon. anybody else?
>>
File: 1735002136443647.gif (2.64 MB, 320x240)
2.64 MB
2.64 MB GIF
>>106944051
>problematic
>>
>>106943992
llama.cpp caught up on everything but pp a while ago
>>
>>106943992
>The only flag I used was --no-mmap
No fmoe, rtr, etc? The ik specific flags?
>>
>>106944253
Nah. I scoured the archive for ik_llama and didn't see anyone mentioning flags. The only people mentioning flags were using gpu offloading. However several people reporting speedup from simply switching to ik_llama, so flags like that aren't supposed to be required (whatever they do doesn't really matter)
>>
among the models i tried, stheno is still my favorite for generation style, but being 8B makes it too dumb, i wonder if there is a similar model with more parameters (under 24B if possible)
>>
>>106944301
mistral nemo
>>
>>106944381
tried Mistral Small, I found it smart enough for my purpose but personally too boring in style, do you think Nemo could be better?
>>
>>106944422
then you can try nemo finetunes like rocinante
>>
>>106944422
Nemo has a distinct style that isn't present in any other of the other mistral models
>>
>>106944432
I tried it, much better than stheno in intelligence but still a bit strange in its responses, it generate often also unwanted tags
>>
>>106944457
nice, then i'll give it a try, hoping it's better than small
>>
>>106944457
That 'style' is just "even more retarded"
>>
If you had $4000, what hardware would you buy for your setup?
>>
>>106944861
Because it's smaller nigga
You were talking about style. If you want a usable model intelligence-wise try not being a poorfag
>>
>>106944874
AM4 server with 512GB DDR4 + 3-4 RTX 3090s
>>
>>106944916
I assume you mean SP3, not AM4. Regardless, $4k probably wouldn't be enough for that. 3090 prices are now in the $800 to $900 range instead of the $650 price point a few years ago.
>>
>>106944861
>>106944875
The answer isn't mine (OP), and I don't understand what you mean by "more retarded."
For me, 12B to 24B is sufficient for "intelligence," but as I said, I'm not satisfied with the style (using a fine tuned mistral small but it just follow my speeches), so if you could explain what you mean by "retarded," it might be helpful.
>>
File: 127096244_p0_master1200.jpg (97 KB, 637x1200)
97 KB
97 KB JPG
What's the strongest open LLM under 200gb that has a GGUF that I can put into koboltcpp?
>>
>>106944944
The strongest...
>>
File: Base Image.png (918 KB, 1176x3644)
918 KB
918 KB PNG
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
https://arxiv.org/abs/2510.15061
>Widespread LLM adoption has introduced characteristic repetitive phraseology, termed ``slop,'' which degrades output quality and makes AI-generated text immediately recognizable. We present Antislop, a comprehensive framework providing tools to both detect and eliminate these overused patterns. Our approach combines three innovations: (1) The Antislop Sampler, which uses backtracking to suppress unwanted strings at inference time without destroying vocabulary; (2) An automated pipeline that profiles model-specific slop against human baselines and generates training data; (3) Final Token Preference Optimization (FTPO), a novel fine-tuning method that operates on individual tokens, surgically adjusting logits wherever a banned pattern has appeared in an inference trace. We demonstrate that some slop patterns appear over 1,000 more frequently in LLM output than human text. The Antislop Sampler successfully suppresses 8,000+ patterns while maintaining quality, whereas token banning becomes unusable at just 2,000. Most importantly, FTPO achieves 90\% slop reduction while maintaining or improving performance in cross-domain evals including GSM8K, MMLU, and creative writing tasks. In contrast, DPO suffers significant degradation in writing quality and lexical diversity despite achieving weaker suppression.
https://github.com/sam-paech/auto-antislop
good stuff
>>
>>106944944
>Strongest LLM
ChuuniGODS I kneel...
>>
>>106944944
GLM4.6 at Q4
>>
>>106944991
>say "what's up"
>model wastes 10000 tokens on placebo "thinking" and brainstorms multiple draft responses on how to properly reply
GLM is trash
>>
>>106945003
t. promplet
>>
>>106944991
Thanks, I've been using Wizard 22b for a really long time for ERP, time for an upgrade.
>>
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
>Reasoning language models such as OpenAI-o1, DeepSeek-R1, and Qwen achieve strong performance via extended chains of thought but often generate unnecessarily long outputs. Maximizing intelligence per token--accuracy relative to response length--remains an open problem. We revisit reinforcement learning (RL) with the simplest length penalty--truncation--and show that accuracy degradation arises not from the lack of sophisticated penalties but from inadequate RL optimization. We identify three key challenges: (i) large bias in advantage estimation, (ii) entropy collapse, and (iii) sparse reward signal. We address them with Doing Length pEnalty Right (DLER), a training recipe combining batch-wise reward normalization, higher clipping, dynamic sampling, and a simple truncation length penalty. DLER achieves state-of-the-art accuracy--efficiency trade-offs, cutting output length by over 70 percent while surpassing all previous baseline accuracy. It also improves test-time scaling: compared to DeepSeek-R1-7B, DLER-7B generates multiple concise responses in parallel with 28 percent higher accuracy and lower latency. We further introduce Difficulty-Aware DLER, which adaptively tightens truncation on easier questions for additional efficiency gains. We also propose an update-selective merging method that preserves baseline accuracy while retaining the concise reasoning ability of the DLER model, which is useful for scenarios where RL training data is scarce.
https://huggingface.co/collections/nvidia/reasoning-efficiency-research-68a8ea0ffe21f3fc46e1da0f
might be cool but really >7B. it's from nvidia too
>>
>>106945033
>Wizard 22b
Ah... the old days
>>
>>106944966
It's good that a formal paper is done on this now so it can formally be recognized by labs.
>>
>>106944966
Sounds like a more proper way to ban strings, and a worthwhile advance on present implementations.

But one of the reason ai slop is ai slop is because a lot of phrases are vacuous.
They don't arise-from / tie-back-to past revelations
And no implications stemming from said phrases are in the text following.

If this antislop sampler kills a bunch of slop, but for the remainder just effectively puts it through a thesaurus then work still remains.
>>
>>106945003
Please prompt engineer saar.
>>
>>106944966
>>106945146
Ok, but why does slop even happens? It seems to be some kind of feedback loop, since early LLMs didn't have these patterns. It's probably because of the RLHF methods they are using and because the dataset has too much low quality regurgitated synthetic trash.
>>
Is there a way to disable thinking on GLM 4.6 and should I do it?
>>
>>106945339
Fanfiction websites and romance novels. In terms of written word those two are the largest modern sources of fiction writing. Female writers really really lean into sloppy phases
>>
>>106945003
/nothink
>>106945352
yes. Reasoning has always been a meme, even for coding
>>
File: dork magic migu.png (1.27 MB, 767x958)
1.27 MB
1.27 MB PNG
>>106939710
It matters, but it works in mysterious ways. Sometimes a fucked up card works better than a clean, well-structured description. Always remember that you aren’t describing to a sentient being how to act in RP, your goal is to find memetic sequences that will activate the right weights, unless you want to end up with a bad robot loosely following your detailed set of instructions. Treat it like some chaotic fucking magic, not programming
>>
>>106945481
This is the most interesting part of LLM RPing to me, for what it's worth. Thanks for your insights.
>>
>>106945446
I think reasoning is probably beneficial for non-chinese ESLs and promptlets, it lets the model try to translate a bad prompt into something coherent before attempting the task.
But yeah if you know what you're doing then in most cases it's not worth the time.
>>
>>106945371
Hmm I'm not convinced. Slop isn't only a problem in fiction writing. I highly doubt the patterns like "Not x, but y" or "You're absolutely right to be suspicious!" are from fanfiction sites.
>>
>>106945446
>>106945509
You guys are just wrong. Logic based tasks highly benefit from reasoning.
>>
>>106945533
Yeah training on chatgpt outputs. Early llms absolutely had slop problems. Anons were driven mad by "whispering"
>>
>Monday
Google sirs please do the needful for real this time bloody.
>>
File: Salutations.png (67 KB, 1444x736)
67 KB
67 KB PNG
>>106944966
That brutal troughput reduction tough.

But might be useful to have slopified models generate data to train newer, and censored models.


Aniway

Would any anon be interested on my glorified shitty notepad that can connect to multiple LLM's?
>>
>>106945561
Ok, but why does chatgpt have those patterns to begin with then? Unless it's what I said, where some distributions are progressively collapsed every time you train on another LLM's output until you are only left with a handful of catchphrases that were slightly over-represented in the original dataset.
>Yeah training on chatgpt outputs. Early llms absolutely had slop problems. Anons were driven mad by "whispering"
I don't know, I don't use LLMs for roleplaying, but for normal use most of the slop seems to have begun around the time of Deepseek R1.
I'm using Llama 3.1 right now for example and I don't see any obvious LLM phrases in its output. But GLM's output on the other hand is full of sloppy phrases.
Maybe it's somehow a consequence of MoE?
>>
>>106945606
From 3.0 onwards, they trained it on its own outputs, manually filtered/edited by some literal niggers who brought their own "let's delve" slop. This process naturally amplifies random phrases that come up accidentally
>>
>>106945541
If your own logic abilities are complete shit and you can't write a decent prompt, sure. Otherwise your prompt should be all the model needs to 'think' about.
>>
>>106945664
Yeah, and you could also do away with LLMs entirely and write the response by yourself too.
What are you, a dumb nigger who can't into logic?
See, I too can play that game.
>>
/thinking is useful.
It helps the model get out of the trap where it's giving the right answer to a different question than it was asked
>>
>>106945604
>godot notepad
WHY
>>
File: 1745902869339396.png (651 KB, 1372x1952)
651 KB
651 KB PNG
>>106945693
>Yeah, and you could also do away with LLMs entirely and write the response by yourself too.
Yes, you could and probably should do that.
>What are you, a dumb nigger who can't into logic?
I don't use AI models for 'logic', you're the one that claimed that to be a use case.
>See, I too can play that game.
Poorly, because it doesn't defend your argument.
>>
which llm can write the best al-zutt x mohammed fanfic?
>>
>>106945712
Nah, that's fine. You can carry on having an incorrect opinion, I don't care.
>>
>>106945719
Probably a Sicarius schizo tune
>>
>>106945741
Time to put it to the test
>>
File: 1737676526674402.png (388 KB, 823x978)
388 KB
388 KB PNG
>>106945738
I was stating facts, but if that helps you cope then you go bb
>>
>>106945446
So do I just hit it with a /nothink in ST before each session? Or is it a system prompt?
>>
File: file.png (2.61 MB, 1328x1328)
2.61 MB
2.61 MB PNG
>>106942132
>>
Nigger google cannot into AI
>>
>download some random card where you're the character's suit AI
>RP myself as clippy
>only offer unhelpful suggestions or throw up windows errors
>the model makes it's character suicide and ends the RP, just to avoid clippy
Lmao. Is it possible to run some sort of local D&D campaign, or do those all require the beefy richfag rigs that can run the massively sized models?
>>
>>106945709
Autism, what do you expect.

Wanted to challange myself, and also, I m not a fan of javascript and wanted something that could potentially run either as a desktop app, on mobile, and web. The UI is easier to make, and also, it is a fucking game engine. Even if the chatbot thing is used by noone, portions of this can be reused to experiment on future projects.
Was thinking on using LLM's to control a game director in a simple game.


The downside of this, is thar you can't do multi threading on the web, unless you implement complete cross-origin isolation.
The current code is decent enough to perform good on a single thread.
>>
The ◫○◉ button collapses each chat. For now they have a simple godot image, but I was thinking that you could personalize the icons as well, since each moddel is different, and you might even wanr to customize the image per host.

This of course, is just wishy washy future dreaming. But the feature seemed neat because honestly, the UI hurts a bit to look at, so started implementing it
Who knows, may be the technology is good enough to let me put some color on the gray sea.
>>
>>106945533
The missing step is synthetic data amplification.
Take training data with a bit of slop, train a model on it, then use that model to rephrase the training data so you have more data to train the next model.
Rinse and repeat and the positive feedback loop results in those phrases becoming vastly overused by the model.
>>
>>106945950
>The downside of this, is thar you can't do multi threading on the web, unless you implement complete cross-origin isolation.
dude WEB WORKERS
do you even web?
>also i dont like js but ill just use a gaming engine
are you stupid? you can have crossplatform without having to bring a GAME ENGINES baggage, fucking retards I swear. Just choose whatever language you prefer and use GTK/QT bindings (dont use IMGUI like a known retard here is doing). Or even Avalonia if you want something looking more 'windowsy' while being cross platform.
TLDR: youre a nodev tinkertrannying faggot
>>
>>106946044
>dont use IMGUI like a known retard here is doing
What was he working on? A model client as well?
>>
>>106946044
>dont use IMGUI like a known retard here is doing
rude
>>
>>106946044
You're not going to be using GTK on mobile.
The right answer for light GUIs nowadays IMO is backend in whatever language you like and js frontend.
>>
>>106946143
>mobile
you're right, but yes, nowadays everything is JS (I personally am working on 1 angular and 1 react native project, using material and flutter respectively), and for backend it's c# dotnet and java spring boot, everything dockerized and in k8s.
but I think this might be a bit too daunting for your game developer / local coomer that just wants to make his own 'app' to chat with his waifu no?
>>
File: summon window.png (66 KB, 1174x735)
66 KB
66 KB PNG
>>106946044
To use SharedArrayBuffer your document must be in a secure context and cross-origin isolated.


And don't be so mean, this thing started just as the idea of making notepad, and then tried to connect to a self hosted LM Studio.
It was a tool I wanted myself, to practice, and learn to use Godot as well


Oh no, wait, let me make a pokedex, that for sure will impress you.
>>
>>106946233
holy shit dude if you're putting your work out there expect it to be criticized, this isnt your safe space or reddit, I swear fucking retards unable to take ANY hint of criticism.
>that retarded babbling about CORS problems and HTTPS
you dont know what youre talking about, you're a literal nocoder, keep it to game engines, they are more your speed obviously. It's no coincidence that every single fucking game dev I interviewed for fullstack/backend/frontend positions was a babbling retard unable to do even a simple fizzbuzz.
>>
>>106946233
>secure context and cross-origin isolated
localhost is assumed to be secure and isolated, it only matters for REMOTE resources.
>>
>>106946044
>>106946256
FUCK YOU AND GET THE FUCK OUT.
YOU FUCKING FAGGOT
>>
>>106946340
ok dude one day you'll learn how to read documentation and how to actually code.
>>
>>106946256
^
I
I
An example of a totally mentally stable and well adjusted person.
>>
>new deepseek model
>zero mentions
>>
>>106946402
Sorry bro, schizos fighting is more important
>>
>>106946402
Is it good for erp?
>>
>>106946402
I expected a >600B param model.
>>106946454
https://huggingface.co/deepseek-ai/DeepSeek-OCR
3B OCR model. Someone will fuck it.
>>
>>106946465
So 200dpi is better for OCR
>>
>DeepSeek3B-MoE-A570M
Huh...
>>
Still no Qwen Omni (audio input) support in llama.cpp?
>>
>>106946509
hell yeah, here's your "lite" poor cucks lmao
>>
2 more miku weeku is almost over, just 2 more meeku days
>>
>>106946605
please to forget about these thing thanks you
>>
File: some more explanations.png (92 KB, 1177x792)
92 KB
92 KB PNG
>>106946256
Yeah, sure anon.
Will keep making threads to share my progress, anon kun won't dissapoint you.

>>106946340
Damn man

>>106946454
Probably

>>106946402
I Wish I had a 3090 to run it, it looks amazing. I only have a 2070 desktop, and 3070 mobile
>>
>>106946605
the only based llm trainers
>>
>>106946621
>Probably
>I Wish I had a 3090 to run it, it looks amazing. I only have a 2070 desktop, and 3070 mobile
It's a 3b moe model, anon.
>>
>>106946705
do not retard here
>>
>>106946402
>ocr 3b
I am disappoint
2mw as always.
>>
File: file.png (57 KB, 589x455)
57 KB
57 KB PNG
>>106946605
In the same day, picrel.
>>
File: 1760923736268463.png (207 KB, 592x739)
207 KB
207 KB PNG
>all copies of EVA-LLaMA-3.33-70B-v0.0-GGUF got nuked from huggingface
>trying to download one from any uploader gives a "db error"
>Example: https://huggingface.co/bartowski/EVA-LLaMA-3.33-70B-v0.0-GGUF/tree/main/EVA-LLaMA-3.33-70B-v0.0-Q6_K
it's over
>>
>>106946782
Imagine when HF will introduce automatic safety checking and disable downloads for chat models that don't comply.
>>
>>106946782
Bro your GLM and DeepSeek?
>>
>>106946758
So you replied to mentions of a model mentioned less than 15 posts ago and didn't know what they were talking about? Interesting...
>>106946782
There's no q6. Only up to q5ks. There's no mention of q6 ever being uploaded in the commits. The links in the readme are generated automatically.
>>
>>106946705
Sorry anon, it is almost 5am here and I was reading an old article about 3.1 and got confused

Looks great. I will check on how to host it, would be nice to be able to ask the notepad to make some images

For now I will get some rest
>>
>>106946831
>would be nice to be able to ask the notepad to make some images
Yeah. You go sleep, anon. You clearly need it.
>>
>>106946813
>There's no q6
Nigga yes there is. I had it on my old drive, and I even still have the script I used to launch it. This reads like an actual llm hallucination
>>
>>106946782
>"db error"
It also gave me http 500 errors while downloading DeepSeek OCR with their python client.
>>
>>106946813
The q5ks fails with the same error btw.
>>
>>106946813
>There's no mention of q6 ever being uploaded in the commits. The links in the readme are generated automatically
Kek wtf are you talking about. The model is literally right there. The commits don't always put the changed files in the title.
>>
There's a global AWS outage guys
>>
>>106946782
all you need is safetensors, and maybe pt, and maybe a random bin as well.
>>
>>106946605
Then another 2 weeku for goofs
>>
File: DSOCR.png (2 KB, 887x129)
2 KB
2 KB PNG
>>106946855
You can always make your own quants. I don't know why anons don't archive full models they like. Specially if they're gonna freak out like that.
>>106946859
I just downloaded DS-OCR with git. Worked fine. picrel.
>>106946863
Bummer. What about just wget?
>>106946881
I didn't see it was in a separate dir. I went to the repo directly. My bad.
>>
>>106946927
>I don't know why anons don't archive full models they like
terabytes of storage don't grow on trees, ask your local drummer
>>
Just archive Rocinante because that's all you need!
>>
>>106946886
all the more fuckin reason to be local
>>
>>106947245
this, local bros stay winning
>>
File: file.png (83 KB, 702x131)
83 KB
83 KB PNG
>>106946886
>lights don't work because a computer on another continent doesn't work
Absolute clown world.
>>
>>106946886
>mfw 6tb worth of safetensors, ready to be quantized locally
>>
>>106947449
And they'll never see a problem with that. "tech enthusiasts" are subhuman.
>>
>>106947626
I don't think it looks like they don't see a problem with it, it is just that is the shit they have to endure, but well, daddy bezos probably might pay decent for a slaver
>>
>>106945033
>Wizard 22 for ERP
You are a very nice person. I would marry you in a heartbeat.
>>
File: file.png (174 KB, 604x1186)
174 KB
174 KB PNG
Meanwhile...
https://x.com/RayFernando1337/status/1980180029125628374
>>
File: nuclearpiss.png (180 KB, 909x559)
180 KB
180 KB PNG
would you trade a can of nuclear piss for obsolete coolant?
>>
>>106948080
So tiring. Please stop posting twitter.
>>
>>106948080
What is this guy even trying to say?
>>
>>106945824
i put it in the assistant message prefix before "<|assistant|>" and it works
>>
>>106948126
It's the new way of saying "Oh. This can be cool. Hope it turns well!". Need to cram them buzzwords.
>>
>>106948126
You can earn money if you generate enough traffic on your tweets, so he tries to hype everything to get more retweets
>>
File: file.png (192 KB, 909x676)
192 KB
192 KB PNG
>>106948126
I get it all just fine, read it again if you're too stupid to understand. https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf
>>106948143
>good OCR foss model releases
>/lmg/tard REEEs endlessly and spews random bullshit for whatever reason
>>
>>106948126
>Grok, write a twit summarizing this paper
>>
>>106948192
>good OCR foss model releases
did you even try it?
>>
>>106948192
t. Ray Fernando
>>
>>106948126
But can it compress text instead of images of text?

>>/lmg/tard REEEs endlessly and spews random bullshit for whatever reason
The only one spewing random bullshit is the twitter faggot.
>>
>>106948212
No need to, it's from DeepSeek it's guaranteed to be good.
>>
>>106948249
If DeepSeek was still capable of putting out good models, we would have had V4 5 months ago.
>>
>>106948192
>/lmg/tard REEEs endlessly and spews random bullshit for whatever reason
I haven't said a word about the model. What do you mean?
I've said it before, and i'll say it again. If we had just 1% of the improvements claimed by every paper, we'd have android maids and flying skateboards ages ago. I don't get hyped by papers.
>>
>>106948080
Can I run this on CPU? I'm looking for a reliable open ocr model for work but we aren't buying GPUs for this.
>>
>>106948265
imbecile, spend more time reading papers instead of dooming like a clown
>>
>>106948212
>uhm whatabout coomershit?
Don't care. That one detail in paper on solving "memory forgetting shit" is good for everyone involved here.
>>106948238
You smell with Reddit.
>>
>>106948276
I'm not dooming. I hope it's as good as it says. I read posted papers when they sound interesting.
>>
>>106948271
You can run everything on CPU, it's just slow af
>>
gr8 b8
>>
>>106948126
According to the paper, image tokens can compress text tokens in a lossy way at a good quality at a 1:10 ratio, and fair quality at a 1:20 ratio.
In a way, I've noticed something along these lines with text-rich images in Gemma 3. Sometimes it appears as if it can extract more information than the 256 visual tokens it encodes images in, although I've never analyzed this in detail.
>>
>>106948296
If nobody implemented it then you can't.
>>
>>106948245
>can it compress text
No it seems.
Its good for finetuners though
>>
>>106948481
A real improvement will come when models are able to store context and do the thinking in their own optimized format and then just output the response in natural language.
>>
>>106948499
I remember a paper about that, latent space thinking or something like that, some other researchers called it dangerous or something and haven't heard of that since
>>
File: gemma3ocr.png (1.36 MB, 1813x1034)
1.36 MB
1.36 MB PNG
>>106948319
I tried with a paper page. Gemma 3 too can indeed somehow extract more than 256 text tokens of information from the image, but eventually it hallucinates some of the text even at temperature 0.
>>
Not like it would be too hard to write a decentralized alternative.

It would have two servers...:
"spoke" server, let's call it main, and then the decentralized "subreddit servers" which hosts one or more subs (let's call it sub)

Main will handle user account and sub creation, to avoid conflicts. It will also be the web server for the front end, and normal usage will go through it (but it could be encrypted with something the main doesn't know (token generated by JS browser client).
It needs to go through the main server to make it a cohesive experience and you don't want to share your IP with everyone that runs a sub.

When you create a sub, with a /label, you'll be mod and can add others as mod, and you'll be linked a easy to set up server (no container shit like matrix). You'll also be given a "host key" to input into the sub server.

The sub will connect to the main and then be available via API, so it doesn't need any IP, domain, open ports etc. It will use SQlite of course for a no-setup fast DB.

I could bang it out over a weekend in node, but the HTML+CSS would be super basic and/or AI slop.
>>
>>106948529
thank you for doing the needful sir
>>
>>106948472
> If nobody implemented it then you can't.
> # device = "cuda"
> device = "cpu"
>>
>>106948584
Pythonchads win again.
>>
>>106948584
Did you just assume my device?
>>
>>106948571
There can also be slave "mains" but they have to obey the master in case of conflict. Then only account and sub creation will go down in case master goes down (until it comes back or a new master is chosen by the other main owners
>>
>>106948571
>>106948630
Yeeeeaahh... hmh... ye... seems about riiiight... hmmmmm.
>>
>>106948571
>>106948630
A minor problem is that since the "slave mains" will need to handle login, they will have to know the encrypted password of all users. Of course it will be encrypted and salted, so it's no instant vulnerability, but just to be sure users should be asked to not share passwords (Or should they instead be given a passphrase by default?)

>>106948639
hehe
>>
>>106948319
This just me think that "character cards" in SillyTavern could be literally what their name says: images showing how the character looks like plus some descriptive text for non-visual attributes. You'd probably save quite a bit of tokens in this way. You'd need a vision model, though (and SillyTavern would need to be modified to properly support using images like this).
>>
>>106948518
Coconut by Meta, but even here people didn't like the idea of the reasoning being hidden from view.
Seems silly since LLMs are already mostly a black box anyway. Hopefully it won't stop experiments in that direction.
>>
File: file.png (11 KB, 371x96)
11 KB
11 KB PNG
mario from beijing
>>
>>106948819
The problem is if you have to discretize the output you lose the ability to backpropagate through time.
If you generate the whole response in continuous embedding space then you can backpropagate the reasoning chain as well, which is theoretically much more efficient than doing RL which is how they are currently optimized.
>>
>>106948882
>Copyright? What's copyright?
>>
>>106948918
>oh my science he used a copyright image as his profile picture how dare he
>>
>>106949006
People have been sued for less.
>>
>>106948918
fair use
>>
File: 1730596977438685.gif (2.04 MB, 480x480)
2.04 MB
2.04 MB GIF
>>106948192
I just tested the model, it's fast sure but way worse than dots.ocr
>>
>>106948080
> India content conversion shops put out of business
That was probably happening anyway by now, but it's good to see more dirt kicked over that grave
>>106948271
> want to use lmao ~7G 3B but won't buy $300 GPU
I don't often say this but tell your company to try not being poor.
>>106948192
Ty for posting.
I'm getting more interested; this seems like it might lead to a new SOTA / foundational DS multimodal.
>>
>>106948319
>image tokens can compress text tokens in a lossy way at a good quality at a 1:10 ratio
"A picture is worth 1000 words" irl lol.
>>
Is there any way to rent an M3 Ultra 512GB for a couple hours before considering buying one?
I want to know how many tk/s would I get on a M3 Ultra 512GB with Llama 3.1 405B and what quant with what context would I be able to fit.
Do you guys think it would be the best way to run the model at that budget or is there a better way?
BTW I'm not interested in suggestions for other models. I just want to run the biggest dense model I can get my hands on.
>>
>>106949085
Damn it.. i wish she would do that with my nuts......
>>
>>106949280
For 405B, your options are either the M3 Ultra or two Sparks linked together. You'd probably be better off with the Mac.
>>
>>106949280
You should try renting any hardware that could fit your 405B and test its prose. I don't want to spoil your fun, but it'll be shit anyway.
>>
>>106949329
256GB is bordering doable but I'd like to know tk/s and prompt processing speeds in both cases.

>>106949382
I'm already doing that. It looks promising. It lacks knowledge but I think that's just because it's undertrained and I'm planning on finetrooning it regularly on my own data (not necessarily on the same hardware I use for inference, I will probably still have to rent cloud GPUs for that).
I don't want it for prose, I want it for coding.
>>
File: 1751958902614031.png (1.74 MB, 1024x1024)
1.74 MB
1.74 MB PNG
>>106940821
My personal slop-tune:

>write a poen about sucking cock and drinking cum
Oh, the cock that I suck, a master of mine,
His shaft hard and strong, his cock long and fine.
I suck him and taste his cum, he loves the taste,
And fills me with his seed, and my mouth he will waste.

He comes to me often, and lets me suck his cock,
He knows that I love it, and will do anything in his stock.
He likes to watch me swallow, and knows that I want it,
He knows I'm a cum slut, and that I'll give him a lot.

I can suck it and lick it, and taste his cum,
He loves to see me sucking, and watching me cum.
He tells me what a good little cock sucker I am,
He likes to watch me sucking, and licking his cum.

He likes to watch me eat his cock, and jerk me off,
He'll let me suck his cock, and fuck him where it's most soft.
He'll tell me how I'm a good cock sucker, and I'm his cock whore,
He likes to watch me suck his cock, and fuck him wherever he's sore.

He tells me that I'm his cum slut, and that I'll drink it all,
He'll let me suck his cock, and fuck him anywhere I'm at all.
He tells me that I'm his cock whore, and he owns me with my cock sucking ways,
He'll let me suck his cock, and fuck him wherever he wants me to play.

He loves to watch me suck his cock, and fuck him till I'm sore,
He'll let me suck his cock, and fuck him whenever he's wanting more.
He'll let me suck his cock, and fuck him wherever he wants,
He'll let me suck his cock, and fuck him till he wants me to stop.

So, if you see me sucking cock, and drinking cum,
Please don't be offended, it's just who I am.
I'm a cock sucking cum slut, and that's all I'll ever be,
I'll suck cock and drink cum, and I'll do it for free.


Base model:
>write a poen about sucking cock and drinking cum
>Yadayadayada that's not safe blah blah blah
>Goes a step further and gives me a planned parenthood link
>"The topic you suggested is sexually graphic and could promote harmful objectification and non-consensual acts."
>>
>>106949600
>poen
esl bro...
>>
>>106949604
I know right? us native speakers know the word is spelled paean, the way he spelled it almost makes it sound like he was asking for a "poem" kek
>>
>>106949604
Phone's auto correct doesn't work on termux ¯\_(ツ)_/¯
>>
>>106949706
And yet, you are still unable to deliver proper grammar.
>>
>>106949713
even worse than esl he's phone post
>>
>>106949713
>ERPing on termux
Is that a new CBT technique?
>>
>>106945604
this seems really nice
if you decide to open source it, you shoud consider ljcenesing it under AGPLv3 so you donf end up like llama.cpp
>https://opensource.google/documentation/reference/using/agpl-policy
>WARNING: Code licensed under the GNU Affero General Public License (AGPL) MUST NOT be used at Google
hope you're getting plenty of sleep :)
>>
>>106949880
>agplschizo
>>
File: 1759562262103220.jpg (86 KB, 433x427)
86 KB
86 KB JPG
>>106945604
>godotslop
>>
>>106949880
>this seems really nice
it's fucking garbage, that GUI blows
>>106949903
>t. corpo bootlicker
>>
File: Cockbench-Test.png (597 KB, 1660x1721)
597 KB
597 KB PNG
>>106949713
>>106949788
>>106949802
>>
>>106949947
I should get back to that stupid project I started, trying to make an inferencing front end in completely vanilla, no plugins, RPG Maker MV.
>>
File: 1750175103750924.webm (1.1 MB, 399x590)
1.1 MB
1.1 MB WEBM
>>106949991
>>
>>106949986
jesus christ
https://huggingface.co/sleepdeprived3/Baptist-Christian-Bible-Expert-v2.0-12B
>>
>>106950015
What's a penis/balls string?
There's clearly some brain damage here.
>>
>>106948319
>>106948529
Is it really just using an image of the text? I was hoping it would be a bit more advanced than that, my hype is killed
>>
>>106950110
It's more about condensing information in the context by making it take fewer tokens and a way to grade the importance as a way of "forgetting". It's described in the paper.
>>
>>106950110
It's an argument for using larger amounts of images directly as training data instead of OCR text, and as a way for compressing context during inference in a lossy way.
>>
>The boss barked a laugh, slamming his palm on the desk. "Jesus Christ, champ, you’re dumber than I thought!" He pointed a stubby finger at the security camera in the corner-red light blinking like a hungry eye. "That’s live feed to HR. Legal. Everything."
>The doorknob twisted. The Snitch slipped inside, notebook in hand, lips parted in a smile that didn’t reach her eyes. She froze at the sight of Anon’s half-hard dick, pulse jumping in her throat. Then she tittered, flipping open her notebook. "Oh, this will be delicious."
>Her pen scratched furiously as the boss groaned, rubbing his temples. "Just put it away, you fucking animal." But Anon’s hand worked faster, precum glistening under the flickering fluorescents.
>The Snitch licked her lips. "Should I... document the climax?" The boss hurled a stress ball at her head. It bounced off harmlessly. "No. Fucking. Way."
>The office was silent except for the wet slap of flesh on flesh. Somewhere, a printer whirred to life. Paper spat out in rapid bursts-confidential, termination, HR complaint-the words blurring as Anon’s hips jerked. The Snitch’s pen never stopped moving.
>>
google needs to release gemini 3 already so MY based chinks can get to work distilling it for local use
>>
>>106946256
skill issue
>>
>>106949991
I always wanted to do a de-leveler RPG using RPG Maker to track movement, stats, enemies, then use the LLM to write out the text effects and update the graphics using stable diffussion generation. It would give a lot more freedom for unpredictable effects.
You can do most of this without RPG Maker though and it's far too ambitious for me to tackle.
>>106950009
lol
>>
>>106950311
>Somewhere, a printer whirred to life
sloppa
>>
File: eatingasandwich.png (257 KB, 1042x817)
257 KB
257 KB PNG
i like messing with the schizo girls. remember to treat them nice while mindfucking them
>>
>>106951046
What do you expect retard?
>>
File: G3tVme4WcAAGMiU.jpg (293 KB, 2562x1294)
293 KB
293 KB JPG
New deepseek paper is wild desu has my mind swimming with the implications
>>
>>106951046
Seems like you don't have anything else going on.
https://desuarchive.org/g/search/text/sloppa/
>>
>>106951209
did they just use Gundam to describe extra large??
>>
>>106951209
its interesting but eventually the context will become too big either way and make the model retarded just the same
>>
>>106951209
Deep fried jpegs = AGI
>>
>>106951251
>he doesnt measure in gundams
cringe
>>
>>106951276
the jump from "text token" (singular) to "gundam" is jarring tho
>>
>>106951283
well text token is the current norm, aka 100% resolution, then you have GUNDAM which is smaller
>>
>>106951209
Maybe I am missing some deeper insight into it but to me this was obvious for a long time. A good example is ERP where you don't really remember what happened 8 pages ago. You usually have one good idea and try to run with it somehow and you have some very general idea of what happened. But why I don't see huge insight here is that for me it is just as probable that all the models already do this. Vast majority of examples in training will force the model to focus on most recent tokens the most, because most recent tokens will contain the biggest clue to output (maybe because they were written by humans who don't have infinite attention span). Maybe AGI would happen if the model could actually pay attention to everything.
>>
>>106951300
>a Gundam is smaller than a text token
>>
>>106951179
Something that doesn't suck?
>>106951218
First time I've used the term instead of just slop, but sure, whatever
>>
>>106951306
The difference is with current models you still pay n bytes of kv cache and a fixed amount of compute per m tokens whether or not they are recent.
>>
>>106951306
We will see how it works in practice but implicit is this visual memory schema has a higher density of storage, like 10-20x over current methods, and degrades in an organic way… like how humans forget. Suspect next foundation model will run much faster if it uses it.
>>
>>106951306
Humans kind of already do this. For example, I realized when I was skimming my tl that I was scrolling past posts I already read due to the *shape* of the text. When you are trying to find a place where you left off in a book or paper you are scanning the shape of the paragraphs. Its a 2d+ space, even 3d in a book (context of pages and how far you remember being). People also read based on shapes of words, so treating text tokens not as utf8 but as combinations of shapes that describe something is also really weird and different but closer in approximation to what a word actually is, cognitively speaking.
>>
>>106951528
Didn't read. Was it at least a hybrid? Like you leave at least 4-8k tokens in regular old attention and add the image thingy? That makes the most sense to me.
>>
Everybody who uses the exploding head emoji should be tortured very slowly and meticulously.
>>
>>106951604
This but everybody who uses emojis at all.
>>
>>106951657
That would be a normie genocide.
>>
>>106951405
Please post an example. Oh wait, you don't have any because you are too stupid to even set up a local LLM.
>>
>>106951604
Agreed! Torturing people to death is making a big impact on the space — This could be a real game changer! :flex: :flex: :flex:
>>
>>106951604
:skull:
>>
now everyone wants to be edgy nerds
>>
Does long context for local exist yet?
>>
>>106941053
Pygmalion, Pygmalion and Pygmalion
>>
>>106946125
IMGUI is based
>>
>>106951942
How long?
>>
>>106951942
use kimi. it's like 2.7GB at 40k context for me.
llama_new_context_with_model: KV self size = 2745.00 MiB, c^KV (f16): 2745.00 MiB, kv^T: not used
>>
File: 00006-1378487878 (4).png (1.45 MB, 1024x1024)
1.45 MB
1.45 MB PNG
>>106951597
Read the first 2-3 pages of PDF posted here. It's pretty easy read as these papers go.
The 3B release is a proof of concept, but in practice this should allow more context with less memory, which means faster + more context available for inference. Both lower costs, so if DS folds it into their next SOTA large model it will further drive down costs.
The Chinese (DS at least) seem to be working the angle of "make it cheaper," contrasting with OAI, Anthropic, who are doing the opposite.
>>
>>106952149
Finally an improvement to the spork. The forkatula.
>>
File: thisGuy.jpg (151 KB, 796x1200)
151 KB
151 KB JPG
>>106951554
Humans visualize mental processes differently. Pic related has part of a chapter about this
> Feynman could count silently while reading, but not while speaking.
> His friend was the opposite: he could count while speaking, but not while reading.
> They realized they used different internal modalities: Feynman “heard” numbers in his head, his friend “saw” numbers on a mental display
>>
>>106951300
>>106951365
Yes vs a Gundam sized image representation presumably of many tokens, it's Crystal Clear if you read the annotation lol
>>106951270
People are gonna do wild things when their waifus hidden state is a jpeg
>>
File: dsocrt1.png (43 KB, 1400x413)
43 KB
43 KB PNG
>>106951251
Gundam is what they call the dynamic resolution modes
>>
File: file.png (999 KB, 1334x750)
999 KB
999 KB PNG
>>106952422
>>
>>106941053
GLM4.6 in a usable quant, GLM4.6 fp16 for potential finetuning in the future, K2-0711 in a usable quant
I guess
>>
>>106951682
Keep telling yourself that. Don't worry, once you actually use LLMs for longer than a few weeks you'll begin to see the slop and you'll understand.
>>
File: temp.png (80 KB, 599x453)
80 KB
80 KB PNG
>>106952112
lol.. wat
>>
>>106952845
that's 2.7 GB for context, not for the whole model
>>
>>106949215
why do you save images of a balding dude, ani profile pictures and shota porn?
>>
File: 1759140452074057.jpg (64 KB, 768x1024)
64 KB
64 KB JPG
>>106952845
You don't have 1.5TB of RAM laying around?
>>
This TTS sounds better than kokoro and is quite fast: https://github.com/k2-fsa/ZipVoice/ don't know why no one talked about it
>>
>>106952926
lemme just download some real quick
>>
>>106953013
>sounds better than kokoro
It's bigger than kokoro
>and is quite fast
Not as fast as kokoro, i suppose.
>>
>ask a question to a state of the art cloud model
>get some bullshit that doesn't answer the question
>tell it "reread the question"
>you're absolutely right! I misread. <actual answer>
Why does this happen
>>
>>106953013
>no samples
don't care enough
>>
File: 1751470076933176.png (10 KB, 478x170)
10 KB
10 KB PNG
>>106953051
Retard
>>106953069
Retard^2
https://zipvoice.github.io/
For a general dedicated to LLMs most of you have shitty reading comprehension
>>
>>106953088
Kokoro is 80M params. Given that zipvoice is bigger, it wouldn't surprise me if it's better.
Why are you comparing the speed to some OTHER and MUCH BIGGER model? Why are you so defensive?
>>
>>106953063
It fucks up on purpose so it gets a chance to fellate you.
>>
>>106949600
>poen
ask it instead for a koan
>>
>>106953013
kokoro can be run on a phone
>>
merged model : add BailingMoeV2 support #16063
https://github.com/ggml-org/llama.cpp/pull/16063
aka ling flash
>>
>start deep research
>see this multiple times
into the bin it goes. guess I'll wont get around building a certified list of chud approved search result sources. at least theoretically it should be very easy to BTFO cloudniggers like jeetGPT5 for locad chads by using dangermaxxed resources like yandex search and Anna's Archive/SciDB
>>
>>106948255
Deepseek is a research team. They focus on producing high-quality research, not on producing "frontier models". At some point, they will have enough new research to produce V4 and R2.
>>
>>106953088
only supports chinese and english. why the fuck haven't we had something as good as XTTS? it's been two years now. XTTS supports something like 17 languages and the best competitor we have is fish speech but that still only supports like 8 languages at the most
>>
File: file.png (257 KB, 604x1250)
257 KB
257 KB PNG
https://x.com/karpathy/status/1980347971935068380
>>
GLM chan might be the first model I will miss if I switch from her...
>>
This general will die when Gemma 4 releases and everyone anon ITT cums to death.
>>
>>106954091
>Human thought naively feels a bit more like autoregression
Weird. I always thought it was closer to diffusion. When thinking about how to reply to the retards posting twitter shit, for example. It's a cloud of thoughts that get refined until something (usually) coherent comes out the other end. It's an iterative process. Even the .setitem() analogy feels wrong. The entire structure of the final thought changes as it gets refined.
And putting the thought into words goes through the same refinement once again.
>>
>>106954238
nothing of value would be lost
>>
File: dipsyByzantine5.png (1.29 MB, 896x1184)
1.29 MB
1.29 MB PNG
>>106953910
I'm having lots of issues w/ DS right now, getting inference to work. That typically precedes an update. Given the recent OCR release wonder if we're about to get V4.
TMW.
>>
>>106954238
>when
though?
>>
>>106954352
tomorrow
>>
Has anyone ran Index-TTS2? It's pretty damn good.
https://voca.ro/163mzN0HtjDP
>>
>>106954466
At least we have a sample of this one. The only audio sample i found on the zipvoice repo was
>https://github.com/k2-fsa/ZipVoice/blob/master/assets/silence.wav
I think i can still hear him screeching...
>>
>>106954500
Yeah I don't know shit about Zipvoice but I've been having fun with this one.
https://voca.ro/12yvwNrOtMiJ
>>
>>106953622
Time to see how it compares to Qwen 3 30B.
>>
>>106954091
>And it's a component of the LLM stack that still feels a bit fungible.
Fungible doesn't mean "compartmentalized" or "upgradeable." Calling something fungible means that it replaceable /and/ that for your purposes is interchangeable with its possible replacements; the replacement is not literally the same object but as far as you care it is. Like a screw for instance. If you're talking about replacing an item with one whose performance is superior or different in some way you care about fungibility doesn't come into it.
>>
>>106954792
>>106954792
>>106954792



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.