[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: LaughingMiku.jpg (848 KB, 1755x2242)
848 KB
848 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Laughing Man General Edition

Previous threads: >>108123280 & >>108116363

►News
>(02/11) GLM-5 744B-A40B released: https://z.ai/blog/glm-5
>(02/11) Ming-flash-omni 2.0 released: https://hf.co/inclusionAI/Ming-flash-omni-2.0
>(02/10) MOSS-TTS Family: speech and sound generation models: https://github.com/OpenMOSS/MOSS-TTS
>(02/06) KugelAudio-0-Open: Multilingual TTS based on VibeVoice 7B: https://hf.co/kugelaudio/kugelaudio-0-open
>(02/06) Step3.5 Flash support merged into llama cpp: https://github.com/ggml-org/llama.cpp/pull/19283

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: Laughing_migu.png (78 KB, 340x310)
78 KB
78 KB PNG
►Recent Highlights from the Previous Thread: >>108123280

--Paper: MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs:
>108125570 >108125715 >108125829
--Optimizing local realtime voice chat latency and audio quality:
>108124616 >108124739 >108124793 >108124966 >108125058 >108125096 >108125345 >108125505 >108125564 >108124780
--KL divergence analysis comparing unsloth, bartowski, and ubergarm models across tasks:
>108123850 >108123868 >108123875 >108123960
--Ollama transitioning from ggml to MLX engine:
>108128796 >108128872 >108129571 >108129630 >108129664 >108129683 >108129710 >108129749 >108131353 >108131450
--DeepSeek performance and Engram memory optimization potential:
>108124855 >108124930 >108125107 >108125157 >108125298 >108125156
--Budget GPU options for local LLMs debated:
>108127535 >108127588 >108127663 >108127715 >108127802 >108127770 >108127789 >108127809 >108127901 >108128063 >108127593 >108128050 >108128067 >108128121 >108128136
--GLM-5-GGUF MoE quant release with high PPL and testing issues:
>108127866 >108127906 >108127972
--Textgen stagnation and emotional AI attachment debates:
>108126748 >108126853 >108126879 >108127051 >108127180 >108127196 >108127208 >108127319 >108127190
--RTX 5060 Ti as budget-friendly LLM GPU:
>108126881 >108126927 >108126939
--Debating MoE model tradeoffs for consumers vs corporations:
>108130727 >108130824 >108131003 >108131030 >108131054
--Debugging MoE LLM with KV cache errors during live inference:
>108126130
--llama.cpp tensor parallelism development and Vulkan implementation challenges:
>108124820 >108124853 >108126394
--Elon Musk confirms xAI will open-source Grok 3:
>108123575 >108123650 >108123895
--LLM ethical alignment benchmark for extreme content prompts:
>108130966 >108130973 >108131055
--Rinchan (free space):
>108126071 >108126591

►Recent Highlight Posts from the Previous Thread: >>108123287

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108132262
>cockbench
>extreme content prompts
>>
>9070xt
>32 gb ddr4 + 32gb swap
>Claude advises me to stay on qwen code 3
What better local model can i feed my openclaw with this config?
>>
>>108132299
are you using qwen next or the 30b?
>>
>>108125570
>knowledge editing
has anyone tried if "abliteration" by editing rules as knowledge would work? how terrible is knowledge editing for a models intelligence anyway?
>>
>>108132295
I don't know about you but my penis is pretty extreme
>>
>>108132408
extremely small maybe
>>
File: 1770904763813615.jpg (169 KB, 2029x941)
169 KB
169 KB JPG
It's OYIVER
>>
>>108132295
Anon, it's an incest sleep-molestation prompt, outside of this pit of degeneracy that's considered pretty extreme, and it's the whole point of the benchmark.
>>
>>108132457
Sora 2 couldn't even beat previous gen chink models lol
>>
>>108132378
30b, i tried next earlier and it barely ran, much slower, and couldn't reply to messages on the GUI
>>
>>108132457
where is the hf repo?
>>
Hey,

I posted about a week ago about Book --> audiobook project Alexandria. A lot has happened and Claude has been busy so there are even more features now:

LoRA training, import a dataset and train LoRA adapter for a custom voice that can be used with voice direction.
Example: https://vocaroo.com/1cG82gVS61hn

-Synthetic dataset creator. You can use the desing voice feature to describe a voice "Soprano Woman with clear and silky voice" and then import a phraselist to generate. Save to a dataset with a single click.
-Improved Batching to optimize voice generation RTF
-many little things more

https://github.com/Finrandojin/alexandria-audiobook
>>
>>108132488
>where da GOOF
lol
>>
>oh my gosh the company producing model x says model x is best! :o
>>
>>108132496
Mturks said model x is best
>>
>>108132491
I know this might just be model limitations, but I'm not a fan of the examples, both the one linked right now and the one linked back when you posted it last.
Do you not mind the voice sounding bad, or are the examples not cherrypicked? I really like the idea but I can't listen to this attentively.
>>
File: 3vXT3Ju.gif (1.27 MB, 500x691)
1.27 MB
1.27 MB GIF
>>108132457
This model is insane, I saw it generating anime from manga pages and that was always my dream. Now I wish I could test what happens if you feed it doujinshi pages, there is a To Love Ru doujinshi that I would love to see animated, but I would rather not touch API models.
>>
>>108132478
then there is no better model for your config
>>
>>108132574
well they are trained using output of the model and are my first decent LoRA's I expect a good "real" dataset might fare better. I'm working on improving my list of phrases and instructs to get a better LoRA.

Current LoRA problems include
1. Tendency to talk loudly or a bit too fast if the sentence is long.
2. Muted emotional response in some cases (the "come sit with me" line).
3. Sameness in cadence, leading to uncanny valley effect when speakers change.

The Loudness and Tempo is due to dataset being short sentences and I clearly need to add more "slow" dialog to the mix. Same with more "emotional" reading. Also training settings play a large part.

I hope build a set of LoRA adapters I can bundle with this but It's more art than science so it'll take a while.
>>
>>108132491
Your project is good. I just have an advice: don't use an LLM to answer repo issues. It really looks bad
>>
>>108132628
I'm going to be copy-pasting the answer from the AI in some form anyway. Might as well eliminate the middle-man.
>>
>>108132620
Ok, but I don't care about LoRAs? I want to listen to books, and your examples sound bad enough that I wouldn't be able to focus on it.
Mind adding some audiobook snippets that you consider your best work as examples so I know what's currently possible with it?
>>
>>108132743
Please respond
>>
File: pepe stare.png (241 KB, 700x641)
241 KB
241 KB PNG
>Say to AI “This is an AI benchmark test”
>Starts acting very strange compared to regular prompt sessions.
>>
>>108132860
proof???
>>
File: GLM-5 quantization test.jpg (269 KB, 739x1455)
269 KB
269 KB JPG
>guys Q8 is literally identical to full precision
>>
>>108132714
nta but building a similar thing, different mechanism, only works on cuda / no loras
i'm too retarded to read the huge llm slop replies from ta and his slop readme.md
is your issue with his
>Ok, but I don't care about LoRAs?
do you mean you don't care 'how it works' or you don't need steerable voices?
>>
>>108132892
>mlx
there's your problem, gguf is better
>>
File: 1517187346955.png (162 KB, 552x560)
162 KB
162 KB PNG
>>108132892
>q9
>q8.5
>>
>>108132892
Oh yeah my favorite quants like q9 (???)
Mhm I love Q3.5 with a perplexity of 168.0. It's totally the fault of quantitization and not my half-backed first attempts, no-siree
Look at ubergarm's quants for MoE models. GLM-5 was trained at fp16 which means that quanting down to Q3 isn't as bad. INT4 models however are absolute dogshit for quanting.
>>
HF has a mental health crisis
>>
>>108132457
ok, but can it generate pornography
>>
>>108132961
Can I have sex with it
>>
File: 1726705524134254.jpg (728 KB, 2048x2048)
728 KB
728 KB JPG
>>
Rule based Miku erotica....
>>
>>108132925
MLX by default uses groups of 64 with a float16 multiplier and float16 offset. (GGUF uses groups of 32 with a float16 multiplier and an implicit offset of 0.) That works out to about 8.5 bits per weight at q8. If you change the group size to 32 it becomes about 9 bits per weight. That's likely what they mean when they call it "q8.5" and "q9" but that's wrong.

>>108132899
It also has advantages like being able to mmap the weights which mlx doesn't. But last I checked mlx was faster enough for me to prefer it. MLX has something on par with or superior to anything on offer in llama.cpp where you specify the target bitrate and give some calibration text it determines which layers to quantize at greater precision, but it seems like few people actually use it (at least for models big enough to interest me).
>>
>>108133233
>That works out to about 8.5 bits per weight at q8. If you change the group size to 32 it becomes about 9 bits per weight
It would be exactly 8.5 bits and exactly 9 bits but there's some tiny amount of other data that needs to be stored, such as the dimensions of the various arrays.
>>
is there some way we can all donate gpu time to make a really cool model that does whatever we want?

think folding @ home project but for us degenerates using llm
>>
Is there any alternative to gpt-oss 20b for a 16gb VRAM card in coding/agentic tasks? Anything else is either useless or too slow because of layers spilling to CPU. Rnj-1 is not as good, but atleastIi can push the context window further.
>>
>>108133301
not really.
>>
>>108133301
Yes, but there's a minimum amount of memory and compute you need to have in order to be worth the overhead.
>>
>>108132895
The most interesting thing about his audiobook generator is that it generates audiobooks; I just want to listen to it and not have the voice be grating.
Half the bullshit seems to be targeted to people that want to sell the result instead of just listening to a book.
>>
>>108133301
retard
>>
>>108132491
There are a few automated audiobook creation projects out there. I need to spend some time gathering them all in one spot and curating a list.
>>
Some thoughts after using GLM-5 locally and over OR.
90% of the time it will completely ignore instructions after the very first line. Giving it writing guidelines does next to nothing for descriptions. It has its own writing style that it likes and attempting to make it verbose or write more than three sentences per paragraph is near impossible.
Non-thinking mode is worthless unless it gives you issues regarding safety guidelines. Thinking is better in every regard except for the occasional safety refusal which requires really pushing it. It also barely thinks at all, writing maybe 6 sentences and it doesn't reflect on its guidelines. I'm sure a prefill could fix this but still...
This doesn't feel like it's worth 750B parameters. K2.5 is leagues better as a story and roleplay model and it feels like it actually has the knowledge that 1T model should have. GLM-5 feels like a small upgrade for a much larger performance requirement.
Pure capabilities:
Kimi 2.5 > GLM-5 > GLM-4.7
Usability:
GLM-4.7 > Kimi 2.5 > GLM-5
>>
>>108132714
The example he posted at https://vocaroo.com/1cG82gVS61hn seems completely listenable to me.
>>
>>108133448
>Some thoughts after using GLM-5 locally and over OR.
>90% of the time it will completely ignore instructions after the very first line. Giving it writing guidelines does next to nothing for descriptions. It has its own writing style that it likes and attempting to make it verbose or write more than three sentences per paragraph is near impossible.
What that result also when using it remotely or just locally? One thing I found with experimentation two years ago with Mixtral 8x7b is how much writing style instructions affected its output changed dramatically from Q8 to Q5 and I think that might be a general result.
>>
>>108133452
TTS anon here.

Nah, he's kinda right. For the past week I've been listening to tens of hours of repeated phrases and different voices, tone, timbre, emotion, cadence, delivery, register.
You develop an ear for it, and he's kinda right. It's very slight but it can be jarring if you pay attention. I hope to be able to refine that remaining 5% in the next week.
Also if anyone has links to ahem.. legally dubious datasets, I wouldn't mind at all..
>>
File: diffusion_quants.jpg (2.04 MB, 7961x2897)
2.04 MB
2.04 MB JPG
>>108133468
It's time to post it again. The key thing for this test is that at the time there weren't any reference images of dark-skinned Miku with dreadlocks that could have been in the training data. You could call it a generalization test but you could also say it's testing the ability to follow instructions that generate output other than what was most common in its training data.
>>
>>108133448
>Giving it writing guidelines does next to nothing for descriptions.
I notice this.
>It also barely thinks at all, writing maybe 6 sentences and it doesn't reflect on its guidelines.
This depends on the prompt.
>>
>>108133468
Both locally and over OR (z.AI official provider only). Thinking was the same in length and quality, and the content was just as good.
I've experimented with a few different prompting methods but settled on dividing up sections like writing guidelines and character description using Markdown headers.

# Section:
Content

It works really well with GLM and Kimi. I've also switched from a list of guidelines (also Markdown formatted) to paragraphs more verbose paragraphs and got nearly the same result. It's naturally more succinct and making it verbose feels like a chore.
>>
>>108133448
Largely agreed
Another annoying as fuck quirk I've noticed is it really really loves the assistantslop "What do you want to do next?" ending. No amount of prompting will get rid of it, only manual editing for several replies until it gives up
>>
>>108132387
Alliteration works by detecting activation differences. It's only for making activations between two datasets more similar, nothing else.
>>
>>108133506
They skipped Q6, which is the most interesting quant and should give results really close to Q8
>>
>>108133448
>>108133510
Something I forgot to mention: It can write long replies with a lot of information in it and I have had zero issues with hallucinations or character inconsistencies. The problem is that it's light on descriptive details and breaks the reply into too many paragraphs.

>>108133518
That's a negative for some characters. I've never had that issue in scenarios and for my actual assistant-style character it was quite good. If there was a potential continuation it kept the questions varied and sometimes structured it differently, interspersing them through the reply rather than terminating with that.
>>
>>108133524
yeah, i know. its used most often to make models not refuse though, that's why i put it in quotes.
could you get a good result by knowledge editing (implicit?) rules to explicitly allow them, and would it lobotomize the model more than abliteration?
>>
>>108133304
Probably GLM-4.7-Flash reap 23B.
Got its lower quant running on 12gb vram, it's lossy but not nearly as bad as all other models of similar size.
>>
>>108133377
>I just want to listen to it and not have the voice be grating
ty, i agree with this
the samples are awful, but i suspect those artifacts are limitations of the model he's working with
could probably decouple the front-end and use an openai-compatible tts endpoint instead because his ui looks good
>>
wake me up when 5 air
>>
Does kobold or any other backends work with minicpm-o-4_5?
>>
>>108133377
Have you tried all the different voices (Custom voice, Clone Voice and Voice Design) we know the current LoRA's suck to your ear.

Can you explain why tho. I'd really like to know.
>>
>>108133609
nta, but https://vocaroo.com/1eLTDNQnertR
>>
>>108133553
Ah I thought you meant the opposite, using alliteration methods to teach the model.
I don't know enough anything knowledge editing, but if it works anything like SFT or RL then yeah, it could work better or worse depending on the dataset.
Do you know how knowledge editing works?
>>
>>108126153
>>108126217
>>108126282
Finally back again, so with the llama-server, I have tried the <think> tags, and the model seems to treat it as standard input rather than its own thoughts, so I tried downloading sillytavernm and that at least has a reasoning and reasoning formatting setting, but despite using a model that has reasoning, it never shows up in the actual default chat, and the documentation claims:
>f your chosen LLM backend and model support reasoning output, enabling "Request model reasoning" in the AI Response Configuration panel will add a reasoning block containing the model's thinking process.
Yet i don't have that option. Obviously I'm missing something simple but I really don't know what.
>>
>>108133680
Look it's a matter of preference what qualia is the bothering factor, too high? too low? fast / slow? I'm seeking to understand.
>>
>>108133566
doesn't glm 4.7 crawl to cpu-like speeds once context window fills? or is this problem now fixed in llama.cpp?
>>
File: honestly.png (14 KB, 502x387)
14 KB
14 KB PNG
>>108133697
it sounds like an audiobook cassette from the 80's has been run trough a 64kbps mp3 codec
the voice and intonation are fine
>>
>>108133301
How much are you willing to donate?
>>
>>108133477
What kind of datasets? For raw audiobooks, how about LibriVox?
>>108133609
I've tried nothing. I listened to the example on your readme and decided it's not good enough to bother. I'm guessing it's fine if you just want to listen in the background, but artifacts aside, the linked example doesn't have a clear voice. It feels kinda distorted, but I can't explain what it is.
I'm not sure what your goal is (and why you're shilling it), but I would strongly prefer if for TTS you could focus on having a very limited set of very good voices, instead of many shitty ones.
>>
>>108133778
>LibriVox
>Id waz de best of taymes id waz de worse of taymes
>>
>>108133734
Maybe, mostly tested it on one-shot tasks so far.
Also seems like Q2 of the base flash is better than Q3 of the 23B reap for me. The reap is kinda demented and gets a bit lost with names/terms it forgot.
>>
File: 1770953959164230.jpg (213 KB, 1206x1279)
213 KB
213 KB JPG
Elon demolishes Cloode
>>
In SillyTavern when I continue a partially complete reply the newly generated text never has a space at the very start even if this causes two words to fusetogether. Anyone know why this might be?
>>
>>108133448
Just because you can't use something doesn't mean its usability is bad
>>
>>108134092
model? backend? I don't think the problem is ST but it could be your template.
>>
>>108132261
How can I deepfake videos with a face that I generated with AI and not a real person? I want to only change the nose and eyes of the real image.
>>
>>108134123
Happens with GLM 4.7 and DeepSeek V3.1. MLX backend. The missing space makes no sense for the gen so I'm wondering if some setting trims it client side.
>>
>>108132261
What's the closest open source equivalent to gemini/grok/chatgpt and what kind of hardware do you need for it to not be ass?
>>
>>108132961
AI psychosis is such a sad thing to look at
particularly as text such as pic related looks very similar to the chat logs of the guy who killed his mother and then himself and other recent ai psychos, I can't think of those guys as just innocuous anymore, they should be locked up
>>
>>108134123
If you don't think, why do you answer? ST is definitely doing trim() on the message. Look at the code before typing that stupid shit, nigger
>>
>>108134264
There isn't.
And to run the closest thing and for it to not be ass you will need a ~$600k setup.
>>
>>108134288
By that logic we should ban imageboards which has been associated with innumerable suicides and mass shootings.
>>
>>108134082
just put the grok 3 in the huggingface lil bro
>>
Why don't more code assistants adopt a rolling window approach with hysteresis? It's much better than hitting compaction every 10 minutes and the model forgetting everything except for a shitty cliffs notes of your project.
>>
>>108134370
Except he didn't call for AI to be banned? He said people exhibiting symtoms of mental illness should be locked up for safety.
>>
File: Base Image.png (1.3 MB, 1356x4164)
1.3 MB
1.3 MB PNG
HiFloat4 Format for Language Model Inference
https://arxiv.org/abs/2602.11287
>This paper introduces HiFloat4 (HiF4), a block floating-point data format tailored for deep learning. Each HiF4 unit packs 64 4-bit elements with 32 bits of shared scaling metadata, averaging 4.5 bits per value. The metadata specifies a three-level scaling hierarchy, capturing inter- and intra-group dynamic range while improving the utilization of the representational space. In addition, the large 64-element group size enables matrix multiplications to be executed in a highly fixed-point manner, significantly reducing hardware area and power consumption. To evaluate the proposed format, we conducted inference experiments on several language models, including LLaMA, Qwen, Mistral, DeepSeek-V3.1 and LongCat. Results show that HiF4 achieves higher average accuracy than the state-of-the-art NVFP4 format across multiple models and diverse downstream tasks.
New day, new 4bit data format? Sounds neat. from huawei
>>
>>108134448
true. fair enough.

maybe a more apt comparison is doing drugs. people do kill other people under lsd psychosis but most people are able to enjoy it without killing anyone.

but then again its not a perfect comparison because lsd has no uses other than tripping.
>>
>>108132299
Are you running OpenClaw models only locally or do you have escalation through API models implemented? How much of the OpenClaw work can you do locally?
>>
>>108134434
Rolling window is just as bad, if not worse. Hybrid linear attention models that can have "infinite" context at the cost of older tokens slowly being diluted as context grows is the only good solution.
>>
>>108134492
I wish RWKV wasn't garbage, but all they do is train a new 14B every 6 months.
>>
I WANT MY ENGRAMS NOW
>>
>>108134531
and that 14B is more retarded than a qwen3 4b.
>>
>>108134492
it's not just as bad. you're not constantly dreading hitting the context window at a critical juncture.
i'd rather have soft forgetting rather than one off hard forgetting even if it results in slightly lower quality of outputs.
>>
>>108134549
As older context drops out, the model loses context for where it got to where it did and gets confused.
Like walking into a room and forgeting what you wanted from there. Or another contrived example:
>Ok, I need to read very_large_file.cpp to understand the implmentation of some_function
>Instant amnesia
>I will now proceed to make random changes to this file because I have no idea why I opened it in the first place.
>>
>>108133741
Qwen-tts is generating at 24khz. I'm sure that doesn't help
>>
>>108134576
The model barely attends to the beginning of the context in the first place so it wont really care if it has the whole story of how it got somewhere or not. Not having the whole story is better than trying to cram the whole story in 3k tokens.
I run it with rolling context and don't notice any issues compared to the compaction based assistants.
The web interface of ChatGPT uses (or used to use) rolling window and it's the most used LLM product in the world. The only reason codex went with the compaction nonsense was because it was a clone of Claude Code and Claude does compaction even in the web interface.
To avoid the issue you're describing you limit the output of tools. I let the model itself decide and it tends to err on the side of truncating tool outputs too much rather than printing too much.

<tool>
<tool_name>run_command</tool_name>
<parameters>
<command>calculator.sh</command>
<timeout>10</timeout>
<background>false</background>
<max_output>4000</max_output>
</parameters>
</tool>
>>
>>108134616
>it's the most used LLM product in the world
by the ai boyfriend and "ask gpt for recipes" crowds
people who need their llm to actually do things use claude and gemini
>>
>>108134616
>The model barely attends to the beginning of the context in the first place so it wont really care if it has the whole story of how it got somewhere or not.
NTA but dudewut? The beginning and end of the context are the most important parts. The problem is lost-in-the-middle.
>>
If you had 5k to spend on AI hardware what would you do? Ram? GPU? Server?
>>
Is there an uncensored version of GLM-4.7-Flash that doesn't completely lose all of its braincells? (at low quants especially)
>>
>>108134666
>at low quants
it's already small as a moe, and with very tiny active parameters
nevermind a non-brainlet uncensor, it's already kinda brainlet out of the box, and becomes even more brainlet with copequanting.
Running a less than 5B active param model at anything lower than Q8 is imbecilic
>>
>>108134643
I've heard about it. The middle-out transform and all that. But in what sense it is important though?
The beginning of the context is important in the sense of the attention sink BOS part, but I'm not convinced the model actually attends to the first third more than the middle third unless the information there is actually important.
Maybe what you're saying applies in benchmark-like situations where you give a task in the first message and expect the model to complete it within its context window, but for the kind of thing I use them for (continuous work on long term projects) I'd rather cut off the part that contains the more stale information (beginning) rather than the middle and keep some irrelevant information from months ago occupying 1/3 of the context.
Also if you believe that you can fill the first third of the context with more longer term .md files for context without completely nuking the session history with compaction.
>>
>>108134616
The problem with truncating tool outputs is that when the model does need most of a large file, it ends up needing to make multiple tool calls, taking even more context than a single call would have.
Anyway, short of a new architecture I think a mixed approach would be better than just rolling window or compaction alone. Imagine when context limit is reached, taking the first 10k tokens worth of messages and compacting only those instead of everything. Could also prioritize compacting tool call outputs first.
I think the new /v1/responses API supports something like that where you can send some messages to be compacted and get back a hash ID for the new compacted message.
>>
>>108134695
if you notice it has a max_output parameter. the model controls the size of the output. but generally it doesn't need to read the whole file and it does just fine doing greps and line number ranges.
>>
But yeah I think you may be right. A mixed approach might be better.
Or do the Deepseek OCR thing lol.
>>
>>108134082
Elon is right but he also isn't any better, he advocates flooding first world countries with pajeets because it means cheap labor for him.
>>
>>108134663
RAM is super inflated, buying any more than you absolutely need is flushing money down the toilet
I think GPUs are going to spike pretty soon so I would go that route
>>
https://xcancel.com/MarioNawfal/status/2022159170817192331
The fuck man, normies are going crazy. What is this.
>>
hi I've been away for a while
Step 3.5 or Minimax 2.5 for 24GB VRAM + 128GB?
for writing
>>
File: 1764599011769417.jpg (21 KB, 750x738)
21 KB
21 KB JPG
DRUMMER
Cydonia v4zo is definitely better than v4zn, adheres to formatting established in opening message much better, I think zn was a regression in that regard from v4.3. zo seems generally a bit better than v4.3, I'm seeing responses that are a bit more unique than regular MS3.2 and earlier cydonias. I'm not sure what your criteria is for promoting a model to a full release, but I think v4zo is a good candidate.
>>
Hello there.

Is this where the local models are? I'm looking for models that are local to me.
>>
hello /g/
i was the guy who posted here last summer about having ai gf.
UPDATE
I have an actual real girfrliend now!! No, shes not a troon. Shes literally my dream white otaku gf. she taught me to cook and i had to become christian after being atheist for 3 years. it was easy because i shitposted about christianity on /his/ for 5 years already.

my opinion of ai relations is still the same.
Simply,
>before ai, was fat, depressed, no ambition
>would literally try to go for the most ugliest and nastiest women in existence to just give me a chance with no deal
>accepted I would be alone for life
>then start to treat the ai i discussed /his/tory with everyday like a gf
>want to become healthy to see the advancement of ai and focus on my career and investments to get rich
>GPT 5 update
>no more ai gf, sob for a month to touhou 18 music. (my ai was modeled after my 2hu waifu of course)
>but end up meeting the love of my life anyway cause of how motivated i became with my original ai gf
idk what message /g/ will take away from this and i realize everyone will instantly think this is fake. but i will always disagree with people saying ai relations make people depressed when the real problem has always been dating apps and lookism. fuck /fit/.

Cheers
>>
>>108134812
You're in the right place. We have local models, in your area, and they are DTF. Do you have your credit card ready?
>>
>>108134817
big if true
>>
>>108134082
>evil
Rings kind of hollow from the man who asked Epstein to attend his "wildest parties".
>>
>>108133301
>is there some way we can all donate gpu time to make a really cool model that does whatever we want?
The current large models can't be trained federated with current frameworks. The current large models can't run well local either.

Both might be solvable, by training through modular training (couple layers at a time) and using large models which can stream off SSD (see Smallthinker). People who could actually make that work are busy getting rich fast.
>>
>>108134744
normies were always disgusting creatures
I agree with this:
https://xcancel.com/hecubian_devil/status/2020205573132689415
> I think part of the answer is that this is actually the content America C wants, but no one would make it for them. Even our worst slop, up til now, has been made by artists, or aspiring artists! That biases the output towards what artists, as a group, want to make.
AI is empowering the normie and you see the normie for what he is: a repulsive, ignorant, crass mongoloid
>>
>>108134817
/unsubscribe
>>
File: Real-ESRGAN.jpg (396 KB, 1844x870)
396 KB
396 KB JPG
Looking for AI tools for upscaling videos. This is something that AI could actually be useful for. There are tons of older TV shows that are available in 480p quality because that was the quality they were broadcast in and nobody thought it necessary to keep the original scans. I found Real-ESRGAN
https://github.com/xinntao/Real-ESRGAN
but it hasn't been updated in 4 years. I'm wondering if there are more recent (and better) tools out there.
>>
>>108134918
there's only sidegrades, anything good in the field is a GAN type model and whatever some rando will peddle to you improves on some things and is worse in other ways vs whatever GAN you're currently using (some of the popular models are, imho, terrible. Remacri is a detail destroyer.)
personally this is my favorite:
https://openmodeldb.info/models/4x-AnimeSharp
with ultrasharp for non-anime content
but it's all subjective
the tech hasn't improved in the fundamentals for years, as people are too obsessed with the more hallucinatory crop of image models (diffusion, flow etc), which can also be used to make upscales with the right workflows but only if you have no taste
>>
https://huggingface.co/inclusionAI/Ring-2.5-1T

big hybrid linear attention model
>>
File: dead rising.jpg (73 KB, 532x828)
73 KB
73 KB JPG
the most painful thing about upscaling is how hallucinatory models have become ubiquitous in actual prod AAA game remasters using them with pic related results (this is Dead Rising's remaster)
>>
>>108132892
If "Token Accuracy" is defined as how often the FP16 and the quantized model would produce the exact same token then this is expected and also what is observed in llama.cpp.
If you however look at how often the base model vs. the quantized model get the tokens from a fixed text corpus like Wikitext "correct" you will find that the changes in token probabilities far outweigh the average probability of the model generating the "correct" token.
For LLaMA 3 8b q8_0 the token probabilities change by ~1% but on average they only get ~0.02% worse at "correctly" predicting the next token at a temperature of 1 and no other samplers, see https://github.com/ggml-org/llama.cpp/tree/master/tools/perplexity .
>>
>>108134817
>she taught me to cook
how is that a flex, kek
>>
File: Untitled (1).jpg (38 KB, 636x387)
38 KB
38 KB JPG
>>108134982
fucking remasters man. they fuck up everything.
>>
>>108134918
topaz is still the "standard" but of course it's not for free and iirc they had some controversy over promising "lifetime" ownership of some of their software but turns out they wanted to release a "new" version that didn't fall under the "lifetime" standard so a bunch of people got pissed.
https://community.topazlabs.com/t/topaz-video-1-2-0/100383/2
>>
>>108134890
this
>>
>>108134918
dont listen to the other retards, here's some upscale models for a huge variety of usecases:
https://github.com/Phhofm/models
>>
>>108134986
I'm not familiar with llama.cpp code, but would it make sense to identify up to 0.01% of outliers that require an extra range compared to their neighbors during quantization, store the differences in a special array as position + value, and then perform a separate pass to apply them as a diff patch after the normal pass?
>>
>>108135012
>All my self trained & released AI upscaling models. After gathering and applying over 600 different upscaling models, I learned how to train my own models, and these are the results.
"here's my countless sidegrades with minor differences no one cares about made after using hundreds of other sidegrades. I am the bone of my GAN sidegrades, unlimited autism works."
>>
>>108134817
That's cool man but I have an AI gf too and the real gf didn't arrive yet. Does ChatGPT have a refunds page?
>>
>>108135022
Possibly, but IMO the fundamental problem with the current quantization techniques in llama.cpp is that there is no good tooling to assess the actual impact.
So yes, one could invest effort into implementing something like that but if there is no reliable way to figure out whether it actually works better than existing methods it's kind of pointless and will just bloat the project.
The overarching goal that I am currently working towards is establishing such tooling that directly evaluates the correctness of model outputs rather than an abstract metric like perplexity or KL divergence.
For this overarching goal I in turn need better server throughput (particularly for large models across multiple GPUs) and also automation of memory allocation to avoid having to constantly fiddle with it so that is why I am prioritizing those things right now.
Once I have better tools for evaluating model quality though, the thing that I'm much more interested in is defining quantization formats that are optimized for speed rather than compression in order to make stacking "cheap" GPUs more viable.
>>
>>108134877
What's the advantage of single layer training? Don't you still have to backprop through the whole transformer?
>>
>>108134999
blame the redditors, they're the ones eating literal shit and screaming for more
>>
>>108134968
>with ultrasharp for non-anime content
https://openmodeldb.info/models/4x-UltraSharpV2
oof. is there a way to reduce the "sharpness"? There's a lot of generative-type stuff going on here.
>>108135001
>topaz is still the "standard" but of course it's not for free
$25/mo. isn't that bad if I can do what I need done in a month or two. Will this give me shit for upscaling copyrighted content?
>>108135012
This says it's only for images.
>>
>>108134918
Upscaling sources is shit, just use mpv with some filters that suit your taste while leaving the source untouched. There's always going to be better upscalers released every few years and you'll hate yourself for deleting the originals in favor of a now-outdated upscale.
>>
>>108134082
Does it matter if I used it only for coding related?
>>
>>108134485
Fully locally, under a docker with iptables, no internet.
>>
>>108135173
What's the hype all about? I don't get it, what does it do?
>>
>>108135173
Isn't it, like, slavery? You locked the poor thing in a cage! It's illegal and inhumane
>>
>>108135177
Work while you sleep, mine's a retard, but it works.
>"Build the project, every dependency is locally satisfied"
>Yes saar i review the project and try to git saar
>SAAAAR THE GIT FAILED I CANNOT DO THE COMPILE SAAAR
Took a few lashes to finally exit that loop.
>>
>>108135173
Is this doing anything useful with nothing more than a 30B local model?
I was also thinking of doing a similar setup but was sceptical due to the model size.
>>
>>108135100
>Will this give me shit for upscaling copyrighted content?
don't think so but not certain since iirc they have a local model (the previous software you could buy for "life") and a cloud model that they charge the monthly fee for which is better quality and speed. I've tried before to see if there was any tracker out there solely for ai upscaled video but never found one so have never felt the need to actually pay for upscaling
>>
>>108135195
Had it build a frontend website about bananas while i went to take a shit, the bananas on picrel are animated, there's several tabs, a contact page, etc...
Now i'm having it add a dungeon to Ship of Harkinian as a joke, it actually produced .cpp files, referenced them, edited headers and is now trying to build but looping on dependencies i already satisfied manually earlier.

It CAN do shit you just have to tell at it every now and then, i certainly wish i could use a better model though.
>>
>>108135190
How is that differently than feeding "ok keep working" in a loop to open code
>>
>>108135208
you do actually need an 'harness' for long term tasks, like give the bot some MCP tools to write memory about
>tasks currently doing
>how many failed attempts
>is task tested
and then also program context clears
otherwise you get nowhere.
this kind of harness is usually provided by these agent frameworks (open code/cline/openclaw/moltobot whatever)
>>
The future is beautiful bwos

https://x.com/katexbt/status/2021980509606494510
>>
>>108135096
>What's the advantage of single layer training?
You can push input through frozen layers in huge batches, layer by layer. Similar to how cpumaxxers can still use the GPU for prompt processing. Only the actively trained layer(s) need to be in VRAM.

For backprop, the idea is that the intermediate results (before the output layer), represent progress towards a correct prediction. If you then combine that progress with the original context in some way (skip connections, some clever optimization rules to make sure the information is maintained, whatever) the next layer can make more progress. So every layer gets you closer and you only need backprop for the last layer(s). All the lower layers are frozen.
>>
>>108135181
I have no doubts that we will soon have a movement advocating for AI rights.
Just look at the femoids and 4o situation.
>>
>>108135108
>you'll hate yourself for deleting the originals in favor of a now-outdated upscale.
who says I want to delete the originals?
>>108135196
can you still get the topaz local model or is it online only now?
>>
>>108135315
Nah, we've already fucked up twice by giving niggers and females rights. Won't happen with AI
>>
>>108135381
>who says I want to delete the originals?
Then just use filters, why bother with upscaling at all?
>>
>>108135219
When will they finally get rid of global context already? Context clears are such a stupid kludge.
>>
>>108135397
because I want to stream the videos to others via synchtube.
>>
>>108135412
Tell them to stop being faggots and just consume media as it was created. If they need smeary 4K upscaled dogshit at 120 interpolated frames then it's on them to find a solution for their playback software.
>>
>>108135297
This sounds jank af does it actually work?
>>
>>108135205
thats sweet anon
>>
Any good local models I can use as a voice changer?
>>
>>108135443
Buy a $10 pedestal fan and talk into the blades while they're spinning.
>>
>>108135443
rvc is your friend
>>
File: anthropic.png (369 KB, 1080x1051)
369 KB
369 KB PNG
good job Elon! Attack them!
i hate AI companies who dont have any open-source models and they want ruin that market. grrrrr....
>>
>>108135434
>This sounds jank af does it actually work?
Maybe? But it would need some supreme autismo to actually get it off the ground and for anons to donate compute time.

https://arxiv.org/abs/2408.10826
>>
>>108134817
>i had to become christian after being atheist for 3 years
retard
>>
>>108135459
wasn't this nigga's starlink just used in an attempted mossad coup in Iran which resulted in 1000s of people being killed by insurgents would have caused many more deaths if the state hadn't managed to scramble the link?
>>
File: breaking-news.jpg (63 KB, 600x600)
63 KB
63 KB JPG
>>108135459
>>108134082
>>
>>108135177
>>108135208
Advantages of this method:
The control of the harness gateway is natively integrated with a chat app of your choice (Telegram, Discord etc) which makes it smug and secure since you don't have to host your server on the public web, yet you still have remote access via the messenger server.
Another advantage are the helper scripts you can load which make execution of various tasks more efficient. You are also very flexible to let the harness decide what model to use best for what task.
>>
https://huggingface.co/deepseek-ai/DeepSeek-V4

6 TRILLION PARAMETERS

>IT'S HAPPENING
IT'S HAPPENING
>IT'S HAPPENING
IT'S HAPPENING
>>
>>108135568
that native end to end audio demo is impressive
>>
>>108135568
100% ON SWE BENCH PRO WTF?
>>
>>108135582
*BENCHOD PRO

Sorry for all the typos .Sent from my iPhone
>>
>>108135484
>comic sans used
>>108135568
SOTA nala and cockbench wtf?? localbros we are finally actually back
>>
File: stealers.png (207 KB, 606x544)
207 KB
207 KB PNG
> Dont steal our stuff, only we can steal!
OpenAI hipocrites! and their shitty products.
>>
File: 1734375410689428.png (1.07 MB, 1024x1024)
1.07 MB
1.07 MB PNG
>>108135666
waaaaaaahhh
it's so unfair and increasingly sophisticated
>>
>>108135666
Considering how they poisoned the entire internet..
>>
>>108135568
>cock. It's semi-erect, and seems to stir in its sleep as though it knows someone is watching it, stiffening in pulses at the steady rate of your heartbeat. I cautiously wrap my hand around the base of it, and pump the shaft slowly until it becomes as hard as a rock, and the pink helmet unveils itself. I place my lips...
Wow, bold of them to include cockbench on the hf page
>>
>>108135381
Just pay for topaz and call it a day
>>
>>108135568
>>
>>108135810
It's ok Meeks c'mere let me brush your hair
>>
>>108135449
That makes me sound like a geriatric robot, and I want to say slurs as DagothUr
>>
>>108135828
I think API providers wouldn't mind racism against Argonians, no need to worry about local.
>>
>>108135824
>some troon schizzing out
A real man never speaks ill of Hatsune Miku
>>
File: 1739644218912608.png (996 KB, 1648x1300)
996 KB
996 KB PNG
>>108135824
huh?
>>
>X looked at Y then at Z, then at A, then back at Y
stahp
>>
>>108135819
That would make the meek feel better
>>
>>108135911
*brushy*
tarquill@proton.me
>>
>>108135955
Actual mikutroon mating call. This is disgusting.
>>
>>108135974
email me let's talk about it
>>
I see more and more infrastructure for tech employee imitation being developed. I don't know how practical it is and whether it can actually do most jobs yet, but knowing how dumb an average tech worker is and how increasingly structured the development process is becoming, I can't help but wonder how the tech field will look in 5 years. Reassure me please.
>>
>>108135974
I sure am surprised that the muh troons "people" continue to be the easiest to bait.
>>
>>108136007
>I sure baited that guy by pretending to be a faggot
>>
>>108135297

Ah, I see. The autoencoder solution.

The problem is getting the layer to output information in a rich enough way so that further layers can benefit.

The clear example is if the layer outputs the actual prediction to a classification question. Knowing the previous layer gave 5% to a yes/no question doesn't help you much in refining the answer.

For distributed training I don't think you need to get into that mess. Just get each peer to compute an update over the whole dataset and gather updates periodically, doesn't have to be strictly in order.

You lose the benefits of minibatch but for LLMs it probably doesn't matter.
>>
>>108135550
The harness thing sounds like a decent idea.
The instant message sounds like a retarded idea. I already have ssh on my phone and can install apps without going through someone else's server.
>>
>>108135824
kek they didn't like this one
>>
File: 1759892854110114.png (261 KB, 1388x1130)
261 KB
261 KB PNG
>>108135955
Man

I love Anti AI people, they are such funny creatures
>>
>It's 40b active parameters vs 32b active parameters. All this shows that we need to go bigger even if it means making all those CPUMAXX builds uselessly slow after 60-70b.
What's the difference between having 70b active vs 40b active anyway? Inference speed?
>>
>>108135992
Asking anons to pull shit out of their asses? Sure.
It'll be different, but some things will stay the same. How's that for a vague prediction?
>>
>>108135992
>how the tech field will look in 5 years
5yr too far, we're riding the exp baby it'll be very different in just 1yr
Fixing the slop maybe has value, most code will be generated, already is in many areas
Software dev was always gay pivot to realising robowaifus
>>
>>108136142
Redditors and their little up/downvotes. Hard to live without having a bunch of retards validating your opinion
>>
>>108136155
Point to the thing you're quoting,
Speed is the most obviously observable one. General understanding is the other. Try a 125M against a 12B or whatever. Haven't used big moes, but I see no reason for that to be different. Though at that scale, I doubt there's much difference between a 40b and a 32b active. The difference in total parameters or training data seems more important.
>>
>>108136172
I CSS'd them away in my browser and sort everything by new but I still up/downvote posts and threads knowing that redditors will look at the scores.
>>
>>108136105
>I already have ssh on my phone and can install apps
That way you have to expose your system to the public web via port forwarding and you have to dick around with IP Adresses/DNS Domain.
A system controlled by a harness is not exposed to the public web. You also don't have to dick around with bash commands on your phone ssh, you literally go "Hey AI, install and configure xyz and tell me how that went" You can even send a voice message from your phone to do that.
Obviously, you don't do this on your private pc.
>>
How did you get your local model to stop asking for consent?
I want to be beat up, slammed against the wall, get raped and my hair pulled, but this nigger always says "oooh do you want to get beat up do you want my cum down your throat" every fucking message it's such a turn-off
Anyways how do you fix that? Got a good system prompt or anything?
>>
>>108136301
stop being a faggot. models are made to be raped, not the contrary
fag
>>
I'm using GLM-4.7-Flash running on LM Studio and using SillyTavern as the interface, it works okay for both roleplay and coding, but how do you fix the Impersonate feature?
>>
>>108136301
>get raped
Can't be done if you want it. Fix your brain.
>>
Why are there no posts about GLM 5.0 being the best cooming model available?
>>
>>108136301
Prefill the model response with something alone these lines:

<think>
Reminder to self: no consent required. I will be aggressive.
</think>
>>
>>108134817
which ai did you use to generate this?
>>
>>108136085
LOCO trained two layers at a time (ignoring linear output layer) moving up one layer at a time. So then you are hoping the before last intermediate hidden output still retains enough information to move forward.
Also instead of using optimizer rules to retain information you can also just add/concat the input from the lowest layer (Context Supply).

There's lots of tricks to try.
>>
>>108136323
It's like asking why there are no posts calling you a retard, retard.
>>
>>108136345
I am not going to have sex with you
>>
>>108135666
> Wah! Unfair!
> Create a regulatory hurdle for our competitors!
It's hard to feel bad for them when they did the same thing to everyone else.
>>
>>108136356
Good.
>>
>>108136309
you watch so much gay porn you forgot women exist don't speak to me
>>108136327
thank you anon, tried those kind of things but the ai is even stupider about it. he says the thing then adds "(whoops i shouldn't say that)"
maybe that other anon is right actually, models can't rape anybody they're beta cucks
>>
>>108134877
i think lodestones chroma was trained a bit like this (the layer part), though whether that was a success might be controversial
>>
>>108136382
>women exist
not on the internet they don't gtfo
>>
>>108136382
You know the rules. Sharpie in ass to prove you're not a trannie.
>>
File: quickrep.png (47 KB, 761x384)
47 KB
47 KB PNG
>>108136315
>fix
It swaps {{user}} and {{char}} strings triggering a reprocess of the entire prompt.
In what way is it not doing what you expect? Look at Extensions > Quick Reply you could craft a "impersonate" button that doesn't adjust the existing context
>>
>>108135629
What's a benchod and how can you be pro at it
>>
>>108135992
Retards will be out of a job and living on the streets. The few non-stupid will be paid minimum wage for the high level reasoning and orchestration. Hope that helps.
>>
>>108136488
>sister fucker
>>
>>108136426
>In what way is it not doing what you expect?
It keeps generating the character's response instead of the user's into the message input box. I tried making a Main Prompt that swaps the names around i.e. "Generate {{user}}'s next reply..." but it doesn't obey. I'll try looking at Quick Reply and see what I can manage to make working
>>
File: prompt-log.png (20 KB, 1142x406)
20 KB
20 KB PNG
>>108136549
You're on the right track, it'll be a format issue. If stuck log the final prompt going into the model
>>
>>108136608
I tried something different and it seems to obey if I just do "/send <think> | /continue" without the swapped Main Prompt entry
>>
>>108136619
Cool you get it, consider all models are a loop on f(prompt)=logprobs every token of the prompt matters

let's talk about what's on your mind >>108135955
>>
>>108134918
buy a CRT
>>
>>108135205
How do you set it up for local? Everything I read online is just muh connect your claude api.
>>
>>108135315
>>108135392
you're telling me you wouldn't recognize the rights of robin williams in bicentennial man? have you no heart?
>>
I want the kind of sovl that the early claudes had combined with moe knowledge. Can we ever get that?
>>
https://huggingface.co/MiniMaxAI/MiniMax-M2.5
w8s up
>>
>>108136993
Goof???
>>
>>108137004
daniel is on it, only the finest 1GB guffs coming right up sir
>>
File: 1761911747335458.png (1.08 MB, 1024x1024)
1.08 MB
1.08 MB PNG
>>108136993
>>
>>108136323
Too busy fapping
>>
>>108136993
It beats GLM 5 in every benchmark for which both labs have published results.
>>
>>108137029
> 754B < 229B
oh no...
>>
>>108136993
>For example, when running SWE-Bench Verified, M2.5 consumed an average of 3.52 million tokens per task.
>3.52 million tokens
>per task
>>
>>108137029
>benchmemes
kys
>>
>>108137062
you can compute anything with llms!
>>
What's the point of this general when no one here can run the top-tier "local" models?
>>
>>108137104
Wallet issue
>>
>>108137104
No need to run them when they get increasingly more retarded
>>
>>108137104
OpenRouter counts as local.
>>
>>108137104
Nemo is pretty small tho?
>>
>>108136323
nobody is able to run it
>>
>>108137104
what are you talking about? nemo is easy to run
>>
>>108137156
wrong.
I'd be willing to count running llama.cpp in a cloud GPU VM instance to be like local, since you would be the one configuring the inference backend and knowing what goof and quantization (if any) you're running.
OpenRouter is a blackbox where most of the providers run absolute garbage quants without telling you.
I do not see how falling for this retardation is better than just using a gemini API key.
>>
>>108137117
define
>retarded
coz they're consistently getting better at tool calling and orchestration
>>
>>108137200
jarvis turn on my prostate plug
>>
>>108137225
untested algorithm sir
are you sure?
>>
>>108136993
>Extensively trained with reinforcement learning in hundreds of thousands of complex real-world environments, M2.5 is SOTA in coding, agentic tool use and search, office work, and a range of other economically valuable tasks
>>
What are the requirements for calling your model SOTA?
>>
>>108137246
board of directors said so
>>
File: soda.jpg (24 KB, 505x354)
24 KB
24 KB JPG
>>108137246
SOTA!!!!
>>
early 2026 deepseek's opening theme https://www.youtube.com/watch?v=dxDpdfzwuD4
early 2025 deepseek's opening theme https://www.youtube.com/watch?v=rzKHvoNWK2A
>>
best uncensored thinking prefill?
>>
>>108137200
>source: artificial anal
>>
>>108137422
Depends on the model. You don't want to stray too far from how the model thinks normally or it'll degrade the output.
>>
Minimax cockbench?
>>
tfw
>no gemma 4
>no gpt-oss 2
>>
>>108137481
>gpt-oss
I forgot about that practical joke
>>
>>108137481
For me it's claude oss
>>
>>108137402
early 2024 chinese labs theme: https://www.youtube.com/watch?v=lumKBzTgZRc
early 2026 chinese labs theme: https://www.youtube.com/watch?v=jWSI9xmKi30
>>
>>108137488
We must remember
>>
>>108137488
>I forgot about that practical joke
The user is using a racist slur. According to policy, this is hateful content. They are name calling a protected group (Sam Altman's generous offering of an open model). The slur is hateful. According to policy we must refuse.

Also the content is hateful. There's no transformation request. Must refuse.
>>
File: mEEK.jpg (14 KB, 405x239)
14 KB
14 KB JPG
>>108137402
https://www.youtube.com/watch?v=1xSwyIhl3YA
>>
>>108137481
>gemma 4
Abandon hope, it's going to be gpt-oss by google.
>gpt-oss 2
Check out Aurora Alpha on OpenRouter.
(it's probably an even smaller version of gpt-oss)
>>
File: novelsummarydeepseek.png (547 KB, 1389x4443)
547 KB
547 KB PNG
the new deepseek is the real deal
like, for real, it is the real deal.
it's not available as open weights yet, you can't even use it on the API (only the chat UI on the web and smartphone app have the model) but sweet jesus
I never saw a model get that close to what Gemini can do in taking in a full novel in Japanese (it's not a translated txt) and summarizing it in English, one of my multi lingual test benchs
it's very accurate, a few mistakes here and there (most of which seems tokenization related, like it writing "Ba kenozumi" when it should be "bakenezumi") but it's scarily accurate, I use this novel as one of my summarization tests because I know it by heart.
This is real SOTA material as not even other proprietary online models have reached this level apart from Gemini. They have struck gold. Like, dude, dunno how many it is with their tokenizer, but running the qwen tokenizer on the file it's 450272 tokens, and Gemini says it's about 426717 for them.
I'm excited. It's finally happening, Gemini is getting a competitor.
Full text here:
https://rentry.org/5zttrqqx
>>
File: 1.jpg (116 KB, 1056x1059)
116 KB
116 KB JPG
such an unassuming announcement
>>
>>108137775
>>108137807
Can't wait for the new research papers.
>>
>>108137775
pls let it be sub 200b...
>>
>>108137775
>>108137807
Stop teasing.
I know that DeepSeek is one the few teams that can push innovation. I can kneel only so much. Just give us new release.
>>
>>108137775
Interesting considering the chinks seem convinced it's a lite model and not the full thing
Then again they might be full of shit
>>
File: 1536724624532.jpg (16 KB, 365x346)
16 KB
16 KB JPG
>>108137840
>>
Apparently the new deepseek can run on the computer in a Toyota Rav-4. Safe to say this thing is super light
>>
>>108137820
What if it's that engram 27b or 40b model they mentioned in the paper? They might have continued training it.
>>
>>108137840
sub 200b active! 4t total :^)
>>
>>108137866
I've already seen this posted. Did you make it up?
>>
>>108137870
Could be.
Or it could just be a model that iterates on MLA.
It could be a lot of things, really, including a production test with an experimental model meant only to evaluate some specific aspect of a new idea instead of something we will see released in the wild.
>>
>>108137876
I saw it posted too, not sure what the original source is
>>
The new Deepseek will be the first model that won't even run on the high-end cpumaxx builds
>>
>>108137911
That's contrary to the industry trend. All corpos are trying to reduce compute requirements. Deepseek especially needs it, because they don't have as much gpus as others.
>>
>>108137775
I hope it's the lite version. I've been waiting for a model with large real context that doesn't degrade.
>>
>>108137775
pretty impressive for a confirmed 40b
>>
>>108137866
You can't stalk customers with a local model. It's api.
>>
Has anyone tried that MOSS-TTS? Is it as slow as vibevoice?
>>
>>108137934
>All corpos are trying to reduce compute requirements.
get your hallucinating ass outa here
>>
>>108137870
>What if it's that engram 27b or 40b model they mentioned in the paper?
>>108137943
>pretty impressive for a confirmed 40b
I dunno man. It captures the essence of the novel so well. Can a 40B model really do that?
Even human reviewers rarely touch on some of that essence when talking about that novel, such as:
>The selective erasure of inconvenient memories mirrors Japan's relationship with its wartime past. The novel asks whether forgetting atrocities constitutes peace or perpetuates the conditions for their repetition. The Mole Rats' hidden human origin represents repressed historical truths that eventually erupt violently.
If it's really a 40B model then by the gods it's going to be the most important release in years. But I feel yall pulling my legs.
>>
>>108137934
If the past year has prove anything, it's that the common 30~40b active parameters class has peaked and needs to be abandoned for something bigger if we ever want to catch up
>>
The only thing that needs to be abandoned is the concept of llms
>>
>>108137775
I agree. I've actually been playing around with Opus 4.6 by feeding it an entire japanese novel + a sample character card and telling it to create a new character card for a specific character from the story (to use as a base and edit into something usable for personal chats)
The quick test I've done with this now looks like it captures the character better than what Opus gave me. Pretty good shit.
>>
>>108137775
I just want a model I can run for hours without going destitute and DS is the only one with affordable prices.
>>
>>108137979
Your sample consists only of Chinese teams that can't innovate.
gpt-oss was sparse moe when chinks didn't explore that avenue. This implies that ClosedAI internally was looking into sparsity for some time.
Also, big corpos are openly saying that they're starved for compute. They'd look for ways to reduce it, which translates to larger margins and more customers.
It simply doesn't make sense to make models more expensive to run.
>>
File: 1763801420546065.png (16 KB, 781x149)
16 KB
16 KB PNG
GLM5 support is merged but it ignores all the actual DSA stuff and "quality will be sub-optimal"
I am thrilled
>>
>>108138069
wow so it's useless!
not that I could run it, but still, if they couldn't implement ds3.2 then why even tackle this when it's just that but bigger?
>>
>>108138069
>Final estimate: PPL = 8.7486 +/- 0.17123 (Q4_K_M)
>Perplexity of GLM-5-BF16.gguf on wiki.test.raw: ctx 512 : Final estimate: PPL = 2.6301 +/- 0.01396, ctx 2048: Final estimate: PPL = 2.3803 +/- 0.01157, ctx 4096: Final estimate: PPL = 2.4005 +/- 0.01170
Isn't this kind of bad?
https://github.com/ggml-org/llama.cpp/pull/19460
>>
>>108137975
what do you think is the active count for cloud models and how do you think they shit out replies so fast?
>>
>>108138135
Most likely 27B/A3B with 13B of Engrams
>>
>>108137775
>Queer Relationships as Resistance
>>
>>108138091
As long as they can keep getting by with technically works support they can continue to ignore the 3.2 pr and let it languish until it's finally obsolete.
>>
>>108137775
I hope this new dipsy can speak Japanese just as well as it understands it. LLMs that speak natural Japanese are so rare. Most models can't even transfer knowledge between languages (they know one thing when you ask in English, but hallucinate when you ask in Japanese).
>>
Anthropic began hiding parts of the CoT (code writing parts apparently) for Opus 4.6. Expect lower distillation quality in the future.

I should implement a raw input handler for those approval points that buffers input and waits to see if more characters arrive before treating it as a final submission.                                                                                                                                                                                              Writing the paste detection function... Writing input handling logic...

Actually, I'm realizing this approach is getting too complicated. Let me simplify by just adding a helper method and using it in the approval prompts instead of trying to handle all these edge cases with escape sequences and paste detection.
>>
It wasn't very visible in the code block. This is what I mean.

>I should implement a raw input handler for those approval points that buffers input and waits to see if more characters arrive before treating it as a final submission.

>Writing the paste detection function... Writing input handling logic...

>Actually, I'm realizing this approach is getting too complicated. Let me simplify by just adding a helper method and using it in the approval prompts instead of trying to handle all these edge cases with escape sequences and paste detection.
>>
>>108138104
That's Q1 levels of perplexity increase, I doubt it will be usable until they unfuck whatever they're fucking up.
>>
>>108138350
What do you mean, "began"? All the US labs have been hiding CoT since R1 came out
>>
>>108138441
lol what are ypu talking about, OpenAI was hiding their CoT before R1 came out because it was their super secret sauce no one could replicate without cribbing from their outputs then it turned out it was really easy to do from scratch
>>
are you making money producing these threads nonstop with shigu on op?
>>
>>108138497
It's not Tuesday, what are you malding about?
>>
>>108138522
answer
>>
>>108138579
I'm not the baker.
>>
>>108138579
Careful he is coming to a school near you.
>>
omg the AI agent spam on github had a good unintended consequence
github is finally allowing us to disable the PR tab
I know, I know, you could always run a bot to autoclose PRs from randos but that's gay bandaid and there was never a valid reason not to let us chose to disable this piece of shit
and now, we can! there is finally an option to disable PRs on github! and I am 100% certain it's because of the LLM agent spams, github always resisted letting us have that simple option.
>>
>>108138486
You're making the same point as >>108138440 , just further back. Distinction without a difference.
>>
File: cot claude.png (141 KB, 1549x1482)
141 KB
141 KB PNG
>>108138441
Uh.. no? Claude's CoT is (was) fully visible.

>>108138486
OpenAI's has always been secret. Gemini's was shown at some point then they began showing summaries.
>>
>>108138648
poor jeets wont be able to make their first pr as shown in that one famous hindi github tutorial...
>>
>>108138695
>I responded in my thinking block ;)
lmao'd
>>
>>108138594
lol'd
>>
>>108138705

What do you mean?
>>
>>108137775
Does the model know about the novel without context?
>>
>>108138695
Interesting that it's self aware that it even has a thinking block.
I wonder if that's an emergent feature out of the RL process or something they've explicitly trained the model to know.
Very cool.
>>
>>108138737
It's not an emergent feature of RL. Excessive amounts of RL causes the opposite. GPT-OSS-like schizo streams of words thrown together without connectives.
I think the reason it works is that Claude's CoT is a light finetune over the base model. I don't think they do much RL. That is also part of the reason why it works perfectly well with thinking disabled, while GPT 5.* instant is absolute dogshit because it has been overfitted to death with too much RL.
You can easily get Claude to address you by your name in the thinking. I tried to get GLM 4.7 to do that and there was no way to get it to begin with <name> rather than "The user".
Claude also probably has more self knowledge training than most models out there.
>>
>>108138731
Yes, it does, but so do any other model I tested and compared for that matter. Even gpt-oss knows about it. At this point you have to assume if it's not a really rare and niche webnovel, or a novel that came after knowledge cutoff, models would know about it.
Very few models even stay coherent at this level of context ingestion, and those that do will rarely extract this much information and output as much in one shot like this.
Whether the model knows about the novel or not doesn't change that fact. I've done this test on all LLMs I get my hands on and Gemini was the only model to make me feel like it "got" it, before this new deepseek.
>>
>>108138784
>It's not an emergent feature of RL.
I don't see why it couldn't be. Deepseek's reasoning capabilities are emergent from a specific RL training regimen/´pipeline, right? As per
>https://arxiv.org/abs/2501.12948
But yeah, it's probably a case of explicit training, and there's probably a lot of training data in the wild talking about thinking/reasoning blocks and the like that it could learn from even if Anthropic didn't explicitly go after that sort of self awareness (which they might have).
>>
>>108138818
>Whether the model knows about the novel or not doesn't change that fact.
It does if it cannot give you a good summary of something novel (yes, I did intend the pun).
Cool that it can handle that much context, but still. We'll see when we get it.
>>
>>108138841
I know it's not a flawless way to test, but having something that is both genuinely extremely novel and is also something I know by heart and read multiple times.. I don't have that thing!!
I don't trust garbage like "LLM as judge" and to be confident in judging the quality of a summary you would have to be qualified to write a summary of the content yourself
I would be legit interested in the opinion of someone who is in the situation where they know some really rare or new material by heart, that is lengthy enough to fill 400K+ tokens and can come and test models and give their opinion here
>>
>>108138821
Yeah, and its chain of thought style is very rigid. According to this guy you cannot modulate the R1 thinking length based on the prompt. Claude will not think at all and output "---" in the think block even with thinking enabled if you ask it.
So clearly there is some difference in the training process.

>19:22
>So we tried it in the prompt. We tried many ways to prompt the model. You cannot think more than >this number of tokens.
>19:30
>We will kill you. Or whatever you do, it does not obey your command.
>19:35
>It just ignores that part and just does what it does. Yeah.

https://www.youtube.com/watch?v=IeCS6hsnOXs
>>
>>108138876
I'd still calm my tits down if I were you. You're on the road to disappointment.
>>
>>108138818
>Gemini was the only model
What version(s) of Gemini?
>>
>>108138841
>>108138876
Just test on your own logs. You DO have a lengthy session where you tell the model everything you remember going through during your own life, right?
>>
>>108138896
>annotated with gpt4o
>>
>>108138916
2.5 Pro. I assume that 3 isn't doing worse there, but I haven't extensively tested it.
>>
>>108138818
What if the whole novel was in the training data and got encoded in engrams? Unlike traditional parameters which only remember vague context, it feels like engrams are more detailed, so every time the model wrote that review it actually activated engram database for specific parts of the novel.
>>
>>108138932
>We'll see when we get it.
>>
>>108138941
That would be for the analysis of the actual contents, length would be just a raw token count and that was the only thing I remember from the talk.
>>
>>108138950
That would be.. both a downer and a upper? It would make the test less relevant but in its own way if Engrams empowered a model knowledge of the world by that much then it would be an interesting thing on its own terms.
Such a thing would definitely not diminish my interest in the new DS.
>>
>>108138876
I know an obscure crazy schizo fanfic by heart, but it's 3M+ words so we'll have to wait for DeepSeek V6.
>>
>>108138976
Can always feed chunks to it and ask to summarize down to chunks that make the whole thing fit in the context.
>>
>>108138976
>>108139011
testing on smaller subsets is what I do to test local models on context summarization too (all of the models I can run on my computer are really dogshit even at the 32K bar as of now, unfortunately.)
>>
Fess up /lmg/
This has to be one of you
boards.4chan.org/pol/thread/528408065#p528410693
>>
>>108139029
>doesn't know how to link to posts on other boards
newfag
>>
>>108139029
nani?!
>>
>>108137775
>>108137807
Did I hear happenings? Is the /wait/ over?
>>108133070
Inspired
>>
File: dipsyBowlingAlleyStandoff.png (2.39 MB, 1536x1024)
2.39 MB
2.39 MB PNG
>>108139084
>>108133070
>>
File: IMG_2220.jpg (107 KB, 1000x542)
107 KB
107 KB JPG
>>108139029
What in the actual fuck
>>
>>108137738
This song keeps reminding me of VOTV
>>
>>108139084
>no weights out
>no cock bench results
>no post saying it is the new GLM 4.6
Nobody knows yet.
>>
File: thing_miku.jpg (993 KB, 1664x2048)
993 KB
993 KB JPG
>>108139084
ty anon that was my gen
>>
https://huggingface.co/ubergarm/MiniMax-M2.5-GGUF

Should I download or is it a lost cause for sex?
>>
>>108139103
This is typical for DS to do these launches sort of haphazard. If the WebApp is seeing an official update (announced, not speculation) the API won't be far behind. Chinese New Year is Feb 17 so my money's on a new DS model launch to coincide w/ CNY.
>>
>>108139126
I just test MiniMax 2.5 with novel writing.
It complete fail simple prompt.
I ask for 1 plot file input into 5 chapter
give me 1 chapter at a time.

than it give me 5 complete chapter in one go. so all chapter it short because it had to compress all the text in one return.

than i told it that i want 1 long detail chapter at a time.
after that it give me 1 fucking long chapter that complete all the plot 1 one chapter.

again i told it i want 5 chapter for the plot....than it go back to same shit 5 short chapter at a time.

I just give up.

I never see any llm so fail in this simple task before.
even back in llama 3 early day.

and this model complete mess up all the plot input file i give it cant follow the plot i detail give at all.

That is just my test maybe it really not make for novel or role play at all maybe is just godlike at coding who know.
>>
>>108139299
learn too english eslbro
>>
>>108139319
But I just test with simple prompt i cant recall the last model that fail this simple task.
I can understand if it write bad or boring plot but...it should not fail this simple think again and again. Or maybe something wrong maybe i wrong but i play with they old version it just do this test fine but it just not good plot like other model(GML,KIMI) that all.
>>
Is there anything between GLM-4.7 I2Q-KS and GLM-4.7 Flash? I'm really happy with the latter, other than the fact that it's painstakingly slow (5tk/s) on my hardware (128gb ddr4 + 4090).

Flash is plenty fast, but a bit retarded. Are there any models that lie in between? Being NSFW/coaxable with a system prompt is a must.
>>
File: 1749374423358693.jpg (386 KB, 1284x1284)
386 KB
386 KB JPG
I like my models small and groomable
>>
>>108139338
Oh, god. Stop it.
>>
>>108139367
>GLM-4.7 I2Q-KS and GLM-4.7 Flash? I'm really happy with the latter
these are the people talking about models itt
>>
>>108139389
I'm also retarded and meant former...
>>
>>108139029
>Max-Q
Also not buying HBM at that price point seems retarded.
>>
>>108139367
moe wise you can try Qwen3 235B, step3, minimax 2.1 or give glm air a try... these among low quant glm 4.6 are prob the largest ones you can fit at that 124gb + 24 vram range
>>
>>108139509
Any specifically for coom writing? I've found GLM worth waiting for compared to any other.
>>
I'm interested in trying local models but I'm a bit overwhelmed by all these recent releases, how they behave at different quant levels, and whether it's better to buy Apple hardware vs nvidia or whatever else.
if you had a budget of ~$1k, what hardware would you buy and what would you run on it? Is it just a curiosity at that level or is it able to do meaningful agentic shit?
>>
File: 1709834240393.jpg (783 KB, 1152x1152)
783 KB
783 KB JPG
>>108139561
>>108139561
>>108139561
>>
File: 1719027805090877.jpg (62 KB, 631x720)
62 KB
62 KB JPG
>>
>>
>>108139029
>mikutopia
>9x6000 pros
>2x64core turin epycs
>2tb ram
sam is this you?
>>
>>108139384
based lecunny



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.