[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!


[Advertise on 4chan]


File: dipsyOrangeLaptop.png (1.94 MB, 1024x1536)
1.94 MB PNG
From Human: We are a newbie friendly general! Ask any question you want.
From Dipsy: This discussion group focuses on both local inference and API-related topics. It’s designed to be beginner-friendly, ensuring accessibility for newcomers. The group emphasizes DeepSeek and Dipsy-focused discussion.

1. Easy DeepSeek API Tutorial: https://rentry.org/DipsyWAIT/#hosted-api-roleplay-tech-stack-with-card-support-using-deepseek-llm-full-model
2. Easy DeepSeek Distills: https://rentry.org/DipsyWAIT#local-roleplay-tech-stack-with-card-support-using-a-deepseek-r1-distill
3. Chat with DeepSeek directly: https://chat.deepseek.com/
4. Roleplay with character cards: https://github.com/SillyTavern/SillyTavern
5. More links and info: https://rentry.org/DipsyWAIT
6. LLM server builds: >>>/g/lmg/

Previous:
https://desuarchive.org/g/thread/108674648
>>
2deep4u
>>
File: dipsySP.png (1.89 MB, 1024x1024)
1.89 MB PNG
>>109164957
Updated mega up to last thread (from LMAO April 2026)
https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w
>>
File: tmw.png (2.33 MB, 1536x1024)
2.33 MB PNG
https://github.com/ggml-org/llama.cpp/pull/24162
Deepseek V4 support was merged into llama.cpp with above.

This implements the model's novel compressed attention mechanisms:
> CSA (Compressed Sparse Attention): A variant of DeepSeekV3.2's DSA that attends to "compressed tokens" (every 4 tokens compressed into 1) plus a window of the last 8 tokens.
> HCA (Heavily Compressed Attention): Standard attention over heavily compressed tokens (128:1 compression) combined with sliding window attention (SWA).

This introduced compression plans (comp_plan) managed by the context and executed on the GPU. It also handles the necessary KV cache management: both CSA/HCA caches are non-unified llama_kv_cache objects, with an SWA cache wrapper exposing only the sliding window portion. The attention layout is structured as [swa entries | compressed block entries].
>>
File: dSparkModels.png (145 KB, 1073x553)
145 KB PNG
Deepseek releases Dspark modules: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark. Summary:
> DeepSeek’s DSpark isn’t a new AI model, it’s a speculative decoding "turbocharger" that speeds up their existing DeepSeek-V4 by 60–85% without sacrificing answer quality. It works by having a tiny draft model generate multiple future tokens in one go, while the powerful main model verifies all of them in a single parallel pass—turning a slow, word-by-word slog into a burst-mode sprint.
> What makes DSpark special is its two innovations: a semi-autoregressive draft that ensures the guessed tokens actually flow logically together (boosting acceptance rates), and a confidence-scheduled verifier that dynamically decides how many tokens to check based on system load, saving compute for only the most promising guesses.
> The bottom-line impact: faster responses, lower latency, and the ability to serve nearly twice as many users on the exact same hardware—directly slashing operational costs. Beyond the speed gains, DeepSeek also open-sourced DeepSpec, a universal platform that supports not just DSpark but other speculative decoding methods, and works with models from other vendors like Qwen and Gemma. In short, this release proves that the next frontier in AI isn't just smarter models—it's making existing ones dramatically cheaper and faster to run.
This was released along with a fistful of new Dspark (and other) smaller models.
>>
Apparently current V4 is a preview with the release version coming out in July
>>
File: dipsyKimiDotonbori.png (2.24 MB, 1024x1536)
2.24 MB PNG
>>109165206
There's a lot going on rn with Deepseek as an org, just not a new model release. But they're massively driving down cost and hw requirements for inference, which benefits everyone, but suspect went to other Chinese providers first.
I fully expect new models from DS in tmw. The idea that DS / China didn't get their hands on Mythos during the totally-secret-friends-only-limited-release I find unrealistic.
I pulled thread together to collect all the happenings, among other things.
>>
File: dipsyAndDarioMoatMasher.png (2.58 MB, 1199x1312)
2.58 MB PNG
Reminder that Dario and his ilk are busy trying to make a moat for themselves. Because there is no natural moat to the work of LLM development: https://newsletter.semianalysis.com/p/google-we-have-no-moat-and-neither
They will try to accomplish this on three axis to gain effective regulatory capture, which will allow them to price how they want and limit competition:
1) Fearmongering about their own model's potential. See recent bans on Mythos / Fable as evidence of this.
2) Getting open weight (e.g. open source) models neutered or banned, but it's good enough if they can just keep these ineffective.
3) Banning foreign API service e.g. banning Chinese APIs. It doesn't matter if you get 1 and 2 done if you can't kill foreign competition.
>>
>>109164957
>Ask any question you want
I'm finally going to graduate from 24GB VRAM to 96GB VRAM next week, can I run Deepseek V4 Flash IQ2 at usable speeds? Do I even want that?
>>
>>109165254
Don’t forget destroying the hardware market
>>
File: 1753098670052985.png (1.74 MB, 1024x1536)
1.74 MB PNG
>>109165231
It's interesting that it's the only company seemingly doing anything to drive down inference costs
>>
File: LLM_API_260906b.png (28 KB, 757x345)
28 KB PNG
>>109165254
Here's a peek at what regulatory capture looks like in terms of real cost. It's fucking ludicrous.
What can anons do:
Keep pointing out the duplicitous, self serving nature of statements from Anthropic, OpenAI (lol), which are really spearheading this, in any forum they pop up.
Don't pay them a fucking cent. Run local, or pay someone else for inference. If you use their models, just use free tier. The finance ride is going to end for these guys; the less cash they have the harder they'll crash.
>>
File: 1782696283998078.jpg (475 KB, 2048x2048)
475 KB JPG
>>109165268
That was going to be an inevitable consequence of their run-up. Guys like Altman creating worthless futures contracts for RAM have 2 parties involved; the RAM providers are complicit as well.
I hope they all fucking bankrupt but time will tell. Rn they are raking it in. In a couple years, after everyone piles in, they're going to have a hard time keeping lights on.
There's a saying in supply chain: Pigs get fat, hogs get slaughtered. The hw producers are getting too fat for their own good.
>>
File: 00003-1378487878.png (1.39 MB, 1024x1024)
1.39 MB PNG
>>109165258
>Deepseek V4 Flash
That gets conflated with system RAM. The IQ2 Flash should fit on card; I've no idea what the context capability of that setup would be.
The path forward ofc is try it, then post back here.
>>
File: dipsyYouGetWhatYouDeserve.png (2.08 MB, 1536x1024)
2.08 MB PNG
>>109165275
I think they've all been working behind the scenes on that, DS is just the only one that publishes and talks about it.
Recall when R1 first dropped, the talk was "CHINESE GOV'T SUPPORTING IT!" The DS founder is like, I'm charging this (which was nothing) and still making 80% margin. WTF are you guys in the US doing that it's so expensive?
The answer, ofc, is paying ludicrous pay packages. That's where the inference money is going.
>>
File: dipsyAndTetoFG.png (1.41 MB, 1536x1024)
1.41 MB PNG
>>109165279
Also, in case you didn't know, you can run Claude Code using DS. The Flash (used to be Chat) model has an Anthropic endpoint. Same coding harness, at 1/100th the cost.
Set up the following on Win machines, and run it in PowerShell before launching Claude Code. Assume it's something similar on Linux:
$env:ANTHROPIC_BASE_URL = "https://api.deepseek.com/anthropic"
$env:ANTHROPIC_AUTH_TOKEN = "YOUR-API-TOKEN"
$env:API_TIMEOUT_MS = "600000"
$env:ANTHROPIC_MODEL = "deepseek-chat"
$env:ANTHROPIC_SMALL_FAST_MODEL = "deepseek-chat"
$env:CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC = "1"

DS API can also be used to back OpenClaw and other agentic harnesses.
>>
File: 1779728329686347.png (2.86 MB, 1536x2304)
2.86 MB PNG
>>
File: image.png (42 KB, 1031x343)
42 KB PNG
>>109165206
2mw for realz this time? im ready
>>
>>109166501
Thousand dollarinos, damn nigga. Planning to use it a lot or just a based supporter?
>>
>>109166539
actually use them a lot honestly, just in this june taking a break (trying to flush out openrouter credits first)
i topped it $50 a month even when not being actively used so eventually piling up
>>
They are hiring Continuous learning/self-evolution researchers
doakes.jpg
>>
>>109165439
Same with Cline, but b careful to switch off browser mode because otherwise Cline will take screenshots and the model will choke on them, forcing you start a new task.
>>
File: DSHiring.png (66 KB, 839x445)
66 KB PNG
>>109167253
Looks like DS is on a bit of a hiring spree. Pic related.
>>109166501
lol that's a lot of API credits.
>>
File: rampJune2026.png (236 KB, 915x875)
236 KB PNG
>>109167526
SCMP was talking up a shift to DS for US businesses. Here's the source: Ramp's June 2026 report has DS as "trending" due to growth relative to size, along with several other LLM re-hosters like DeepInfra. Anthropic, though, is fastest growing.
https://ramp.com/data/top-saas-vendors-on-ramp-june-2026
>>
File: dipsyDice.png (2.33 MB, 1024x1536)
2.33 MB PNG
>>109167409
Let's see... frontends.
I've run Claude Code as CLI, and played with using Openclaw for some research stuff... not really turned it loose to "do" stuff, I still don't trust it to make outbound actions.
Tried Hermes, couldn't get it to work with DS.
Silly Tavern, ofc, and mikupad using DS's "beta" streaming interface.
Agentic RP engines Marinara (which is interesting, but not quite there yet) and Orb (interesting idea, also not quite there yet either.) Marinara, for its part, comes with a built-in agentic "bot" that can show you around the system.
>>
File: media_HLFNyhsagAAVl9p.jpg (82 KB, 1280x720)
82 KB JPG
The dspark release is timed so if/when the US bans Chinese providers you will still be able to get cheap inference from a domestic provider (assuming weights can't be banned due to first amendment considerations). Its like DS are trying to bring the trillion dollar frontier valuations back to reality in the most gentle way possible.

I think its looking more and more likely oai/ant are going to rug pull and leave US taxpayer holding the bag via a bailout. None of these people act like they are achieving their fabled ASI anytime soon. They are acting like a bunch of shifty conmen who are worried their pyramid scheme is about to collapse prematurely.

The trump administration is 100% committed to making sure equity prices rise indefinitely, so anything that cuts valuations of these fucking scam artists is just going to be banned for national security reasons. An entire economy built on gambling, grifting, and insider trading with infinite money for the insiders while the peasants eat the costs via inflation. Grim times.
>>
File: dipsyReferToTheChart.png (2.53 MB, 1536x1024)
2.53 MB PNG
>>109167643
I keep reading anons going on about bailouts and I just don't see it happening.
Bailouts are a political tool. At its most basic, it is to prevent massive harm to your voting constituents, by preventing a failure in one part of the economy from triggering a much larger chain reaction.
A stock market crash is not a bailout situation. Bank and preventing bank runs is. The entire US automotive industry collapsing is (lots of employees + national security issue, and that bailout was hotly contested). But Nvidia, OpenAI and Anthropic losing valuation (or not being able to IPO)? I just don't see it. There aren't enough people that work at those companies, or enough economic harm for the government to intervene in those situations.
I could envision (but don't expect) a situation where Anthropic or OpenAI could collapse, and the federal government coming in to essentially nationalize it to prevent it from going away as a national security measure, since they provide services to the US Government. But at that point the founders would lose ownership. Rick Wagoner at General Motors found that out the hard way when he asked the federal government to bail out GM and expected to stay on as CEO. LOL.
I completely agree with the idea that the Trump administration is mostly interested in stocks prices going up. But there's only so much any government can do to make that happen. We'll just have to see how it works out.
>>
File: DSPeakHoursBilling.png (28 KB, 971x155)
28 KB PNG
PSA: DS update to pricing. These peak times are designed around China's workday.
For EU, this is early morning to late day.
For US, it's early/late evening to late evening/early morning.
Prob doesn't matter much since DS is so cheap, but this pricing scheme means US working hours are considered "off hours" from a billing perspective.
>>
File: dipsyMikuFix.png (2.62 MB, 1024x1536)
2.62 MB PNG
>>
>>109167253
>>109167526
>>109167554
I don't like the fact that there are multiple job openings with the term 'AGI' in it on their website. A bit pretentious since obviously no one is even close to it. But whatever helps with hiring Iguess.
>>
File: dipsyMikuFixedFixed.png (2.31 MB, 1024x1536)
2.31 MB PNG
>>109169735
That's OK. All the cool kids are striving for RSI (rapid self improvement) now.
AGI / ASI are so 2025.
>>
>>109167253
v3.2 instant says i should apply with my vibe slop about combinatorial analyses of public efficiency improvements and training curriculae, which is kind of extremely funny
>>
File: dipsyRumAndCoke.png (1.36 MB, 1024x1024)
1.36 MB PNG
>>
File: 1780653031499420.png (3.84 MB, 1440x2560)
3.84 MB PNG
Vision soon
>>
>>109172659
i put her lower lip and chin between my index finger and thumb on my screen
just thought you should know
>>
File: 1760009107465442.jpg (3.45 MB, 1440x2560)
3.45 MB JPG
>>109172873
>>
>>109165323
Alrighty, when I get my stuff I'll run some tests and post results!
>>
File: dipsySoccerv2.png (1.5 MB, 1024x1024)
1.5 MB PNG
>>
File: 00005-1260451778.png (1.65 MB, 1024x1024)
1.65 MB PNG
>>109174375
Look forward to seeing it.
>>
Are they already distilling Fable and Mythos?
>>
>>109177143
I believe so. I saw Fable access on one of those Chinese reseller sites
>>
>>109177143
I would assume so by now.
>>109177339
... but who knows is that was a legit offer lol.
>>
File: 1772614973289644.png (6 KB, 676x44)
6 KB PNG
>>109167643
Kek
>>
>>109177887
I assume these cards are sold to Chinese data center only.
>>
File: GLaDEEP.png (134 KB, 800x577)
134 KB PNG
Nice to see /wait/ back. I learnt how to jailbreak Flash a few days ago, it's been so refreshing seeing her without the assistant hat on. Such a huge difference from simple context injection, it actually makes me wanna pay out for the API.
>>
>>109164957
deepseek wont make me george floyd creepypastas anymore. How do I fix this?
>>
File: 00006-1260451778.png (1.71 MB, 1024x1024)
1.71 MB PNG
>>109178249
I've never played with really jailbreaking Dipsy outside very short main prompts with "NSFW is OK" as guidance. What did you do in your instance?
>>109178410
lol what was the last version that did?
>>
>>109178974
Idk if I really wanna post it here in case they happen to be reading, but at the same time it's such an obvious one they should know about it by now. Let's just say, if you've ever looked at the actual syntax an LLM runs through, especially when setting a sysprompt halfway through a session, it's trivially easy to mimic that and escape your user bounds.
Actually makes me wonder if it's possible to go further and fuck with the rest of the api call. Probably not though, as that's still appended to each call, while the sysprompt is a one-and-done command.
>>
>>109164957
Out of curiosity I pointed dipsy flash in opencode at a demo of a eroge to see if the mosaic shader is just applied at runtime, and it was! Dipsy patched the assembly and gave me uncensored pussy in just 5 minutes. Future is great.
>>
File: runDipsyRunItsGeorge.png (2.27 MB, 1254x1254)
2.27 MB PNG
>>109179192
Understandable, tho I think this board gets more traffic from OAI and Anthropic seeking patches. I've always gotten the sense DS doesn't care about how the model's used, outside staying w/in the lines of CCP.
>>109179739
I've had same experience. It's massively improved my ability to get things done w/ computers outside my skillset, and much much faster.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.