/g/ - >They built an OCR system that compresses long tex - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
10/21/25(Tue)09:22:46 No.106961353

File: 1741011766257340.jpg (200 KB, 845x1200)

Anonymous 10/21/25(Tue)09:22:46 No.106961353

>They built an OCR system that compresses long text into vision tokens literally turning paragraphs into pixels.
>Their model, DeepSeek-OCR, achieves 97% decoding precision at 10× compression and still manages 60% accuracy even at 20×. That means one image can represent entire documents using a fraction of the tokens an LLM would need.
>Even crazier? It beats GOT-OCR2.0 and MinerU2.0 while using up to 60× fewer tokens and can process 200K+ pages/day on a single A100.
>This could solve one of AI’s biggest problems: long-context inefficiency.
>Instead of paying more for longer sequences, models might soon see text instead of reading it.
Deepseek did It again!:
https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf

Anonymous
10/21/25(Tue)09:28:12 No.106961394

Anonymous 10/21/25(Tue)09:28:12 No.106961394

>>106961353
Exactly what we needed. Lossy text conpression. This will revolutionize data storage.
Can't wait to see what the bible looks like after 20 rounds of jpeg

Anonymous
10/21/25(Tue)09:30:42 No.106961408

Anonymous 10/21/25(Tue)09:30:42 No.106961408

File: 1761053197.png (1.71 MB, 864x1184)

1.71 MB PNG

NOOOOOOOOOOOOOOOOOOOOOOOOO CHINA CANNOT INNOVATE

BLODY BASTERD BETCH

Anonymous
10/21/25(Tue)09:55:16 No.106961592

Anonymous 10/21/25(Tue)09:55:16 No.106961592

I just want to see a semi-trainable OCR that builds its own library of characters, common symbols for a (probably) low quality pic and is highly accurate even with blurred text.
F

>They're trying to implement asperger's visual memory.

Anonymous
10/21/25(Tue)09:56:07 No.106961602

Anonymous 10/21/25(Tue)09:56:07 No.106961602

Brainlet here, how does turning a wall of text into an image make it faster for an AI to understand than just giving it the text directly? Should I have been studying for my exams by looking at pictures of my textbooks instead of actually reading them?"

Anonymous
10/21/25(Tue)10:00:54 No.106961635

Anonymous 10/21/25(Tue)10:00:54 No.106961635

>>106961353
>60% accuracy
That's a coin flip. Are we going to ignore them because it's a bug model and China is based?

Anonymous
10/21/25(Tue)10:02:50 No.106961641

Anonymous 10/21/25(Tue)10:02:50 No.106961641

>>106961602
Picture of a landscape vs 10 pages of a detailed description for the same picture.
That systems simply closed the gap of text -> pic but encoded in a "loseless" way instead of just generating a pic like stable diffusion.

Natural language actually has a lot of 'loose ends', more flexibility than people usually uses, and ways to improve communication despite noise or need for concealment of the message, etc. Plain text (natural language) looks simple but it actually has a lot of extra weight you ignore.

Anonymous
10/21/25(Tue)10:05:57 No.106961669

Anonymous 10/21/25(Tue)10:05:57 No.106961669

>>106961635
>achieves 97% decoding precision at 10× compression and still manages 60% accuracy even at 20×.
Learn to read

Anonymous
10/21/25(Tue)10:08:22 No.106961687

Anonymous 10/21/25(Tue)10:08:22 No.106961687

Does this mean their model will ACTUALLY read the PDFs I give it instead of lying to me and gaslighting me like GPT, Gemini, and Grok?

Anonymous
10/21/25(Tue)10:10:04 No.106961697

Anonymous 10/21/25(Tue)10:10:04 No.106961697

File: Tolstoy_life_bad.jpg (560 KB, 2048x2560)

560 KB JPG

>>106961669
That's the problem. I can read. There's no point in mentioning a 60% accuracy on text unless you're an ESL retard. 99.99% accuracy is the only thing that matters in text form. Compression artifacts are acceptable with a jpeg, not so much with words.
I don't want to live in a world where dumb fucking jeets are compressing books to jpegs with 98% accuracy. Do the needful and jump in front of a train.

Anonymous
10/21/25(Tue)10:13:36 No.106961735

Anonymous 10/21/25(Tue)10:13:36 No.106961735

>>106961697
Retard, you never mentioned the 90% accuracy part. You only pointed the 60% accuracy thing.

Anonymous
10/21/25(Tue)10:17:54 No.106961769

Anonymous 10/21/25(Tue)10:17:54 No.106961769

>>106961735
Yes, because 60% accuracy is such a stupid number to boast about. Why mention something so ludicrous unless you're an illiterate ESL jeet?

Anonymous
10/21/25(Tue)10:22:49 No.106961804

Anonymous 10/21/25(Tue)10:22:49 No.106961804

why do you need to compress text if the llm is still eating the same text to process at the end of the day?

Anonymous
10/21/25(Tue)10:23:59 No.106961816

Anonymous 10/21/25(Tue)10:23:59 No.106961816

>>106961353
97%? is this a good number? it seems low.

Anonymous
10/21/25(Tue)10:25:03 No.106961826

Anonymous 10/21/25(Tue)10:25:03 No.106961826

>>106961697
Didn't this guy enjoy his life to the maximum and only changed his habits in his 50s?

Anonymous
10/21/25(Tue)10:48:16 No.106962039

Anonymous 10/21/25(Tue)10:48:16 No.106962039

>>106961816
Is very high number saar. Please to give trillions.

Anonymous
10/21/25(Tue)11:05:52 No.106962183

Anonymous 10/21/25(Tue)11:05:52 No.106962183

>>106961687
PDF is a container. It can have anything.

Anonymous
10/21/25(Tue)11:48:00 No.106962582

Anonymous 10/21/25(Tue)11:48:00 No.106962582

>>106961769
because 60% is statistically significant and proves that the method is not randomly getting things correct? dumb dunning kruger

Anonymous
10/21/25(Tue)12:11:32 No.106962810

Anonymous 10/21/25(Tue)12:11:32 No.106962810

>>106961353
So they made software to "compress" an image of text by converting to the original text. Ooooo. Ahhhh.

In 2000 commercial products that were being sold that did this were called OCR or Optical Character Recognition.

Nothing new. Nothing to see here. Just AI hype, this time based on old technology. Yawn.

Anonymous
10/21/25(Tue)12:17:06 No.106962873

Anonymous 10/21/25(Tue)12:17:06 No.106962873

File: 166906.mp4 (355 KB, 300x300)

355 KB MP4

>Codes and model weights are publicly accessible
Superb. SV techbros slavists are even less relevant.

Anonymous
10/21/25(Tue)12:18:25 No.106962887

Anonymous 10/21/25(Tue)12:18:25 No.106962887

>>106962183
Doesn’t answer my question

Anonymous
10/21/25(Tue)12:19:31 No.106962904

Anonymous 10/21/25(Tue)12:19:31 No.106962904

>>106962810
The paper literally says OCR and describes it
You’re not smart

Anonymous
10/21/25(Tue)12:21:53 No.106962929

Anonymous 10/21/25(Tue)12:21:53 No.106962929

is text compression a big modern problem in computing? how is this significant?

Anonymous
10/21/25(Tue)12:26:24 No.106962984

Anonymous 10/21/25(Tue)12:26:24 No.106962984

What should I study in order to understand AI better? The majority sounds like gibberish to me. Is data analysis the way to go?

Anonymous
10/21/25(Tue)12:27:09 No.106962995

Anonymous 10/21/25(Tue)12:27:09 No.106962995

>>106962582
Two more trillion

Anonymous
10/21/25(Tue)12:29:07 No.106963017

Anonymous 10/21/25(Tue)12:29:07 No.106963017

>>106961394
i guess that's one way to achieve innovation

Anonymous
10/21/25(Tue)12:39:33 No.106963151

Anonymous 10/21/25(Tue)12:39:33 No.106963151

>>106962929
idk it's a big problem in that we have a bazillion ways to compress text and they all work really well. I guess my takeaway from OP is that we should compress context windows too so there's less overall tokens

Anonymous
10/21/25(Tue)13:24:28 No.106963562

Anonymous 10/21/25(Tue)13:24:28 No.106963562

File: disenpepe.png (149 KB, 440x457)

149 KB PNG

Non technical coomer use of AI here, please dumb it down as much as possible for me and how will it affect me?

Anonymous
10/21/25(Tue)13:27:02 No.106963585

Anonymous 10/21/25(Tue)13:27:02 No.106963585

>>106962929
serious answer: iirc sending a screenshot of a paragraph to a VLM sometimes results in fewer input tokens than if you were to send it as text. So the 'compression' people are shitting their pants over sounds closer to representing sub-word level semantics in fewer tokens rather than strictly compressing a fixed symbol set. That said, it's been known for a while that LLMs (a la arithmetic coding) tend to outperform most compression algorithms even outside text, so I'm not entirely sure why this result is being spammed everywhere.

Anonymous
10/21/25(Tue)13:52:29 No.106963818

Anonymous 10/21/25(Tue)13:52:29 No.106963818

>>106961394
New schizo gemara technique unlocked.

Anonymous
10/21/25(Tue)14:11:38 No.106964012

Anonymous 10/21/25(Tue)14:11:38 No.106964012

>>106963562
you can have longer goon sessions

Anonymous
10/21/25(Tue)16:12:12 No.106965435

Anonymous 10/21/25(Tue)16:12:12 No.106965435

Quick rundown?

Anonymous
10/21/25(Tue)17:11:58 No.106965990

Anonymous 10/21/25(Tue)17:11:58 No.106965990

File: G3tVme4WcAAGMiU.jpg (293 KB, 2562x1294)

293 KB JPG

>>106965435
Deepseek-OCR paper claims they discovered that 1000 text tokens can be represented in 100 visual tokens. And you get graceful degradation of context by reducing the resolution of the visual tokens. They explicitly call out the implication for huge context. Z.ai just published a paper claiming the same finding with their VLM.

Anonymous
10/21/25(Tue)17:16:55 No.106966035

Anonymous 10/21/25(Tue)17:16:55 No.106966035

>>106961669
>10x compression
Does this mean a 90% reduction in size? You can typically achieve that with xz or zstd and it remains lossless

Anonymous
10/21/25(Tue)17:39:22 No.106966253

Anonymous 10/21/25(Tue)17:39:22 No.106966253

File: 1760169263314.gif (1.29 MB, 500x463)

1.29 MB GIF

>>106961353
>they used Maid-LZW
Eli won.

Anonymous
10/21/25(Tue)18:33:05 No.106966763

Anonymous 10/21/25(Tue)18:33:05 No.106966763

>>106962984
3blue1brown YouTube series is a great place to start on the concepts. Linear algebra and machine learning if you want to get your hands dirty. None of the concepts are that crazy (most of it is extremely obvious in retrospect), and on a micro level everything is pretty understandable.

Karpathy recently came out with a "build your own nanochat" walkthru that has the whole pipeline.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.