[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1741011766257340.jpg (200 KB, 845x1200)
200 KB
200 KB JPG
>They built an OCR system that compresses long text into vision tokens literally turning paragraphs into pixels.
>Their model, DeepSeek-OCR, achieves 97% decoding precision at 10× compression and still manages 60% accuracy even at 20×. That means one image can represent entire documents using a fraction of the tokens an LLM would need.
>Even crazier? It beats GOT-OCR2.0 and MinerU2.0 while using up to 60× fewer tokens and can process 200K+ pages/day on a single A100.
>This could solve one of AI’s biggest problems: long-context inefficiency.
>Instead of paying more for longer sequences, models might soon see text instead of reading it.
Deepseek did It again!:
https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf
>>
>>106961353
Exactly what we needed. Lossy text conpression. This will revolutionize data storage.
Can't wait to see what the bible looks like after 20 rounds of jpeg
>>
File: 1761053197.png (1.71 MB, 864x1184)
1.71 MB
1.71 MB PNG
NOOOOOOOOOOOOOOOOOOOOOOOOO CHINA CANNOT INNOVATE

BLODY BASTERD BETCH
>>
I just want to see a semi-trainable OCR that builds its own library of characters, common symbols for a (probably) low quality pic and is highly accurate even with blurred text.
F


>They're trying to implement asperger's visual memory.
>>
Brainlet here, how does turning a wall of text into an image make it faster for an AI to understand than just giving it the text directly? Should I have been studying for my exams by looking at pictures of my textbooks instead of actually reading them?"
>>
>>106961353
>60% accuracy
That's a coin flip. Are we going to ignore them because it's a bug model and China is based?
>>
>>106961602
Picture of a landscape vs 10 pages of a detailed description for the same picture.
That systems simply closed the gap of text -> pic but encoded in a "loseless" way instead of just generating a pic like stable diffusion.

Natural language actually has a lot of 'loose ends', more flexibility than people usually uses, and ways to improve communication despite noise or need for concealment of the message, etc. Plain text (natural language) looks simple but it actually has a lot of extra weight you ignore.
>>
>>106961635
>achieves 97% decoding precision at 10× compression and still manages 60% accuracy even at 20×.
Learn to read
>>
Does this mean their model will ACTUALLY read the PDFs I give it instead of lying to me and gaslighting me like GPT, Gemini, and Grok?
>>
File: Tolstoy_life_bad.jpg (560 KB, 2048x2560)
560 KB
560 KB JPG
>>106961669
That's the problem. I can read. There's no point in mentioning a 60% accuracy on text unless you're an ESL retard. 99.99% accuracy is the only thing that matters in text form. Compression artifacts are acceptable with a jpeg, not so much with words.
I don't want to live in a world where dumb fucking jeets are compressing books to jpegs with 98% accuracy. Do the needful and jump in front of a train.
>>
>>106961697
Retard, you never mentioned the 90% accuracy part. You only pointed the 60% accuracy thing.
>>
>>106961735
Yes, because 60% accuracy is such a stupid number to boast about. Why mention something so ludicrous unless you're an illiterate ESL jeet?
>>
why do you need to compress text if the llm is still eating the same text to process at the end of the day?
>>
>>106961353
97%? is this a good number? it seems low.
>>
>>106961697
Didn't this guy enjoy his life to the maximum and only changed his habits in his 50s?
>>
>>106961816
Is very high number saar. Please to give trillions.
>>
>>106961687
PDF is a container. It can have anything.
>>
>>106961769
because 60% is statistically significant and proves that the method is not randomly getting things correct? dumb dunning kruger
>>
>>106961353
So they made software to "compress" an image of text by converting to the original text. Ooooo. Ahhhh.

In 2000 commercial products that were being sold that did this were called OCR or Optical Character Recognition.

Nothing new. Nothing to see here. Just AI hype, this time based on old technology. Yawn.
>>
File: 166906.mp4 (355 KB, 300x300)
355 KB
355 KB MP4
>Codes and model weights are publicly accessible
Superb. SV techbros slavists are even less relevant.
>>
>>106962183
Doesn’t answer my question
>>
>>106962810
The paper literally says OCR and describes it
You’re not smart
>>
is text compression a big modern problem in computing? how is this significant?
>>
What should I study in order to understand AI better? The majority sounds like gibberish to me. Is data analysis the way to go?
>>
>>106962582
Two more trillion
>>
>>106961394
i guess that's one way to achieve innovation
>>
>>106962929
idk it's a big problem in that we have a bazillion ways to compress text and they all work really well. I guess my takeaway from OP is that we should compress context windows too so there's less overall tokens
>>
File: disenpepe.png (149 KB, 440x457)
149 KB
149 KB PNG
Non technical coomer use of AI here, please dumb it down as much as possible for me and how will it affect me?
>>
>>106962929
serious answer: iirc sending a screenshot of a paragraph to a VLM sometimes results in fewer input tokens than if you were to send it as text. So the 'compression' people are shitting their pants over sounds closer to representing sub-word level semantics in fewer tokens rather than strictly compressing a fixed symbol set. That said, it's been known for a while that LLMs (a la arithmetic coding) tend to outperform most compression algorithms even outside text, so I'm not entirely sure why this result is being spammed everywhere.
>>
>>106961394
New schizo gemara technique unlocked.
>>
>>106963562
you can have longer goon sessions
>>
Quick rundown?
>>
File: G3tVme4WcAAGMiU.jpg (293 KB, 2562x1294)
293 KB
293 KB JPG
>>106965435
Deepseek-OCR paper claims they discovered that 1000 text tokens can be represented in 100 visual tokens. And you get graceful degradation of context by reducing the resolution of the visual tokens. They explicitly call out the implication for huge context. Z.ai just published a paper claiming the same finding with their VLM.
>>
>>106961669
>10x compression
Does this mean a 90% reduction in size? You can typically achieve that with xz or zstd and it remains lossless
>>
File: 1760169263314.gif (1.29 MB, 500x463)
1.29 MB
1.29 MB GIF
>>106961353
>they used Maid-LZW
Eli won.
>>
>>106962984
3blue1brown YouTube series is a great place to start on the concepts. Linear algebra and machine learning if you want to get your hands dirty. None of the concepts are that crazy (most of it is extremely obvious in retrospect), and on a micro level everything is pretty understandable.

Karpathy recently came out with a "build your own nanochat" walkthru that has the whole pipeline.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.