New Breakthrough in Text Compression: DeepSeek Launches Groundbreaking Multimodal AI New Breakthrough in Text Compression: DeepSeek Launches Groundbreaking Multimodal AI

The Chinese AI startup DeepSeek has unveiled a new multimodal AI capable of handling large and complex documents while utilizing significantly fewer tokens.

DeepSeek-OCR employs visual perception as a method of information compression.

This system is the outcome of research into the role of visual encoders in compressing text within large language models (LLMs). By employing this approach, neural networks can manage vast amounts of information without a corresponding increase in computational costs.

«With DeepSeek-OCR, we have demonstrated that text compression through visual representations can reduce the number of tokens by 7 to 20 times at various contextual stages. This opens up a promising avenue for addressing the long-context problem in LLMs,» the company stated.

DeepSeek-OCR comprises two main components:

The first serves as the core computational unit of the model. It maintains low activity during the processing of high-resolution images while achieving significant compression levels, thereby reducing the number of tokens.

The decoder is a Mixture-of-Experts model with 570 million parameters, responsible for reconstructing the original text. The architecture divides the neural network into several independent subnetworks, or «experts,» each specializing in a specific type of data or task. Together, they collaborate to solve the overall problem.

DeepSeek-OCR has the capability to analyze complexly structured visual content, tables, formulas, and geometric diagrams. According to the company, this makes the model particularly advantageous for applications in finance and scientific research.

It noted that DeepSeek-OCR achieved a 97% decoding accuracy. With a compression ratio of 20x, the model retained around 60%. This highlights its ability to preserve information even at extreme levels of compression.

On OmniDocBench—a benchmark test for assessing the understanding of diverse documents—DeepSeek-OCR outperformed leading optical character recognition models like GOT-OCR 2.0 and MinerU 2.0, all while using significantly fewer tokens.

It is worth noting that in August, the startup updated its flagship AI model to version 3.1.