DeepSeek OCR

https://news.ycombinator.com/rss Hits: 29

Summary

📥 Model Download | 📄 Paper Link | 📄 Arxiv Paper Link | Explore the boundaries of visual-text compression. Release [2025/x/x]🚀🚀🚀 We release DeepSeek-OCR, a model to investigate the role of vision encoders from an LLM-centric viewpoint. Contents Install Our environment is cuda11.8+torch2.6.0. Clone this repository and navigate to the DeepSeek-OCR folder git clone https://github.com/deepseek-ai/DeepSeek-OCR.git Conda conda create -n deepseek-ocr python=3.12.9 -y conda activate deepseek-ocr Packages download the vllm-0.8.5 whl pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118 pip install vllm-0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl pip install -r requirements.txt pip install flash-attn==2.7.3 --no-build-isolation Note: if you want vLLM and transformers codes to run in the same environment, you don't need to worry about this installation error like: vllm 0.8.5+cu118 requires transformers>=4.51.1 VLLM: Note: change the INPUT_PATH/OUTPUT_PATH and other settings in the DeepSeek-OCR-master/DeepSeek-OCR-vllm/config.py cd DeepSeek-OCR-master/DeepSeek-OCR-vllm image: streaming output python run_dpsk_ocr_image.py pdf: concurrency ~2500tokens/s(an A100-40G) python run_dpsk_ocr_pdf.py batch eval for benchmarks python run_dpsk_ocr_eval_batch.py Transformers from transformers import AutoModel , AutoTokenizer import torch import os os . environ [ "CUDA_VISIBLE_DEVICES" ] = '0' model_name = 'deepseek-ai/DeepSeek-OCR' tokenizer = AutoTokenizer . from_pretrained ( model_name , trust_remote_code = True ) model = AutoModel . from_pretrained ( model_name , _attn_implementation = 'flash_attention_2' , trust_remote_code = True , use_safetensors = True ) model = model . eval (). cuda (). to ( torch . bfloat16 ) # prompt = "<image> Free OCR. " prompt = "<image> <|grounding|>Convert the document to markdown. " image_file = 'your_image.jpg' output_path = 'your/output/dir' res = model . infer ( tokenizer , prompt = prompt , image...

First seen: 2025-10-20 07:04

Last seen: 2025-10-21 12:09

Read Full Article More from this Source

DeepSeek OCR

Summary

Related News

Wren: A classy little scripting language

The bug that taught me more about PyTorch than years of using it

10k Downloadable Movie Posters From The 40s, 50s, 60s, and 70s

Validating Your Ideas on Strangers

Myanmar military shuts down a major cybercrime center, detains over 2k people