RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
-
Updated
Jul 1, 2025 - Python
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
A Repo For Document AI
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
A curated list of resources for Document Understanding (DU) topic
Parsing-free RAG supported by VLMs
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud
A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating.
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.
ReadingBank: A Benchmark Dataset for Reading Order Detection
Object Detection Model for Scanned Documents
Checkbox Detection Model for Scanned Documents
Datasets and Evaluation Scripts for CompHRDoc
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning
Add a description, image, and links to the document-understanding topic page so that developers can more easily learn about it.
To associate your repository with the document-understanding topic, visit your repo's landing page and select "manage topics."