Diffusion-based Document Layout Generation
ICDAR 2023 Oral

Abstract

We develop a diffusion-based approach for various document layout sequence generation. Layout sequences specify the contents of a document design in an explicit format. Our novel diffusion-based approach works in the sequence domain rather than the image domain in order to permit more complex and realistic layouts. We also introduce a new metric, Document Earth Mover’s Distance (Doc-EMD). By considering similarity between heterogeneous categories document designs, we handle the shortcomings of prior document metrics that only evaluate the same category of layouts. Our empirical analysis shows that our diffusion-based approach is comparable to or outperforming other previous methods for layout generation across various document datasets. Moreover, our metric is capable of differentiating documents better than previous metrics for specific cases.

Method

we develop a new approach to layout generation using the recently emerging area of diffusion probabilistic models. The key idea is that when a diffusion process consists of small steps of Gaussian noise conditioned on the data, then the reversing process can be approximated by a conditional Gaussian as well. To use the conventional diffusion methods for discrete sequence generation, rounding and embedding steps have to be introduced.


overview

Comparisons to SOTA on PublayNet

overview

Doc_EMD is robust to different document categories and bounding box sizes

overview

Synthetic document generation for OCR detection

overview

BibTeX


                @InProceedings{he2023diffusion,
                    title={Diffusion-Based Document Layout Generation},
                    author={He, Liu and Lu, Yijuan and Corring, John and Florencio, Dinei and Zhang, Cha},
                    booktitle={Document Analysis and Recognition - ICDAR 2023}, 
                    year={2023}, 
                    publisher={Springer Nature Switzerland}, 
                    pages={361--378},
                    doi={10.1007/978-3-031-41676-7_21},
                    isbn={978-3-031-41676-7}
                  }                
              

The website template was borrowed from Michaël Gharbi.