DAMF: Data Augmentation for Multi-Format Complex Document Classification
Tuesday, June 20, 2023
Document AI is one of the Machine Learning fields that study about understanding and analyzing documents like PDFs, DOCXs, etc. But with the growth of complex documents and limited data due to data privacy and security, industry ML practitioners are looking for a method that could augment data effectively. At Otrafy, we developed an augmented method that is a combination of CV and NLP where we augmented data based on both text and layout of the document. The study show that our works improve the performance of LayoutLM in both accuracy and runtime.