A Comprehensive Analysis of LayoutLM and Donut for Document Classification
Date Issued
2023-07
Author(s)
Bajrami, Merxhan
Abstract
Document classification is important in everyday
life as it allows for efficient management and organization of
vast amounts of digital documents, saving time and resources.
This task is essential for businesses, organizations, and individ uals who handle large volumes of data and need to quickly
retrieve and analyze specific information. AI-based document
classification can help organizations better manage and organize
their digital assets, improve information retrieval, and make
better business decisions based on the insights derived from the
classified documents. This paper compares the performance of
two transformer-based models, LayoutLM and Donut, for image
classification tasks on two different datasets. LayoutLM was
trained using pre-trained weights from Microsoft, while Donut
used pre-trained weights from Huggingface. Both models were
fine-tuned for 100 epochs with early stopping technique, using
the Adam optimizer and Cross Entropy Loss. Our results show
that LayoutLM performs better than Donut on the first dataset,
achieving an overall accuracy of 0.88, while Donut achieved an
accuracy of 0.74. Our study demonstrates the importance of
carefully selecting and evaluating different models for document
classification tasks, based on the specific char- acteristics of
the dataset and the task requirements. Additionally, we provide
insights into the strengths and weaknesses of both LayoutLM and
Donut models for document classification on different datasets.
life as it allows for efficient management and organization of
vast amounts of digital documents, saving time and resources.
This task is essential for businesses, organizations, and individ uals who handle large volumes of data and need to quickly
retrieve and analyze specific information. AI-based document
classification can help organizations better manage and organize
their digital assets, improve information retrieval, and make
better business decisions based on the insights derived from the
classified documents. This paper compares the performance of
two transformer-based models, LayoutLM and Donut, for image
classification tasks on two different datasets. LayoutLM was
trained using pre-trained weights from Microsoft, while Donut
used pre-trained weights from Huggingface. Both models were
fine-tuned for 100 epochs with early stopping technique, using
the Adam optimizer and Cross Entropy Loss. Our results show
that LayoutLM performs better than Donut on the first dataset,
achieving an overall accuracy of 0.88, while Donut achieved an
accuracy of 0.74. Our study demonstrates the importance of
carefully selecting and evaluating different models for document
classification tasks, based on the specific char- acteristics of
the dataset and the task requirements. Additionally, we provide
insights into the strengths and weaknesses of both LayoutLM and
Donut models for document classification on different datasets.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
CIIT2023_paper_22.pdf
Size
8.97 MB
Format
Adobe PDF
Checksum
(MD5):48007ac2f93145184ce8628742cf13a9
