TY - JOUR
T1 - End-to-end table structure recognition and extraction in heterogeneous documents
AU - Kashinath, Tejas
AU - Jain, Twisha
AU - Agrawal, Yash
AU - Anand, Tanvi
AU - Singh, Sanjay
N1 - Funding Information:
We would like to thank the anonymous reviewers whose insightful comments and suggestions have significantly improved the quality of this paper. All authors read and approved the final manuscript.
Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/7
Y1 - 2022/7
N2 - Automatically detecting and parsing tables into an indexable and searchable format is an important problem in document digitization. It relates to computer vision, machine learning, and optical character recognition. This paper presents a simple model based on a deep neural network architecture that combines recent advances in computer vision and machine learning, which can be used to detect and convert a table into a format that can be edited or searched. The motivation for this work is to develop a sound method to extract the vast data set of knowledge available in physical documents such that it can be used to develop data-driven tools that can be used to support decisions in fields such as healthcare and finance. The model uses a Yolo-based object detector trained to maximize the Intersection over Union of the detected table regions within the document image and a novel OCR-based algorithm to parse the table from each table detected in the document.Past works have all focused on documents and images containing a level and even tables. This paper aims to present our findings after the model is run on a set of skewed image datasets. Experiments on the Marmot and Publaynet datasets show that the proposed method is entirely accurate and can generalize different tables formats. At an Intersection over the Union threshold of 50%, we achieve a mean Average Precision (mAP) of 98% and an average IoU of 88.81% on the PubLayNet dataset. With the same IoU threshold, we achieve an mAP of 95.07% and an average IoU of 75.57% on the Marmot dataset.
AB - Automatically detecting and parsing tables into an indexable and searchable format is an important problem in document digitization. It relates to computer vision, machine learning, and optical character recognition. This paper presents a simple model based on a deep neural network architecture that combines recent advances in computer vision and machine learning, which can be used to detect and convert a table into a format that can be edited or searched. The motivation for this work is to develop a sound method to extract the vast data set of knowledge available in physical documents such that it can be used to develop data-driven tools that can be used to support decisions in fields such as healthcare and finance. The model uses a Yolo-based object detector trained to maximize the Intersection over Union of the detected table regions within the document image and a novel OCR-based algorithm to parse the table from each table detected in the document.Past works have all focused on documents and images containing a level and even tables. This paper aims to present our findings after the model is run on a set of skewed image datasets. Experiments on the Marmot and Publaynet datasets show that the proposed method is entirely accurate and can generalize different tables formats. At an Intersection over the Union threshold of 50%, we achieve a mean Average Precision (mAP) of 98% and an average IoU of 88.81% on the PubLayNet dataset. With the same IoU threshold, we achieve an mAP of 95.07% and an average IoU of 75.57% on the Marmot dataset.
UR - http://www.scopus.com/inward/record.url?scp=85130412524&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85130412524&partnerID=8YFLogxK
U2 - 10.1016/j.asoc.2022.108942
DO - 10.1016/j.asoc.2022.108942
M3 - Article
AN - SCOPUS:85130412524
SN - 1568-4946
VL - 123
JO - Applied Soft Computing
JF - Applied Soft Computing
M1 - 108942
ER -