참고문헌
- Integromat. 8 Easy Ways to Automate your Invoices (and Save Hours of Your Time) [Internet]. Integromat Blog. [cited 2022 Jun 28]. Available from: https://www.integromat.com/en/blog/invoice-automation
- Patel S, Bhatt D. Abstractive information extraction from scanned invoices (AIESI) using end-to-end sequential approach. arXiv preprint arXiv:2009.05728. 2020 Sep 12.
- Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991. 2015 Aug 9.
- Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360. 2016 Mar 4.
- Baviskar D, Ahirrao S, Kotecha K. A bibliometric survey on cognitive document processing. Library Philosophy and Practice. 2020 Oct 1:1-31.
- Adnan K, Akbar R. Limitations of information extraction methods and techniques for heterogeneous unstructured big data. International Journal of Engineering Business Management. 2019 Dec 9;11:1847979019890771.
- Adnan K, Akbar R. An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data. 2019 Dec;6(1):1-38. https://doi.org/10.1186/s40537-018-0162-3
- Baviskar D, Ahirrao S, Kotecha K. Multi-Layout Invoice Document Dataset (MIDD): A Dataset for Named Entity Recognition. Data. 2021 Jul 20;6(7):78. https://doi.org/10.3390/data6070078
- Palm RB, Laws F, Winther O. Attend, copy, parse end-toend information extraction from documents. In2019 International Conference on Document Analysis and Recognition (ICDAR) 2019 Sep 20 (pp. 329-336). IEEE.
- Reul C, Christ D, Hartelt A, Balbach N, Wehner M, Springmann U, Wick C, Grundig C, Buttner A, Puppe F. OCR4all-An open-source tool providing a (semi-) automatic OCR workflow for historical printings. Applied Sciences. 2019 Nov 13;9(22):4853. https://doi.org/10.3390/app9224853
- Abbas A, Afzal M, Hussain J, Lee S. Meaningful information extraction from unstructured clinical documents. Proc. Asia Pac. Adv. Netw. 2019 Oct;48:42-7.
- Steinkamp JM, Bala W, Sharma A, Kantrowitz JJ. Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes. Journal of biomedical informatics. 2020 Feb 1;102:103354.
- Joshi S, Shah P, Pandey AK. Location identification, extraction and disambiguation using machine learning in legal contracts. In2018 4th International Conference on Computing Communication and Automation (ICCCA) 2018 Dec 14 (pp. 1-5). IEEE.
- Shah P, Joshi S, Pandey AK. Legal clause extraction from contract using machine learning with heuristics improvement. In2018 4th International Conference on Computing Communication and Automation (ICCCA) 2018 Dec 14 (pp. 1-3). IEEE.
- Tkaczyk D, Szostek P, Bolikowski L. GROTOAP2-the methodology of creating a large ground truth dataset of scientific articles. D-Lib Magazine. 2014 Nov;20(11/12).
- Yang J, Liu Y, Qian M, Guan C, Yuan X. Information extraction from electronic medical records using multitask recurrent neural network with contextual word embedding. Applied Sciences. 2019 Sep 4;9(18):3658. https://doi.org/10.3390/app9183658
- Eberendu AC. Unstructured Data: an overview of the data of Big Data. International Journal of Computer Trends and Technology. 2016 Aug;38(1):46-50. https://doi.org/10.14445/22312803/IJCTT-V38P109
- Davis B, Morse B, Cohen S, Price B, Tensmeyer C. Deep visual template-free form parsing. In2019 International Conference on Document Analysis and Recognition (ICDAR) 2019 Sep 20 (pp. 134-141). IEEE.
- Zhao X, Niu E, Wu Z, Wang X. Cutie: Learning to understand documents with convolutional universal text information extractor. arXiv preprint arXiv:1903.12363. 2019 Mar 29.
- Smith R. An overview of the Tesseract OCR engine. InNinth international conference on document analysis and recognition (ICDAR 2007) 2007 Sep 23 (Vol. 2, pp. 629-633). IEEE.
- Smith R. Tesseract ocr engine. Lecture. Google Code. Google Inc. 2007 Jul.
- Palm RB, Winther O, Laws F. Cloudscan-a configurationfree invoice analysis system using recurrent neural networks. In2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 2017 Nov 9 (Vol. 1, pp. 406-413). IEEE.
- Katti AR, Reisswig C, Guder C, Brarda S, Bickel S, Hohne J, Faddoul JB. Chargrid: Towards understanding 2d documents. arXiv preprint arXiv:1809.08799. 2018 Sep 24.
- Krieger F, Drews P, Funk B, Wobbe T. Information extraction from invoices: A graph neural network approach for datasets with high layout variety. InInternational Conference on Wirtschaftsinformatik 2021 Mar 9 (pp. 5-20). Springer, Cham.
- Liu W, Zhang Y, Wan B. Unstructured document recognition on business invoice. Mach. Learn., Stanford iTunes Univ., Stanford, CA, USA, Tech. Rep. 2016.
- Schaeffer MS. Essentials of accounts payable. John Wiley & Sons; 2002 Oct 15.
- Si Y, Wang J, Xu H, Roberts K. Enhancing clinical concept extraction with contextual embeddings. Journal of the American Medical Informatics Association. 2019 Nov 1;26(11):1297-304. https://doi.org/10.1093/jamia/ocz096
- Wang B, Wang A, Chen F, Wang Y, Kuo CC. Evaluating word embedding models: methods and experimental results. APSIPA transactions on signal and information processing. 2019;8.
- Li Y, Liu T, Li D, Li Q, Shi J, Wang Y. Character-based bilstm-crf incorporating pos and dictionaries for chinese opinion target extraction. InAsian Conference on Machine Learning 2018 Nov 4 (pp. 518-533). PMLR.
- Majumder BP, Potti N, Tata S, Wendt JB, Zhao Q, Najork M. Representation learning for information extraction from form-like documents. Inproceedings of the 58th annual meeting of the Association for Computational Linguistics 2020 Jul (pp. 6495-6504).