• Title/Summary/Keyword: Document research

Search Result 1,345, Processing Time 0.031 seconds

Transformation of Text Contents of Engineering Documents into an XML Document by using a Technique of Document Structure Extraction (문서구조 추출기법을 이용한 엔지니어링 문서 텍스트 정보의 XML 변환)

  • Lee, Sang-Ho;Park, Junwon;Park, Sang Il;Kim, Bong-Geun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.31 no.6D
    • /
    • pp.849-856
    • /
    • 2011
  • This paper proposes a method for transforming unstructured text contents of engineering documents, which have complex hierarchical structure of subtitles with various heading symbols, into a semi-structured XML document according to the hierarchical subtitle structure. In order to extract the hierarchical structure from plain text information, this study employed a method of document structure extraction which is an analysis technique of the document structure. In addition, a method for processing enumerative text contents was developed to increase overall accuracy during extraction of the subtitles and construction of a hierarchical subtitle structure. An application module was developed based on the proposed method, and the performance of the module was evaluated with 40 test documents containing structural calculation records of bridges. The first test group of 20 documents related to the superstructure of steel girder bridges as applied in a previous study and they were used to verify the enhanced performance of the proposed method. The test results show that the new module guarantees an increase in accuracy and reliability in comparison with the test results of the previous study. The remaining 20 test documents were used to evaluate the applicability of the method. The final mean value of accuracy exceeded 99%, and the standard deviation was 1.52. The final results demonstrate that the proposed method can be applied to diverse heading symbols in various types of engineering documents to represent the hierarchical subtitle structure in a semi-structured XML document.

An Efficient Application of XML Schema Matching Technique to Structural Calculation Document of Bridge (XML 스키마 매칭 기법의 교량 구조계산서 적용 방안)

  • Park, Sang Il;Kim, Bong-Geun;Lee, Sang-Ho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.32 no.1D
    • /
    • pp.51-59
    • /
    • 2012
  • An efficient application method of XML schema matching technique to the document structure of structural calculation document (SCD) of bridge is proposed. With 30 case studies, a parametric study on weightings of name, sibling, child, and parent elements of XML scheme component that are used in the similarity measure of XML schema matching technique has been performed, and suitable weighting to analyze document structure of SCD is suggested. A simplified formula for quantification of similarity is also introduced to reduce computation time in huge scale document structure of SCDs. Numerical experiments show that the suggested method can increase the accuracy of XML schema matching by 10% with suitable weighting parameters, and can maintain almost the same accuracy without weighting parameters compared to previous studies. In addition, computation time can be reduced dramatically when the proposed simplified formula for the quantification of similarity is used. In the numerical experiments of testing 20 practical SCDs of bridges, the suggested method is superior to previous studies in the accuracy of analyzing document structure and 4 to 460 times faster than the previous results in computation time.

A Study on the Archival Authority Record Elements for Automatic Organization and Production (기록물 전거레코드 기술 요소의 자동생성에 관한 연구)

  • Park, Yong-Gee;Chung, Yeon-Kyoung
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.18 no.2
    • /
    • pp.5-22
    • /
    • 2007
  • The purpose of this study is to show elements when we product the archival authority record elements for automatic organization and production. ISAAR(CPF) and RMSCA are reviewed and analyzed which are functioning as the element of standards description and element of the creator. The research methods are literary reviews from Slectronic Document System and Records Center Automation System in public institution as Electronic Records Management Systems. As a result of the research, archival authority record elements are suggested through electronic approval function and system operating and managing function of Electronic Document System, while the elements which is impossible to use automatic organization and production are suggested by the authority system and archivist, and the description rule.

A Proofreader Matching Method Based on Topic Modeling Using the Importance of Documents (문서 중요도를 고려한 토픽 기반의 논문 교정자 매칭 방법론)

  • Son, Yeonbin;An, Hyeontae;Choi, Yerim
    • Journal of Internet Computing and Services
    • /
    • v.19 no.4
    • /
    • pp.27-33
    • /
    • 2018
  • In the process of submitting a manuscript to a journal in order to present the results of the research at the research institution, researchers often proofread the manuscript because it can manuscripts to communicate the results more effectively. Currently, most of the manuscript proofreading companies use the manual proofreader assignment method according to the subjective judgment of the matching manager. Therefore, in this paper, we propose a topic-based proofreader matching method for effective proofreading results. The proposed method is categorized into two steps. First, a topic modeling is performed by using Latent Dirichlet Allocation. In this process, the frequency of each document constituting the representative document of a user is determined according to the importance of the document. Second, the user similarity is calculated based on the cosine similarity method. In addition, we confirmed through experiments by using real-world dataset. The performance of the proposed method is superior to the comparative method, and the validity of the matching results was verified using qualitative evaluation.

A Study about Electronic Document Unification for Construction Job-site Report (건설현장보고 전자문서 일원화에 관한 연구)

  • Jeong, Seong-Yun
    • Proceedings of the Korean Institute Of Construction Engineering and Management
    • /
    • 2007.11a
    • /
    • pp.925-928
    • /
    • 2007
  • As the field report data does not normalize, ordering organization requires different form every time and field reporters make out form as random, data and report to the ordering organization by different medium such as FAX, e-mail, paper document. Therefore, Field reporters must set and make out same information in required pattern. And ordering organization is being responsible for side works of totalizing and processing the reported data. This research suggests the electronic document unification about monthly construction progress report and construction present condition report system. In addition, it suggests electronically processing function that can handle field report business to unifying electronic documents. As using the result of this research, it will improve the work efficiency and simplify in report business.

  • PDF

UML Documentation Using Compound Document (복합문서를 이용한 UML 문서화)

  • Choi, Gil-Rim;Kim, Tae-Gyun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.1
    • /
    • pp.796-799
    • /
    • 2005
  • Two of major research fields in this area are concerned in issues for CASE tools and component-based technologies. This paper discusses an example of applying component-based technology to a CASE tool. This paper proposes the research experience gained while we have developing OODesigner, an OO CASE tool, with compound document support functionality based on COM/OLE technology. OODesigner can be used not only to model UML diagrams, but also to make documentation cooperatively with various kinds of OLE servers and containers. Therefore we can conduct powerful and consistent documentation with the tool. In this paper, we present design issues for incrementally implementing the compound document support facilities as a container and a server and show a brief sample for demonstrating the usability of the OLE enabled CASE tool.

  • PDF

Counting Research Publications, Citations, and Topics: A Critical Assessment of the Empirical Basis of Scientometrics and Research Evaluation

  • Wolfgang G. Stock;Gerhard Reichmann;Isabelle Dorsch;Christian Schlogl
    • Journal of Information Science Theory and Practice
    • /
    • v.11 no.2
    • /
    • pp.37-66
    • /
    • 2023
  • Scientometrics and research evaluation describe and analyze research publications when conducting publication, citation, and topic analyses. However, what exactly is a (scientific, academic, scholarly or research) publication? This article demonstrates that there are many problems when it comes to looking in detail at quantitative publication analyses, citation analyses, altmetric analyses, and topic analyses. When is a document a publication and when is it not? We discuss authorship and contribution, formally and informally published documents, as well as documents in between (preprints, research data) and the characteristics of references, citations, and topics. What is a research publication? Is there a commonly accepted criterion for distinguishing between research and non-research? How complete and unbiased are data sources for research publications and sources for altmetrics? What is one research publication? What is the unit of a publication that causes us to count it as "1?" In this regard, we report problems related to multi-author publications and their counting, weighted document types, the unit and weighting of citations and references, the unit of topics, and counting problems-not only at the article and individual researcher level (micro-level), but also at the meso-level (e.g., institutions) and macro-level (e.g., countries). Our results suggest that scientometric counting units are not reliable and clear. Many scientometric and research evaluation studies must therefore be used with the utmost caution.

Structure-based Clustering for XML Document Retrieval (XML 문서 검색을 위한 구조 기반 클러스터링)

  • Hwang Jeong Hee;Ryu Keun Ho
    • The KIPS Transactions:PartD
    • /
    • v.11D no.7 s.96
    • /
    • pp.1357-1366
    • /
    • 2004
  • As the importance or XML is increasing to manage information and exchange data efficiently in the web, there are on going works about structural integration and retrieval. The XML. document with the defined structure can retrieve the structure through the DTD or XML schema, but the existing method can't apply to XML. documents which haven't the structure information. Therefore. in this paper we propose a new clus-tering technique at a basic research which make it possible to retrieve structure fast about the XML documents that haven't the structure information. We first estract the feature of frequent structure from each XML document. And we cluster based on the similar structure by con-sidering the frequent structure as representative structure of the XML document, which makes it possible to retrieve the XML document raster than dealing with the whole documents that have different structure. And also we perform the structure retrieval about XML documents based on the clusters which is the group of similar structure. Moreover, we show efficiency of proposed method to describe how to apply the structure retrieval as well as to display the example of application result.

Analysis on Sequence of Ball-pen and Pencil by using Digital Infrared Photography -with Emphasis on the Documents Authentication- (적외선 사진술을 이용한 볼펜과 연필의 선후 관계 분석 -문서감정을 중심으로-)

  • Kim, Yoo-Jin;Youn, Sung-Bin;Har, Dong-Hwan
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.5
    • /
    • pp.481-488
    • /
    • 2011
  • Generally speaking, a document is a mutual promise between two parties and functions as a legally-binding trust for a transaction. A document should be produced on a mutual agreement basis, and its credibility shall be attained if the transparency of a document production is ensured. Therefore, sequence analysis of the procedures in a document production is very important for appraisal of a document. The purpose of this research is to distinguish sequence association between the erased carbon ingredients of a pencil and the ingredients left in a ball-point pen and thus suggest a method that determines whether mutual agreement was applied or not in signing an insurance policy. This method analyzes if the carbon ingredients of a pencil are left in the bottom section of a ball-point pen through infrared photography. If the carbon ingredients of a pencil are left in the bottom section of a pen, the pen shall absorb infrared rays and mark a dense concentration. This method applies a relatively simple infrared photography system and therefore shall be beneficial to a personal appraisal store.