• 제목/요약/키워드: document topic

검색결과 190건 처리시간 0.024초

e-비즈니스 협업에 적합한 다중변환 시스템 연구 (A Study on multi-translation system for e-business collaboration)

  • 안경림;정진욱
    • 인터넷정보학회논문지
    • /
    • 제7권6호
    • /
    • pp.123-130
    • /
    • 2006
  • e-비즈니스 초창기의 트랜잭션은 단일 비즈니스 주체 또는 마켓 플레이스 내에서 발생하였으나 점차 복합적인 형태로 발전되고 있다. 특히 비즈니스 주체 또는 마켓 플레이스 간의 비즈니스 협업에 대한 필요성이 핵심 사상으로 대두되었다. 상호 교환되는 전자문서의 형태도 다양해짐에 따라 문서 간 형태 변환이 중요한 요소가 되었다. 본 논문에서는 이러한 객체 지향적인 비즈니스 트랜잭션의 흐름에 따라 상호 교환되는 문서의 기본 형태를 ebXML로 정의하였으며, 다양한 형태의 문서 변환을 지원하기 위해 다중-포맷 변환 기능을 갖는 변환 시스템을 설계하였다. 본 논문에서 제안한 시스템은 model-driven 방식으로 설계되어 시스템 환경에 따라 다양한 형태로 구성될 수 있다. 제안한 변환 시스템은 어떠한 형태의 데이터가 입력되더라도 파싱 모듈만 추가로 개발하면 적용할 수 있도록 설계하였다. 또한 공통 데이터 셋을 정의하여 데이터의 재사용성을 증가시켰다. 본 논문에서는 다양한 형태 변환에 대해 기존 변환 시스템과의 성능을 비교하여 제안한 시스템의 우위성을 증명하였다.

  • PDF

Object detection in financial reporting documents for subsequent recognition

  • Sokerin, Petr;Volkova, Alla;Kushnarev, Kirill
    • International journal of advanced smart convergence
    • /
    • 제10권1호
    • /
    • pp.1-11
    • /
    • 2021
  • Document page segmentation is an important step in building a quality optical character recognition module. The study examined already existing work on the topic of page segmentation and focused on the development of a segmentation model that has greater functional significance for application in an organization, as well as broad capabilities for managing the quality of the model. The main problems of document segmentation were highlighted, which include a complex background of intersecting objects. As classes for detection, not only classic text, table and figure were selected, but also additional types, such as signature, logo and table without borders (or with partially missing borders). This made it possible to pose a non-trivial task of detecting non-standard document elements. The authors compared existing neural network architectures for object detection based on published research data. The most suitable architecture was RetinaNet. To ensure the possibility of quality control of the model, a method based on neural network modeling using the RetinaNet architecture is proposed. During the study, several models were built, the quality of which was assessed on the test sample using the Mean average Precision metric. The best result among the constructed algorithms was shown by a model that includes four neural networks: the focus of the first neural network on detecting tables and tables without borders, the second - seals and signatures, the third - pictures and logos, and the fourth - text. As a result of the analysis, it was revealed that the approach based on four neural networks showed the best results in accordance with the objectives of the study on the test sample in the context of most classes of detection. The method proposed in the article can be used to recognize other objects. A promising direction in which the analysis can be continued is the segmentation of tables; the areas of the table that differ in function will act as classes: heading, cell with a name, cell with data, empty cell.

Cross-Domain Text Sentiment Classification Method Based on the CNN-BiLSTM-TE Model

  • Zeng, Yuyang;Zhang, Ruirui;Yang, Liang;Song, Sujuan
    • Journal of Information Processing Systems
    • /
    • 제17권4호
    • /
    • pp.818-833
    • /
    • 2021
  • To address the problems of low precision rate, insufficient feature extraction, and poor contextual ability in existing text sentiment analysis methods, a mixed model account of a CNN-BiLSTM-TE (convolutional neural network, bidirectional long short-term memory, and topic extraction) model was proposed. First, Chinese text data was converted into vectors through the method of transfer learning by Word2Vec. Second, local features were extracted by the CNN model. Then, contextual information was extracted by the BiLSTM neural network and the emotional tendency was obtained using softmax. Finally, topics were extracted by the term frequency-inverse document frequency and K-means. Compared with the CNN, BiLSTM, and gate recurrent unit (GRU) models, the CNN-BiLSTM-TE model's F1-score was higher than other models by 0.0147, 0.006, and 0.0052, respectively. Then compared with CNN-LSTM, LSTM-CNN, and BiLSTM-CNN models, the F1-score was higher by 0.0071, 0.0038, and 0.0049, respectively. Experimental results showed that the CNN-BiLSTM-TE model can effectively improve various indicators in application. Lastly, performed scalability verification through a takeaway dataset, which has great value in practical applications.

An Optimized e-Lecture Video Search and Indexing framework

  • Medida, Lakshmi Haritha;Ramani, Kasarapu
    • International Journal of Computer Science & Network Security
    • /
    • 제21권8호
    • /
    • pp.87-96
    • /
    • 2021
  • The demand for e-learning through video lectures is rapidly increasing due to its diverse advantages over the traditional learning methods. This led to massive volumes of web-based lecture videos. Indexing and retrieval of a lecture video or a lecture video topic has thus proved to be an exceptionally challenging problem. Many techniques listed by literature were either visual or audio based, but not both. Since the effects of both the visual and audio components are equally important for the content-based indexing and retrieval, the current work is focused on both these components. A framework for automatic topic-based indexing and search depending on the innate content of the lecture videos is presented. The text from the slides is extracted using the proposed Merged Bounding Box (MBB) text detector. The audio component text extraction is done using Google Speech Recognition (GSR) technology. This hybrid approach generates the indexing keywords from the merged transcripts of both the video and audio component extractors. The search within the indexed documents is optimized based on the Naïve Bayes (NB) Classification and K-Means Clustering models. This optimized search retrieves results by searching only the relevant document cluster in the predefined categories and not the whole lecture video corpus. The work is carried out on the dataset generated by assigning categories to the lecture video transcripts gathered from e-learning portals. The performance of search is assessed based on the accuracy and time taken. Further the improved accuracy of the proposed indexing technique is compared with the accepted chain indexing technique.

Research on Brand Value Dimensions of Employers: Based on Online Reviews by the Employees

  • XU, Meng
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제9권10호
    • /
    • pp.215-225
    • /
    • 2022
  • This study investigates employees' online reviews, conducts in-depth text topic mining, effectively summarizes the dimensions of employer brand value, and seeks effective ways to build employer brands from a multi-dimensional perspective. This study employs samples of employer reviews, filter keywords according to word frequency-inverse document frequency, builds a review network containing the same keywords, explore the community and summarize the theme dimensions. Simultaneously, it makes a dynamic comparison and analysis of the employer brand value dimension of different industries and enterprises. The study shows that the community exploration theme can be summarized into 11 dimensions of employer brand value, and the dimensions of employer brand value are significantly different across industries and among different enterprises within the industry. The attention to the employer brand value dimension has a significant time change. Various industries pay increasing attention to the dimension of work intensity and career development, while employers pay steady attention to the dimension of welfare benefits. The findings of this study suggest that seeking the heterogeneity of employer brand resources from the multi-dimensional differences and changes is an effective way to improve the competitiveness of enterprises in the human capital market.

토픽 모델링을 활용한 COVID-19 발생 전후 간호사 관련 토픽 비교: 인터넷 포털과 소셜미디어를 중심으로 (Comparison of Topics Related to Nurse on the Internet Portals and Social Media Before and During the COVID-19 era Using Topic Modeling)

  • 윤영미;김성광;김혜경;김은주;정윤의
    • 근관절건강학회지
    • /
    • 제27권3호
    • /
    • pp.255-267
    • /
    • 2020
  • Purpose: The purpose of this study is to compare topics through keywords related to nurses in internet portals and social media Pre coronavirus disease (COVID-19) era and during the COVID-19 era. Methods: For six months before and during the outbreak of COVID-19 in Korea, "nurse" was searched on the internet. For data collection, we implemented web crawlers in programming languages such as Python and collected keywords. The keywords collected were classified into three domains of topic Modeling. Results: The keyword 'nurse' increased by 15% during COVID-19 era. Keywords that ranked high in Term Frequency - Inverse Document Frequency (TF-IDF) values were before COVID-19, such as "nurse" and "C-section". during COVID-19, however, they were not only "nurse" but also "emergency" and "gown" related to pandemics. Conclusion: Various topics were being uploaded into the internet media. Nursing professionals should be interested in the text that is revealed in the internet media and try to continuously identify and improve problems.

로테르담규칙상 운송물 인도와 실무상 유의점 (Practical Implications on Delivery of Goods under the Rotterdam Rules)

  • 양정호
    • 무역상무연구
    • /
    • 제74권
    • /
    • pp.55-79
    • /
    • 2017
  • The Rotterdam Rules introduces new issues that have been ignored by previous international transport conventions. Among them, provisions on delivery of goods have been a much debated topic as it deviate from well established principles. Rotterdam Rules provides several alternatives in order to resolve uncertainty regarding delivery practice. The carrier have to make a resonable effort to deliver the goods following the required procedure which is different from transport document issued. Where the goods are not deliverable, the carrier could discharge from its obligations to deliver the goods when he deliver the goods by delivery instruction of shipper. In addition, he can take actions reasonably required according to circumstances if it is impossible to deliver the goods. These alternatives are not ideal, but they seem to be partly helping to solve practical problems arising in the process of delivery. However the delivery regime under the Rotterdam Rules could cause confusion in the traditional delivery principle. On the other hand, it puts a new burden on the parties concerned. In conclusion, the parties concerned should consider practical implications in issuing and transferring transport document as well as requesting and instructing delivery of goods.

  • PDF

단말노드 언어모델 기반의 XML문서검색에서 구조 제한의 유용성에 관한 실험적 연구 (A Experimental Study on the Usefulness of Structure Hints in the Leaf Node Language Model-Based XML Document Retrieval)

  • 정영미
    • 정보관리학회지
    • /
    • 제24권1호
    • /
    • pp.209-226
    • /
    • 2007
  • XML웹 문서 포맷은 문헌 내에 내용과 의미있는 논리적인 구조 정보를 포함할 수 있어, 검색에서 문서의 내용뿐만 아니라 구조로 접근하는 것을 제공한다. 그래서 본 연구의 목적은XML검색에 있어 내용 검색에 추가적인 요소로 사용된 구조적인 제한이 얼마나 유용한지를 실험하기 위해 내용만으로 검색한 결과와 내용과 구조적인 제한을 가지고 검색한 결과간의 성능을 비교하였다. 이 실험은 자체 개발된 단말노드 언어모델기반의 XML 검색시스템을 사용하였고 INEX 2005의 ad-hoc track에 참여하여 모든 실험방법과 INEX 2005의 실험 문헌 집단을 사용하였다.

비디오 문서 관리시스템의 설계 및 구현 (Design and Implementation of Video Documents Management System)

  • 권재길;배종민
    • 한국정보처리학회논문지
    • /
    • 제7권8호
    • /
    • pp.2287-2297
    • /
    • 2000
  • 시청각 정보 및 의미적 정보 등을 포함하는 비디오 문서는 미디어들간에 복잡한 관계성을 가지며, 내용에 대한 주제별 검색과 문서에 나타나는 객체 정보를 이용한 특정 영역별 검색을 요구하는 등 사용자의 질의가 다양해지고 있으나, 기존의 정보 검색 설계 방법으로는 이러한 내용을 충분히 수용하기 어렵다. 따라서 비디오 문서를 체계적으로 관리하고 다양한 검색을 지원하기 위해서는 자동 혹은 수동으로 추출된 문서의 의미적 정보 및 구조적 정보 등을 활용한 구조적이고 체계적인 메타데이터 모델을 정립해야 한다. 본 논문은 비디오 문서가 가지는 특성들을 분석하여 다양한 질의를 수용할 수 있고, 다른 비디오 응용의 기반 구조로 제공될 수 있는 일반저인 메타데이터를 분류하고 이를 통합적으로 관리하는 일반 통합 메타데이터 모델(GIMM)을 제안한다. 그리고 GIMM을 이용하여 비디오 문서 관리 시스템(VDMS)을 설계하고 구현한다.

  • PDF

그룹웨어의 현황 분석 II (Groupware: Current Status Analysis II)

  • 김선욱;김봉진
    • 산업공학
    • /
    • 제11권2호
    • /
    • pp.211-225
    • /
    • 1998
  • As mentioned in Part I all groupware products have been categorized into three areas which include cooperation/document management systems(CMS), collaborative writing systems(CWS), and decision-making/meeting system(DMS). This study deals with a comparative analysis of the last two areas, which is added to the first. It turns out that DMS has a higher market share than CWS. However. since effective collaboration requires the functions inherent to these two systems. they should be integrated somehow. The systems' functions that have been implemented in response to design issues have been described. Each group of the functions has been divided into three parts which consist of basic function, quasi-basic function. and others. Such a decision has been made according to the frequency rate of the functions provided in the products. While the basic functions in CWS include collaboraive writing beyond restriction of time and place, group awareness. version control. and others, in DMS realtime collaboration. brainstorming. presentation. various task support. policy formation. document management, multimedia, subgroup communication. topic commenter, categorizer, screen capture and various rile transfer. The basic functions are merged into the integrated functional model which was proposed in Part I. Since the model is so flexible that it can partially include the quasi-functions in addition to the hasic functions. a large number of products may stem from the modification of the functional model.

  • PDF