• 제목/요약/키워드: Parsing technology

Search Result 153, Processing Time 0.03 seconds

Investigations on Techniques and Applications of Text Analytics (텍스트 분석 기술 및 활용 동향)

  • Kim, Namgyu;Lee, Donghoon;Choi, Hochang;Wong, William Xiu Shun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.471-492
    • /
    • 2017
  • The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.

Design and Implementation of Customized Farming Applications using Public Data (공공데이터를 이용한 맞춤형 영농 어플리케이션 설계 및 구현)

  • Ko, Jooyoung;Yoon, Sungwook;Kim, Hyenki
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.6
    • /
    • pp.772-779
    • /
    • 2015
  • Advancing information technology have rapidly changed our service environment of life, culture, and industry. Computer information communication system is applied in medical, health, distribution, and business transaction. Smart is using new information by combining ability of computer and information. Although agriculture is labor intensive industry that requires a lot of hands, agriculture is becoming knowledge-based industry today. In agriculture field, computer communication system is applied on facilities farming and machinery Agricultural. In this paper, we designed and implemented application that provides personalized agriculture related information at the actual farming field. Also, this provides farmer a system that they can directly auction or sell their produced crops. We designed and implemented a system that parsing information of each seasonal, weather condition, market price, region based, crop, and disease and insects through individual setup on ubiquitous environment using location-based sensor network and processing data.

A Study on Feature Information Parsing System of Video Image for Multimedia Service (멀티미디어 서비스를 위한 동영상 이미지의 특징정보 분석 시스템에 관한 연구)

  • 이창수;지정규
    • Journal of Information Technology Applications and Management
    • /
    • v.9 no.3
    • /
    • pp.1-12
    • /
    • 2002
  • Due to the fast development in computer and communication technologies, a video is now being more widely used than ever in many areas. The current information analyzing systems are originally built to process text-based data. Thus, it has little bits problems when it needs to correctly represent the ambiguity of a video, when it has to process a large amount of comments, or when it lacks the objectivity that the jobs require. We would like to purpose an algorithm that is capable of analyze a large amount of video efficiently. In a video, divided areas use a region growing and region merging techniques. To sample the color, we translate the color from RGB to HSI and use the information that matches with the representative colors. To sample the shape information, we use improved moment invariants(IMI) so that we can solve many problems of histogram intersection caused by current IMI and Jain. Sampled information on characteristics of the streaming media will be used to find similar frames.

  • PDF

Knowledge Extractions, Visualizations, and Inference from the big Data in Healthcare and Medical

  • Kim, Jin Sung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.5
    • /
    • pp.400-405
    • /
    • 2013
  • The purpose of this study is to develop a composite platform for knowledge extractions, visualizations, and inference. Generally, the big data sets were frequently used in the healthcare and medical area. To help the knowledge managers/users working in the field, this study is focused on knowledge management (KM) based on Data Mining (DM), Knowledge Distribution Map (KDM), Decision Tree (DT), RDBMS, and SQL-inference. The proposed mechanism is composed of five key processes. Firstly, in Knowledge Parsing, it extracts logical rules from a big data set by using DM technology. Then it transforms the rules into RDB tables. Secondly, through Knowledge Maintenance, it refines and manages the knowledge to be ready for the computing of knowledge distributions. Thirdly, in Knowledge Distribution process, we can see the knowledge distributions by using the DT mechanism.Fourthly, in Knowledge Hierarchy, the platform shows the hierarchy of the knowledge. Finally, in Inference, it deduce the conclusions by using the given facts and data.This approach presents the advantages of diversity in knowledge representations and inference to improve the quality of computer-based medical diagnosis.

A Study on Korean Question Processing System Using Knowledge Base (지식(知識) 베이스를 이용한 한국어(韓國語) 질문 처리(處理) 시스템에 관한 연구)

  • Kim, Pan-Jun
    • Journal of Information Management
    • /
    • v.24 no.3
    • /
    • pp.1-30
    • /
    • 1993
  • Providing users who intend to retrieve document information in korean natural language with direct access to retrieval systems, a korean question processing system was developed in which korean natural language was translated into boolean search statements, which are the most frequently used in current information retrieval systems.

  • PDF

An Implementation of Java based MPEG-4 System (Java기반의 MPEG-4 시스템 구현)

  • Kang, Ki-Joung;Hong, Choong-Seon;Lee, Dae-Young
    • The KIPS Transactions:PartC
    • /
    • v.9C no.5
    • /
    • pp.637-646
    • /
    • 2002
  • In this paper, an implementation example of Java based MPEG-4 system that follows MPEG-4 standard protocols in order to provide multimedia-messaging service is introduced. The multimedia-messaging service is a wireless LAN based wired and wireless service that delivers multimedia contents including video and audio information. Detailed Methods to develop a MPEG-4 system like technology of MPEG-4 system implementation, definition for wired and wireless multimedia service, DMIF implementation, and mp4 file Parsing are described.

A Reconsideration of Asymmetries of Bracketing Paradoxes in English Derivation: a Corpus-based Approach

  • Kim, Jin-hyung
    • Journal of English Language & Literature
    • /
    • v.55 no.3
    • /
    • pp.475-495
    • /
    • 2009
  • In this paper, I discuss some asymmetries of bracketing paradoxes from a corpus-based perspective. Through a critical examination of previous analyses of bracketing paradoxes, it is demonstrated that the cases of apparent asymmetries of bracketing paradoxes are consistently accounted for when combined with the frequency-based parsability in morphological processing. Based on the relative frequency, this paper argues that bracketing paradoxes are well-atttested when their immediate bases are frequent and productive enough to be accessed as a unit and stored as such in memory. This is an extension of Hay 2002 which conducted a comprehensive survey of differential frequency effects in suffix pairs. A frequency-based approach to bracketing paradoxes adopted in this paper can be a challenge to the conventional formal theory by assuming a major role of language use and have the potential to significantly advance our understanding of the asymmetries observed in the real language world.

Tree Tagging Tool using Two-phrase Parsing (2단계 구문분석을 이용한 구문분석 말뭉치 구축도구)

  • Kim, Hye-Kyum;Park, Kyung-Mi;Yoon, Yeo-Chan;Rim, Hae-Chang;Park, So-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2005.10a
    • /
    • pp.151-158
    • /
    • 2005
  • 본 논문에서는 2단계 구문분석을 통한 구문분석 말뭉치 구축도구를 제안한다. 제안하는 방법은 대량의 구문분석 말뭉치를 수동으로 구축할 때 요구되는 작성자의 수작업을 줄이는 것을 목적으로 한다. 도구는 입력 문장을 문장 분할기준에 따라 분할하는 문장 분할 단계, 각 부분에 대해 자동 구문분석을 수행하는 부분 구문구조 생성 단계, 각 부분 구문구조를 통합하여 완전한 구문구조를 얻는 부분 통합 단계로 이루어져 있다. 자동 구문분석은 자질기반 한국어 구문분석모델을 이용하였고 문장을 부분으로 분할할 때는 문장 분할기준을 말뭉치에서 자동추출 하고 간단한 검증을 거쳐 적용하는 방법을 택하였다. 구문분석 말뭉치 구축의 각 단계에서 자동 구문 분석기가 출력한 결과를 작성자가 취소, 재구축 가능하게 하였다.

  • PDF

A data management system for microbial genome projects

  • Ki-Bong Kim;Hyeweon Nam;Hwajung Seo and Kiejung Park
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.83-85
    • /
    • 2000
  • A lot of microbial genome sequencing projects is being done in many genome centers around the world, since the first genome, Haemophilus influenzae, was sequenced in 1995. The deluge of microbial genome sequence data demands new and highly automatic data flow system in order for genome researchers to manage and analyze their own bulky sequence data from low-level to high-level. In such an aspect, we developed the automatic data management system for microbial genome projects, which consists mainly of local database, analysis programs, and user-friendly interface. We designed and implemented the local database for large-scale sequencing projects, which makes systematic and consistent data management and retrieval possible and is tightly coupled with analysis programs and web-based user interface, That is, parsing and storage of the results of analysis programs in local database is possible and user can retrieve the data in any level of data process by means of web-based graphical user interface. Contig assembly, homology search, and ORF prediction, which are essential in genome projects, make analysis programs in our system. All but Contig assembly program are open as public domain. These programs are connected with each other by means of a lot of utility programs. As a result, this system will maximize the efficiency in cost and time in genome research.

  • PDF

Development of a distributed high-speed data acquisition and monitoring system based on a special data packet format for HUST RF negative ion source

  • Li, Dong;Yin, Ling;Wang, Sai;Zuo, Chen;Chen, Dezhi
    • Nuclear Engineering and Technology
    • /
    • v.54 no.10
    • /
    • pp.3587-3594
    • /
    • 2022
  • A distributed high-speed data acquisition and monitoring system for the RF negative ion source at Huazhong University of Science and Technology (HUST) is developed, which consists of data acquisition, data forwarding and data processing. Firstly, the data acquisition modules sample physical signals at high speed and upload the sampling data with corresponding absolute-time labels over UDP, which builds the time correlation among different signals. And a special data packet format is proposed for the data upload, which is convenient for packing or parsing a fixed-length packet, especially when the span of the time labels in a packet crosses an absolute second. The data forwarding modules then receive the UDP messages and distribute their data packets to the real-time display module and the data storage modules by PUB/SUB-pattern message queue of ZeroMQ. As for the data storage, a scheme combining the file server and MySQL database is adopted to increase the storage rate and facilitate the data query. The test results show that the loss rate of the data packets is within the range of 0-5% and the storage rate is higher than 20 Mbps, both acceptable for the HUST RF negative ion source.