• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.032 seconds

Design and Implementation of Incremental Learning Technology for Big Data Mining

  • Min, Byung-Won;Oh, Yong-Sun
    • International Journal of Contents
    • /
    • v.15 no.3
    • /
    • pp.32-38
    • /
    • 2019
  • We usually suffer from difficulties in treating or managing Big Data generated from various digital media and/or sensors using traditional mining techniques. Additionally, there are many problems relative to the lack of memory and the burden of the learning curve, etc. in an increasing capacity of large volumes of text when new data are continuously accumulated because we ineffectively analyze total data including data previously analyzed and collected. In this paper, we propose a general-purpose classifier and its structure to solve these problems. We depart from the current feature-reduction methods and introduce a new scheme that only adopts changed elements when new features are partially accumulated in this free-style learning environment. The incremental learning module built from a gradually progressive formation learns only changed parts of data without any re-processing of current accumulations while traditional methods re-learn total data for every adding or changing of data. Additionally, users can freely merge new data with previous data throughout the resource management procedure whenever re-learning is needed. At the end of this paper, we confirm a good performance of this method in data processing based on the Big Data environment throughout an analysis because of its learning efficiency. Also, comparing this algorithm with those of NB and SVM, we can achieve an accuracy of approximately 95% in all three models. We expect that our method will be a viable substitute for high performance and accuracy relative to large computing systems for Big Data analysis using a PC cluster environment.

A Binary Prediction Method for Outlier Detection using One-class SVM and Spectral Clustering in High Dimensional Data (고차원 데이터에서 One-class SVM과 Spectral Clustering을 이용한 이진 예측 이상치 탐지 방법)

  • Park, Cheong Hee
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.6
    • /
    • pp.886-893
    • /
    • 2022
  • Outlier detection refers to the task of detecting data that deviate significantly from the normal data distribution. Most outlier detection methods compute an outlier score which indicates the degree to which a data sample deviates from normal. However, setting a threshold for an outlier score to determine if a data sample is outlier or normal is not trivial. In this paper, we propose a binary prediction method for outlier detection based on spectral clustering and one-class SVM ensemble. Given training data consisting of normal data samples, a clustering method is performed to find clusters in the training data, and the ensemble of one-class SVM models trained on each cluster finds the boundaries of the normal data. We show how to obtain a threshold for transforming outlier scores computed from the ensemble of one-class SVM models into binary predictive values. Experimental results with high dimensional text data show that the proposed method can be effectively applied to high dimensional data, especially when the normal training data consists of different shapes and densities of clusters.

Validity Study of Kohonen Self-Organizing Maps

  • Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.2
    • /
    • pp.507-517
    • /
    • 2003
  • Self-organizing map (SOM) has been developed mainly by T. Kohonen and his colleagues as a unsupervised learning neural network. Because of its topological ordering property, SOM is known to be very useful in pattern recognition and text information retrieval areas. Recently, data miners use Kohonen´s mapping method frequently in exploratory analyses of large data sets. One problem facing SOM builder is that there exists no sensible criterion for evaluating goodness-of-fit of the map at hand. In this short communication, we propose valid evaluation procedures for the Kohonen SOM of any size. The methods can be used in selecting the best map among several candidates.

Power-Flow Simulator with Visualization Function Based on IEEE Common Data Format

  • Sugino, Shohei;Sekiya, Hiroo
    • Journal of Multimedia Information System
    • /
    • v.3 no.4
    • /
    • pp.161-168
    • /
    • 2016
  • In this paper, a power flow simulator, which visualizes power flow and system configuration, is proposed and implemented. Generally, it is necessary to prepare a text file with power-system descriptions, which is one of the barriers for power-flow simulations. The proposed simulator has a function of automatic generations of IEEE common data format files from user-drawn power-system diagrams. Therefore, it is possible for users to carry out simulations only by drawing power system on display. In addition, the proposed simulator also has a function that power-system diagram is illustrated automatically from an IEEE common data format file. By using this function, it is possible to visualize amounts and directions of power flows on the bus-system diagram, which helps users to comprehend network dynamics intuitively. Because the proposed simulator allows including renewable-resource generators in power systems, it is useful to evaluate the power distribution system. It is shown in this paper that the proposed simulator can make IEEE common data format files correctly and illustrate intuitive power flow.

Development of a Database Program for the Management of Railroad Slopes (철도사면관리 프로그램 개발)

  • 송원경;한공창;천대성;신희순;신민호
    • Proceedings of the Korean Geotechical Society Conference
    • /
    • 2000.11a
    • /
    • pp.151-158
    • /
    • 2000
  • A database program, named SLOPMAN, was developed to collect all the information on the slopes with potential hazard around railroad and to setup the effective management system. SLOPMAN is composed of three modules: data control, search and analysis. The program is able to store both text and image data and operated with tabs for the users' ability and convenience. Drop-down menu is equipped to reduce errors and the number of key strokes when inputting data. Searching data is made by codes automatically given to slopes or key words. In the analysis module, RMR and SMR values can be obtained to estimate the stability of slopes.

  • PDF

Diagnosis Model for Remote Monitoring of CNC Machine Tool (공작기계 운격감시를 위한 진단모델)

  • 김선호;이은애;김동훈;한기상;권용찬
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2000.11a
    • /
    • pp.233-238
    • /
    • 2000
  • CNC machine tool is assembled by central processor, PLC(Programmable Logic Controller), and actuator. The sequential control of machine generally controlled by a PLC. The main fault occured at PLC in 3 control parts. In LC faults, operational fault is charged over 70%. This paper describes diagnosis model and data processing for remote monitoring and diagnosis system in machine tools with open architecture controller. Two diagnostic models based on the ladder diagram. Logical Diagnosis Model(LDM), Sequential Diagnosis Model(SDM), are proposed. Data processing structure is proposed ST(Structured Text) based on IEC1131-3. The faults from CNC are received message form open architecture controller and faults from PLC are gathered by sequential data.. To do this, CNC and PLC's logical and sequential data is constructed database.

  • PDF

A Study on Database of Region Statistic and Application (지역통계 데이타베이스 구축및 활용방안)

  • 이희춘;김승구
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.19 no.38
    • /
    • pp.199-205
    • /
    • 1996
  • The purpose of this study, therefore, was to construct the region statistical information to present methods of the data. The results of this paper are as follows: First, the construction of region statistical data is much in need of utilizing the server of regional information center, or the database to the server of public institutions, Second, there are some difficulties to receive the region statistical data because of only depending on the main source of KOSIS provided by national units from National Statistical Office. Third, as there is another problem which is text searching system served by KOSIS, GU system should be established for the user's satisfaction served by easier accessing screen. Fourth, there should be a standard software production to suit for the accessing software of the region statistical data.

  • PDF

Machine Learning Applied to Uncovering Gene Regulation

  • Craven, Mark
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.61-68
    • /
    • 2000
  • Now that the complete genomes of numerous organisms have been ascertained, key problems in molecular biology include determining the functions of the genes in each organism, the relationships that exist among these genes, and the regulatory mechanisms that control their operation. These problems can be partially addressed by using machine learning methods to induce predictive models from available data. My group is applying and developing machine learning methods for several tasks that involve characterizing gene regulation. In one project, for example, we are using machine learning methods to identify transcriptional control elements such as promoters, terminators and operons. In another project, we are using learning methods to identify and characterize sets of genes that are affected by tumor promoters in mammals. Our approach to these tasks involves learning multiple models for inter-related tasks, and applying learning algorithms to rich and diverse data sources including sequence data, microarray data, and text from the scientific literature.

  • PDF

DESIGN OF METADATA MANAGEMENT SYSTEM FOR RETRIEVAL OF VIDEO DATA

  • Heo, Byeong-Mun;Lee, Yang-Koo;Chai, Duck-Jin;Wang, Ling;Lee, Yong-Mi;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.314-316
    • /
    • 2007
  • Currently for the development of internet and network technology, since request of service for the large volume multimedia data has been more increased, multimedia users want the convenience and accuracy of multimedia service system about storing and retrieving of the multimedia contents. To satisfy the request of users, metadata management for the diverse information of multimedia contents is very important. However, the metadata management for the multimedia contents is difficult because the metadata standards are different each other for the type of multimedia data and service. In this paper, we propose the integration metadata management system structure which extends previous metadata management system based on text for the multimedia contents metadata which are expressed differently each other according to the multimedia data or service type.

  • PDF

A Study on the Effect of Data Fusion on the Retrieval Effectiveness of Web Documents (데이터 결합이 웹 문서 검색성능에 미치는 영향 연구)

  • Park, Ok-Hwa;Chung, Young-Mee
    • Journal of Information Management
    • /
    • v.38 no.1
    • /
    • pp.1-19
    • /
    • 2007
  • This study investigates the effect of data fusion on the retrieval effectiveness by performing an experiment combining multiple representations of Web documents. The types of document representation combined in the study include content terms, links, anchor text, and URL. The experimental results showed that the data fusion technique combining document representation methods in Web environment did not bring any significant improvement in retrieval effectiveness.