• Title/Summary/Keyword: intelligent classification

Search Result 915, Processing Time 0.024 seconds

Class Imbalance Resolution Method and Classification Algorithm Suggesting Based on Dataset Type Segmentation (데이터셋 유형 분류를 통한 클래스 불균형 해소 방법 및 분류 알고리즘 추천)

  • Kim, Jeonghun;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.23-43
    • /
    • 2022
  • In order to apply AI (Artificial Intelligence) in various industries, interest in algorithm selection is increasing. Algorithm selection is largely determined by the experience of a data scientist. However, in the case of an inexperienced data scientist, an algorithm is selected through meta-learning based on dataset characteristics. However, since the selection process is a black box, it was not possible to know on what basis the existing algorithm recommendation was derived. Accordingly, this study uses k-means cluster analysis to classify types according to data set characteristics, and to explore suitable classification algorithms and methods for resolving class imbalance. As a result of this study, four types were derived, and an appropriate class imbalance resolution method and classification algorithm were recommended according to the data set type.

Malware Detection Technology Based on API Call Time Section Characteristics (API 호출 구간 특성 기반 악성코드 탐지 기술)

  • Kim, Dong-Yeob;Choi, Sang-Yong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.4
    • /
    • pp.629-635
    • /
    • 2022
  • Cyber threats are also increasing with recent social changes and the development of ICT technology. Malicious codes used in cyber threats are becoming more advanced and intelligent, such as analysis environment avoidance technology, concealment, and fileless distribution, to make analysis difficult. Machine learning technology is being used to effectively analyze these malicious codes, but a lot of effort is needed to increase the accuracy of classification. In this paper, we propose a malicious code detection technology based on API call interval characteristics to improve the classification performance of machine learning. The proposed technology uses API call characteristics for each section and entropy of binary to separate characteristic factors into sections based on the extraction malicious code and API call order of normal binary. It was verified that malicious code can be well analyzed using the support vector machine (SVM) algorithm for the extracted characteristic factors.

Mining Intellectual History Using Unstructured Data Analytics to Classify Thoughts for Digital Humanities (디지털 인문학에서 비정형 데이터 분석을 이용한 사조 분류 방법)

  • Seo, Hansol;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.141-166
    • /
    • 2018
  • Information technology improves the efficiency of humanities research. In humanities research, information technology can be used to analyze a given topic or document automatically, facilitate connections to other ideas, and increase our understanding of intellectual history. We suggest a method to identify and automatically analyze the relationships between arguments contained in unstructured data collected from humanities writings such as books, papers, and articles. Our method, which is called history mining, reveals influential relationships between arguments and the philosophers who present them. We utilize several classification algorithms, including a deep learning method. To verify the performance of the methodology proposed in this paper, empiricists and rationalism - related philosophers were collected from among the philosophical specimens and collected related writings or articles accessible on the internet. The performance of the classification algorithm was measured by Recall, Precision, F-Score and Elapsed Time. DNN, Random Forest, and Ensemble showed better performance than other algorithms. Using the selected classification algorithm, we classified rationalism or empiricism into the writings of specific philosophers, and generated the history map considering the philosopher's year of activity.

Personal Information Detection by Using Na$\ddot{i}$ve Bayes Methodology (Na$\ddot{i}$ve Bayes 방법론을 이용한 개인정보 분류)

  • Kim, Nam-Won;Park, Jin-Soo
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.91-107
    • /
    • 2012
  • As the Internet becomes more popular, many people use it to communicate. With the increasing number of personal homepages, blogs, and social network services, people often expose their personal information online. Although the necessity of those services cannot be denied, we should be concerned about the negative aspects such as personal information leakage. Because it is impossible to review all of the past records posted by all of the people, an automatic personal information detection method is strongly required. This study proposes a method to detect or classify online documents that contain personal information by analyzing features that are common to personal information related documents and learning that information based on the Na$\ddot{i}$ve Bayes algorithm. To select the document classification algorithm, the Na$\ddot{i}$ve Bayes classification algorithm was compared with the Vector Space classification algorithm. The result showed that Na$\ddot{i}$ve Bayes reveals more excellent precision, recall, F-measure, and accuracy than Vector Space does. However, the measurement level of the Na$\ddot{i}$ve Bayes classification algorithm is still insufficient to apply to the real world. Lewis, a learning algorithm researcher, states that it is important to improve the quality of category features while applying learning algorithms to some specific domain. He proposes a way to incrementally add features that are dependent on related documents and in a step-wise manner. In another experiment, the algorithm learns the additional dependent features thereby reducing the noise of the features. As a result, the latter experiment shows better performance in terms of measurement than the former experiment does.

Similar Contents Recommendation Model Based On Contents Meta Data Using Language Model (언어모델을 활용한 콘텐츠 메타 데이터 기반 유사 콘텐츠 추천 모델)

  • Donghwan Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.27-40
    • /
    • 2023
  • With the increase in the spread of smart devices and the impact of COVID-19, the consumption of media contents through smart devices has significantly increased. Along with this trend, the amount of media contents viewed through OTT platforms is increasing, that makes contents recommendations on these platforms more important. Previous contents-based recommendation researches have mostly utilized metadata that describes the characteristics of the contents, with a shortage of researches that utilize the contents' own descriptive metadata. In this paper, various text data including titles and synopses that describe the contents were used to recommend similar contents. KLUE-RoBERTa-large, a Korean language model with excellent performance, was used to train the model on the text data. A dataset of over 20,000 contents metadata including titles, synopses, composite genres, directors, actors, and hash tags information was used as training data. To enter the various text features into the language model, the features were concatenated using special tokens that indicate each feature. The test set was designed to promote the relative and objective nature of the model's similarity classification ability by using the three contents comparison method and applying multiple inspections to label the test set. Genres classification and hash tag classification prediction tasks were used to fine-tune the embeddings for the contents meta text data. As a result, the hash tag classification model showed an accuracy of over 90% based on the similarity test set, which was more than 9% better than the baseline language model. Through hash tag classification training, it was found that the language model's ability to classify similar contents was improved, which demonstrated the value of using a language model for the contents-based filtering.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

A Study on Service Classification System for Urban Intelligent Geospatial Information (지능형 도시공간정보 서비스 분류체계에 관한 연구)

  • Kim, Eun-Hyung
    • Proceedings of the Korean Association of Geographic Inforamtion Studies Conference
    • /
    • 2010.09a
    • /
    • pp.277-282
    • /
    • 2010
  • IT를 매개로 서로 다른 분야의 기술들이 상호 융합되어 새로운 서비스 또는 기술을 제공하는 유비쿼터스 패러다임은 도시에 대한 새로운 시각을 제공하고 있다. u-City는 첨단 정보통신 인프라와 유비쿼터스 정보서비스를 도시공간에 융합하여 도시생활의 편의 증대와 삶의 질 향상, 체계적 도시관리에 의한 안전보장과 시민복지 향상, 신산업 창출 등 도시의 제반기능을 혁신시킬 수 있는 차세대 정보화 도시를 의미한다. 성공적인 u-City 건설을 위해서 기존의 지자체GIS 개념을 확대한 지능형 도시공간정보의 개념이 필요하며, 지능형 도시공간정보를 효율적으로 제공하기 위한 서비스가 요구된다. 이러한 지능형 도시공간정보 서비스는 도시 전반의 영역에서 제시될 수 있으므로 다양한 서비스를 도출할 수 있다. 이에 본 연구에서는 지능형 도시공간정보의 개념을 살펴보고, 기존 u-City 관련 서비스 분류에 관한 선행 연구내용을 중심으로 지능형 도시공간정보 서비스 분류체계를 제시였다.

  • PDF

A Research of Obstacle Detection and Path Planning for Lane Change of Autonomous Vehicle in Urban Environment (자율주행 자동차의 실 도로 차선 변경을 위한 장애물 검출 및 경로 계획에 관한 연구)

  • Oh, Jae-Saek;Lim, Kyung-Il;Kim, Jung-Ha
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.21 no.2
    • /
    • pp.115-120
    • /
    • 2015
  • Recently, in automotive technology area, intelligent safety systems have been actively accomplished for drivers, passengers, and pedestrians. Also, many researches are focused on development of autonomous vehicles. This paper propose the application of LiDAR sensors, which takes major role in perceiving environment, terrain classification, obstacle data clustering method, and local map building for autonomous driving. Finally, based on these results, planning for lane change path that vehicle tracking possible were created and the reliability of path generation were experimented.

Filtering and Segmentation of radar imagery

  • Kang, Sung-Chul;Kim, Young-seup;Yoon, Hong-Joo;Baek, Seung-Gyun
    • Proceedings of the KSRS Conference
    • /
    • 1999.11a
    • /
    • pp.421-424
    • /
    • 1999
  • The purpose of this study is to demonstrate a variety of methods for reducing the speckle noise content of SAR images, whilst at the same time retaining the fined details and average radiometric properties of the original data. In order to increase the accuracy of classification, Two categories of filters are used (speckleblind(simple), Speckle aware(intelligent)) and Segmentation of highly speckled radar imagery is achieved by the use of the Gaussian Markov Random Field model(GMRF). The problems in applying filtering techniques to different object types are discussed and the GMRF procedure and efficiency of the segmentation also discussed.

  • PDF

LAN-Based Protective Relaying for Interconnect Protection of Dispersed Generators (LAN을 이용한 분산전원 연계 계통의 보호)

  • Jyung, Tae-Young;Baek, Young-Sik
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.56 no.3
    • /
    • pp.491-497
    • /
    • 2007
  • As dispersed generators was driven in condition interconnecting with utility, it could cause a variety of new effects to the original distribution system that was running as considered only the one-way power flow. Therefore, the protection devices that is builted in distribution system should be designed to be able to operate with disposing of not only a fault of the generator, but also utility condition. Especially, the fault of the feeder interconnected with Dispersed Generator can cause the islanding phenomenon of open DG(Dispersed Generators). This phenomenon has many problems such as a machinery damage, electricity qualify degradation and a difficulty of the system recovery. In the fault therefore, we must separate Dispersed Generator from the system quickly. In this paper, for the fault classification of the interconnected DG and the outside feeder we judge the fault of the interconnected DG and the outside feeder in HMI through data provided by IED(Intelligent Electronic Device) on the network and decide whether it operates or not by sending the result to each relay.