• Title/Summary/Keyword: unsupervised analysis

Search Result 315, Processing Time 0.021 seconds

Construction of Linearly Aliened Corpus Using Unsupervised Learning (자율 학습을 이용한 선형 정렬 말뭉치 구축)

  • Lee, Kong-Joo;Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.3
    • /
    • pp.387-394
    • /
    • 2004
  • In this paper, we propose a modified unsupervised linear alignment algorithm for building an aligned corpus. The original algorithm inserts null characters into both of two aligned strings (source string and target string), because the two strings are different from each other in length. This can cause some difficulties like the search space explosion for applications using the aligned corpus with null characters and no possibility of applying to several machine learning algorithms. To alleviate these difficulties, we modify the algorithm not to contain null characters in the aligned source strings. We have shown the usability of our approach by applying it to different areas such as Korean-English back-trans literation, English grapheme-phoneme conversion, and Korean morphological analysis.

Development of an unsupervised learning-based ESG evaluation process for Korean public institutions without label annotation

  • Do Hyeok Yoo;SuJin Bak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.5
    • /
    • pp.155-164
    • /
    • 2024
  • This study proposes an unsupervised learning-based clustering model to estimate the ESG ratings of domestic public institutions. To achieve this, the optimal number of clusters was determined by comparing spectral clustering and k-means clustering. These results are guaranteed by calculating the Davies-Bouldin Index (DBI), a model performance index. The DBI values were 0.734 for spectral clustering and 1.715 for k-means clustering, indicating lower values showed better performance. Thus, the superiority of spectral clustering was confirmed. Furthermore, T-test and ANOVA were used to reveal statistically significant differences between ESG non-financial data, and correlation coefficients were used to confirm the relationships between ESG indicators. Based on these results, this study suggests the possibility of estimating the ESG performance ranking of each public institution without existing ESG ratings. This is achieved by calculating the optimal number of clusters, and then determining the sum of averages of the ESG data within each cluster. Therefore, the proposed model can be employed to evaluate the ESG ratings of various domestic public institutions, and it is expected to be useful in domestic sustainable management practice and performance management.

Research Trends Analysis on ESG Using Unsupervised Learning

  • Woo-Ryeong YANG;Hoe-Chang YANG
    • The Journal of Economics, Marketing and Management
    • /
    • v.11 no.3
    • /
    • pp.47-66
    • /
    • 2023
  • Purpose: The purpose of this study is to identify research trends related to ESG by domestic and overseas researchers so far, and to present research directions and clues for the possibility of applying ESG to Korean companies in the future and ESG practice through comparison of derived topics. Research design, data and methodology: In this study, as of October 20, 2022, after searching for the keyword 'ESG' in 'scienceON', 341 domestic papers with English abstracts and 1,173 overseas papers were extracted. For analysis, word frequency analysis, word co-occurrence frequency analysis, BERTopic, LDA, and OLS regression analysis were performed to confirm trends for each topic using Python 3.7. Results: As a result of word frequency analysis, It was found that words such as management, company, performance, and value were commonly used in both domestic and overseas papers. In domestic papers, words such as activity and responsibility, and in overseas papers, words such as sustainability, impact, and development were included in the top 20 words. As a result of analyzing the co-occurrence frequency of words, it was confirmed that domestic papers were related mainly to words such as company, management, and activity, and overseas papers were related to words such as investment, sustainability, and performance. As a result of topic modeling, 3 topics such as named ESG from the corporate perspective were derived for domestic papers, and a total of 7 topics such as named sustainable investment for overseas papers were derived. As a result of the annual trend analysis, each topic did not show a relatively increasing or decreasing tendency, confirming that all topics were neutral. Conclusions: The results of this study confirmed that although it is desirable that domestic papers have recently started research on consumers, the subject diversity is lower than that of overseas papers. Therefore, it is suggested that future research needs to approach various topics such as forecasting future risks related to ESG and corporate evaluation methods.

Dynamic Asset Allocation by Applying Regime Detection Analysis (Regime 탐지 분석을 이용한 동적 자산 배분 기법)

  • Kim, Woo Chang
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.38 no.4
    • /
    • pp.258-261
    • /
    • 2012
  • In this paper, I propose a new asset allocation framework to cope with the dynamic nature of the financial market. The investment performance can be much improved by protecting the capital from the market crashes, and such crashes can be pre-identified with high probabilities by regime detection analysis via a specialized unsupervised machine learning technique.

Complex Features by Independent Component Analysis (독립성분분석에 의한 복합특징 형성)

  • 오상훈
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2003.05a
    • /
    • pp.351-355
    • /
    • 2003
  • Neurons in the mammalian visual cortex can be classified into the two main categories of simple cells and complex cells based on their response properties. Here, we find the complex features corresponding to the response of complex cells by applying the unsupervised independent component analysis network to input images. This result will be helpful to elucidate the information processing mechanism of neurons in primary visual cortex.

  • PDF

Review of Author Name Disambiguation Techniques for Citation Analysis (인용분석에서의 모호한 저자명 식별을 위한 방법들에 관한 고찰)

  • Kim, Hyun-Jung
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.23 no.3
    • /
    • pp.5-17
    • /
    • 2012
  • In citation analysis, author names are often used as the unit of analysis and some authors are indexed under the same name in bibliographic databases where the citation counts are obtained from. There are many techniques for author name disambiguation, using supervised, unsupervised, or semisupervised learning algorithms. Unsupervised approach uses machine learning algorithms to extract necessary bibliographic information from large-scale databases and digital libraries, while supervised approaches use manually built training datasets for clustering author groups for combining them with learning algorithms for author name disambiguation. The study examines various techniques for author name disambiguation in the hope for finding an aid to improve the precision of citation counts in citation analysis, as well as for better results in information retrieval.

Traffic Attributes Correlation Mechanism based on Self-Organizing Maps for Real-Time Intrusion Detection (실시간 침입탐지를 위한 자기 조직화 지도(SOM)기반 트래픽 속성 상관관계 메커니즘)

  • Hwang, Kyoung-Ae;Oh, Ha-Young;Lim, Ji-Young;Chae, Ki-Joon;Nah, Jung-Chan
    • The KIPS Transactions:PartC
    • /
    • v.12C no.5 s.101
    • /
    • pp.649-658
    • /
    • 2005
  • Since the Network based attack Is extensive in the real state of damage, It is very important to detect intrusion quickly at the beginning. But the intrusion detection using supervised learning needs either the preprocessing enormous data or the manager's analysis. Also it has two difficulties to detect abnormal traffic that the manager's analysis might be incorrect and would miss the real time detection. In this paper, we propose a traffic attributes correlation analysis mechanism based on self-organizing maps(SOM) for the real-time intrusion detection. The proposed mechanism has three steps. First, with unsupervised learning build a map cluster composed of similar traffic. Second, label each map cluster to divide the map into normal traffic and abnormal traffic. In this step there is a rule which is created through the correlation analysis with SOM. At last, the mechanism would the process real-time detecting and updating gradually. During a lot of experiments the proposed mechanism has good performance in real-time intrusion to combine of unsupervised learning and supervised learning than that of supervised learning.

Deduction of regional characteristics using environmental spatial information and SOM (Self-Organizing map) for natural park zoning - Focused on Taeanhaean National Park - (자연공원 용도지구 설정을 위한 환경공간정보와 SOM(Self-Organizing map)을 활용한 지역 특성 도출 - 태안해안국립공원을 대상으로 -)

  • Lee, Sung-Hee;Son, Yong-Hoon
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.26 no.3
    • /
    • pp.1-17
    • /
    • 2023
  • Korea's natural parks are managed by dividing them into four use districts: nature preservation district, natural environment district, cultural heritage district, and park village district within the park under the goal of 'conservation and sustainable use of natural parks'. However, the use districts divided in this way are designated by reflecting the results derived from the simple drawing overlapping method, and there is a limit in that objective and scientific evidence for this is insufficient. In addition, in Taeanhaean National Park, the case of this study, only a very small area of less than 1% of the nature preservation district is designated, and the natural environment district that serves as a buffer space is designated on an excessively wide scale, making it difficult to efficiently manage the national park. Therefore, the use district is not fulfilling its role. In this study, the purpose of this study was to present a method for analyzing the spatial characteristics of natural parks using environmental indicators and unsupervised learning analysis methods to set the use districts of natural parks. In this study, evaluation indicators that can evaluate the natural and human environments were derived, and the distribution patterns for each indicator were analyzed. Afterwards, by applying Self-Organizing Map (SOM) analysis, one of the unsupervised learning analysis methods, districts with similar characteristics were derived in Taeanhaean National Park, and the characteristics of each district were analyzed. As a result of the study, 7 districts with different characteristics were derived in Taeanhaean National Park, and by examining the contribution of each indicator together, it was possible to reveal that each district had different representative characteristics even though it was an adjacent area. This study evaluated natural parks by comprehensively considering the indicators of the natural and human environments. In addition, the SOM method used in the study is meaningful in that it can provide scientific and objective grounds for the existing zoning and apply it to the management plan.

Deforestation Analysis Using Unsupervised Change Detection Based on ITPCA (ITPCA 기반의 무감독 변화탐지 기법을 이용한 산림황폐화 분석)

  • Choi, Jaewan;Park, Honglyun;Park, Nyunghee;Han, Soohee;Song, Jungheon
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.6_3
    • /
    • pp.1233-1242
    • /
    • 2017
  • In this study, we tried to analyze deforestation due to forest fire by using KOMPSAT satellite imagery. For deforestation analysis, unsupervised change detection algorithm is applied to multitemporal images. Through ITPCA (ITerative Principal Component Analysis) of NDVI (Normalized Difference Vegetation Index) generated from multitemporal satellite images before and after forest fire, changed areas due to deforestation are extracted. In addition, a post-processing method using SRTM (Shuttle Radar Topographic Mission) data is involved in order to minimize the error of change detection. As a result of the experiment using KOMPSAT-2 and 3 images, it was confirmed that changed areas due to deforestation can be efficiently extracted.

Correlation Analysis of Event Logs for System Fault Detection (시스템 결함 분석을 위한 이벤트 로그 연관성에 관한 연구)

  • Park, Ju-Won;Kim, Eunhye;Yeom, Jaekeun;Kim, Sungho
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.39 no.2
    • /
    • pp.129-137
    • /
    • 2016
  • To identify the cause of the error and maintain the health of system, an administrator usually analyzes event log data since it contains useful information to infer the cause of the error. However, because today's systems are huge and complex, it is almost impossible for administrators to manually analyze event log files to identify the cause of an error. In particular, as OpenStack, which is being widely used as cloud management system, operates with various service modules being linked to multiple servers, it is hard to access each node and analyze event log messages for each service module in the case of an error. For this, in this paper, we propose a novel message-based log analysis method that enables the administrator to find the cause of an error quickly. Specifically, the proposed method 1) consolidates event log data generated from system level and application service level, 2) clusters the consolidated data based on messages, and 3) analyzes interrelations among message groups in order to promptly identify the cause of a system error. This study has great significance in the following three aspects. First, the root cause of the error can be identified by collecting event logs of both system level and application service level and analyzing interrelations among the logs. Second, administrators do not need to classify messages for training since unsupervised learning of event log messages is applied. Third, using Dynamic Time Warping, an algorithm for measuring similarity of dynamic patterns over time increases accuracy of analysis on patterns generated from distributed system in which time synchronization is not exactly consistent.