• Title/Summary/Keyword: clustering techniques

Search Result 528, Processing Time 0.023 seconds

K-means clustering analysis and differential protection policy according to 3D NAND flash memory error rate to improve SSD reliability

  • Son, Seung-Woo;Kim, Jae-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.11
    • /
    • pp.1-9
    • /
    • 2021
  • 3D-NAND flash memory provides high capacity per unit area by stacking 2D-NAND cells having a planar structure. However, due to the nature of the lamination process, there is a problem that the frequency of error occurrence may vary depending on each layer or physical cell location. This phenomenon becomes more pronounced as the number of write/erase(P/E) operations of the flash memory increases. Most flash-based storage devices such as SSDs use ECC for error correction. Since this method provides a fixed strength of data protection for all flash memory pages, it has limitations in 3D NAND flash memory, where the error rate varies depending on the physical location. Therefore, in this paper, pages and layers with different error rates are classified into clusters through the K-means machine learning algorithm, and differentiated data protection strength is applied to each cluster. We classify pages and layers based on the number of errors measured after endurance test, where the error rate varies significantly for each page and layer, and add parity data to stripes for areas vulnerable to errors to provides differentiate data protection strength. We show the possibility that this differentiated data protection policy can contribute to the improvement of reliability and lifespan of 3D NAND flash memory compared to the protection techniques using RAID-like or ECC alone.

Water resources potential assessment of ungauged catchments in Lake Tana Basin, Ethiopia

  • Damtew, Getachew Tegegne;Kim, Young-Oh
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2015.05a
    • /
    • pp.217-217
    • /
    • 2015
  • The objective of this study was mainly to evaluate the water resources potential of Lake Tana Basin (LTB) by using Soil and Water Assessment Tool (SWAT). From SWAT simulation of LTB, about 5236 km2 area of LTB is gauged watershed and the remaining 9878 km2 area is ungauged watershed. For calibration of model parameters, four gauged stations were considered namely: Gilgel Abay, Gummera, Rib, and Megech. The SWAT-CUP built-in techniques, particle swarm optimization (PSO) and generalized likelihood uncertainty estimation (GLUE) method was used for calibration of model parameters and PSO method were selected for the study based on its performance results in four gauging stations. However the level of sensitivity of flow parameters differ from catchment to catchment, the curve number (CN2) has been found the most sensitive parameters in all gauged catchments. To facilitate the transfer of data from gauged catchments to ungauged catchments, clustering of hydrologic response units (HRUs) were done based on physical similarity measured between gauged and ungauged catchment attributes. From SWAT land use/ soil use/slope reclassification of LTB, a total of 142 HRUs were identified and these HRUs are clustered in to 39 similar hydrologic groups. In order to transfer the optimized model parameters from gauged to ungauged catchments based on these clustered hydrologic groups, this study evaluates three parameter transfer schemes: parameters transfer based on homogeneous regions (PT-I), parameter transfer based on global averaging (PT-II), and parameter transfer by considering Gilgel Abay catchment as a representative catchment (PT-III) since its model performance values are better than the other three gauged catchments. The performance of these parameter transfer approach was evaluated based on values of Nash-Sutcliffe efficiency (NSE) and coefficient of determination (R2). The computed NSE values was found to be 0.71, 0.58, and 0.31 for PT-I, PT-II and PT-III respectively and the computed R2 values was found to be 0.93, 0.82, and 0.95 for PT-I, PT-II, and PT-III respectively. Based on the performance evaluation criteria, PT-I were selected for modelling ungauged catchments by transferring optimized model parameters from gauged catchment. From the model result, yearly average stream flow for all homogeneous regions was found 29.54 m3/s, 112.92 m3/s, and 130.10 m3/s for time period (1989 - 2005) for region-I, region-II, and region-III respectively.

  • PDF

A study on the number of passengers using the subway stations in Seoul (데이터마이닝 기법을 이용한 서울시 지하철역 승차인원 예측)

  • Cho, Soojin;Kim, Bogyeong;Kim, Nahyun;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.111-128
    • /
    • 2019
  • Subways are eco-friendly public transportation that can transport large numbers of passengers safely and quickly. It is necessary to predict the accurate number of passengers in order to increase public interest in subway. This study groups stations on Lines 1 to 9 of the Seoul Metropolitan Subway using clustering analysis. We propose one final prediction model for all stations and three optimal prediction models for each cluster. We found three groups of stations out of 294 total subway stations. The Group 1 area is industrial and commercial, the Group 2 ares is residential and commercial, and the Group 3 area is residential districts. Various data mining techniques were conducted for each group, as well as driving some influential factors on demand prediction. We use our model to predict the number of passengers for 8 new stations which are part of the 3rd extension plan of Seoul metro line 9 opened in October 2018. The estimated average number of passengers per hour is from 241 to 452 and the estimated maximum number of passengers per hour is from 969 to 1515. We believe our analysis can help improve the efficiency of public transportation policy.

A Study on Detection Technique of Anomaly Signal for Financial Loan Fraud Based on Social Network Analysis (소셜 네트워크 분석 기반의 금융회사 불법대출 이상징후 탐지기법에 관한 연구)

  • Wi, Choong-Ki;Kim, Hyoung-Joong;Lee, Sang-Jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.22 no.4
    • /
    • pp.851-868
    • /
    • 2012
  • After the financial crisis in 2008, the financial market still seems to be unstable with expanding the insolvency of the financial companies' real estate project financing loan in the aftermath of the lasted real estate recession. Especially after the illegal actions of people's financial institutions disclosed, while increased the anxiety of economic subjects about financial markets and weighted in the confusion of financial markets, the potential risk for the overall national economy is increasing. Thus as economic recession prolongs, the people's financial institutions having a weak profit structure and financing ability commit illegal acts in a variety of ways in order to conceal insolvent assets. Especially it is hard to find the loans of shareholder and the same borrower sharing credit risk in advance because most of them usually use a third-party's name bank account. Therefore, in order to effectively detect the fraud under other's name, it is necessary to analyze by clustering the borrowers high-related to a particular borrower through an analysis of association between the whole borrowers. In this paper, we introduce Analysis Techniques for detecting financial loan frauds in advance through an analysis of association between the whole borrowers by extending SNA(social network analysis) which is being studied by focused on sociology recently to the forensic accounting field of the financial frauds. Also this technique introduced in this pager will be very useful to regulatory authorities or law enforcement agencies at the field inspection or investigation.

Location Classification and Its Utilization for Illegal Parking Enforcement: Focusing on the Case of Gyeonggi (불법주정차 단속을 위한 지역(장소) 분류 및 활용 방안: 경기도를 중심으로)

  • Hyeon Han;So-yeon Choe;So-Hyun Lee
    • Information Systems Review
    • /
    • v.25 no.4
    • /
    • pp.113-130
    • /
    • 2023
  • Due to economic development and increasing gross national income, the number of automobiles continues to rise, leading to a serious issue of illegal parking due to limited road conditions and insufficient parking facilities. Illegal parking causes significant inconvenience and displeasure to people and can even result in accidents and loss of lives. The severity of accidents and their consequences, related to the growing number of vehicles and illegal parking, is escalating, particularly in the metropolitan areas. Consequently, efforts are being made to address this problem as a cause of social issues and come up with measures to reduce illegal parking. In particular, half of the public complaints in the metropolitan area are related to illegal parking, and the highest physical and human damage occurs in Gyeonggi. Thus, this study aims to use machine learning techniques based on data related to illegal parking in Suwon city, Gyeonggi, to categorize regional characteristics and propose effective measures to crack down on illegal parking. Additionally, practical, social, policy, and legal measures to decrease illegal parking in the metropolitan area are suggested. This study has academic significance in that it solved the problem of illegal parking, which is mentioned as one of the social problems that cause traffic congestion, by classifying regional characteristics using K-prototype, a machine learning algorithm. Furthermore, the results of this study contribute to practical and social aspects by providing measures to decrease illegal parking in the metropolitan area.

Energy Balancing Distribution Cluster With Hierarchical Routing In Sensor Networks (계층적 라우팅 경로를 제공하는 에너지 균등분포 클러스터 센서 네트워크)

  • Mary Wu
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.3
    • /
    • pp.166-171
    • /
    • 2023
  • Efficient energy management is a very important factor in sensor networks with limited resources, and cluster techniques have been studied a lot in this respect. However, a problem may occur in which energy use of the cluster header is concentrated, and when the cluster header is not evenly distributed over the entire area but concentrated in a specific area, the transmission distance of the cluster members may be large or very uneven. The transmission distance can be directly related to the problem of energy consumption. Since the energy of a specific node is quickly exhausted, the lifetime of the sensor network is shortened, and the efficiency of the entire sensor network is reduced. Thus, balanced energy consumption of sensor nodes is a very important research task. In this study, factors for balanced energy consumption by cluster headers and sensor nodes are analyzed, and a balancing distribution clustering method in which cluster headers are balanced distributed throughout the sensor network is proposed. The proposed cluster method uses multi-hop routing to reduce energy consumption of sensor nodes due to long-distance transmission. Existing multi-hop cluster studies sets up a multi-hop cluster path through a two-step process of cluster setup and routing path setup, whereas the proposed method establishes a hierarchical cluster routing path in the process of selecting cluster headers to minimize the overhead of control messages.

Analysis of Research Trends Related to drug Repositioning Based on Machine Learning (머신러닝 기반의 신약 재창출 관련 연구 동향 분석)

  • So Yeon Yoo;Gyoo Gun Lim
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.21-37
    • /
    • 2022
  • Drug repositioning, one of the methods of developing new drugs, is a useful way to discover new indications by allowing drugs that have already been approved for use in people to be used for other purposes. Recently, with the development of machine learning technology, the case of analyzing vast amounts of biological information and using it to develop new drugs is increasing. The use of machine learning technology to drug repositioning will help quickly find effective treatments. Currently, the world is having a difficult time due to a new disease caused by coronavirus (COVID-19), a severe acute respiratory syndrome. Drug repositioning that repurposes drugsthat have already been clinically approved could be an alternative to therapeutics to treat COVID-19 patients. This study intends to examine research trends in the field of drug repositioning using machine learning techniques. In Pub Med, a total of 4,821 papers were collected with the keyword 'Drug Repositioning'using the web scraping technique. After data preprocessing, frequency analysis, LDA-based topic modeling, random forest classification analysis, and prediction performance evaluation were performed on 4,419 papers. Associated words were analyzed based on the Word2vec model, and after reducing the PCA dimension, K-Means clustered to generate labels, and then the structured organization of the literature was visualized using the t-SNE algorithm. Hierarchical clustering was applied to the LDA results and visualized as a heat map. This study identified the research topics related to drug repositioning, and presented a method to derive and visualize meaningful topics from a large amount of literature using a machine learning algorithm. It is expected that it will help to be used as basic data for establishing research or development strategies in the field of drug repositioning in the future.

Identification of Employee Experience Factors and Their Influence on Job Satisfaction (직원경험 요인 파악 및 직무 만족도에 끼치는 영향력 분석)

  • Juhyeon Lee;So-Hyun Lee;Hee-Woong Kim
    • Information Systems Review
    • /
    • v.25 no.2
    • /
    • pp.181-203
    • /
    • 2023
  • With the fierce competition of companies for the attraction of outstanding individuals, job satisfaction of employees has been of importance. In this circumstance, many companies try to invest in job satisfaction improvement by finding employees' everyday experiences and difficulties. However, due to a lack of understanding of the employee experience, their investments are not paying off. This study examined the relationship between employee experience and job satisfaction using employee reviews and company ratings from Glassdoor, one of the largest employee communities worldwide. We use text mining techniques such as K-means clustering and LDA topic-based sentiment analysis to extract key experience factors by job level, and DistilBERT sentiment analysis to measure the sentiment score of each employee experience factor. The drawn employee experience factors and each sentiment score were analyzed quantitatively, and thereby relations between each employee experience factor and job satisfaction were analyzed. As a result, this study found that there is a significant difference between the workplace experiences of managers and general employees. In addition, employee experiences that affect job satisfaction also differed between positions, such as customer relationship and autonomy, which did not affect the satisfaction of managers. This study used text mining and quantitative modeling method based on theory of work adjustment so as to find and verify main factors of employee experience, and thus expanded research literature. In addition, the results of this study are applicable to the personnel management strategy for improving employees' job satisfaction, and are expected to improve corporate productivity ultimately.

A Time Series Forecasting Model with the Option to Choose between Global and Clustered Local Models for Hotel Demand Forecasting (호텔 수요 예측을 위한 전역/지역 모델을 선택적으로 활용하는 시계열 예측 모델)

  • Keehyun Park;Gyeongho Jung;Hyunchul Ahn
    • The Journal of Bigdata
    • /
    • v.9 no.1
    • /
    • pp.31-47
    • /
    • 2024
  • With the advancement of artificial intelligence, the travel and hospitality industry is also adopting AI and machine learning technologies for various purposes. In the tourism industry, demand forecasting is recognized as a very important factor, as it directly impacts service efficiency and revenue maximization. Demand forecasting requires the consideration of time-varying data flows, which is why statistical techniques and machine learning models are used. In recent years, variations and integration of existing models have been studied to account for the diversity of demand forecasting data and the complexity of the natural world, which have been reported to improve forecasting performance concerning uncertainty and variability. This study also proposes a new model that integrates various machine-learning approaches to improve the accuracy of hotel sales demand forecasting. Specifically, this study proposes a new time series forecasting model based on XGBoost that selectively utilizes a local model by clustering with DTW K-means and a global model using the entire data to improve forecasting performance. The hotel demand forecasting model that selectively utilizes global and regional models proposed in this study is expected to impact the growth of the hotel and travel industry positively and can be applied to forecasting in other business fields in the future.

A Study on the Design of Memorial in the Design Competition for Donghak Peasant Revolution Memorial Park (동학농민혁명 기념공원 설계공모에 나타난 메모리얼 설계 경향)

  • Lee, Jin-Wook;Sung, Jong-Sang;Son, Yong-Hoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.45 no.3
    • /
    • pp.66-79
    • /
    • 2017
  • In 2014, a Donghak Peasant Revolution Memorial Park design competition was held with various forms and techniques to convey mourning. This is a process of the reconsideration of memorial projects that are used to stimulate the collective memory and it is a meaningful resource for examining the consciousness of contemporary designers in regards to the memorial designs that are currently under planning in Korea. This study investigated the background of the Donghak commemorative projects that took place at the same site in a timely manner and analyzed the design competition through the existing literature research. Through this, it was seen that the memorial, which was formed by means of past political purposes, has changed into a way to collect various opinions and forms through open design competition. A framework of analysis prepared through multi-layer analysis is daily use, interaction and spontaneity, abstraction, temporality, locality, integration and harmony with surroundings. The results of this study are as follows. First, in order to convey memorial commemoration in everyday life, the projects organized scattered memorial spaces with special characteristics and linked them with daily activities program. Second, the projects used direct participation and emotional experiences to interact with monuments. Third, color, vertical elements, clustering, and park frame manipulation were used for abstract reproduction. Fourth, the projects introduce architecture and furniture that can be changed and plants for temporal change. Fifth, the previous terrain was restored and the setting of the scene was reproduced in order to make the site a space with place. Sixth, to improve the connection with existing monuments, the projects used techniques such as relaxation and the reinforcement of circulation lines and axes. Seventh, a path and a building conforming to the terrain were arranged for harmony with the surroundings.