• 제목/요약/키워드: Preprocessed data

검색결과 185건 처리시간 0.031초

Development of a Personalized Music Recommendation System Using MBTI Personality Types and KNN Algorithm

  • Chun-Ok Jang
    • International Journal of Advanced Culture Technology
    • /
    • 제12권3호
    • /
    • pp.427-433
    • /
    • 2024
  • This study aims to develop a personalized music digital therapeutic based on MBTI personality types and apply it to depression treatment. In the data collection stage, participants' MBTI personality types and music preferences were surveyed to build a database, which was then preprocessed as input data for the KNN model. The KNN model calculates the distance between personality types using Euclidean distance and recommends music suitable for the user's MBTI type based on the nearest K neighbors' data. The developed system was tested with new participants, and the system and algorithm were improved based on user feedback. In the final validation stage, the system's effectiveness in alleviating depression was evaluated. The results showed that the MBTI personality type-based music recommendation system provides a personalized music therapy experience, positively impacting emotional stability and stress reduction. This study suggests the potential of nonpharmacological treatments and demonstrates that a personalized treatment experience can offer more effective and safer methods for treating depression.

Big Data Smoothing and Outlier Removal for Patent Big Data Analysis

  • Choi, JunHyeog;Jun, Sunghae
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권8호
    • /
    • pp.77-84
    • /
    • 2016
  • In general statistical analysis, we need to make a normal assumption. If this assumption is not satisfied, we cannot expect a good result of statistical data analysis. Most of statistical methods processing the outlier and noise also need to the assumption. But the assumption is not satisfied in big data because of its large volume and heterogeneity. So we propose a methodology based on box-plot and data smoothing for controling outlier and noise in big data analysis. The proposed methodology is not dependent upon the normal assumption. In addition, we select patent documents as target domain of big data because patent big data analysis is a important issue in management of technology. We analyze patent documents using big data learning methods for technology analysis. The collected patent data from patent databases on the world are preprocessed and analyzed by text mining and statistics. But the most researches about patent big data analysis did not consider the outlier and noise problem. This problem decreases the accuracy of prediction and increases the variance of parameter estimation. In this paper, we check the existence of the outlier and noise in patent big data. To know whether the outlier is or not in the patent big data, we use box-plot and smoothing visualization. We use the patent documents related to three dimensional printing technology to illustrate how the proposed methodology can be used for finding the existence of noise in the searched patent big data.

Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability

  • Jung, Yong;Seo, Hwa-Jeong;Park, Yu-Rang;Kim, Ji-Hun;Bien, Sang Jay;Kim, Ju-Han
    • Genomics & Informatics
    • /
    • 제9권1호
    • /
    • pp.19-27
    • /
    • 2011
  • Gene Expression Omnibus (GEO) has kept the largest amount of gene-expression microarray data that have grown exponentially. Microarray data in GEO have been generated in many different formats and often lack standardized annotation and documentation. It is hard to know if preprocessing has been applied to a dataset or not and in what way. Standard-based integration of heterogeneous data formats and metadata is necessary for comprehensive data query, analysis and mining. We attempted to integrate the heterogeneous microarray data in GEO based on Minimum Information About a Microarray Experiment (MIAME) standard. We unified the data fields of GEO Data table and mapped the attributes of GEO metadata into MIAME elements. We also discriminated non-preprocessed raw datasets from others and processed ones by using a two-step classification method. Most of the procedures were developed as semi-automated algorithms with some degree of text mining techniques. We localized 2,967 Platforms, 4,867 Series and 103,590 Samples with covering 279 organisms, integrated them into a standard-based relational schema and developed a comprehensive query interface to extract. Our tool, GEOQuest is available at http://www.snubi.org/software/GEOQuest/.

LFFD 및 SFFD를 이용한 3차원 라스트 데이터 생성시스템 개발 (Three Dimensional Last Data Generation System Design Utilizing SFFD and LFFD)

  • 김시경;박인덕
    • 제어로봇시스템학회논문지
    • /
    • 제12권2호
    • /
    • pp.113-118
    • /
    • 2006
  • A new last design approach based on the Limb line FFD (LFFD) and Scale factor FFD (SFFD) is presented in this paper. The proposed last design method utilizes the dynamic trimmed parametric patches for the measured foot 3D data and last 3D data. Furthermore, the proposed last data generation system utilizes cross sectional data extracted obtained from the measured 3D foot data. First, the last design rule of the LFFD is constructed on the FFD lattice based on foot last shape analysis. Secondly, SFFD is constructed on the LFFD new lattice based on scale factor deformation. The scale factor is constructed on the boundary edges of polygonized patch and the cross section last data boundary edge of the polygon object. Suppose the two boundary curves have been preprocessed so that they run in the same direction and they forms the SF(Scale Factor). In addition, the control points of FFD lattice are derived with cross. sectional data interpolation methods from a finite set of 3D foot data.

A Container Orchestration System for Process Workloads

  • Jong-Sub Lee;Seok-Jae Moon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제15권4호
    • /
    • pp.270-278
    • /
    • 2023
  • We propose a container orchestration system for process workloads that combines the potential of big data and machine learning technologies to integrate enterprise process-centric workloads. This proposed system analyzes big data generated from industrial automation to identify hidden patterns and build a machine learning prediction model. For each machine learning case, training data is loaded into a data store and preprocessed for model training. In the next step, you can use the training data to select and apply an appropriate model. Then evaluate the model using the following test data: This step is called model construction and can be performed in a deployment framework. Additionally, a visual hierarchy is constructed to display prediction results and facilitate big data analysis. In order to implement parallel computing of PCA in the proposed system, several virtual systems were implemented to build the cluster required for the big data cluster. The implementation for evaluation and analysis built the necessary clusters by creating multiple virtual machines in a big data cluster to implement parallel computation of PCA. The proposed system is modeled as layers of individual components that can be connected together. The advantage of a system is that components can be added, replaced, or reused without affecting the rest of the system.

기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구 (Improving Efficiency of Food Hygiene Surveillance System by Using Machine Learning-Based Approaches)

  • 조상구;조승용
    • 한국빅데이터학회지
    • /
    • 제5권2호
    • /
    • pp.53-67
    • /
    • 2020
  • 본 연구는 가공식품의 제조·가공 업소를 대상으로 기계학습 분야의 지도학습(Supervised Learning) 예측 모형을 적용하여 부적합이 예상되는 업체를 사전에 적발하는 단속 선별시스템을 마련하여 단속 활동의 효율성을 높이고자 하였다. 본 연구에서는 머신러닝의 예측 모델링을 위한 목적 정의, 데이터의 기초 분석과 시각화, 특성 변수 도출 및 예측 모형의 선정 및 예측 등으로 기계학습 수행의 표준적인 절차에 따라 연구를 수행하였다. 종속변수는 2014년도부터 2018년까지 과거 5년 동안 지도점검 적발 건수로 설정하였고, 목적함수는 실제 부적합업체를 사전에 판정하여 단속활동이 이루어지는 것을 최대화하는 것으로 하였다. 제조가공업소의 매출액, 영업일수, 종업원 수 등 기본속성뿐만 아니라 과거 지도점검 단속 이력 정보를 반영하여 자료를 재구성하였다. 특성 변수 추출 방법을 적용하여 부적합 판정에 영향을 미치는 업체 위험, 품목 위험, 환경 위험 및 과거 위반 이력 등을 특성 변수로 도출하여 머신러닝 알고리즘을 데이터에 적용하였다. 랜덤포레스트 모형이 식품의약품안전처 지도점검 업무 목적에 가장 적합한 것으로 나타났다. 본 연구결과를 바탕으로 식품안전 관리 국가 사무가 데이터기반의 과학적인 행정 체계로 발전할 수 있는 기반이 되기를 기대한다.

빅데이터의 정규화 전처리과정이 기계학습의 성능에 미치는 영향 (Effectiveness of Normalization Pre-Processing of Big Data to the Machine Learning Performance)

  • 조준모
    • 한국전자통신학회논문지
    • /
    • 제14권3호
    • /
    • pp.547-552
    • /
    • 2019
  • 최근, 빅데이터 분야에서는 빅 데이터의 양적 팽창이 주요 이슈로 떠오르고 있다. 더군다나 이러한 빅데이터는 기계학습의 입력값으로 사용되어지고 있으며 이들의 성능을 향상시키기 위해 정규화 전처리가 필요하다. 이러한 성능은 빅데이터 컬럼의 범위나 정규화 전처리 방식에 따라 크게 좌우된다. 본 논문에서는 다양한 종류의 정규화 전처리 방식과 빅데이터 컬럼의 범위를 조절하면서 서포트벡터머신(SVM)의 기계학습방식에 적용함으로써 더욱 효과적인 정규화 전처리 방식을 파악하고자 하였다. 이를 위하여 파이썬언어와 주피터 노트북 환경에서 기계학습을 수행하고 분석하였다.

A data fusion method for bridge displacement reconstruction based on LSTM networks

  • Duan, Da-You;Wang, Zuo-Cai;Sun, Xiao-Tong;Xin, Yu
    • Smart Structures and Systems
    • /
    • 제29권4호
    • /
    • pp.599-616
    • /
    • 2022
  • Bridge displacement contains vital information for bridge condition and performance. Due to the limits of direct displacement measurement methods, the indirect displacement reconstruction methods based on the strain or acceleration data are also developed in engineering applications. There are still some deficiencies of the displacement reconstruction methods based on strain or acceleration in practice. This paper proposed a novel method based on long short-term memory (LSTM) networks to reconstruct the bridge dynamic displacements with the strain and acceleration data source. The LSTM networks with three hidden layers are utilized to map the relationships between the measured responses and the bridge displacement. To achieve the data fusion, the input strain and acceleration data need to be preprocessed by normalization and then the corresponding dynamic displacement responses can be reconstructed by the LSTM networks. In the numerical simulation, the errors of the displacement reconstruction are below 9% for different load cases, and the proposed method is robust when the input strain and acceleration data contains additive noise. The hyper-parameter effect is analyzed and the displacement reconstruction accuracies of different machine learning methods are compared. For experimental verification, the errors are below 6% for the simply supported beam and continuous beam cases. Both the numerical and experimental results indicate that the proposed data fusion method can accurately reconstruct the displacement.

신경회로망을 이용한 소결기 팰릿 속도 제어 (Pallet speed control in a sintering plant using neural networks)

  • 장민;조성준
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 1999년도 춘계공동학술대회-지식경영과 지식공학
    • /
    • pp.261-270
    • /
    • 1999
  • Sintering transforms powdered ore into lumped ore so that the latter can be used in a blast furnace. The powdered or combined with coke and other materials is loaded into a container and moved along by a pallet while the ignited coke burns. The speed by which the pallet moves determines how much sintering takes place. Since the process is complicated and lacks an accurate mathematical model, human operators manually control the speed by monitoring various factors in the plant. In this paper, we propose a neural network-based pallet speed controller which copies human operator knowledge. Actual process data were collected from a sintering plant for eight months and preprocessed to remove noisy and inconsistent data. A multilayer perceptron was trained using a back-propagation learning algorithm. In on-line testing at the sinter plant, the proposed model reliably controlled pallet speed during normal operation without the help of human operators. Moreover, the quality and productivity was as good as with human operators.

  • PDF

Impedance Spectroscopy를 이용한 토양 수분함량 센서의 주요 설계인자 분석 (Analysis of Main Design Factors for Developing a Soil Water Content Sensor Using Impedance Spectroscopy)

  • 이동훈;조용진;장영창;이규승
    • Journal of Biosystems Engineering
    • /
    • 제33권4호
    • /
    • pp.269-275
    • /
    • 2008
  • This study was conducted to design an impedance sensor that can measure soil water content of soils. Partial least square regression (PLSR) was applied to soil impedance data preprocessed with a smoothing method. An optimal sub-spectrum size and wavelength range were determined by comparing the coefficient of determination ($R^2$) and root mean square error (RMSE) of the PLSR models obtained using soil impedance data. various PLS analysis. Based on the PLSR analysis, it would be concluded that the optimal spectrum measurement range was $32.0{\sim}50.0\;MHz$ with the optimal sub-spectrum size of about 18.5 MHz.