• Title/Summary/Keyword: PCA(Principal Component Analysis

Search Result 1,243, Processing Time 0.034 seconds

Apartment Price Prediction Using Deep Learning and Machine Learning (딥러닝과 머신러닝을 이용한 아파트 실거래가 예측)

  • Hakhyun Kim;Hwankyu Yoo;Hayoung Oh
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.2
    • /
    • pp.59-76
    • /
    • 2023
  • Since the COVID-19 era, the rise in apartment prices has been unconventional. In this uncertain real estate market, price prediction research is very important. In this paper, a model is created to predict the actual transaction price of future apartments after building a vast data set of 870,000 from 2015 to 2020 through data collection and crawling on various real estate sites and collecting as many variables as possible. This study first solved the multicollinearity problem by removing and combining variables. After that, a total of five variable selection algorithms were used to extract meaningful independent variables, such as Forward Selection, Backward Elimination, Stepwise Selection, L1 Regulation, and Principal Component Analysis(PCA). In addition, a total of four machine learning and deep learning algorithms were used for deep neural network(DNN), XGBoost, CatBoost, and Linear Regression to learn the model after hyperparameter optimization and compare predictive power between models. In the additional experiment, the experiment was conducted while changing the number of nodes and layers of the DNN to find the most appropriate number of nodes and layers. In conclusion, as a model with the best performance, the actual transaction price of apartments in 2021 was predicted and compared with the actual data in 2021. Through this, I am confident that machine learning and deep learning will help investors make the right decisions when purchasing homes in various economic situations.

Variations in Ecological Niche of Quercus variabilis and Quercus acutissima Leaf Morphological Characters in Response to Moisture and Nutrient Gradient Treatments under Climate Change Conditions (기후변화 조건에서 수분구배 및 영양소 구배에 따른 굴참나무와 상수리나무 잎 형태적 특성의 생태지위 변화)

  • Park, Yeo-Bin;Kim, Eui-Joo;Park, Jae-Hoon;Kim, Yoon-Seo;Park, Ji-Won;Lee, Jung-Min;You, Young-Han
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.27 no.2
    • /
    • pp.43-53
    • /
    • 2024
  • This study attempted to elucidate the ecological niches and influencing environmental factors of Quercus variabilis and Quercus acutissima, which are representative deciduous broad-leaved trees in Korean forests, taxonomically close and genetically similar, under climate change conditions. Under climate change conditions induced by increased CO2 and temperature, soil moisture and nutrient environments were manipulated in four gradients. At the end of the growing, plants were harvested to measure growth responses, calculate ecological niches, and compare them with those of the control. Eperimental plants were grown for 180 days in a glass greenhouse designed with four gradients each for soil moisture and nutrient environments under climate change conditions induced by increased CO2 and temperature. After harvesting, growth responses of leaf traits were measured, ecological niches were calculated, and these were compared with those of the control groups. Furthermore, the responses of the two species' populations were interpreted using principal component analysis(PCA) based on leaf trait measurements. As a result, under climate change conditions, the ecological niche breadth for moisture environment was broader for Quercus variabilis than Quercus acutissima, whereas for the nutrient environment, Quercus acutissima exhibited a broader niche breadth than Quercus variabilis. And the rate of change in ecological niche breadth due to climate change decreased for Quercus variabilis in both moisture and nutrient environments, while for Quercus acutissima, it increased in the moisture environment but decreased in the nutrient environment. Additionally, in terms of group responses, both Quercus variabilis and Quercus acutissima expanded their ecological niches under climate change conditions in both soil moisture and nutrient conditions, with Quercus acutissima exhibiting a broader niche than Quercus variabilis under nutrient conditions. These results indicate that the changes in leaf morphological characteristics and the responses of individuals reflecting them vary not only under climate change conditions but also depending on environmental factors.

A study on ESG Management Guidelines for Small and Medium-sized Logistics Enterprises (중소·중견 물류기업 ESG 경영 이행 가이드라인에 관한 연구)

  • Maowei Chen;Hyangsook Lee;Kyongjun Yun
    • Journal of Korea Port Economic Association
    • /
    • v.39 no.4
    • /
    • pp.147-161
    • /
    • 2023
  • As global challenges, particularly climate change, become more pressing, there is a growing global awareness of Environmental, Social, and Governance (ESG) management. Given the crucial role played by the logistics industry in the complex network of the global supply chain, various societal stakeholders are emphasizing the necessity for logistics entities to practice ESG management. Despite the comprehensive ESG guidelines established by Korea for all enterprises, a notable limitation arises from its inadequate consideration of the distinctive features inherent to logistics enterprises, especially those of a smaller and medium scale. Accordingly, this study conducts a thorough examination of existing ESG guidelines, sustainable management approaches in large-scale logistics enterprises, and prior research to identify potential ESG management diagnostic criteria relevant to small and medium-sized logistics enterprises, including aspects such as Public(P), Environmental(E), Social(S), and Governance(G). To streamline the diagnostic criteria, taking into account the unique characteristics of small and medium-sized logistics enterprises, this study conducts a survey involving 60 logistics company personnel and experts from academic and research domains. The collected data undergoes Principal Component Analysis (PCA), revealing that the four dimensions of information disclosure can be consolidated into a single dimension. Additionally, environmental criteria reduce from 16 to 3 items, societal considerations decrease from 22 to 7 items, and governance structures distill from 20 to 5 items. This empirical endeavor is deemed significant in presenting tailored ESG management diagnostic criteria aligned with the specificities of small and medium-sized logistics enterprises. The findings of this study are expected to serve as a foundational resource for the development of guidelines by relevant entities, promoting the wider adoption of ESG management practices in the sphere of small and medium-sized logistics enterprises in the near future. population coming from areas other than Gwangyang, where Gwangyang Port is located.

Electronic Sensors and Multivariate Approaches for Taste and Odor in Korean Soups and Stews (전자센서와 다변량 분석을 이용한 국내 국·탕류의 향미 특성 분석)

  • Boo, Chang Guk;Hong, Seong Jun;Cho, Jin-Ju;Shin, Eui-Cheol
    • Journal of Food Hygiene and Safety
    • /
    • v.35 no.5
    • /
    • pp.430-437
    • /
    • 2020
  • This is an approach study on the sensory properties (taste and odor) of 15 types of Korean conventional soups and stews using electronic nose and tongue. The relative sensor intensity for the taste components of the samples using electronic tongue was demonstrated. By SRS (sourness) sensor, sogogi-baechuguk (beef and cabbage soup) had the highest rate of 9.0. The STS (saltiness) sensor showed the highest score of 8.2 for ojingeoguk (squid soup). For the UMS (umami) sensor, which identifies savoriness, the sogogi-baechuguk was the highest at 10.1. The SWS (sweetness) sensors showed relatively little difference, with sigeumchi-doenjangguk (spinach and bean paste soup) at the highest of 7.3. According to the BRS sensor, which tests for bitterness, the siraegi-doenjangguk (dried radish green and bean paste soup) was the highest at 7.8. By principal component analysis (PCA), we observed variances of 56.21% in principal component 1 (PC1) and 25.23% in PC2. For each flavor component, we observed -0.95 and -0.20 for factor loading of PC1 and PC2 for SRS sensors, 0.96 and 0.14 for STS sensors, and -0.94 and 0.22 for PC1 and PC2 for UMS sensors, and PC1 and 0.22 for PC1 and PC2 loading for SWS sensors. The similarity between the samples identified by clustering analysis was largely identified by 4 clusters. A total of 25 kinds of volatile compounds in 15 samples were identified, and the ones showing the highest relative content in all samples were identified as ethanol and 2-methylthiophhene. The main ingredient analysis confirmed variances of 28.54% in PC1 and 20.80% in PC2 as a result of the pattern for volatile compounds in 15 samples. In the cluster analysis, it was found to be largely classified into 3 clusters. The data in this study can be used for a sensory property database of conventional Korean soups and stews using electronic sensors.

Accuracy of HF radar-derived surface current data in the coastal waters off the Keum River estuary (금강하구 연안역에서 HF radar로 측정한 유속의 정확도)

  • Lee, S.H.;Moon, H.B.;Baek, H.Y.;Kim, C.S.;Son, Y.T.;Kwon, H.K.;Choi, B.J.
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.13 no.1
    • /
    • pp.42-55
    • /
    • 2008
  • To evaluate the accuracy of currents measured by HF radar in the coastal sea off Keum River estuary, we compared the facing radial vectors of two HF radars, and HF radar-derived currents with in-situ measurement currents. Principal component analysis was used to extract regression line and RMS deviation in the comparison. When two facing radar's radial vectors at the mid-point of baseline are compared, RMS deviation is 4.4 cm/s in winter and 5.4 cm/s in summer. When GDOP(Geometric Dilution of Precision) effect is corrected from the RMS deviations that is analyzed from the comparison between HF radar-derived and current-metermeasured currents, the error of velocity combined by HF radar-derived current is less than 5.1 cm/s in the stations having moderate GDOP values. These two results obtained from different method suggest that the lower limit of HF radar-derived current's accuracy is 5.4 cm/s in our study area. As mentioned in previous researches, RMS deviations become large in the stations located near the islands and increase as a function of mean distance from the radar site due to decrease of signal-to-noise level and the intersect angle of radial vectors. We found that an uncertain error bound of HF radar-derived current can be produced from the separation process of RMS deviations using GDOP value if GDOP value for each component is very close and RMS deviations obtained from current component comparison are also close. When the current measured in the stations having moderate GDOP values is separated into tidal and subtidal current, characteristics of tidal current ellipses analyzed from HF radar-derived current show a good agreement with those from current-meter-measured current, and time variation of subtidal current showed a response reflecting physical process driven by wind and density field.

A New Face Tracking and Recognition Method Adapted to the Environment (환경에 적응적인 얼굴 추적 및 인식 방법)

  • Ju, Myung-Ho;Kang, Hang-Bong
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.385-394
    • /
    • 2009
  • Face tracking and recognition are difficult problems because the face is a non-rigid object. The main reasons for the failure to track and recognize the faces are the changes of a face pose and environmental illumination. To solve these problems, we propose a nonlinear manifold framework for the face pose and the face illumination normalization processing. Specifically, to track and recognize a face on the video that has various pose variations, we approximate a face pose density to single Gaussian density by PCA(Principle Component Analysis) using images sampled from training video sequences and then construct the GMM(Gaussian Mixture Model) for each person. To solve the illumination problem for the face tracking and recognition, we decompose the face images into the reflectance and the illuminance using the SSR(Single Scale Retinex) model. To obtain the normalized reflectance, the reflectance is rescaled by histogram equalization on the defined range. We newly approximate the illuminance by the trained manifold since the illuminance has almost variations by illumination. By combining these two features into our manifold framework, we derived the efficient face tracking and recognition results on indoor and outdoor video. To improve the video based tracking results, we update the weights of each face pose density at each frame by the tracking result at the previous frame using EM algorithm. Our experimental results show that our method is more efficient than other methods.

Identification and classification of fresh lubricants and used engine oils by GC/MS and bayesian model (GC/MS 분석과 베이지안 분류 모형을 이용한 새 윤활유와 사용 엔진 오일의 동일성 추적과 분류)

  • Kim, Nam Yee;Nam, Geum Mun;Kim, Yuna;Lee, Dong-Kye;Park, Seh Youn;Lee, Kyoungjae;Lee, Jaeyong
    • Analytical Science and Technology
    • /
    • v.27 no.1
    • /
    • pp.41-59
    • /
    • 2014
  • The aims of this work were the identification and the classification of fresh lubricants and used engine oils of vehicles for the application in forensic science field-80 kinds of fresh lubricants were purchased and 86 kinds of used engine oils were sampled from 24 kinds of diesel and gasoline vehicles with different driving conditions. The sample of lubricants and used engine oils were analyzed by GC/MS. The Bayesian model technique was developed for classification or identification. Both the wavelet fitting and the principal component analysis (PCA) techniques as a data dimension reduction were applied. In fresh lubricants classification, the rates of matching by Bayesian model technique with wavelet fitting and PCA were 97.5% and 96.7%, respectively. The Bayesian model technique with wavelet fitting was better to classify lubricants than it with PCA based on dimension reduction. And we selected the Bayesian model technique with wavelet fitting for classification of lubricants. The other experiment was the analysis of used engine oils which were collected from vehicles with the several mileage up to 5,000 km after replacing engine oil. The eighty six kinds of used engine oil sample with the mileage were collected. In vehicle classification (total 24 classes), the rate of matching by Bayesian model with wavelet fitting was 86.4%. However, in the vehicle's fuel type classification (whether it is gasoline vehicle or diesel vehicle, only total 2 classes), the rate of matching was 99.6%. In the used engine oil brands classification (total 6 classes), the rate of matching was 97.3%.

A Numerical Taxonomy of Korean Ilex (Aquifoliaceae) (한국산 감탕나무속(Ilex L.) 식물의 수리분류학적 연구)

  • Hwang, Seung-Hyun;Park, Seon-Joo;Kim, Joo-Hwan
    • Korean Journal of Plant Taxonomy
    • /
    • v.37 no.4
    • /
    • pp.401-418
    • /
    • 2007
  • We performed the numerical analyses of thirty two morphological characters for twenty four populations of eight Korean Ilex L. taxa. Principal component analyses showed that the first three principal components were related to the total covariance by 67.0%, and the proportions of PCl, PC2 and PC3 were 31.5%, 21.1%, and 14.4%, respectively. And the closely related characteristics to the PCl, PC2 and PC3 were some reproductive characters such as the morphology of sepal, petal, anther, pistil and fruits and vegetative characters such as the morphology of petiol and leaf margin, the trichomes on the twigs, the leaf duration. From the two dimensional plottings by the eigenvalues of PCl, PC2 and PC3, six grouped were clustered as Ilex integra, I cornuta, I x wandoensis, I. rotunda, I. macropoda and I. macropoda for. pseudomacropoda, I. crenata and I. crenata var. microphilla. The numerical analysis was useful for the taxonomy of Korean Ilex because it clearly seperated the populations of taxa included in this study. The identification key was provided with the diagnostic characters.

Comparison of Isoflavone Content and Composition in Soybean (Glycine max L. (Merr)) Germplasm

  • Hyemyeong Yoon;Yumi Choi;Myung-Chul Lee;Jeongyoon Yi;Sejong Oh;Sukyeung Lee;Hyunchoong Ok;Kebede Taye Desta
    • Proceedings of the Plant Resources Society of Korea Conference
    • /
    • 2020.08a
    • /
    • pp.101-101
    • /
    • 2020
  • Soybean is known as to have a several healthy ingredients. Among them, isoflavones are effective in reducing obesity, menopausal symptom. Isoflavones consist of 12 isomers, including Aglycon, Glucoside, Malonyl glucoside, Acetyl glucoside, and are usually found in soybean seeds. The content is determined by the sum of 12 isomers, and the content value difference between the varieties is huge. In this study, we investigated the agronomic traits, 12 isomer of isoflavone content and composition for 49 soybean germplasms. This germplasms were selected from the 23,000 germplasms with the highest total content of isoflavones possessed by the National Agrobiodiversity Center. Seed samples were cultivated in experimental field located in Jeonju City on April 04, 2019. Matured seeds were harvested and portions of each seed samples were oven-dried, pulverized, and analyzed for their isoflavone compositions using HPLC-DAD. The soybean samples showed distinction in their agronomic traits, isoflavone compositions and contents. The days to flowering ranged between 38 and 69 days while the days to maturity ranged between 103 and 156 days. The seed coat color of soybean germplasms was 24 in black, 10 in yellow, 2 in green, 5 in yellowish green, 4 in green with black spot, 4 in pale yellow. The germplasm with the highest total content of isoflavones was the IT178054(1257.61±7.98 ㎍/g), but the germplasms containing the largest number of isoflavone isomers were IT274592, IT275005, both germplasms had 11 isoflavone isomers excluding Malonyl glycitin. The largest source of Aglycon, the most easily absorbed isoflavone form in the human body, was IT274592(DZ: 8.83±0.30 ㎍/g, GL: 11.14±0.81 ㎍/g, GE: 8.16±0.26 ㎍/g), while only IT274592, IT275005, IT308619 contained all three components of Aglycon. In Principal Component Analysis(PCA), the first two principal components showed more than 3.5 Eigen value and accounted for 58.2% of variability. The total content value had strong relationship with Malonyl genistin content value. Acetyl isomers had strong relationship, but Malonyl isomers were only related to isomers except Malonyl glycitin. These results will help in research on soybean varieties to enhance isoflavone ingredients.

  • PDF

A Novel of Data Clustering Architecture for Outlier Detection to Electric Power Data Analysis (전력데이터 분석에서 이상점 추출을 위한 데이터 클러스터링 아키텍처에 관한 연구)

  • Jung, Se Hoon;Shin, Chang Sun;Cho, Young Yun;Park, Jang Woo;Park, Myung Hye;Kim, Young Hyun;Lee, Seung Bae;Sim, Chun Bo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.465-472
    • /
    • 2017
  • In the past, researchers mainly used the supervised learning technique of machine learning to analyze power data and investigated the identification of patterns through the data mining technique. Data analysis research, however, faces its limitations with the old data classification and analysis techniques today when the size of electric power data has increased with the possible real-time provision of data. This study thus set out to propose a clustering architecture to analyze large-sized electric power data. The clustering process proposed in the study supplements the K-means algorithm, an unsupervised learning technique, for its problems and is capable of automating the entire process from the collection of electric power data to their analysis. In the present study, power data were categorized and analyzed in total three levels, which include the row data level, clustering level, and user interface level. In addition, the investigator identified K, the ideal number of clusters, based on principal component analysis and normal distribution and proposed an altered K-means algorithm to reduce data that would be categorized as ideal points in order to increase the efficiency of clustering.