• Title/Summary/Keyword: 분류시스템

Search Result 6,478, Processing Time 0.037 seconds

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

  • Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.237-262
    • /
    • 2022
  • There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.

Analyzing Different Contexts for Energy Terms through Text Mining of Online Science News Articles (온라인 과학 기사 텍스트 마이닝을 통해 분석한 에너지 용어 사용의 맥락)

  • Oh, Chi Yeong;Kang, Nam-Hwa
    • Journal of Science Education
    • /
    • v.45 no.3
    • /
    • pp.292-303
    • /
    • 2021
  • This study identifies the terms frequently used together with energy in online science news articles and topics of the news reports to find out how the term energy is used in everyday life and to draw implications for science curriculum and instruction about energy. A total of 2,171 online news articles in science category published by 11 major newspaper companies in Korea for one year from March 1, 2018 were selected by using energy as a search term. As a result of natural language processing, a total of 51,224 sentences consisting of 507,901 words were compiled for analysis. Using the R program, term frequency analysis, semantic network analysis, and structural topic modeling were performed. The results show that the terms with exceptionally high frequencies were technology, research, and development, which reflected the characteristics of news articles that report new findings. On the other hand, terms used more than once per two articles were industry-related terms (industry, product, system, production, market) and terms that were sufficiently expected as energy-related terms such as 'electricity' and 'environment.' Meanwhile, 'sun', 'heat', 'temperature', and 'power generation', which are frequently used in energy-related science classes, also appeared as terms belonging to the highest frequency. From a network analysis, two clusters were found including terms related to industry and technology and terms related to basic science and research. From the analysis of terms paired with energy, it was also found that terms related to the use of energy such as 'energy efficiency,' 'energy saving,' and 'energy consumption' were the most frequently used. Out of 16 topics found, four contexts of energy were drawn including 'high-tech industry,' 'industry,' 'basic science,' and 'environment and health.' The results suggest that the introduction of the concept of energy degradation as a starting point for energy classes can be effective. It also shows the need to introduce high-tech industries or the context of environment and health into energy learning.

Development of Correction Formulas for KMA AAOS Soil Moisture Observation Data (기상청 농업기상관측망 토양수분 관측자료 보정식 개발)

  • Choi, Sung-Won;Park, Juhan;Kang, Minseok;Kim, Jongho;Sohn, Seungwon;Cho, Sungsik;Chun, Hyenchung;Jung, Ki-Yuol
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.24 no.1
    • /
    • pp.13-34
    • /
    • 2022
  • Soil moisture data have been collected at 11 agrometeorological stations operated by The Korea Meteorological Administration (KMA). This study aimed to verify the accuracy of soil moisture data of KMA and develop a correction formula to be applied to improve their quality. The soil of the observation field was sampled to analyze its physical properties that affect soil water content. Soil texture was classified to be sandy loam and loamy sand at most sites. The bulk density of the soil samples was about 1.5 g/cm3 on average. The content of silt and clay was also closely related to bulk density and water holding capacity. The EnviroSCAN model, which was used as a reference sensor, was calibrated using the self-manufactured "reference soil moisture observation system". Comparison between the calibrated reference sensor and the field sensor of KMA was conducted at least three times at each of the 11 sites. Overall, the trend of fluctuations over time in the measured values of the two sensors appeared similar. Still, there were sites where the latter had relatively lower soil moisture values than the former. A linear correction formula was derived for each site and depth using the range and average of the observed data for the given period. This correction formula resulted in an improvement in agreement between sensor values at the Suwon site. In addition, the detailed approach was developed to estimate the correction value for the period in which a correction formula was not calculated. In summary, the correction of soil moisture data at a regular time interval, e.g., twice a year, would be recommended for all observation sites to improve the quality of soil moisture observation data.

Transcriptomic Analysis of Triticum aestivum under Salt Stress Reveals Change of Gene Expression (RNA sequencing을 이용한 염 스트레스 처리 밀(Triticum aestivum)의 유전자 발현 차이 확인 및 후보 유전자 선발)

  • Jeon, Donghyun;Lim, Yoonho;Kang, Yuna;Park, Chulsoo;Lee, Donghoon;Park, Junchan;Choi, Uchan;Kim, Kyeonghoon;Kim, Changsoo
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.67 no.1
    • /
    • pp.41-52
    • /
    • 2022
  • As a cultivar of Korean wheat, 'Keumgang' wheat variety has a fast growth period and can be grown stably. Hexaploid wheat (Triticum aestivum) has moderately high salt tolerance compared to tetraploid wheat (Triticum turgidum L.). However, the molecular mechanisms related to salt tolerance of hexaploid wheat have not been elucidated yet. In this study, the candidate genes related to salt tolerance were identified by investigating the genes that are differently expressed in Keumgang variety and examining salt tolerant mutation '2020-s1340.'. A total of 85,771,537 reads were obtained after quality filtering using NextSeq 500 Illumina sequencing technology. A total of 23,634,438 reads were aligned with the NCBI Campala Lr22a pseudomolecule v5 reference genome (Triticum aestivum). A total of 282 differentially expressed genes (DEGs) were identified in the two Triticum aestivum materials. These DEGs have functions, including salt tolerance related traits such as 'wall-associated receptor kinase-like 8', 'cytochrome P450', '6-phosphofructokinase 2'. In addition, the identified DEGs were classified into three categories, including biological process, molecular function, cellular component using gene ontology analysis. These DEGs were enriched significantly for terms such as the 'copper ion transport', 'oxidation-reduction process', 'alternative oxidase activity'. These results, which were obtained using RNA-seq analysis, will improve our understanding of salt tolerance of wheat. Moreover, this study will be a useful resource for breeding wheat varieties with improved salt tolerance using molecular breeding technology.

A Checklist to Improve the Fairness in AI Financial Service: Focused on the AI-based Credit Scoring Service (인공지능 기반 금융서비스의 공정성 확보를 위한 체크리스트 제안: 인공지능 기반 개인신용평가를 중심으로)

  • Kim, HaYeong;Heo, JeongYun;Kwon, Hochang
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.259-278
    • /
    • 2022
  • With the spread of Artificial Intelligence (AI), various AI-based services are expanding in the financial sector such as service recommendation, automated customer response, fraud detection system(FDS), credit scoring services, etc. At the same time, problems related to reliability and unexpected social controversy are also occurring due to the nature of data-based machine learning. The need Based on this background, this study aimed to contribute to improving trust in AI-based financial services by proposing a checklist to secure fairness in AI-based credit scoring services which directly affects consumers' financial life. Among the key elements of trustworthy AI like transparency, safety, accountability, and fairness, fairness was selected as the subject of the study so that everyone could enjoy the benefits of automated algorithms from the perspective of inclusive finance without social discrimination. We divided the entire fairness related operation process into three areas like data, algorithms, and user areas through literature research. For each area, we constructed four detailed considerations for evaluation resulting in 12 checklists. The relative importance and priority of the categories were evaluated through the analytic hierarchy process (AHP). We use three different groups: financial field workers, artificial intelligence field workers, and general users which represent entire financial stakeholders. According to the importance of each stakeholder, three groups were classified and analyzed, and from a practical perspective, specific checks such as feasibility verification for using learning data and non-financial information and monitoring new inflow data were identified. Moreover, financial consumers in general were found to be highly considerate of the accuracy of result analysis and bias checks. We expect this result could contribute to the design and operation of fair AI-based financial services.

Characteristics of Flue Gas Using Direct Combustion of VOC and Ammonia (휘발성 유기 화합물 및 암모니아 직접 연소를 통한 배기가스 특성)

  • Kim, JongSu;Choi, SeukCheun;Jeong, SooHwa;Mock, ChinSung;Kim, DooBoem
    • Clean Technology
    • /
    • v.28 no.2
    • /
    • pp.131-137
    • /
    • 2022
  • The semiconductor process currently emits various by-products and unused gases. Emissions containing pollutants are generally classified into categories such as organic, acid, alkali, thermal, and cabinet exhaust. They are discharged after treatment in an atmospheric prevention facility suitable for each exhaust type. The main components of organic exhaust are volatile organic compounds (VOC), which is a generic term for oxygen-containing hydrocarbons, sulfur-containing hydrocarbons, and volatile hydrocarbons, while the main components of alkali exhaust include ammonia and tetramethylammonium hydroxide. The purpose of this study was to determine the combustion characteristics and analyze the NOX reduction rate by maintaining a direct combustion and temperature to process organic and alkaline exhaust gases simultaneously. Acetone, isopropyl alcohol (IPA), and propylene glycol methyl ether acetate (PGMEA) were used as VOCs and ammonia was used as an alkali exhaust material. Independent and VOC-ammonia mixture combustion tests were conducted for each material. The combustion tests for the VOCs confirmed that complete combustion occurred at an equivalence ratio of 1.4. In the ammonia combustion test, the NOX concentration decreased at a lower equivalence ratio. In the co-combustion of VOC and ammonia, NO was dominant in the NOX emission while NO2 was detected at approximately 10 ppm. Overall, the concentration of nitrogen oxide decreased due to the activation of the oxidation reaction as the reaction temperature increased. On the other hand, the concentration of carbon dioxide increased. Flameless combustion with an electric heat source achieved successful combustion of VOC and ammonia. This technology is expected to have advantages in cost and compactness compared to existing organic and alkaline treatment systems applied separately.

Development of a water quality prediction model for mineral springs in the metropolitan area using machine learning (머신러닝을 활용한 수도권 약수터 수질 예측 모델 개발)

  • Yeong-Woo Lim;Ji-Yeon Eom;Kee-Young Kwahk
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.307-325
    • /
    • 2023
  • Due to the prolonged COVID-19 pandemic, the frequency of people who are tired of living indoors visiting nearby mountains and national parks to relieve depression and lethargy has exploded. There is a place where thousands of people who came out of nature stop walking and breathe and rest, that is the mineral spring. Even in mountains or national parks, there are about 600 mineral springs that can be found occasionally in neighboring parks or trails in the metropolitan area. However, due to irregular and manual water quality tests, people drink mineral water without knowing the test results in real time. Therefore, in this study, we intend to develop a model that can predict the quality of the spring water in real time by exploring the factors affecting the quality of the spring water and collecting data scattered in various places. After limiting the regions to Seoul and Gyeonggi-do due to the limitations of data collection, we obtained data on water quality tests from 2015 to 2020 for about 300 mineral springs in 18 cities where data management is well performed. A total of 10 factors were finally selected after two rounds of review among various factors that are considered to affect the suitability of the mineral spring water quality. Using AutoML, an automated machine learning technology that has recently been attracting attention, we derived the top 5 models based on prediction performance among about 20 machine learning methods. Among them, the catboost model has the highest performance with a prediction classification accuracy of 75.26%. In addition, as a result of examining the absolute influence of the variables used in the analysis through the SHAP method on the prediction, the most important factor was whether or not a water quality test was judged nonconforming in the previous water quality test. It was confirmed that the temperature on the day of the inspection and the altitude of the mineral spring had an influence on whether the water quality was unsuitable.

Modeling the Effects of Forest Management Scenarios on Aboveground Biomass and Wood Production: A Study in Mt. Gariwang, South Korea (산림경영활동에 따른 수종별 지상부생물량 및 목재생산량 변화 모델링: 가리왕산 모델숲을 대상으로)

  • Wonhee Cho;Wontaek Lim;Won Il Choi;Hee Moon Yang;Dongwook W. Ko
    • Journal of Korean Society of Forest Science
    • /
    • v.112 no.2
    • /
    • pp.173-187
    • /
    • 2023
  • The forest protection policies implemented in South Korea have resulted in the significant accumulation of forest. Moreover, the associated public interest has also been closely evaluated. As forests mature, there arises a need for forest management (FM) practices, such as thinning and harvesting. It is therefore essential to perform a scientific analysis of the long-term effects of FM. In this study, conducted in Mt. Gariwang, the effect of FM on forest succession and wood production (WP) were evaluated based on changes in aboveground biomass (AGB) using the LANDIS-II model. The FM consists of three scenarios (Selection, Shelterwood, and Two-stories), characterized based on the harvest intensity, frequency, and period. The model was applied to changes in the forest over 200 years. All scenarios show that the total AGB decreased immediately after thinning and harvesting. However, AGB recovery time differed among scenarios, with recovery to preharvest level occurring from 15 to 50 years after harvest; further, after 200 years, harvested forests had a greater total AGB than forests without FMs In particular, the changes in AGB of each species was different depending on its shade tolerance. The AGB of currently dominant shade-intolerant and mid-tolerant species decreased dramatically after harvesting. However, shade-tolerant species, dominant in the understory, continued to grow but were not harvested due to their small size. The cumulative WP for each scenario was estimated at 545.6, 141.6, and 299.9 tons/ha in Selection, Shelterwood, and Two-stories, respectively. The composition of WP differed according to harvest intensity and period. Most WP originated from shade-intolerant and mid-tolerant species in the early period. Later, most WP was from shade-tolerant species, which became dominant. The modeling approach used in this study is capable of analyzing the long-term effects of FM on changes in forests and WP. This study can contribute to decision making to guide FM methods for a variety of purposes, including WP and controlling forest composition and structure.

Sensitivity analysis of the FAO Penman-Monteith reference evapotranspiration model (FAO Penman-Monteith 기준증발산식 민감도 분석)

  • Rim, Chang-Soo
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.4
    • /
    • pp.285-299
    • /
    • 2023
  • Estimating the evapotranspiration is very important factor for effective water resources management, and FAO Penman-Monteith (FAO P-M) model has been applied for reference evapotranspiration estimation by many researchers. However, because various input data are required for the application of FAO P-M model, understanding the effect of each input data on FAO P-M model is necessary. Therefore, in this study, for 56 study stations located in South Korea, the effects of 8 meteorological factors (maximum and minimum temperature, wind speed, relative humidity, solar radiation, vapor pressure deficit, net radiation, ground heat flux), energy and aerodynamic terms of FAO P-M model, and elevation on FAO P-M reference evapotranspiration (RET) estimation were analyzed. The relative sensitivity analysis was performed to determine how 10% increment of each specific independent variable affects a reference evapotranspiration under given set of condition that other independent variables are unchanged. Furthermore, to select the 5 representative stations and perform the monthly relative sensitivity analysis for those stations, 56 study stations were classified into 5 clusters using cluster analysis. The study results showed that net radiation was turned out to be the most sensitive factor in 8 meteorological factors for 56 study stations. The next most sensitive factor was relative humidity, solar radiation, maximum temperature, vapor pressure deficit and wind speed, followed by minimum temperature in order. Ground heat flux was the least sensitive factor. In case of ground surface condition, elevation showed very low positive relative sensitivity. Relativity sensitivities of energy and aerodynamic terms of FAO P-M model were 0.707 for energy term and 0.293 for aerodynamic term respectively, indicating that energy term was more contributable than aerodynamic term for reference evapotranspiration. The monthly relative sensitivities of meteorological factors showed the seasonal effects, and also the relative sensitivity of elevation showed different pattern each other among study stations. Therefore, for the application of FAO P-M model, the seasonal and regional sensitivity differences of each input variable should be considered.

Effects of Impact of Climate Change on Livestock Productivity - For bullocks, dairy, pigs, laying hens, and broilers - (기후변화가 축산 생산성에 미치는 영향 -거세우, 낙농, 양돈, 산란계, 육계를 대상으로-)

  • Lee, H.K.;Park, H.M.;Shin, Y.K.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.20 no.1
    • /
    • pp.107-123
    • /
    • 2018
  • The global impact of climate change on agriculture is now increasing. The purpose of this study was to investigate the effect of climate change on livestock productivity. The variables that have the greatest influence on climate change factors were examined through previous studies and expert surveys. We also used the actual productivity data of livestock farmers to investigate the relationship with climate change. In order to evaluate the climate for changes in livestock productivity, national representative data (such as bullocks, dairy, pigs, laying hens, and broilers) were surveyed in Korea. Also, to select and classify evaluation indexes, we selected climate change factor variables as prior studies and studied the weighting factor of climate variable factors. In this study, the researchers of industry, academia, and farmers in the livestock sector conducted questionnaires on the indicators of vulnerability to climate change using experts, and then weighed the selected indicators using the hierarchical analysis process (AHP). In order to verify the validity of the evaluation index, was examined using domestic climate data (temperature, precipitation, humidity, etc.). Correlation and regression analysis were performed. The empirical relationship between climate change and livestock productivity was examined through this study. As a result, we used data with high reliability of statistical analysis and found that there are significant variables.