Search | Korea Science

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
- Journal of Intelligence and Information Systems
- /
- v.28 no.2
- /
- pp.237-262
- /
- 2022
There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.
https://doi.org/10.13088/jiis.2022.28.2.237 인용 PDF KSCI

A Study of the Abalone Outlook Model Using by Partial Equilibrium Model Approach Based on DEEM System (부분균형모형을 이용한 전복 수급전망모형 구축에 관한 연구)

Han, Suk-Ho;Jang, Hee-Soo;Heo, Su-Jin;Lee, Nam-Su
- The Journal of Fisheries Business Administration
- /
- v.51 no.2
- /
- pp.51-69
- /
- 2020
The purpose of this study is to construct an outlook model that is consistent with the "Fisheries Outlook" monthly published by the Fisheries Outlook Center of the Korea Maritime Institute(KMI). In particular, it was designed as a partial equilibrium model limited to abalone items, but a model was constructed with a dynamic ecological equation model(DEEM) system taking into account biological breeding and shipping time. The results of this study are significant in that they can be used as basic data for model development of various items in the future. In this study, due to the limitation of monthly data, the market equilibrium price was calculated by using the recursive model construction method to be calculated directly as an inverse demand. A model was built in the form of a structural equation model that can explain economic causality rather than a conventional time series analysis model. The research results and implications are as follows. As a result of the estimation of the amount of young seashells planting, it was estimated that the coefficient of the amount of young seashells planting from the previous year was estimated to be 0.82 so that there was no significant difference in the amount of young seashells planting this year and last year. It is also meant to be nurtured for a long time after aquaculture license and limited aquaculture area(edge style) and implantation. The economic factor, the coefficient of price from last year was estimated at 0.47. In the case of breeding quantity, it was estimated that the longer the breeding period, the larger the coefficient of breeding quantity in the previous period. It was analyzed that the impact of shipments on the breeding volume increased. In the case of shipments, the coefficient of production price was estimated unelastically. As the period of rearing increased, the estimation coefficient decreased. Such result indicates that the expected price, which is an economic factor variable and that had less influence on the intention to shipments. In addition, the elasticity of the breeding quantity was estimated more unelastically as the breeding period increased. This is also correlated with the relative coefficient size of the expected price. The abalone supply and demand forecast model developed in this study is significant in that it reduces the prediction error than the existing model using the ecological equation modeling system and the economic causal model. However, there are limitations in establishing a system of simultaneous equations that can be linked to production and consumption between industries and items. This is left as a future research project.
https://doi.org/10.12939/FBA.2020.51.2.051 인용 PDF KSCI

RBM-based distributed representation of language (RBM을 이용한 언어의 분산 표상화)

You, Heejo;Nam, Kichun;Nam, Hosung
- Korean Journal of Cognitive Science
- /
- v.28 no.2
- /
- pp.111-131
- /
- 2017
The connectionist model is one approach to studying language processing from a computational perspective. And building a representation in the connectionist model study is just as important as making the structure of the model in that it determines the level of learning and performance of the model. The connectionist model has been constructed in two different ways: localist representation and distributed representation. However, the localist representation used in the previous studies had limitations in that the unit of the output layer having a rare target activation value is inactivated, and the past distributed representation has the limitation of difficulty in confirming the result by the opacity of the displayed information. This has been a limitation of the overall connection model study. In this paper, we present a new method to induce distributed representation with local representation using abstraction of information, which is a feature of restricted Boltzmann machine, with respect to the limitation of such representation of the past. As a result, our proposed method effectively solves the problem of conventional representation by using the method of information compression and inverse transformation of distributed representation into local representation.
https://doi.org/10.19066/cogsci.2017.28.2.002 인용 PDF

DEVELOPMENT OF AN AMPHIBIOUS ROBOT FOR VISUAL INSPECTION OF APR1400 NPP IRWST STRAINER ASSEMBLY

Jang, You Hyun;Kim, Jong Seog
- Nuclear Engineering and Technology
- /
- v.46 no.3
- /
- pp.439-446
- /
- 2014
An amphibious inspection robot system (hereafter AIROS) is being developed to visually inspect the in-containment refueling storage water tank (hereafter IRWST) strainer in APR1400 instead of a human diver. Four IRWST strainers are located in the IRWST, which is filled with boric acid water. Each strainer has 108 sub-assembly strainer fin modules that should be inspected with the VT-3 method according to Reg. guide 1.82 and the operation manual. AIROS has 6 thrusters for submarine voyage and 4 legs for walking on the top of the strainer. An inverse kinematic algorithm was implemented in the robot controller for exact walking on the top of the IRWST strainer. The IRWST strainer has several top cross braces that are extruded on the top of the strainer, which can be obstacles of walking on the strainer, to maintain the frame of the strainer. Therefore, a robot leg should arrive at the position beside the top cross brace. For this reason, we used an image processing technique to find the top cross brace in the sole camera image. The sole camera image is processed to find the existence of the top cross brace using the cross edge detection algorithm in real time. A 5-DOF robot arm that has multiple camera modules for simultaneous inspection of both sides can penetrate narrow gaps. For intuitive presentation of inspection results and for management of inspection data, inspection images are stored in the control PC with camera angles and positions to synthesize and merge the images. The synthesized images are then mapped in a 3D CAD model of the IRWST strainer with the location information. An IRWST strainer mock-up was fabricated to teach the robot arm scanning and gaiting. It is important to arrive at the designated position for inserting the robot arm into all of the gaps. Exact position control without anchor under the water is not easy. Therefore, we designed the multi leg robot for the role of anchoring and positioning. Quadruped robot design of installing sole cameras was a new approach for the exact and stable position control on the IRWST strainer, unlike a traditional robot for underwater facility inspection. The developed robot will be practically used to enhance the efficiency and reliability of the inspection of nuclear power plant components.
https://doi.org/10.5516/NET.09.2013.085 인용 PDF KSCI

Dynamic Instability and Multi-step Taylor Series Analysis for Space Truss System under Step Excitation (스텝 하중을 받는 공간 트러스 시스템의 멀티스텝 테일러 급수 해석과 동적 불안정)

Lee, Seung-Jae;Shon, Su-Deok
- Journal of Korean Society of Steel Construction
- /
- v.24 no.3
- /
- pp.289-299
- /
- 2012
The goal of this paper is to apply the multi-step Taylor method to a space truss, a non-linear discrete dynamic system, and analyze the non-linear dynamic response and unstable behavior of the structures. The accurate solution based on an analytical approach is needed to deal with the inverse problem, or the dynamic instability of a space truss, because the governing equation has geometrical non-linearity. Therefore, the governing motion equations of the space truss were formulated by considering non-linearity, where an accurate analytical solution could be obtained using the Taylor method. To verify the accuracy of the applied method, an SDOF model was adopted, and the analysis using the Taylor method was compared with the result of the 4th order Runge-Kutta method. Moreover, the dynamic instability and buckling characteristics of the adopted model under step excitation was investigated. The result of the comparison between the two methods of analysis was well matched, and the investigation shows that the dynamic response and the attractors in the phase space can also delineate dynamic snapping under step excitation, and damping affects the displacement of the truss. The analysis shows that dynamic buckling occurs at approximately 77% and 83% of the static buckling in the undamped and damped systems, respectively.
https://doi.org/10.7781/kjoss.2012.24.3.289 인용 PDF KSCI

Chemometric Approach to Fatty Acid Profiles in Soybean Cultivars by Principal Component Analysis (PCA)

Shin, Eui-Cheol;Hwang, Chung-Eun;Lee, Byong-Won;Kim, Hyun-Tae;Ko, Jong-Min;Baek, In-Youl;Lee, Yang-Bong;Choi, Jin-Sang;Cho, Eun-Ju;Seo, Weon-Taek;Cho, Kye-Man
- Preventive Nutrition and Food Science
- /
- v.17 no.3
- /
- pp.184-191
- /
- 2012
The purpose of this study was to investigate the fatty acid profiles in 18 soybean cultivars grown in Korea. A total of eleven fatty acids were identified in the sample set, which was comprised of myristic (C14:0), palmitic (C16:0), palmitoleic (C16:1, ${\omega}7$), stearic (C18:0), oleic (C18:1, ${\omega}9$), linoleic (C18:2, ${\omega}6$), linolenic (C18:3, ${\omega}3$), arachidic (C20:0), gondoic (C20:1, ${\omega}9$), behenic (C22:0), and lignoceric (C24:0) acids by gas-liquid chromatography with flame ionization detector (GC-FID). Based on their color, yellow-, black-, brown-, and green-colored cultivars were denoted. Correlation coefficients (r) between the nine major fatty acids identified (two trace fatty acids, myristic and palmitoleic, were not included in the study) were generated and revealed an inverse association between oleic and linoleic acids (r=-0.94, p<0.05), while stearic acid was positively correlated to arachidic acid (r=0.72, p<0.05). Principal component analysis (PCA) of the fatty acid data yielded four significant principal components (PCs; i.e., eigenvalues>1), which together account for 81.49% of the total variance in the data set; with PC1 contributing 28.16% of the total. Eigen analysis of the correlation matrix loadings of the four significant PCs revealed that PC1 was mainly contributed to by oleic, linoleic, and gondoic acids, PC2 by stearic, linolenic and arachidic acids, PC3 by behenic and lignoceric acids, and PC4 by palmitic acid. The score plots generated between PC1-PC2 and PC3-PC4 segregated soybean cultivars based on fatty acid composition.
https://doi.org/10.3746/pnf.2012.17.3.184 인용 PDF KSCI

IN-VIVO DOSE RECONSTRUCT10N USING A TRANSMISION FACTOR AND AN EFFECTIVE FIELD CONCEPT (팬텀투과계수와 유효조사면 개념을 이용한 종양선량 확인에 관한 연구)

Kim, You-Hyun;Yeo, In-Hwan;Kwon, Soo-Il
- Journal of radiological science and technology
- /
- v.25 no.1
- /
- pp.63-71
- /
- 2002
The aim of this study Is to develop a simple and fast method which computes in-vivo doses from transmission doses measured doting patient treatment using an ionization chamber. Energy fluence and the dose that reach the chamber positioned behind the patient is modified by three factors: patient attenuation, inverse square attenuation. and scattering. We adopted a straightforward empirical approach using a phantom transmission factor (PTF) which accounts for the contribution from all three factors. It was done as follows. First of all, the phantom transmission factor was measured as a simple ratio of the chamber reading measured with and without a homogeneous phantom in the radiation beam according to various field sizes($r_p$), phantom to chamber distance($d_g$) and phantom thickness($T_p$). Secondly, we used the concept of effective field to the cases with inhomogeneous phantom (patients) and irregular fields. The effective field size is calculated by finding the field size that produces the same value of PTF to that for the irregular field and/or inhomogeneous phantom. The hypothesis is that the presence of inhomogeneity and irregular field can be accommodated to a certain extent by altering the field size. Thirdly, the center dose at the prescription depth can be computed using the new TMR($r_{p,eff}$) and Sp($r_{p,eff}$) from the effective field size. After that, when TMR(d, $r_{p,eff}$) and SP($r_{p,eff}$) are acquired. the tumor dose is as follows. $$D_{center}=D_t/PTF(d_g,\;T_p){\times}(\frac{SCD}{SAD})^2{\times}BSF(r_o){\times}S_p(r_{p,eff}){\times}TMR(d,\;r_{p,eff})$$ To make certain the accuracy of this method, we checked the accuracy for the following four cases; in cases of regular or irregular field size, inhomogeneous material included, any errors made and clinical situation. The errors were within 2.3% for regular field size, 3.0% irregular field size, 2.4% when inhomogeneous material was included in the phantom, 3.8% for 6 MV when the error was made purposely, 4.7% for 10 MV and 1.8% for the measurement of a patient in clinic. It is considered that this methode can make the quality control for dose at the time of radiation therapy because it is non-invasive that makes possible to measure the doses whenever a patient is given a therapy as well as eliminates the problem for entrance or exit dose measurement.
PDF

The Effect of Lactose and Calcium on the Acute Lead Poisoning in Rats (白鼠에서 乳糖과 칼슘이 急性 납중독에 미치는 영향)

Kim, Jong-Woo;Lee, Yong-Wook
- Journal of Environmental Health Sciences
- /
- v.14 no.2
- /
- pp.73-87
- /
- 1988
This study was performed to investigate the effect of lactose in 4 different concentrations against the protective effect of calcium on the acute lead poisoning in rats after 4 weeks treatment. In this animal experiment, 70 albino male weanling rats (50-70g of body weight) of Sprague-Dawley strain were used. Lead was dissolved in the distilled water and intubated at the dose of 400mg lead (as acetate)/ kg of body weight/day. Calcium and lactose were administered in drinking water ad libiturn dissolved with the solution of 0.7% calcium gluconate mixed with 40, 80, 160 and 320mM lacotse respectively. The results obtained were summarized as follows: 1. The rate of body weight gain in all treated groups turned out to be lower than that in the control group during 4 weeks treatment. The slow-down of body weight gain was the most significantly observed in the group treated with lead only ( p < 0.05). 2. The relative spleen weight in lead only treated group was significantly lower than that of lead + calcium, lead + calcium + 80mM lactose treated group ( p < 0.05). 3. The value of RBC, WBC, Hb and Hct showed a decreasing tendency in the group treated with lead only ( p < 0.05), however, a significant decrease was not observed in the group treated with lead + calcium. On the other hand, the protective effect of calcium was deteriorated in the group treated with lead + calcium + lactose. 4. The activity of $\delta$-aminolevulinic acid dehydratase ($\delta$-ALAD activity) showed the same tendency as No. 2. 5. The lead concentration in the blood (PbB) showed an increasing tendency and the interrelation among the different groups was also identical with No. 2. 6. With a statistical approach, it was found out that the activity of $\delta$-ALAD and the lead concentration in the blood show a relation of inverse proportion(r=-0.7301). The diagram was interpreted with the logarithmic equation InY = 5.5357-0.0251X (X:PbB, Y:$\delta$-ALAD activity). 7. In the histopathological findings of the kidney, the protective effect of calcium was observed. However, the protective effect of calcium was restricted in the group treated with lead + calcium + lactose. As a conclusion, the intensity of the acute ingested lead poisoning was obviously reduced by calcium, however, the protective effect of calcium was deteriorated in proportion with the concentration of the lactose to be administered. On the other hand, it was also noted that the deterioration was lightly restrained in the group treated with the physiological concentration of 80mM lactose than the results shown in the groups treated with lactose of other concentrations.
PDF

A Travel Time Prediction Model under Incidents (돌발상황하의 교통망 통행시간 예측모형)

Jang, Won-Jae
- Journal of Korean Society of Transportation
- /
- v.29 no.1
- /
- pp.71-79
- /
- 2011
Traditionally, a dynamic network model is considered as a tool for solving real-time traffic problems. One of useful and practical ways of using such models is to use it to produce and disseminate forecast travel time information so that the travelers can switch their routes from congested to less-congested or uncongested, which can enhance the performance of the network. This approach seems to be promising when the traffic congestion is severe, especially when sudden incidents happen. A consideration that should be given in implementing this method is that travel time information may affect the future traffic condition itself, creating undesirable side effects such as the over-reaction problem. Furthermore incorrect forecast travel time can make the information unreliable. In this paper, a network-wide travel time prediction model under incidents is developed. The model assumes that all drivers have access to detailed traffic information through personalized in-vehicle devices such as car navigation systems. Drivers are assumed to make their own travel choice based on the travel time information provided. A route-based stochastic variational inequality is formulated, which is used as a basic model for the travel time prediction. A diversion function is introduced to account for the motorists' willingness to divert. An inverse function of the diversion curve is derived to develop a variational inequality formulation for the travel time prediction model. Computational results illustrate the characteristics of the proposed model.
PDF KSCI

Bayesian ordinal probit semiparametric regression models: KNHANES 2016 data analysis of the relationship between smoking behavior and coffee intake (베이지안 순서형 프로빗 준모수 회귀 모형 : 국민건강영양조사 2016 자료를 통한 흡연양태와 커피섭취 간의 관계 분석)

Lee, Dasom;Lee, Eunji;Jo, Seogil;Choi, Taeryeon
- The Korean Journal of Applied Statistics
- /
- v.33 no.1
- /
- pp.25-46
- /
- 2020
This paper presents ordinal probit semiparametric regression models using Bayesian Spectral Analysis Regression (BSAR) method. Ordinal probit regression is a way of modeling ordinal responses - usually more than two categories - by connecting the probability of falling into each category explained by a combination of available covariates using a probit (an inverse function of normal cumulative distribution function) link. The Bayesian probit model facilitates posterior sampling by bringing a latent variable following normal distribution, therefore, the responses are categorized by the cut-off points according to values of latent variables. In this paper, we extend the latent variable approach to a semiparametric model for the Bayesian ordinal probit regression with nonparametric functions using a spectral representation of Gaussian processes based BSAR method. The latent variable is decomposed into a parametric component and a nonparametric component with or without a shape constraint for modeling ordinal responses and predicting outcomes more flexibly. We illustrate the proposed methods with simulation studies in comparison with existing methods and real data analysis applied to a Korean National Health and Nutrition Examination Survey (KNHANES) 2016 for investigating nonparametric relationship between smoking behavior and coffee intake.
https://doi.org/10.5351/KJAS.2020.33.1.025 인용 PDF KSCI

Search Result 458, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)