• Title/Summary/Keyword: Statistical Regression Model

Search Result 1,768, Processing Time 0.031 seconds

Nonlinear Vector Alignment Methodology for Mapping Domain-Specific Terminology into General Space (전문어의 범용 공간 매핑을 위한 비선형 벡터 정렬 방법론)

  • Kim, Junwoo;Yoon, Byungho;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.127-146
    • /
    • 2022
  • Recently, as word embedding has shown excellent performance in various tasks of deep learning-based natural language processing, researches on the advancement and application of word, sentence, and document embedding are being actively conducted. Among them, cross-language transfer, which enables semantic exchange between different languages, is growing simultaneously with the development of embedding models. Academia's interests in vector alignment are growing with the expectation that it can be applied to various embedding-based analysis. In particular, vector alignment is expected to be applied to mapping between specialized domains and generalized domains. In other words, it is expected that it will be possible to map the vocabulary of specialized fields such as R&D, medicine, and law into the space of the pre-trained language model learned with huge volume of general-purpose documents, or provide a clue for mapping vocabulary between mutually different specialized fields. However, since linear-based vector alignment which has been mainly studied in academia basically assumes statistical linearity, it tends to simplify the vector space. This essentially assumes that different types of vector spaces are geometrically similar, which yields a limitation that it causes inevitable distortion in the alignment process. To overcome this limitation, we propose a deep learning-based vector alignment methodology that effectively learns the nonlinearity of data. The proposed methodology consists of sequential learning of a skip-connected autoencoder and a regression model to align the specialized word embedding expressed in each space to the general embedding space. Finally, through the inference of the two trained models, the specialized vocabulary can be aligned in the general space. To verify the performance of the proposed methodology, an experiment was performed on a total of 77,578 documents in the field of 'health care' among national R&D tasks performed from 2011 to 2020. As a result, it was confirmed that the proposed methodology showed superior performance in terms of cosine similarity compared to the existing linear vector alignment.

The Analysis on the Relationship between Firms' Exposures to SNS and Stock Prices in Korea (기업의 SNS 노출과 주식 수익률간의 관계 분석)

  • Kim, Taehwan;Jung, Woo-Jin;Lee, Sang-Yong Tom
    • Asia pacific journal of information systems
    • /
    • v.24 no.2
    • /
    • pp.233-253
    • /
    • 2014
  • Can the stock market really be predicted? Stock market prediction has attracted much attention from many fields including business, economics, statistics, and mathematics. Early research on stock market prediction was based on random walk theory (RWT) and the efficient market hypothesis (EMH). According to the EMH, stock market are largely driven by new information rather than present and past prices. Since it is unpredictable, stock market will follow a random walk. Even though these theories, Schumaker [2010] asserted that people keep trying to predict the stock market by using artificial intelligence, statistical estimates, and mathematical models. Mathematical approaches include Percolation Methods, Log-Periodic Oscillations and Wavelet Transforms to model future prices. Examples of artificial intelligence approaches that deals with optimization and machine learning are Genetic Algorithms, Support Vector Machines (SVM) and Neural Networks. Statistical approaches typically predicts the future by using past stock market data. Recently, financial engineers have started to predict the stock prices movement pattern by using the SNS data. SNS is the place where peoples opinions and ideas are freely flow and affect others' beliefs on certain things. Through word-of-mouth in SNS, people share product usage experiences, subjective feelings, and commonly accompanying sentiment or mood with others. An increasing number of empirical analyses of sentiment and mood are based on textual collections of public user generated data on the web. The Opinion mining is one domain of the data mining fields extracting public opinions exposed in SNS by utilizing data mining. There have been many studies on the issues of opinion mining from Web sources such as product reviews, forum posts and blogs. In relation to this literatures, we are trying to understand the effects of SNS exposures of firms on stock prices in Korea. Similarly to Bollen et al. [2011], we empirically analyze the impact of SNS exposures on stock return rates. We use Social Metrics by Daum Soft, an SNS big data analysis company in Korea. Social Metrics provides trends and public opinions in Twitter and blogs by using natural language process and analysis tools. It collects the sentences circulated in the Twitter in real time, and breaks down these sentences into the word units and then extracts keywords. In this study, we classify firms' exposures in SNS into two groups: positive and negative. To test the correlation and causation relationship between SNS exposures and stock price returns, we first collect 252 firms' stock prices and KRX100 index in the Korea Stock Exchange (KRX) from May 25, 2012 to September 1, 2012. We also gather the public attitudes (positive, negative) about these firms from Social Metrics over the same period of time. We conduct regression analysis between stock prices and the number of SNS exposures. Having checked the correlation between the two variables, we perform Granger causality test to see the causation direction between the two variables. The research result is that the number of total SNS exposures is positively related with stock market returns. The number of positive mentions of has also positive relationship with stock market returns. Contrarily, the number of negative mentions has negative relationship with stock market returns, but this relationship is statistically not significant. This means that the impact of positive mentions is statistically bigger than the impact of negative mentions. We also investigate whether the impacts are moderated by industry type and firm's size. We find that the SNS exposures impacts are bigger for IT firms than for non-IT firms, and bigger for small sized firms than for large sized firms. The results of Granger causality test shows change of stock price return is caused by SNS exposures, while the causation of the other way round is not significant. Therefore the correlation relationship between SNS exposures and stock prices has uni-direction causality. The more a firm is exposed in SNS, the more is the stock price likely to increase, while stock price changes may not cause more SNS mentions.

A Study on Relationships Between Environment, Organizational Structure, and Organizational Effectiveness of Public Health Centers in Korea (보건소의 환경, 조직구조와 조직유효성과의 관계)

  • Yun, Soon-Nyoung
    • Research in Community and Public Health Nursing
    • /
    • v.6 no.1
    • /
    • pp.5-33
    • /
    • 1995
  • The objective of the study are two-fold: one is to explore the relationship between environment, organizational structure, and organizational effectiveness of public health centers in Korea, and the other is to examine the validity of contingency theory for improving the organizational structure of public health care agencies, with special emphasis on public health nursing administration. Accordingly, the conceptual model of the study consisted of three different concepts: environment, organizational structure, and organizational effectiveness, which were built up from the contingency theory. Data were collected during the period from 1st of May through 30th of June, 1990. From the total of 249 health centers in the country, one hundred and five centers were sampled non proportionally, according to the geopolitical distribution. Out of 105, 73 health centers responded to mailed questionnaire. The health centers were the unit of the study, and a various statistical analysis techniques were used: Reliability analysis(Cronbach's Alpha) for 4 measurement tools; Shapiro-Wilk statistic for normality test of measured scores of 6 variables: ANOVA, Pearson Correlaion analysis, regressional analysis, and canonical correlation analysis for the test of the relationships and differences between the variables. The results were. as follows : 1. No significant differences between forma lization, decision-making authority and environmental complexity were found(F=1.383, P=.24 ; F=.801, P=.37). 2. Negative relationships between formalization and decision-making authority for both urban and rural health centers were found(r=-.470, P=.002 ; r=-.348, P=.46). 3. No significant relationship between formalization and job satisfaction for both urban and rural health centers were found (r=-.242, P=.132, r=-.060, P=.739). 4. Significant positive relationship between decision - making authority and job satisfaction were found in urban health centers (r=.504, P=.0009), but no such relationship was observed in rural health centers. Regression coefficient between them was statistically significant($\beta=1.535$, P=.0002), and accuracy of regression line was accepted (W=.975, P= .420). 5. No significant relationships among formalization and family planning services, maternal health services, and tuberculosis control services for both urban and rural health centers were found. 6. Among decision-making authority and family planning services, maternal health services, and tuberculosis control services, significant positive relationship was found between de cision-making authority and family planning services(r=.286, P=.73). 7. A significant difference was found in maternal health services by the type of health centers (F=5.13, P=.026) but no difference was found in tuberculosis control services by the type of health centers, formalization, and decision-making authority. 8. A significant positive relationships were found between family planning services and maternal health services and tuberculosis control services, and between maternal health services and tuberculosis control services (r=-.499, P=.001 ; r=.457, P=.004 ; r=.495, P=.002) in case of urban health centers. In case of rural health centers, relationships between family planning services and tuberculosis control services, and between maternal health services and tuberculosis control services were statistically significant (r=.534, P=.002 ; r=.389, P=.027). No significant relationship was found between family planning and maternal health services. 9. A significant positive canonical correlation was found between the group of independent variables consisted of formalization and de cision-making authority and the group of dependent variables consisted of family planning services, maternal health services and tuberculosis control services(Rc=.455, P=.02). In case of urban health centers, no significant canonical correlation was found between them, but significant canoncial correlation was found in rural health centers(Rc=.578, P=.069), 10. Relationships between job satisfaction and health care productivity was not found significant. Through these results, the assumed relationship between environment and organizational structure was not supported in health centers. Therefore, the relationship between the organizational effectiveness and the congruence between environment and organizational structure that contingency theory proposes to exist was not able to be tested. However decision-making authority was found as an important variable of organizational structure affecting family planning services and job satisfaction in urban health centers. Thus it was suggested that decentralized decision making among health professionals would be a valuable strategy for improvement of organizational effectiveness in public health centers. It is also recommended that further studies to test contingency theory would use variability and uncertainty to define environment of public health centers instead of complexity.

  • PDF

A Study on the Factors Related to the Cognitive Function and Depression Among the Elderly (일부지역 노인들의 인지기능과 우울에 관련된 요인에 관한 연구)

  • Shin, Cheol-Ho;Kim, Soo-Young;Lee, Young-Soo;Cho, Young-Chae;Lee, Tae-Yong;Lee, Dong-Bae
    • Journal of Preventive Medicine and Public Health
    • /
    • v.29 no.2 s.53
    • /
    • pp.199-214
    • /
    • 1996
  • To investigate the factors which affecting the cognitive function and depression of the 65 or more age group, the authors surveyed for the subjects in the region of Taejon and nearby Taejon area. 729 studied subjects were tested for cognitive function with MMSE and depression with GDS. The main results were followings; In the studied subjects, the rate of normal cognitive function was 56.8%, the rate of mildly impaired was 24.1% and the rate of severe impairment was 19.1%. The cognitive function level was closely related to the depression score. As the age increased, the cognitive function was more impaired. Sexual difference was also existed in the cognitive function level and the depression score. After adjusting the effect of age, the variables such as sex, marital status, education level, past job, instrumental ability of daily living, regular physical exercise, frequencies of going out the house, chest discomfort, visual and auditory disturbance, and dizziness had the significant relationship with cognitive function impairment. Among these variables instrumental ADL, age, visual disturbance, and sex showed statistical significance in the logistic regression model. In the multiple stepwise regression, the variables which had significant relationship to depression score were education level, frequencies of going out house, current job and house work activity, regular physical exercise, instrumental ADL, self-rated health and nutritional status, dimness, visual disturbance, and chest pain. In conclusion, main characteristics which had close relationship to the cognitive function and depression symptoms in the studied subjects were physical function and self rated health status.

  • PDF

A Study on Clinical Variables Contributing to Differentiation of Delirium and Non-Delirium Patients in the ICU (중환자실 섬망 환자와 비섬망 환자 구분에 기여하는 임상 지표에 관한 연구)

  • Ko, Chanyoung;Kim, Jae-Jin;Cho, Dongrae;Oh, Jooyoung;Park, Jin Young
    • Korean Journal of Psychosomatic Medicine
    • /
    • v.27 no.2
    • /
    • pp.101-110
    • /
    • 2019
  • Objectives : It is not clear which clinical variables are most closely associated with delirium in the Intensive Care Unit (ICU). By comparing clinical data of ICU delirium and non-delirium patients, we sought to identify variables that most effectively differentiate delirium from non-delirium. Methods : Medical records of 6,386 ICU patients were reviewed. Random Subset Feature Selection and Principal Component Analysis were utilized to select a set of clinical variables with the highest discriminatory capacity. Statistical analyses were employed to determine the separation capacity of two models-one using just the selected few clinical variables and the other using all clinical variables associated with delirium. Results : There was a significant difference between delirium and non-delirium individuals across 32 clinical variables. Richmond Agitation Sedation Scale (RASS), urinary catheterization, vascular catheterization, Hamilton Anxiety Rating Scale (HAM-A), Blood urea nitrogen, and Acute Physiology and Chronic Health Examination II most effectively differentiated delirium from non-delirium. Multivariable logistic regression analysis showed that, with the exception of vascular catheterization, these clinical variables were independent risk factors associated with delirium. Separation capacity of the logistic regression model using just 6 clinical variables was measured with Receiver Operating Characteristic curve, with Area Under the Curve (AUC) of 0.818. Same analyses were performed using all 32 clinical variables;the AUC was 0.881, denoting a very high separation capacity. Conclusions : The six aforementioned variables most effectively separate delirium from non-delirium. This highlights the importance of close monitoring of patients who received invasive medical procedures and were rated with very low RASS and HAM-A scores.

An Empirical Study on the Effect of CRM System on the Performance of Pharmaceutical Companies (고객관계관리 시스템의 수준이 BSC 관점에서의 기업성과에 미치는 영향 : 제약회사를 중심으로)

  • Kim, Hyun-Jung;Park, Jong-Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.43-65
    • /
    • 2010
  • Facing a complex environment driven by a decade, many companies are adopting new strategic frameworks such as Customer Relationship Management system to achieve sustainable profitability as well as overcome serious competition for survival. In many business areas, CRM system advanced a great deal in a matter of continuous compensating the defect and overall integration. However, pharmaceutical companies in Korea were slow to accept them for usesince they still have a tendency of holding fast to traditional way of sales and marketing based on individual networks of sales representatives. In the circumstance, this article tried to empirically address current status of CRM system as well as the effects of the system on the performance of pharmaceutical companies by applying BSC method's four perspectives, from financial, customer, learning and growth and internal process. Survey by e-mail and post to employers and employees who were working in pharma firms were undergone for the purpose. Total 113 cases among collected 140 ones were used for the statistical analysis by SPSS ver. 15 package. Reliability, Factor analysis, regression were done. This study revealed that CRM system had a significant effect on improving financial and non-financial performance of pharmaceutical companies as expected. Proposed regression model fits well and among them, CRM marketing information system shed the light on substantial impact on companies' outcome given profitability, growth and investment. Useful analytical information by CRM marketing information system appears to enable pharmaceutical firms to set up effective marketing and sales strategies, these result in favorable financial performance by enhancing values for stakeholderseventually, not to mention short-term profit and/or mid-term potential to growth. CRM system depicted its influence on not only financial performance, but also non-financial fruit of pharmaceutical companies. Further analysis for each component showed that CRM marketing information system were able to demonstrate statistically significant effect on the performance like the result of financial outcome. CRM system is believed to provide the companies with efficient way of customers managing by valuable standardized business process prompt coping with specific customers' needs. It consequently induces customer satisfaction and retentionto improve performance for long period. That is, there is a virtuous circle for creating value as the cornerstone for sustainable growth. However, the research failed to put forward to evidence to support hypothesis regarding favorable influence of CRM sales representative's records assessment system and CRM customer analysis system on the management performance. The analysis is regarded to reflect the lack of understanding of sales people and respondents between actual work duties and far-sighted goal in strategic analysis framework. Ordinary salesmen seem to dedicate short-term goal for the purpose of meeting sales target, receiving incentive bonus in a manner-of-fact style, as such, they tend to avail themselves of personal network and sales and promotional expense rather than CRM system. The study finding proposed a link between CRM information system and performance. It empirically indicated that pharmaceutical companies had been implementing CRM system as an effective strategic business framework in order for more balanced achievements based on the grounded understanding of both CRM system and integrated performance. It suggests a positive impact of supportive CRM system on firm performance, especially for pharmaceutical industry through the initial empirical evidence. Also, it brings out unmet needs for more practical system design, improvement of employees' awareness, increase of system utilization in the field. On the basis of the insight from this exploratory study, confirmatory research by more appropriate measurement tool and increased sample size should be further examined.

A Spatial Statistical Approach to Migration Studies: Exploring the Spatial Heterogeneity in Place-Specific Distance Parameters (인구이동 연구에 대한 공간통계학적 접근: 장소특수적 거리 패러미터의 추출과 공간적 패턴 분석)

  • Lee, Sang-Il
    • Journal of the Korean association of regional geographers
    • /
    • v.7 no.3
    • /
    • pp.107-120
    • /
    • 2001
  • This study is concerned with providing a reliable procedure of calibrating a set of places specific distance parameters and with applying it to U.S. inter-State migration flows between 1985 and 1900. It attempts to conform to recent advances in quantitative geography that are characterized by an integration of ESDA(exploratory spatial data analysis) and local statistics. ESDA aims to detect the spatial clustering and heterogeneity by visualizing and exploring spatial patterns. A local statistic is defined as a statistically processed value given to each location as opposed to a global statistic that only captures an average trend across a whole study region. Whereas a global distance parameter estimates an averaged level of the friction of distance, place-specific distance parameters calibrate spatially varying effects of distance. It is presented that a poisson regression with an adequately specified design matrix yields a set of either origin-or destination-specific distance parameters. A case study demonstrates that the proposed model is a reliable device of measuring a spatial dimension of migration, and that place-specific distance parameters are spatially heterogeneous as well as spatially clustered.

  • PDF

A Study on Medical Waste Generation Analysis during Outbreak of Massive Infectious Diseases (대규모 감염병 발병에 따른 의료폐기물 발생량 예측에 관한 연구)

  • Sang-Min Kim;Jin-Kyu Park;In-Beom Ko;Byung-Sun Lee;Sang-Ryong Shin;Nam-Hoon Lee
    • Journal of the Korea Organic Resources Recycling Association
    • /
    • v.31 no.4
    • /
    • pp.29-39
    • /
    • 2023
  • In this study, an analysis of medical waste generation characteristics was conducted, differentiating between ordinary situation and the outbreaks of massive infectious diseases. During ordinary situation, prediction models for medical waste quantities by type, general medical waste(G-MW), hazardous medical waste(H-MW), infectious medical waste(I-MW), were established through regression analysis, with all significance values (p) being <0.0001, indicating statistical significance. The determination coefficient(R2) values for prediction models of each category were analyzed as follows : I-MW(R2=0.9943) > G-MW(R2=0.9817) > H-MW(R2=0.9310). Additionally, factors such as GDP(G-MW), the number of medical institutions (H-MW), and the elderly population ratio(I-MW), utilized as influencing factors and consistent with previous literature, showed high correlations. The total MW generation, evaluated by combining each model, had an MAE of 2,615 and RMSE of 3,353. This indicated accuracy levels similar to the medical waste models of H-MW(2,491, 2,890) and I-MW(2,291, 3,267). Due to limitations in accurately estimating the quantity of medical waste during the rapid and outbreaks of massive infectious diseases, the generation unit of I-MW was derived to analyze its characteristics. During the early unstable stage of infectious disease outbreaks, the generation unit was 8.74 kg/capita·day, 2.69 kg/capita·day during the stable stage, and an average of 0.08 kg/capita·day during the reduction stage. Correlation analysis between generation unit of I-MW and lethality rates showed +0.99 in the unstable stage, +0.52 in the stable stage, and +0.96 in the reduction period, demonstrating a very high positive correlation of +0.95 or higher throughout the entire outbreaks of massive infectious diseases. The results derived from this study are expected to play a useful role in establishing an effective medical waste management system in the field of health care.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

The Relationship Between Son Preference and Fertility (남아 선호와 출산력간의 관계)

  • 이성용
    • Korea journal of population studies
    • /
    • v.26 no.1
    • /
    • pp.31-57
    • /
    • 2003
  • This study is intended to examine (l)whether the value of son-for example, old age security and succession of family lineage- causing son preference in the traditional society can be explained at the individual level, (2)whether women without son in the son preference country continue her childbearing until having at least one son or give up the desire of having a son at a certain level. To accomplish these purposes, the 1974 Korean National Fertility Survey data are analyzed by the quadratic hazard models controlling unobserved heterogeneity. Unlike ordinary regression model, even omitted variables that affect hazard rates and are uncorrelated with the included independent variables can distort the parameter estimates in the hazard model. Therefore the nonparametric maximum likelihood estimator(NPMLE) of a mixing distribution developed by Heckman and Singer is used to control unobserved heterogeneity. Based on the statistical result in this study, the value of son causing son preference is determined at the societal level, not at the individual level. And Korean women without a son did not continue endlessly childbearing during child bearing ages until having a son. In general, they gave up the desire having a son when she had born six daughters continuously. Thus, 30-40 years ago, the number of daughters that women without a son giving up the desire of son was six, which is about the level of total fertility rate during 1960s. In these days, we can often see many women who have only two or three daughters and do not any son. This means that the level of giving up the desire of son, which is one factor representing the strength of son preference, becomes lower. If the strength of son preference did not become much weaker, then the fertility rates in Korea could not reach the below replacement level.