• Title/Summary/Keyword: 잠재일

Search Result 5,912, Processing Time 0.037 seconds

Estimating Average Causal Effect in Latent Class Analysis (잠재범주분석을 이용한 원인적 영향력 추론에 관한 연구)

  • Park, Gayoung;Chung, Hwan
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1077-1095
    • /
    • 2014
  • Unlike randomized trial, statistical strategies for inferring the unbiased causal relationship are required in the observational studies. Recently, new methods for the causal inference in the observational studies have been proposed such as the matching with the propensity score or the inverse probability treatment weighting. They have focused on how to control the confounders and how to evaluate the effect of the treatment on the result variable. However, these conventional methods are valid only when the treatment variable is categorical and both of the treatment and the result variables are directly observable. Research on the causal inference can be challenging in part because it may not be possible to directly observe the treatment and/or the result variable. To address this difficulty, we propose a method for estimating the average causal effect when both of the treatment and the result variables are latent. The latent class analysis has been applied to calculate the propensity score for the latent treatment variable in order to estimate the causal effect on the latent result variable. In this work, we investigate the causal effect of adolescents delinquency on their substance use using data from the 'National Longitudinal Study of Adolescent Health'.

Collaborative Tag-Based Recommendation Methods Using the Principle of Latent Factor Models (잠재 요인 모델의 원리를 이용한 협업 태그 기반 추천 방법)

  • Kim, Hyoung-Do
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.47-57
    • /
    • 2009
  • Collaborative tagging systems allow users to attach tags to diverse sharable contents in social networks. These tags provide usefulness in reusing the contents for all community members as well as their creators. Three-dimensional data composed of users, items, and tags are used in the collaborative tag-based recommendation. They are generally more voluminous and sparse than two-dimensional data composed of users and items. Therefore, there are many difficulties in applying existing collaborative filtering methods directly to them. Latent factor models, which are also successful in the area of collaborative filtering recently, discover latent features(factors) for explaining observed values and solve problems based on the features. However, establishing the models require much time and efforts. In order to apply the latent factor models to three-dimensional collaborative filtering data, we have to overcome the difficulty of establishing them. This paper proposes various methods for determining preferences of users to items via establishing an intuitive model by assuming tags used for items as latent factors to users and items respectively. They are compared using real data for concluding desirable directions.

  • PDF

Latent causal inference using the propensity score from latent class regression model (잠재범주회귀모형의 성향점수를 이용한 잠재변수의 원인적 영향력 추론 연구)

  • Lee, Misol;Chung, Hwan
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.615-632
    • /
    • 2017
  • Unlike randomized trial, statistical strategies for inferring the unbiased causal relationship are required in the observational studies. The matching with the propensity score is one of the most popular methods to control the confounders in order to evaluate the effect of the treatment on the outcome variable. Recently, new methods for the causal inference in latent class analysis (LCA) have been proposed to estimate the average causal effect (ACE) of the treatment on the latent discrete variable. They have focused on the application study for the real dataset to estimate the ACE in LCA. In practice, however, the true values of the ACE are not known, and it is difficult to evaluate the performance of the estimated the ACE. In this study, we propose a method to generate a synthetic data using the propensity score in the framework of LCA, where treatment and outcome variables are latent. We then propose a new method for estimating the ACE in LCA and evaluate its performance via simulation studies. Furthermore we present an empirical analysis based on data form the 'National Longitudinal Study of Adolescents Health,' where puberty as a latent treatment and substance use as a latent outcome variable.

The Effect of Climate Data Applying Temperature Lapse Rate on Prediction of Potential Forest Distribution (기온감율을 적용한 기후자료가 잠재 산림분포 예측에 미치는 영향)

  • Lee, Sang-Chul;Choi, Sung-Ho;Lee, Woo-Kyun;Yoo, Seong-Jin;Byun, Jae-Gyun
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.19 no.2
    • /
    • pp.19-27
    • /
    • 2011
  • The objective of this study was to suggest technical approaches for preparation and down scaling of climate data used for predicting the potential forest distribution. To predict the forest distribution, we employed a Korean-specific forest distribution model, so-called the TAG(Thermal Analogy Group), and defined the PFT(Plant Functional Types) based on the HyTAG(Hydrological and Thermal Analogy Group). The climate data with 20km spatial resolution were interpolated to fit on the input data format with 1km spatial resolution. Two potential forest distribution maps were estimated using climate data constructed by kriging, one of the interpolation and down-scaling approaches, with and without lapse rate considered. Through the verification process by comparing two potential maps with the actual vegetation map, the forest distribution using the lapse rate was proven to be 38% more accurate.

Development of the Potential Query Recommendation System using User's Search History (사용자 검색이력 기반의 잠재적 질의어 추천 시스템 개발)

  • Park, Jeongbae;Park, Kinam;Lim, Heuiseok
    • Journal of Digital Convergence
    • /
    • v.11 no.7
    • /
    • pp.193-199
    • /
    • 2013
  • In this paper, a user search history based potential query recommendation system is proposed to enable the user of information search system to represent one's potential desire for information in terms of query and to facilitate the desired information to be searched. The proposed system has analyzed the association with the existing users's search histories based on the users' search query, and it has extracted the users's potential desire for information. The extracted potential desire for information is represented in terms of recommended query and thereby made recommendations to users. In order to analyze the effectiveness of the system proposed in this paper, we conducted behavioral experiments by using search histories of 27656. As a result of behavioral experiments, the experiment subjects were found to show a statistically significant higher level of satisfaction when using the proposed system as compared to using general search engines.

Rapid Ecoassessment Technique about Anthropogenic Disturbance Potentiality of Land Use (토지의 훼손 잠재성에 대한 신속한 생태평가기법)

  • 김종원
    • The Korean Journal of Ecology
    • /
    • v.26 no.1
    • /
    • pp.19-22
    • /
    • 2003
  • In order to determine the degree of anthropogenic disturbance potentiality (ADP) of the area a rapid ecoassessment technique was developed on the basis of actual vegetation map. ADP degree of relevant unit cell was computed by using four criteria of land use patterns such as forested area, open water and stream, agricultural area, and urbanized area. Ultimate ADP degree of each cell was obtained by means of direct and indirect computation process. Finally the map of ADP was drawn and analyzed. Vulnerable cell and disturbance nuclei were determined according to disturbance vector which is a kind of potential disturbance pressure of relevant cell influenced by surrounding cell. A case study was accomplished in the Gijang area of Pusan metropolitan city. 973 meshes (500m×500m) were analyzed and a total of 79 meshes were currently threatened. Present technique of rapid ecoassessment was practically useful for diagnosing and planning land use.

Mortality and Potential Years of Life Lost comparison of lung cancer between Korea and OECD countries (우리나라와 OECD 국가 간의 폐암 사망률과 잠재수명손실연수(PYLL)에 관한 비교)

  • Kim, Dong-Seok;Kang, Soo-Won;Park, Ji-Won
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.7
    • /
    • pp.2515-2521
    • /
    • 2010
  • The aim of this study is to analyze the mortality and potential years of life lost (PYLL) by malignant neoplasm of lung between OECD countries and Korea. Based on the result, we tried to point out a problem on mortality caused malignant neoplasm of lung to make the best strategy for policy and education on public health. Using the ANOVA analysis between Korean and OECD countries, the lung cancer-induced mortality and PYLL in total and gender-specific Korean population were greater after 21th century than before. In particular, the PYLL was sharply elevated than the mortality. Taken together, the present study indicated that the lung cancer-induced PYLL between Korean and OECD countries can be more important parameter.

Variable selection for latent class analysis using clustering efficiency (잠재변수 모형에서의 군집효율을 이용한 변수선택)

  • Kim, Seongkyung;Seo, Byungtae
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.721-732
    • /
    • 2018
  • Latent class analysis (LCA) is an important tool to explore unseen latent groups in multivariate categorical data. In practice, it is important to select a suitable set of variables because the inclusion of too many variables in the model makes the model complicated and reduces the accuracy of the parameter estimates. Dean and Raftery (Annals of the Institute of Statistical Mathematics, 62, 11-35, 2010) proposed a headlong search algorithm based on Bayesian information criteria values to choose meaningful variables for LCA. In this paper, we propose a new variable selection procedure for LCA by utilizing posterior probabilities obtained from each fitted model. We propose a new statistic to measure the adequacy of LCA and develop a variable selection procedure. The effectiveness of the proposed method is also presented through some numerical studies.

Method to Identify Future Technology Candidates: Biofuel Case (잠재적 후보기술 경로 탐색방법 : 바이오 연료 사례)

  • Lee, Yongseung;Shin, Juneseuk
    • Journal of Technology Innovation
    • /
    • v.28 no.3
    • /
    • pp.29-53
    • /
    • 2020
  • Existing main path analysis is useful to clarify the backbone of technology developments over the past, but has difficulty in identifying future technology candidates, and also in anticipating changes in the mainstream technology. Our method develops a growth velocity indicator, and combines it with key-route analysis and traversal counts measure in the main path analysis. It enables us to identify rapidly growing paths of future technology candidates, and further to evaluate the relative growth potential of such paths by which can replace the mainstream technology in the main path. Our method can contribute to identifying future technology candidates in a quantitative way by using patents, and broaden the scope of main path analysis research toward foresight. It can be useful for technology strategy in practice. Biofuel technology is exemplified.

International Patent Classificaton Using Latent Semantic Indexing (잠재 의미 색인 기법을 이용한 국제 특허 분류)

  • Jin, Hoon-Tae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.1294-1297
    • /
    • 2013
  • 본 논문은 기계학습을 통하여 특허문서를 국제 특허 분류(IPC) 기준에 따라 자동으로 분류하는 시스템에 관한 연구로 잠재 의미 색인 기법을 이용하여 분류의 성능을 높일 수 있는 방법을 제안하기 위한 연구이다. 종래 특허문서에 관한 IPC 자동 분류에 관한 연구가 단어 매칭 방식의 색인 기법에 의존해서 이루어진바가 있으나, 현대 기술용어의 발생 속도와 다양성 등을 고려할 때 특허문서들 간의 관련성을 분석하는데 있어서는 단어 자체의 빈도 보다는 용어의 개념에 의한 접근이 보다 효과적일 것이라 판단하여 잠재 의미 색인(LSI) 기법에 의한 분류에 관한 연구를 하게 된 것이다. 실험은 단어 매칭 방식의 색인 기법의 대표적인 자질선택 방법인 정보획득량(IG)과 카이제곱 통계량(CHI)을 이용했을 때의 성능과 잠재 의미 색인 방법을 이용했을 때의 성능을 SVM, kNN 및 Naive Bayes 분류기를 사용하여 분석하고, 그중 가장 성능이 우수하게 나오는 SVM을 사용하여 잠재 의미 색인에서 명사가 해당 용어의 개념적 의미 구조를 구축하는데 기여하는 정도가 어느 정도인지 평가함과 아울러, LSI 기법 이용시 최적의 성능을 나타내는 특이값의 범위를 실험을 통해 비교 분석 하였다. 분석결과 LSI 기법이 단어 매칭 기법(IG, CHI)에 비해 우수한 성능을 보였으며, SVM, Naive Bayes 분류기는 단어 매칭 기법에서는 비슷한 수준을 보였으나, LSI 기법에서는 SVM의 성능이 월등이 우수한 것으로 나왔다. 또한, SVM은 LSI 기법에서 약 3%의 성능 향상을 보였지만 Naive Bayes는 오히려 20%의 성능 저하를 보였다. LSI 기법에서 명사가 잠재적 의미 구조에 미치는 영향은 모든 단어들을 내용어로 한 경우 보다 약 10% 더 향상된 결과를 보여주었고, 특이값의 범위에 따른 성능 분석에 있어서는 30% 수준에 Rank 되는 범위에서 가장 높은 성능의 결과가 나왔다.