• Title/Summary/Keyword: process rate

Search Result 12,519, Processing Time 0.047 seconds

Discovering Promising Convergence Technologies Using Network Analysis of Maturity and Dependency of Technology (기술 성숙도 및 의존도의 네트워크 분석을 통한 유망 융합 기술 발굴 방법론)

  • Choi, Hochang;Kwahk, Kee-Young;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.101-124
    • /
    • 2018
  • Recently, most of the technologies have been developed in various forms through the advancement of single technology or interaction with other technologies. Particularly, these technologies have the characteristic of the convergence caused by the interaction between two or more techniques. In addition, efforts in responding to technological changes by advance are continuously increasing through forecasting promising convergence technologies that will emerge in the near future. According to this phenomenon, many researchers are attempting to perform various analyses about forecasting promising convergence technologies. A convergence technology has characteristics of various technologies according to the principle of generation. Therefore, forecasting promising convergence technologies is much more difficult than forecasting general technologies with high growth potential. Nevertheless, some achievements have been confirmed in an attempt to forecasting promising technologies using big data analysis and social network analysis. Studies of convergence technology through data analysis are actively conducted with the theme of discovering new convergence technologies and analyzing their trends. According that, information about new convergence technologies is being provided more abundantly than in the past. However, existing methods in analyzing convergence technology have some limitations. Firstly, most studies deal with convergence technology analyze data through predefined technology classifications. The technologies appearing recently tend to have characteristics of convergence and thus consist of technologies from various fields. In other words, the new convergence technologies may not belong to the defined classification. Therefore, the existing method does not properly reflect the dynamic change of the convergence phenomenon. Secondly, in order to forecast the promising convergence technologies, most of the existing analysis method use the general purpose indicators in process. This method does not fully utilize the specificity of convergence phenomenon. The new convergence technology is highly dependent on the existing technology, which is the origin of that technology. Based on that, it can grow into the independent field or disappear rapidly, according to the change of the dependent technology. In the existing analysis, the potential growth of convergence technology is judged through the traditional indicators designed from the general purpose. However, these indicators do not reflect the principle of convergence. In other words, these indicators do not reflect the characteristics of convergence technology, which brings the meaning of new technologies emerge through two or more mature technologies and grown technologies affect the creation of another technology. Thirdly, previous studies do not provide objective methods for evaluating the accuracy of models in forecasting promising convergence technologies. In the studies of convergence technology, the subject of forecasting promising technologies was relatively insufficient due to the complexity of the field. Therefore, it is difficult to find a method to evaluate the accuracy of the model that forecasting promising convergence technologies. In order to activate the field of forecasting promising convergence technology, it is important to establish a method for objectively verifying and evaluating the accuracy of the model proposed by each study. To overcome these limitations, we propose a new method for analysis of convergence technologies. First of all, through topic modeling, we derive a new technology classification in terms of text content. It reflects the dynamic change of the actual technology market, not the existing fixed classification standard. In addition, we identify the influence relationships between technologies through the topic correspondence weights of each document, and structuralize them into a network. In addition, we devise a centrality indicator (PGC, potential growth centrality) to forecast the future growth of technology by utilizing the centrality information of each technology. It reflects the convergence characteristics of each technology, according to technology maturity and interdependence between technologies. Along with this, we propose a method to evaluate the accuracy of forecasting model by measuring the growth rate of promising technology. It is based on the variation of potential growth centrality by period. In this paper, we conduct experiments with 13,477 patent documents dealing with technical contents to evaluate the performance and practical applicability of the proposed method. As a result, it is confirmed that the forecast model based on a centrality indicator of the proposed method has a maximum forecast accuracy of about 2.88 times higher than the accuracy of the forecast model based on the currently used network indicators.

A Study on Risk Parity Asset Allocation Model with XGBoos (XGBoost를 활용한 리스크패리티 자산배분 모형에 관한 연구)

  • Kim, Younghoon;Choi, HeungSik;Kim, SunWoong
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.135-149
    • /
    • 2020
  • Artificial intelligences are changing world. Financial market is also not an exception. Robo-Advisor is actively being developed, making up the weakness of traditional asset allocation methods and replacing the parts that are difficult for the traditional methods. It makes automated investment decisions with artificial intelligence algorithms and is used with various asset allocation models such as mean-variance model, Black-Litterman model and risk parity model. Risk parity model is a typical risk-based asset allocation model which is focused on the volatility of assets. It avoids investment risk structurally. So it has stability in the management of large size fund and it has been widely used in financial field. XGBoost model is a parallel tree-boosting method. It is an optimized gradient boosting model designed to be highly efficient and flexible. It not only makes billions of examples in limited memory environments but is also very fast to learn compared to traditional boosting methods. It is frequently used in various fields of data analysis and has a lot of advantages. So in this study, we propose a new asset allocation model that combines risk parity model and XGBoost machine learning model. This model uses XGBoost to predict the risk of assets and applies the predictive risk to the process of covariance estimation. There are estimated errors between the estimation period and the actual investment period because the optimized asset allocation model estimates the proportion of investments based on historical data. these estimated errors adversely affect the optimized portfolio performance. This study aims to improve the stability and portfolio performance of the model by predicting the volatility of the next investment period and reducing estimated errors of optimized asset allocation model. As a result, it narrows the gap between theory and practice and proposes a more advanced asset allocation model. In this study, we used the Korean stock market price data for a total of 17 years from 2003 to 2019 for the empirical test of the suggested model. The data sets are specifically composed of energy, finance, IT, industrial, material, telecommunication, utility, consumer, health care and staple sectors. We accumulated the value of prediction using moving-window method by 1,000 in-sample and 20 out-of-sample, so we produced a total of 154 rebalancing back-testing results. We analyzed portfolio performance in terms of cumulative rate of return and got a lot of sample data because of long period results. Comparing with traditional risk parity model, this experiment recorded improvements in both cumulative yield and reduction of estimated errors. The total cumulative return is 45.748%, about 5% higher than that of risk parity model and also the estimated errors are reduced in 9 out of 10 industry sectors. The reduction of estimated errors increases stability of the model and makes it easy to apply in practical investment. The results of the experiment showed improvement of portfolio performance by reducing the estimated errors of the optimized asset allocation model. Many financial models and asset allocation models are limited in practical investment because of the most fundamental question of whether the past characteristics of assets will continue into the future in the changing financial market. However, this study not only takes advantage of traditional asset allocation models, but also supplements the limitations of traditional methods and increases stability by predicting the risks of assets with the latest algorithm. There are various studies on parametric estimation methods to reduce the estimated errors in the portfolio optimization. We also suggested a new method to reduce estimated errors in optimized asset allocation model using machine learning. So this study is meaningful in that it proposes an advanced artificial intelligence asset allocation model for the fast-developing financial markets.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Short-Term Efficacy of Steroid and Immunosuppressive Drugs in Patients with Idiopathic Pulmonary Fibrosis and Pre-treatment Factors Associated with Favorable Response (특발성폐섬유화증에서 스테로이드와 면역억제제의 단기 치료효과 및 치료반응 예측인자)

  • Kang, Kyeong-Woo;Park, Sang-Joon;Koh, Young-Min;Lee, Sang-Pyo;Suh, Gee-Young;Chung, Man-Pyo;Han, Jung-Ho;Kim, Ho-Joong;Kwon, O-Jung;Lee, Kyung-Soo;Rhee, Chong-H.
    • Tuberculosis and Respiratory Diseases
    • /
    • v.46 no.5
    • /
    • pp.685-696
    • /
    • 1999
  • Background : Idiopathic pulmonary fibrosis (IPF) is a diffuse inflammatory and fibrosing process that occurs within the interstitium and alveolus of the lung with invariably poor prognosis. The major problem in management of IPF results from the variable rate of disease progression and the difficulties in predicting the response to therapy. The purpose of this retrospective study was to evaluate the short-term efficacy of steroid and immunosuppressive therapy for IPF and to identify the pre-treatment determinants of favorable response. Method : Twenty patients of IPF were included. Diagnosis of IPF was proven by thoracoscopic lung biopsy and they were presumed to have active progressive disease. The baseline evaluation in these patients included clinical history, pulmonary function test, bronchoalveolar lavage (BAL), and chest high resolution computed tomography (HRCT). Fourteen patients received oral prednisolone treatment with initial dose of 1mg/kg/day for 8 to 12 weeks and then tapering to low-dose prednisolone (0.25mg/kg/day). Six patients who previously had experienced significant side effects to steroid received 2mg/kg/day of oral cyclophosphamide with or without low-dose prednisolone. Follow-up evaluation was performed after 6 months of therapy. If patients met more than one of followings, they were considered to be responders : (1) improvement of more than one grade in dyspnea index, (2) improvement in FVC or TLC more than 10% or improvement in DLco more than 20% (3) decreased extent of disease in chest HRCT findings. Result : One patient died of extrapulmonary cause after 3 month of therapy, and another patient gave up any further medical therapy due to side effect of steroid. Eventually medical records of 18 patients were analyzed. Nine of 18 patients were classified into responders and the other nine patients into nonresponders. The histopathologic diagnosis of the responders were all nonspecific interstitial pneumonia (NSIP) and that of nonresponders were all usual interstitial pneumonia (UIP) (p<0.001). The other significant differences between the two groups were female predominance (p<0.01), smoking history (p<0.001), severe grade of dyspnea (p<0.05), lymphocytosis in BAL fluid ($23.8{\pm}16.3%$ vs $7.8{\pm}3.6%$, p<0.05), and less honeycombing in chest HRCT findings (0% vs $9.2{\pm}2.3%$, p<0.001). Conclusion : Our results suggest that patients with histopathologic diagnosis of NSIP or lymphocytosis in BAL fluid are more likely to respond to steroid or immunosuppressive therapy. Clinical results in large numbers of IPF patients will be required to identify the independent variables.

  • PDF

The Experimental Study for Myocardial Preservation Effect of Ischemic Preconditioning (허혈성 전조건화 유발이 심근보호에 미치는 영향에 관한 실험적 연구)

  • 이종국;박일환;이상헌
    • Journal of Chest Surgery
    • /
    • v.37 no.2
    • /
    • pp.119-130
    • /
    • 2004
  • Decrease in cardiac function after open heart surgery is due to an ischemia induced myocardial damage during surgery, and ischemic preconditioning, a condition in which the myocardial damage does not accumulate after repeated episodes of ischemia but protects itself from damage after prolonged ischemia due to myocytes tolerating the ischemia, is known to diminish myocardial damage, which also helps the recovery of myocardium after reperfusion, and decreases incidences of arrythmia. Our study is performed to display the ischemic preconditioning and show the myocardial protective effect by applying cardioplegic solution to the heart removed from rat. Material and Method: Sprague-Dawley male rats were used, They were fixed on a modified isolated working heart model after cannulation. The reperfusion process was according to non-working and working heart methods and the working method was executed for 20 minutes in which the heart rate, aortic pressure, aortic flow and coronary flow were measured and recorded. The control group is the group which the extracted heart was fixed on the isolated working heart model, recovered by reperfusion 60 minutes after infusion and preserved in the cardioplegic solution 20 minutes after the working heart perfusion and aortic cross clamp, The thesis groups were divided into group I, which ischemic hearts that were hypoxia induced were perfused by cardioplegic solution and preserved for 60 minutes; group II, the cardioplegic solution was infused 45 seconds (II-1), 1 minutes (II-2), 3 minutes (II-3), after the ischemia induction, 20 minutes after working heart perfusion and aortic cross clamp; and group III, hearts were executed on working heart perfusion for 20 minutes and aortic cross clamp was performed for 45 seconds (III-1), 1minute (III-2), 3 minutes (III-3), reperfused for 2 minutes to recover the heart, and then aortic cross clamping was repeated for reperfusion, all the groups were compared based on hemodynamic performance after reperfusion of the heart after preservation for 60 minutes. Result: The recovery time until spontaneous heart beat was longer in groups I, II-3, III-2 and III-3 to control group (p<0.01). Group III-1 (p<0.05) had better results in terms of recovery in number of heart rates compared to control group, and recovered better compared to II-1 (p<0.05). The recovery of aortic blood pressure favored group III-1 (p<0.05) and had better outcomes compared with II-1 (p<0.01). Group III-1 also showed best results in terms of cardiac output (p<0.05) and group III-2 was better compared to II-2 (p<0.05). Group I (p<0.01) and II-3 (p<0.05) showed more cardiac edema than control group. Conclusion: When the effects of other organs are dismissed, protecting the heart by infusion of cardioplegic solution after enforcing ischemia for a short period of time before the onset of abnormal heart beats for preconditioning has a better recovery effect in the cardioplegic group with preconditioning compared to the cardioplegic solution itself. we believe that further study is needed to find a more effective method of preconditioning.

The Determination of Trust in Franchisor-Franchisee Relationships in China (중국 프랜차이즈 시스템에서의 본부와 가맹점간 신뢰의 영향요인)

  • Shin, Geon-Cheol;Ma, Yaokun
    • Journal of Global Scholars of Marketing Science
    • /
    • v.18 no.2
    • /
    • pp.65-88
    • /
    • 2008
  • Since the implementation of economic reforms in 1978, the Chinese economy grows rapidly at an average annul growth rate of 9% over the post two decades. Franchising has been widely recognized as an important source of entrepreneurial activity. Trust is important in that it facilitates relational exchanges by permits partners to transcend short-run inequities or risks to concentrate on long-term profits or gains. In the relationship between the franchisors and franchisees, trust has been described as an important source of competitive advantage. However, little research has been done on the factors affecting trust in Chinese franchisor-franchisee relationships. The purpose of this study is to investigate what factors affect the trust in the franchise system in China, and to provide guidelines and insights to franchisors which enter Chinese market. In this study, according to Morgan and Hunt (1994), trust is defined as the extending when one party has confidence in an exchange partner's reliability and integrity. We offered a conceptual model of the empirical study. The model shows that the factors affecting the trust include franchisor's supports, communication, satisfaction with previous outcome and conflict. We also suggested the franchisor's supports and communication like to enhance the franchisee's satisfaction with previous outcome, and the franchisor's supports, communication and he franchisee's satisfaction with previous outcome tend to decrease conflict. Before the formal study, a pretest involving exploratory interviews with owners from three franchisees was conducted to make sure the questionnaire was relevant and clear to the respondents. The data were collected using trained interviewers to carry out personal interviews with the aid of an unidentified, muti-page, structured questionnaire. The respondents comprised of owners, managers, and owner managers of franchisee-owned food service franchises located in Beijing, China. Even though a total of 256 potential franchises were initially contacted, the finally usable sample consisted of 125 respondents. As expected, the sampling method was successful in soliciting respondents with waried personal and firm characteristics. Self-administrated questionnaires were used for all measures. And established scales were used to measure the latent constructs in this study. The measures tapped the franchisees' perceptions of the relationship with the referent franchisor. Five-point Likert-type scales ranging from "strongly disagree" (=1) to "strongly agree" (=7) were used throughout the constructs (trust, eight items; support, five items; communication, four items; satisfaction, six items; conflict, three items). The reliability measurements traditionally employed, such as the Cronbach's alpha, were used. All the reliabilities were greater than.80. The proposed measurement model was estimated using SPSS 12.0 and AMOS 5.0 analysis package. We conducted A series of exploratory factor analyses and confirmatory factor analyses to assess the convergent validity, discriminant validity, and reliability. The results indicate reasonable overall fits between the model and the observed data. The overall fit of measurement model were $X^2$= 159.699, p=0.004, d.f. = 116, GFI =.879, NFI =.898, CFI =.969, IFI =.970, TLI =.959, RMR =.058. The results demonstrated that the data reasonably fitted the model. We also examined construct reliability and reliability and average variance extracted (AVE). The construct reliability of each construct was greater than.80 and the AVE of each construct was greater than.50. According to the analysis of Structure Equation Modeling (SEM), the results of path model indicated an adequate fit of the model: $X^2$= 142.126, p = 0.044, d.f. = 115, GFI =.892, NFI =.909, CFI =.981, IFI =.981, TLI =.974, RMR =.057. As hypothesized, the results showed that it is strategically important to establish trust in a franchise system, and the franchisor's supports, communication and satisfaction with previous outcome tend to reinforce franchisee's trust. The results also showed trust seems to decrease as the experience of conflict episodes increases. And we also noticed that franchisor's supports and communication tend to enhance the franchisee's satisfaction with previous outcome, and communication tend to decrease conflict. If the trust between the franchisor and franchisee can be established in a franchise system, franchising offers many benefits and reduces many costs. To manage a mutual trust of relationship with their franchisees, franchisor's should provide support effectively to their franchisees. Effective assistant services have direct effect on franchisees' satisfaction with previous outcome and trust in franchisor. Especially, franchise sales process, orientation, and training in the start-up period are key elements for success of the franchise system. Franchisor's support is an accumulated separate satisfaction evaluation with different kind of service provided by the franchisor. And providing support definitely can improve the trustworthy image of the franchisor. In the franchise system, conflicts of interests and exertions of different power sources are very common. The experience of conflict episodes seems to negatively relate to trust. Therefore, it is important to reduce the negative side of the relationship conflicts. Communication actually plays a broader role in reducing conflict and establish mutual trust in franchisor-franchisee relationship. And effective communication between franchisors and franchisees can improve franchisees' satisfaction toward the franchise system. As the diversification of Chinese markets, both franchisors and franchisees must keep the relevant, timely, and reliable communication. And it is very important to improve the quality of communication. Satisfaction with precious outcomes seems to positively relate to trust. Franchisors and franchisees that are highly satisfied with the previous outcomes that flow from their relationship will perceive their partner as advancing their goal achievement. Therefore, it is necessary for both franchisor and their franchisees to make the welfare of partner with effort. Little literature has focused on what factors affect the trust between franchisors and their franchisees in China. This study developed the hypotheses regarding the factors affecting trust in the transaction relationship. The results of data analysis supported the hypotheses strongly. There are certain limitations in this study. First, we may point out that some other factors missed in this study could be significantly important. Second, the context of this study, food service industry, limits its potential generalizability for all franchise systems. More studies in different categories of franchise system are needed to broaden its generalizability. Third, the model was tested empirically in a sample in Beijing, more empirical tests of the proposed model in other Chinese areas are needed. Finally, the analysis in this study was solely based on the perception of franchisees and the opinions of franchisors were not included.

  • PDF

Changes in blood pressure and determinants of blood pressure level and change in Korean adolescents (성장기 청소년의 혈압변화와 결정요인)

  • Suh, Il;Nam, Chung-Mo;Jee, Sun-Ha;Kim, Suk-Il;Kim, Young-Ok;Kim, Sung-Soon;Shim, Won-Heum;Kim, Chun-Bae;Lee, Kang-Hee;Ha, Jong-Won;Kang, Hyung-Gon;Oh, Kyung-Won
    • Journal of Preventive Medicine and Public Health
    • /
    • v.30 no.2 s.57
    • /
    • pp.308-326
    • /
    • 1997
  • Many studies have led to the notion that essential hypertension in adults is the result of a process that starts early in life: investigation of blood pressure(BP) in children and adolescents can therefore contribute to knowledge of the etiology of the condition. A unique longitudinal study on BP in Korea, known as Kangwha Children's Blood Pressure(KCBP) Study was initiated in 1986 to investigate changes in BP in children. This study is a part of the KCBP study. The purposes of this study are to show changes in BP and to determine factors affecting to BP level and change in Korean adolescents during age period 12 to 16 years. A total of 710 students(335 males, 375 females) who were in the first grade at junior high school(12 years old) in 1992 in Kangwha County, Korea have been followed to measure BP and related factors(anthropometric, serologic and dietary factors) annually up to 1996. A total of 562 students(242 males, 320 females) completed all five annual examinations. The main results are as follows: 1. For males, mean systolic and diastolic BP at age 12 and 16 years old were 108.7 mmHg and 118.1 mmHg(systolic), and 69.5 mmHg and 73.4 mmHg(diastolic), respectively. BP level was the highest when students were at 15 years old. For females, mean systolic and diastolic BP at age 12 and 16 years were 114.4 mmHg and 113.5 mmHg(systolic) and 75.2 mmHg and 72.1 mmHg(diastolic), respectively. BP level reached the highest point when they were 13-14 years old. 2. Anthropometric variables(height, weight and body mass index, etc) increased constantly during the study period for males. However, the rate of increase was decreased for females after age 15 years. Serum total cholesterol decreased and triglyceride increased according to age for males, but they did not show any significant trend fer females. Total fat intake increased at age 16 years compared with that at age 14 years. Compositions of carbohydrate, protein and fat among total energy intake were 66.2:12.0:19.4, 64.1:12.1:21.8 at age 14 and 16 years, respectively. 3. Most of anthropometric measures, especially, height, body mass index(BMI) and triceps skinfold thickness showed a significant correlation with BP level in both sexes. When BMI was adjusted, serum total cholesterol showed a significant negative correlation with systolic BP at age 12 years in males, but at age 14 years the direction of correlation changed to positive. In females serum total cholesterol was negatively correlated with diastolic BP at age 15 and 16 years. Triglyceride and creatinine showed positive correlation with systolic and diastolic BP in males, but they did not show any correlation in females. There was no consistent findings between nutrient intake and BP level. However, protein intake correlated positively with diastolic BP level in males. 4. Blood pressure change was positively associated with changes in BMI and serum total cholesterol in both sexes. Change in creatinine was associated with BP change positively in males and negatively in females. Students whose sodium intake was high showed higher systolic and diastolic BP in males, and students whose total fat intake was high maintained lower level of BP in females. The major determinants on BP change was BMI in both sexes.

  • PDF

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.

Geochemical Equilibria and Kinetics of the Formation of Brown-Colored Suspended/Precipitated Matter in Groundwater: Suggestion to Proper Pumping and Turbidity Treatment Methods (지하수내 갈색 부유/침전 물질의 생성 반응에 관한 평형 및 반응속도론적 연구: 적정 양수 기법 및 탁도 제거 방안에 대한 제안)

  • 채기탁;윤성택;염승준;김남진;민중혁
    • Journal of the Korean Society of Groundwater Environment
    • /
    • v.7 no.3
    • /
    • pp.103-115
    • /
    • 2000
  • The formation of brown-colored precipitates is one of the serious problems frequently encountered in the development and supply of groundwater in Korea, because by it the water exceeds the drinking water standard in terms of color. taste. turbidity and dissolved iron concentration and of often results in scaling problem within the water supplying system. In groundwaters from the Pajoo area, brown precipitates are typically formed in a few hours after pumping-out. In this paper we examine the process of the brown precipitates' formation using the equilibrium thermodynamic and kinetic approaches, in order to understand the origin and geochemical pathway of the generation of turbidity in groundwater. The results of this study are used to suggest not only the proper pumping technique to minimize the formation of precipitates but also the optimal design of water treatment methods to improve the water quality. The bed-rock groundwater in the Pajoo area belongs to the Ca-$HCO_3$type that was evolved through water/rock (gneiss) interaction. Based on SEM-EDS and XRD analyses, the precipitates are identified as an amorphous, Fe-bearing oxides or hydroxides. By the use of multi-step filtration with pore sizes of 6, 4, 1, 0.45 and 0.2 $\mu\textrm{m}$, the precipitates mostly fall in the colloidal size (1 to 0.45 $\mu\textrm{m}$) but are concentrated (about 81%) in the range of 1 to 6 $\mu\textrm{m}$in teams of mass (weight) distribution. Large amounts of dissolved iron were possibly originated from dissolution of clinochlore in cataclasite which contains high amounts of Fe (up to 3 wt.%). The calculation of saturation index (using a computer code PHREEQC), as well as the examination of pH-Eh stability relations, also indicate that the final precipitates are Fe-oxy-hydroxide that is formed by the change of water chemistry (mainly, oxidation) due to the exposure to oxygen during the pumping-out of Fe(II)-bearing, reduced groundwater. After pumping-out, the groundwater shows the progressive decreases of pH, DO and alkalinity with elapsed time. However, turbidity increases and then decreases with time. The decrease of dissolved Fe concentration as a function of elapsed time after pumping-out is expressed as a regression equation Fe(II)=10.l exp(-0.0009t). The oxidation reaction due to the influx of free oxygen during the pumping and storage of groundwater results in the formation of brown precipitates, which is dependent on time, $Po_2$and pH. In order to obtain drinkable water quality, therefore, the precipitates should be removed by filtering after the stepwise storage and aeration in tanks with sufficient volume for sufficient time. Particle size distribution data also suggest that step-wise filtration would be cost-effective. To minimize the scaling within wells, the continued (if possible) pumping within the optimum pumping rate is recommended because this technique will be most effective for minimizing the mixing between deep Fe(II)-rich water and shallow $O_2$-rich water. The simultaneous pumping of shallow $O_2$-rich water in different wells is also recommended.

  • PDF