• Title/Summary/Keyword: Research Management System

Search Result 13,995, Processing Time 0.048 seconds

Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining (데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발)

  • Yoon, Seungjin;Kim, Suhwan;Shin, Kyungshik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.1-17
    • /
    • 2015
  • In this paper, we report what we have observed with regards to a prediction model for the military based on enlisted men's internal(cumulative records) and external data(SNS data). This work is significant in the military's efforts to supervise them. In spite of their effort, many commanders have failed to prevent accidents by their subordinates. One of the important duties of officers' work is to take care of their subordinates in prevention unexpected accidents. However, it is hard to prevent accidents so we must attempt to determine a proper method. Our motivation for presenting this paper is to mate it possible to predict accidents using enlisted men's internal and external data. The biggest issue facing the military is the occurrence of accidents by enlisted men related to maladjustment and the relaxation of military discipline. The core method of preventing accidents by soldiers is to identify problems and manage them quickly. Commanders predict accidents by interviewing their soldiers and observing their surroundings. It requires considerable time and effort and results in a significant difference depending on the capabilities of the commanders. In this paper, we seek to predict accidents with objective data which can easily be obtained. Recently, records of enlisted men as well as SNS communication between commanders and soldiers, make it possible to predict and prevent accidents. This paper concerns the application of data mining to identify their interests, predict accidents and make use of internal and external data (SNS). We propose both a topic analysis and decision tree method. The study is conducted in two steps. First, topic analysis is conducted through the SNS of enlisted men. Second, the decision tree method is used to analyze the internal data with the results of the first analysis. The dependent variable for these analysis is the presence of any accidents. In order to analyze their SNS, we require tools such as text mining and topic analysis. We used SAS Enterprise Miner 12.1, which provides a text miner module. Our approach for finding their interests is composed of three main phases; collecting, topic analysis, and converting topic analysis results into points for using independent variables. In the first phase, we collect enlisted men's SNS data by commender's ID. After gathering unstructured SNS data, the topic analysis phase extracts issues from them. For simplicity, 5 topics(vacation, friends, stress, training, and sports) are extracted from 20,000 articles. In the third phase, using these 5 topics, we quantify them as personal points. After quantifying their topic, we include these results in independent variables which are composed of 15 internal data sets. Then, we make two decision trees. The first tree is composed of their internal data only. The second tree is composed of their external data(SNS) as well as their internal data. After that, we compare the results of misclassification from SAS E-miner. The first model's misclassification is 12.1%. On the other hand, second model's misclassification is 7.8%. This method predicts accidents with an accuracy of approximately 92%. The gap of the two models is 4.3%. Finally, we test if the difference between them is meaningful or not, using the McNemar test. The result of test is considered relevant.(p-value : 0.0003) This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of enlisted men's data. Additionally, various independent variables used in the decision tree model are used as categorical variables instead of continuous variables. So it suffers a loss of information. In spite of extensive efforts to provide prediction models for the military, commanders' predictions are accurate only when they have sufficient data about their subordinates. Our proposed methodology can provide support to decision-making in the military. This study is expected to contribute to the prevention of accidents in the military based on scientific analysis of enlisted men and proper management of them.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

DEVELOPMENT OF STATEWIDE TRUCK TRAFFIC FORECASTING METHOD BY USING LIMITED O-D SURVEY DATA (한정된 O-D조사자료를 이용한 주 전체의 트럭교통예측방법 개발)

  • 박만배
    • Proceedings of the KOR-KST Conference
    • /
    • 1995.02a
    • /
    • pp.101-113
    • /
    • 1995
  • The objective of this research is to test the feasibility of developing a statewide truck traffic forecasting methodology for Wisconsin by using Origin-Destination surveys, traffic counts, classification counts, and other data that are routinely collected by the Wisconsin Department of Transportation (WisDOT). Development of a feasible model will permit estimation of future truck traffic for every major link in the network. This will provide the basis for improved estimation of future pavement deterioration. Pavement damage rises exponentially as axle weight increases, and trucks are responsible for most of the traffic-induced damage to pavement. Consequently, forecasts of truck traffic are critical to pavement management systems. The pavement Management Decision Supporting System (PMDSS) prepared by WisDOT in May 1990 combines pavement inventory and performance data with a knowledge base consisting of rules for evaluation, problem identification and rehabilitation recommendation. Without a r.easonable truck traffic forecasting methodology, PMDSS is not able to project pavement performance trends in order to make assessment and recommendations in the future years. However, none of WisDOT's existing forecasting methodologies has been designed specifically for predicting truck movements on a statewide highway network. For this research, the Origin-Destination survey data avaiiable from WisDOT, including two stateline areas, one county, and five cities, are analyzed and the zone-to'||'&'||'not;zone truck trip tables are developed. The resulting Origin-Destination Trip Length Frequency (00 TLF) distributions by trip type are applied to the Gravity Model (GM) for comparison with comparable TLFs from the GM. The gravity model is calibrated to obtain friction factor curves for the three trip types, Internal-Internal (I-I), Internal-External (I-E), and External-External (E-E). ~oth "macro-scale" calibration and "micro-scale" calibration are performed. The comparison of the statewide GM TLF with the 00 TLF for the macro-scale calibration does not provide suitable results because the available 00 survey data do not represent an unbiased sample of statewide truck trips. For the "micro-scale" calibration, "partial" GM trip tables that correspond to the 00 survey trip tables are extracted from the full statewide GM trip table. These "partial" GM trip tables are then merged and a partial GM TLF is created. The GM friction factor curves are adjusted until the partial GM TLF matches the 00 TLF. Three friction factor curves, one for each trip type, resulting from the micro-scale calibration produce a reasonable GM truck trip model. A key methodological issue for GM. calibration involves the use of multiple friction factor curves versus a single friction factor curve for each trip type in order to estimate truck trips with reasonable accuracy. A single friction factor curve for each of the three trip types was found to reproduce the 00 TLFs from the calibration data base. Given the very limited trip generation data available for this research, additional refinement of the gravity model using multiple mction factor curves for each trip type was not warranted. In the traditional urban transportation planning studies, the zonal trip productions and attractions and region-wide OD TLFs are available. However, for this research, the information available for the development .of the GM model is limited to Ground Counts (GC) and a limited set ofOD TLFs. The GM is calibrated using the limited OD data, but the OD data are not adequate to obtain good estimates of truck trip productions and attractions .. Consequently, zonal productions and attractions are estimated using zonal population as a first approximation. Then, Selected Link based (SELINK) analyses are used to adjust the productions and attractions and possibly recalibrate the GM. The SELINK adjustment process involves identifying the origins and destinations of all truck trips that are assigned to a specified "selected link" as the result of a standard traffic assignment. A link adjustment factor is computed as the ratio of the actual volume for the link (ground count) to the total assigned volume. This link adjustment factor is then applied to all of the origin and destination zones of the trips using that "selected link". Selected link based analyses are conducted by using both 16 selected links and 32 selected links. The result of SELINK analysis by u~ing 32 selected links provides the least %RMSE in the screenline volume analysis. In addition, the stability of the GM truck estimating model is preserved by using 32 selected links with three SELINK adjustments, that is, the GM remains calibrated despite substantial changes in the input productions and attractions. The coverage of zones provided by 32 selected links is satisfactory. Increasing the number of repetitions beyond four is not reasonable because the stability of GM model in reproducing the OD TLF reaches its limits. The total volume of truck traffic captured by 32 selected links is 107% of total trip productions. But more importantly, ~ELINK adjustment factors for all of the zones can be computed. Evaluation of the travel demand model resulting from the SELINK adjustments is conducted by using screenline volume analysis, functional class and route specific volume analysis, area specific volume analysis, production and attraction analysis, and Vehicle Miles of Travel (VMT) analysis. Screenline volume analysis by using four screenlines with 28 check points are used for evaluation of the adequacy of the overall model. The total trucks crossing the screenlines are compared to the ground count totals. L V/GC ratios of 0.958 by using 32 selected links and 1.001 by using 16 selected links are obtained. The %RM:SE for the four screenlines is inversely proportional to the average ground count totals by screenline .. The magnitude of %RM:SE for the four screenlines resulting from the fourth and last GM run by using 32 and 16 selected links is 22% and 31 % respectively. These results are similar to the overall %RMSE achieved for the 32 and 16 selected links themselves of 19% and 33% respectively. This implies that the SELINICanalysis results are reasonable for all sections of the state.Functional class and route specific volume analysis is possible by using the available 154 classification count check points. The truck traffic crossing the Interstate highways (ISH) with 37 check points, the US highways (USH) with 50 check points, and the State highways (STH) with 67 check points is compared to the actual ground count totals. The magnitude of the overall link volume to ground count ratio by route does not provide any specific pattern of over or underestimate. However, the %R11SE for the ISH shows the least value while that for the STH shows the largest value. This pattern is consistent with the screenline analysis and the overall relationship between %RMSE and ground count volume groups. Area specific volume analysis provides another broad statewide measure of the performance of the overall model. The truck traffic in the North area with 26 check points, the West area with 36 check points, the East area with 29 check points, and the South area with 64 check points are compared to the actual ground count totals. The four areas show similar results. No specific patterns in the L V/GC ratio by area are found. In addition, the %RMSE is computed for each of the four areas. The %RMSEs for the North, West, East, and South areas are 92%, 49%, 27%, and 35% respectively, whereas, the average ground counts are 481, 1383, 1532, and 3154 respectively. As for the screenline and volume range analyses, the %RMSE is inversely related to average link volume. 'The SELINK adjustments of productions and attractions resulted in a very substantial reduction in the total in-state zonal productions and attractions. The initial in-state zonal trip generation model can now be revised with a new trip production's trip rate (total adjusted productions/total population) and a new trip attraction's trip rate. Revised zonal production and attraction adjustment factors can then be developed that only reflect the impact of the SELINK adjustments that cause mcreases or , decreases from the revised zonal estimate of productions and attractions. Analysis of the revised production adjustment factors is conducted by plotting the factors on the state map. The east area of the state including the counties of Brown, Outagamie, Shawano, Wmnebago, Fond du Lac, Marathon shows comparatively large values of the revised adjustment factors. Overall, both small and large values of the revised adjustment factors are scattered around Wisconsin. This suggests that more independent variables beyond just 226; population are needed for the development of the heavy truck trip generation model. More independent variables including zonal employment data (office employees and manufacturing employees) by industry type, zonal private trucks 226; owned and zonal income data which are not available currently should be considered. A plot of frequency distribution of the in-state zones as a function of the revised production and attraction adjustment factors shows the overall " adjustment resulting from the SELINK analysis process. Overall, the revised SELINK adjustments show that the productions for many zones are reduced by, a factor of 0.5 to 0.8 while the productions for ~ relatively few zones are increased by factors from 1.1 to 4 with most of the factors in the 3.0 range. No obvious explanation for the frequency distribution could be found. The revised SELINK adjustments overall appear to be reasonable. The heavy truck VMT analysis is conducted by comparing the 1990 heavy truck VMT that is forecasted by the GM truck forecasting model, 2.975 billions, with the WisDOT computed data. This gives an estimate that is 18.3% less than the WisDOT computation of 3.642 billions of VMT. The WisDOT estimates are based on the sampling the link volumes for USH, 8TH, and CTH. This implies potential error in sampling the average link volume. The WisDOT estimate of heavy truck VMT cannot be tabulated by the three trip types, I-I, I-E ('||'&'||'pound;-I), and E-E. In contrast, the GM forecasting model shows that the proportion ofE-E VMT out of total VMT is 21.24%. In addition, tabulation of heavy truck VMT by route functional class shows that the proportion of truck traffic traversing the freeways and expressways is 76.5%. Only 14.1% of total freeway truck traffic is I-I trips, while 80% of total collector truck traffic is I-I trips. This implies that freeways are traversed mainly by I-E and E-E truck traffic while collectors are used mainly by I-I truck traffic. Other tabulations such as average heavy truck speed by trip type, average travel distance by trip type and the VMT distribution by trip type, route functional class and travel speed are useful information for highway planners to understand the characteristics of statewide heavy truck trip patternS. Heavy truck volumes for the target year 2010 are forecasted by using the GM truck forecasting model. Four scenarios are used. Fo~ better forecasting, ground count- based segment adjustment factors are developed and applied. ISH 90 '||'&'||' 94 and USH 41 are used as example routes. The forecasting results by using the ground count-based segment adjustment factors are satisfactory for long range planning purposes, but additional ground counts would be useful for USH 41. Sensitivity analysis provides estimates of the impacts of the alternative growth rates including information about changes in the trip types using key routes. The network'||'&'||'not;based GMcan easily model scenarios with different rates of growth in rural versus . . urban areas, small versus large cities, and in-state zones versus external stations. cities, and in-state zones versus external stations.

  • PDF

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

Development and application of prediction model of hyperlipidemia using SVM and meta-learning algorithm (SVM과 meta-learning algorithm을 이용한 고지혈증 유병 예측모형 개발과 활용)

  • Lee, Seulki;Shin, Taeksoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.111-124
    • /
    • 2018
  • This study aims to develop a classification model for predicting the occurrence of hyperlipidemia, one of the chronic diseases. Prior studies applying data mining techniques for predicting disease can be classified into a model design study for predicting cardiovascular disease and a study comparing disease prediction research results. In the case of foreign literatures, studies predicting cardiovascular disease were predominant in predicting disease using data mining techniques. Although domestic studies were not much different from those of foreign countries, studies focusing on hypertension and diabetes were mainly conducted. Since hypertension and diabetes as well as chronic diseases, hyperlipidemia, are also of high importance, this study selected hyperlipidemia as the disease to be analyzed. We also developed a model for predicting hyperlipidemia using SVM and meta learning algorithms, which are already known to have excellent predictive power. In order to achieve the purpose of this study, we used data set from Korea Health Panel 2012. The Korean Health Panel produces basic data on the level of health expenditure, health level and health behavior, and has conducted an annual survey since 2008. In this study, 1,088 patients with hyperlipidemia were randomly selected from the hospitalized, outpatient, emergency, and chronic disease data of the Korean Health Panel in 2012, and 1,088 nonpatients were also randomly extracted. A total of 2,176 people were selected for the study. Three methods were used to select input variables for predicting hyperlipidemia. First, stepwise method was performed using logistic regression. Among the 17 variables, the categorical variables(except for length of smoking) are expressed as dummy variables, which are assumed to be separate variables on the basis of the reference group, and these variables were analyzed. Six variables (age, BMI, education level, marital status, smoking status, gender) excluding income level and smoking period were selected based on significance level 0.1. Second, C4.5 as a decision tree algorithm is used. The significant input variables were age, smoking status, and education level. Finally, C4.5 as a decision tree algorithm is used. In SVM, the input variables selected by genetic algorithms consisted of 6 variables such as age, marital status, education level, economic activity, smoking period, and physical activity status, and the input variables selected by genetic algorithms in artificial neural network consist of 3 variables such as age, marital status, and education level. Based on the selected parameters, we compared SVM, meta learning algorithm and other prediction models for hyperlipidemia patients, and compared the classification performances using TP rate and precision. The main results of the analysis are as follows. First, the accuracy of the SVM was 88.4% and the accuracy of the artificial neural network was 86.7%. Second, the accuracy of classification models using the selected input variables through stepwise method was slightly higher than that of classification models using the whole variables. Third, the precision of artificial neural network was higher than that of SVM when only three variables as input variables were selected by decision trees. As a result of classification models based on the input variables selected through the genetic algorithm, classification accuracy of SVM was 88.5% and that of artificial neural network was 87.9%. Finally, this study indicated that stacking as the meta learning algorithm proposed in this study, has the best performance when it uses the predicted outputs of SVM and MLP as input variables of SVM, which is a meta classifier. The purpose of this study was to predict hyperlipidemia, one of the representative chronic diseases. To do this, we used SVM and meta-learning algorithms, which is known to have high accuracy. As a result, the accuracy of classification of hyperlipidemia in the stacking as a meta learner was higher than other meta-learning algorithms. However, the predictive performance of the meta-learning algorithm proposed in this study is the same as that of SVM with the best performance (88.6%) among the single models. The limitations of this study are as follows. First, various variable selection methods were tried, but most variables used in the study were categorical dummy variables. In the case with a large number of categorical variables, the results may be different if continuous variables are used because the model can be better suited to categorical variables such as decision trees than general models such as neural networks. Despite these limitations, this study has significance in predicting hyperlipidemia with hybrid models such as met learning algorithms which have not been studied previously. It can be said that the result of improving the model accuracy by applying various variable selection techniques is meaningful. In addition, it is expected that our proposed model will be effective for the prevention and management of hyperlipidemia.

『황제내경소문(黃帝內經素問)·칠편대론(七篇大論)』 왕빙 주본(注本)을 통(通)한 운기학설(運氣學說) 관(關)한 연구(硏究)

  • Kim, Gi-Uk;Park, Hyeon-Guk
    • The Journal of Dong Guk Oriental Medicine
    • /
    • v.4
    • /
    • pp.109-140
    • /
    • 1995
  • As we considered in the main subjects, investigations on the theory of 'Doctrine on five elements' motion and six kinds of natural factors(運氣學說)' through 'Wang Bing's Commentary(王氷 注本)' of 'The seven great chapters in The Yellow Emperor's Internal Classic Su Wen' ("黃帝內經素問 七篇大論") are as follows. (1) In The seven great chapters("七篇大論")' Wang Bing supplement theory and in the academic aspects as a interpreter, judging from 'forget(亡)' character. expressed in the 'The missing chapters("素問遺篇")', 'Bonbyung-ron("本病論")' and 'Jabeob-ron(刺法論)', 'The seven great chapters("七篇大論")' must be supplementary work by Wang Bing. Besides, he quoted such forty books as medical books, taoist books, confucianist books, miscellaneous books, etc in the commentary and the contents quoted in the 'Su Wen(素問)' and 'Ling Shu("靈樞")' scripture nearly occupy in the book. As a method of interpreting scripiure as scripture, he edited the order of 'Internal Classic("內經")' ascended from the ancient time and when he compensated for commentary, with exhaustive scholarly mind and by observing the natural phenomena practically and writing the pathology and the methods of treatment. We knew that the book is combined with the study of 'Doctrine on five elements motion and six kinds of natural factors(運氣學說)' (2) When we compare, analyze the similar phrase of 'The seven great chapters in The Yellow Emperor's Internal Classic Su Wen'("黃帝內經素問ㆍ七篇大論") through 'Wang Bing's Commentary(王氷 注本)', he tells abouts organized 'five elements(五行)' and 'heaven's regularly movement(天道運行)' rather than 'Emyangengsangdae-ron("陰陽應象大論")' in 'The seven great chapters("七篇大論")'. Also the 'Ohanunhangdae-ron("五運行大論")' because the repeated sentences with 'Emyangengsangdae-ron("陰陽應象大論")' is long they are omitted. And in the 'Youkmijidae-ron("六微旨大論")', 'Cheonjin ideology(天眞四象)' based on the 'Sanggocheonjin- ron("上古天眞論")', 'Sagijosindae-ron("四氣調神大論")' is written and in the 'Gigoupyondae-ron("氣交變大論")', the syndrome and symptom are explained in detail rather than 'Janggibeobsi-ron("藏氣法時論")', 'Okgijinjang-ron ("玉機眞藏論")' and in the 'Osangieongdae-ron("五常政大論")', the concept of 'five element(五行)' of the 'Gemgwejineon-ron("金櫃眞言論")' is expanded to 'the five elements' motion concept(五運槪念)' and in the 'Youkwonjeonggidae-ron("六元正紀大論")', explanations of 'The five elements' motion and six kinds of natural factors(運氣)' function are mentioned mainly and instead systematic pathology is not revealed rather than 'Emyangengsangdae-ron("陰陽應象大論")'. And in the 'Jijinyodae-ron("至眞要大論")', explanations of the change of atmosphere which correspond to treatment principle by 'The three Yin and Yang(三陰三陽)' as a progressed concepts are revealed. Therefore there are much similarity between the phrase of 'Emyangengsangdae-ron("陰陽應象大論")' and 'chapters of addition(補缺之篇)'. Generally, the doctrine which 'The seven great chapters("七篇大論")' are added by Wang Bing(王氷) is supported because there are more profound concepts rather than the other chapter in 'The seven great chapters("七篇大論")'. (3) When we study Wang Bing's(王氷) 'Pattern on five elements motion and six kinds of natural factors(運氣格局)' in 'The seven great chapter("七篇大論")', in the 'Cheonwongi-dae-ron("天元紀大論")', With 'Cheonjin ideology(天眞思想)' and the concepts of 'Owang(旺)'${\cdot}$'Sang(相)'${\cdot}$'Sa(死)'${\cdot}$'Su(囚)'${\cdot}$'Hu(休)' and 'Cheonbu(天符)'${\cdot}$'Sehwoi(歲會)' are measured time-spacially to the concept of 'Three Sum(三合)' the concept of 'Taeulcheonbu(太乙天符)' is explained. In the 'Ounhangdae-ron("五運行大論")', 'The calender Signs five Sum(天干五合)' is compared to the concepts of 'couples(夫婦)', 'weak-strong(柔强)' and in the 'Youkmijidae-ron("六微旨大論")', 'the relationship of obedience and disobedience(順逆關係)' which conform to the 'energy status(氣位)' change and 'monarch-minister(君相)' position is mentioned. In the 'Gikyobyeondae-ron("氣交變大論")', the concept of 'Sang-duk(相得)', 'Pyungsang(平常)' is emphasized but concrete measurement is mentioned. In the 'Osangieongdae-ron("五常政大論")', the detailed explanation with twenty three 'systemic of the five elements' motion(五運體系)' form and 'rountine-contrary treatment(正治. 反治)' with 'chill-fever-warm-cold(寒${\cdot}$${\cdot}$${\cdot}$凉)' are mentioned according to the 'analyse and differentiate pathological conditions in accordance with the eight principal syndromes(八綱辨證)'. In the 'Youkwonjeonggidae-ron("六元正紀大論")', Wang Bing of doesn't mention the concepts of 'Jungwun(中運)' that is seen in the original classic. In the new corrective edition, as the concepts of 'Jungwun, Dongcheonbu, Dongsehae and Taeulcheonbu(中運, 同天符, 同歲會, 太乙天符)' is appeared, Wang Bing seems to only use the concepts of 'Daewun, Juwun, and Gaekwun(大運, 主運, 客運)'. In the 'Jijinyodaeron("至眞要大論")', Wang Bing added detailed commentary to pathology and treatment doctrine by explaining the numerous appearances of 'Sebo, sufficiency, deficiency(歲步, 有餘, 不足)' and in the relation of 'victory-defeat(勝復)', he argued clearly that it is not mechanical estimation. (4) When we observe the Wang Bing's originality on the study of 'the theory of Doctrine on five elements' motion and six kinds of natural factors(運氣學說)', he emphasized 'The idea of Jeongindogi and Health preserving(全眞導氣${\cdot}$養生思想)' by adding 'Wang Bing's Commentary(王氷 注本)' of 'The seven great chapters("七篇大論")' and explained clearly 'The theory of Doctrine on five elements' motion and six kinds of natural factors(運氣學說)' and simpled and expanded the meaning of 'man, as a microcosm, is connected with the macrocosm(天人相應)' and with 'Atmosphere theory(大氣論)' also explained the meaning of 'rising and falling mechanism(升降氣機)'. In the sentence of 'By examining the pathology, take care of your health(審察病機 無失氣宜)'. he explained the meaning of pathology of 'heart-kidney-water-fire(心腎水火)' and suggested the doctrine and management of prescription. In the estimation and treatment, by suggesting 'asthenia and sthenia(虛實)' two method's estimation, 'contrary treatment(反治)' and treatment principals of 'falling heart fire tonifyng kidney water(降心火益腎水)', 'two class of chill and fever(寒熱二綱)' were demonstrated. There are 'inside and outside in the illness and so inner and outer in the treatment(病有中外 治有表囊)'. This sentence suggests concertedly. 'two class of superfies and interior(表囊二綱)' conforming to the position of disease. Therefore Wang Bing as an excellent theorist and introduced 'Cheoniin ideology(天眞思想)' as a clinician and realized the medical science. With these accomplishes mainly written in 'The theory of Doctrine on five elements' motion and six kinds of natural factors(運氣學說)' of 'The seven great chapters("七篇大論")', he interpreted the ancient medical scriptures and expanded the meaning of scriptures and conclusively contributed to the development of the study 'Korean Oriental Medicine(韓醫學)'.

  • PDF

Pareto Ratio and Inequality Level of Knowledge Sharing in Virtual Knowledge Collaboration: Analysis of Behaviors on Wikipedia (지식 공유의 파레토 비율 및 불평등 정도와 가상 지식 협업: 위키피디아 행위 데이터 분석)

  • Park, Hyun-Jung;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.19-43
    • /
    • 2014
  • The Pareto principle, also known as the 80-20 rule, states that roughly 80% of the effects come from 20% of the causes for many events including natural phenomena. It has been recognized as a golden rule in business with a wide application of such discovery like 20 percent of customers resulting in 80 percent of total sales. On the other hand, the Long Tail theory, pointing out that "the trivial many" produces more value than "the vital few," has gained popularity in recent times with a tremendous reduction of distribution and inventory costs through the development of ICT(Information and Communication Technology). This study started with a view to illuminating how these two primary business paradigms-Pareto principle and Long Tail theory-relates to the success of virtual knowledge collaboration. The importance of virtual knowledge collaboration is soaring in this era of globalization and virtualization transcending geographical and temporal constraints. Many previous studies on knowledge sharing have focused on the factors to affect knowledge sharing, seeking to boost individual knowledge sharing and resolve the social dilemma caused from the fact that rational individuals are likely to rather consume than contribute knowledge. Knowledge collaboration can be defined as the creation of knowledge by not only sharing knowledge, but also by transforming and integrating such knowledge. In this perspective of knowledge collaboration, the relative distribution of knowledge sharing among participants can count as much as the absolute amounts of individual knowledge sharing. In particular, whether the more contribution of the upper 20 percent of participants in knowledge sharing will enhance the efficiency of overall knowledge collaboration is an issue of interest. This study deals with the effect of this sort of knowledge sharing distribution on the efficiency of knowledge collaboration and is extended to reflect the work characteristics. All analyses were conducted based on actual data instead of self-reported questionnaire surveys. More specifically, we analyzed the collaborative behaviors of editors of 2,978 English Wikipedia featured articles, which are the best quality grade of articles in English Wikipedia. We adopted Pareto ratio, the ratio of the number of knowledge contribution of the upper 20 percent of participants to the total number of knowledge contribution made by the total participants of an article group, to examine the effect of Pareto principle. In addition, Gini coefficient, which represents the inequality of income among a group of people, was applied to reveal the effect of inequality of knowledge contribution. Hypotheses were set up based on the assumption that the higher ratio of knowledge contribution by more highly motivated participants will lead to the higher collaboration efficiency, but if the ratio gets too high, the collaboration efficiency will be exacerbated because overall informational diversity is threatened and knowledge contribution of less motivated participants is intimidated. Cox regression models were formulated for each of the focal variables-Pareto ratio and Gini coefficient-with seven control variables such as the number of editors involved in an article, the average time length between successive edits of an article, the number of sections a featured article has, etc. The dependent variable of the Cox models is the time spent from article initiation to promotion to the featured article level, indicating the efficiency of knowledge collaboration. To examine whether the effects of the focal variables vary depending on the characteristics of a group task, we classified 2,978 featured articles into two categories: Academic and Non-academic. Academic articles refer to at least one paper published at an SCI, SSCI, A&HCI, or SCIE journal. We assumed that academic articles are more complex, entail more information processing and problem solving, and thus require more skill variety and expertise. The analysis results indicate the followings; First, Pareto ratio and inequality of knowledge sharing relates in a curvilinear fashion to the collaboration efficiency in an online community, promoting it to an optimal point and undermining it thereafter. Second, the curvilinear effect of Pareto ratio and inequality of knowledge sharing on the collaboration efficiency is more sensitive with a more academic task in an online community.

Deep Learning-based Professional Image Interpretation Using Expertise Transplant (전문성 이식을 통한 딥러닝 기반 전문 이미지 해석 방법론)

  • Kim, Taejin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.79-104
    • /
    • 2020
  • Recently, as deep learning has attracted attention, the use of deep learning is being considered as a method for solving problems in various fields. In particular, deep learning is known to have excellent performance when applied to applying unstructured data such as text, sound and images, and many studies have proven its effectiveness. Owing to the remarkable development of text and image deep learning technology, interests in image captioning technology and its application is rapidly increasing. Image captioning is a technique that automatically generates relevant captions for a given image by handling both image comprehension and text generation simultaneously. In spite of the high entry barrier of image captioning that analysts should be able to process both image and text data, image captioning has established itself as one of the key fields in the A.I. research owing to its various applicability. In addition, many researches have been conducted to improve the performance of image captioning in various aspects. Recent researches attempt to create advanced captions that can not only describe an image accurately, but also convey the information contained in the image more sophisticatedly. Despite many recent efforts to improve the performance of image captioning, it is difficult to find any researches to interpret images from the perspective of domain experts in each field not from the perspective of the general public. Even for the same image, the part of interests may differ according to the professional field of the person who has encountered the image. Moreover, the way of interpreting and expressing the image also differs according to the level of expertise. The public tends to recognize the image from a holistic and general perspective, that is, from the perspective of identifying the image's constituent objects and their relationships. On the contrary, the domain experts tend to recognize the image by focusing on some specific elements necessary to interpret the given image based on their expertise. It implies that meaningful parts of an image are mutually different depending on viewers' perspective even for the same image. So, image captioning needs to implement this phenomenon. Therefore, in this study, we propose a method to generate captions specialized in each domain for the image by utilizing the expertise of experts in the corresponding domain. Specifically, after performing pre-training on a large amount of general data, the expertise in the field is transplanted through transfer-learning with a small amount of expertise data. However, simple adaption of transfer learning using expertise data may invoke another type of problems. Simultaneous learning with captions of various characteristics may invoke so-called 'inter-observation interference' problem, which make it difficult to perform pure learning of each characteristic point of view. For learning with vast amount of data, most of this interference is self-purified and has little impact on learning results. On the contrary, in the case of fine-tuning where learning is performed on a small amount of data, the impact of such interference on learning can be relatively large. To solve this problem, therefore, we propose a novel 'Character-Independent Transfer-learning' that performs transfer learning independently for each character. In order to confirm the feasibility of the proposed methodology, we performed experiments utilizing the results of pre-training on MSCOCO dataset which is comprised of 120,000 images and about 600,000 general captions. Additionally, according to the advice of an art therapist, about 300 pairs of 'image / expertise captions' were created, and the data was used for the experiments of expertise transplantation. As a result of the experiment, it was confirmed that the caption generated according to the proposed methodology generates captions from the perspective of implanted expertise whereas the caption generated through learning on general data contains a number of contents irrelevant to expertise interpretation. In this paper, we propose a novel approach of specialized image interpretation. To achieve this goal, we present a method to use transfer learning and generate captions specialized in the specific domain. In the future, by applying the proposed methodology to expertise transplant in various fields, we expected that many researches will be actively conducted to solve the problem of lack of expertise data and to improve performance of image captioning.

A Study on the Spatial Structure of Eupchi(邑治) and Landscape Architecture of Provincial Government Office(地方官衙) in the Late Joseon Dynasty through 'Sukchunjeahdo(宿踐諸衙圖)' - Focused on the Youngyuhyun Pyeongan Province and Sincheongun Hwanghae Province - (『숙천제아도(宿踐諸衙圖)』를 통해 본 조선시대 읍치(邑治)의 공간구조와 관아(官衙) 조경 - 평안도 영유현과 황해도 신천군을 중심으로 -)

  • Shin, Sang sup;Lee, Seung yoen
    • Korean Journal of Heritage: History & Science
    • /
    • v.49 no.2
    • /
    • pp.86-103
    • /
    • 2016
  • 'Sukchunjeahdo' illustration-book, which was left by Han, Pil-gyo(韓弼敎 : 1807~1878)in the late Joseon Dynasty, includes pictorial record paintings containing government offices, Eupchi, and Feng Shui condition drawn by Gyehwa(界畵) method Sabangjeondomyobeop(四方顚倒描法) and is the rare historical material that help to understand spatial structure and landscape characteristics. Youngyuhyun(永柔縣) and Sincheongun(信川郡) town, the case sites of this study, show Feng Shui foundation structure and placement rules of government offices in the Joseon Period are applied such as 3Dan 1Myo(三壇一廟 : Sajikdan, Yeodan, Seonghwangdan, Hyanggyo), 3Mun 3Jo(三門三朝 : Oeah, Dongheon, Naeah) and Jeonjohuchim(前朝後寢) etc. by setting the upper and lower hierarchy of the north south central axis. The circulation system is the pattern that roads are segmented around the marketplace of the entrance of the town and the structure is that heading to the north along the internal way leads to the government office and going out to the main street leads to the major city. Baesanimsu(背山臨水 : Mountain in backward and water in front) foundation, back hill pine forest, intentionally created low mountains and town forest etc. showed landscape aesthetics well suited for the environmental comfort condition such as microclimate control, natural disaster prevention, psychological stability reflecting color constancy principle etc. and tower pavilions were built throughout the scenic spot, reflecting life philosophy and thoughts of contemporaries such as physical and mental discipline, satisfied at the reality of poverty, returning to nature etc. For government office landscape, shielding and buffer planting, landscape planting etc. were considered around Gaeksa(客舍), Dongheon(東軒), Naeah(內衙) backyard and deciduous tree s and flowering trees were cultivated as main species and in case of Gaeksa, tiled pavilions and pavilions topped with poke weed in tetragonal pond were introduced to Dongheon and Naeah and separate pavilions were built for the purpose of physical and mental discipline and military training such as archery. Back hill pine tree forest formed back landscape and zelkova, pear trees, willow trees, old pine trees, lotus, flowering trees etc. were cultivated as gardening trees and Feng-Shui forest with willow trees as its main species was created for landscape and practical purposes. On the other hand, various cultural landscape elements etc. were introduced such as pavilions, pond serving as fire protection water(square and circle), stone pagoda and stone Buddha, fountains and wells, monument houses, flagpoles etc. In case of Sincheongun town forest(邑藪), Manhagwan(挽河觀), Moonmujeong(文武井), Sangjangdae(上場岱) and Hajangdae(下場岱) Market place, Josanshup<(造山藪 : Dongseojanglim(東西長林)>, Namcheon(南川) etc. were combined and community cultural park with the nature of modern urban park was operated. In this context, government office landscape shows the garden management aspect where square pond and pavilions, flowering trees are harmonized around side pavilion and backyard. Also, environmental design technique not biased to aesthetics and ideological moral philosophy and comprehensively considering functionality (shielding and fire prevention, microclimate control, etc.) and environmental soundness etc. is working.

Analysis of shopping website visit types and shopping pattern (쇼핑 웹사이트 탐색 유형과 방문 패턴 분석)

  • Choi, Kyungbin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.85-107
    • /
    • 2019
  • Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.