• Title/Summary/Keyword: Statistical classification

Search Result 1,419, Processing Time 0.036 seconds

Social Tagging-based Recommendation Platform for Patented Technology Transfer (특허의 기술이전 활성화를 위한 소셜 태깅기반 지적재산권 추천플랫폼)

  • Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.53-77
    • /
    • 2015
  • Korea has witnessed an increasing number of domestic patent applications, but a majority of them are not utilized to their maximum potential but end up becoming obsolete. According to the 2012 National Congress' Inspection of Administration, about 73% of patents possessed by universities and public-funded research institutions failed to lead to creating social values, but remain latent. One of the main problem of this issue is that patent creators such as individual researcher, university, or research institution lack abilities to commercialize their patents into viable businesses with those enterprises that are in need of them. Also, for enterprises side, it is hard to find the appropriate patents by searching keywords on all such occasions. This system proposes a patent recommendation system that can identify and recommend intellectual rights appropriate to users' interested fields among a rapidly accumulating number of patent assets in a more easy and efficient manner. The proposed system extracts core contents and technology sectors from the existing pool of patents, and combines it with secondary social knowledge, which derives from tags information created by users, in order to find the best patents recommended for users. That is to say, in an early stage where there is no accumulated tag information, the recommendation is done by utilizing content characteristics, which are identified through an analysis of key words contained in such parameters as 'Title of Invention' and 'Claim' among the various patent attributes. In order to do this, the suggested system extracts only nouns from patents and assigns a weight to each noun according to the importance of it in all patents by performing TF-IDF analysis. After that, it finds patents which have similar weights with preferred patents by a user. In this paper, this similarity is called a "Domain Similarity". Next, the suggested system extract technology sector's characteristics from patent document by analyzing the international technology classification code (International Patent Classification, IPC). Every patents have more than one IPC, and each user can attach more than one tag to the patents they like. Thus, each user has a set of IPC codes included in tagged patents. The suggested system manages this IPC set to analyze technology preference of each user and find the well-fitted patents for them. In order to do this, the suggeted system calcuates a 'Technology_Similarity' between a set of IPC codes and IPC codes contained in all other patents. After that, when the tag information of multiple users are accumulated, the system expands the recommendations in consideration of other users' social tag information relating to the patent that is tagged by a concerned user. The similarity between tag information of perferred 'patents by user and other patents are called a 'Social Simialrity' in this paper. Lastly, a 'Total Similarity' are calculated by adding these three differenent similarites and patents having the highest 'Total Similarity' are recommended to each user. The suggested system are applied to a total of 1,638 korean patents obtained from the Korea Industrial Property Rights Information Service (KIPRIS) run by the Korea Intellectual Property Office. However, since this original dataset does not include tag information, we create virtual tag information and utilized this to construct the semi-virtual dataset. The proposed recommendation algorithm was implemented with JAVA, a computer programming language, and a prototype graphic user interface was also designed for this study. As the proposed system did not have dependent variables and uses virtual data, it is impossible to verify the recommendation system with a statistical method. Therefore, the study uses a scenario test method to verify the operational feasibility and recommendation effectiveness of the system. The results of this study are expected to improve the possibility of matching promising patents with the best suitable businesses. It is assumed that users' experiential knowledge can be accumulated, managed, and utilized in the As-Is patent system, which currently only manages standardized patent information.

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

  • Choi, Nam woong;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.43-62
    • /
    • 2019
  • At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.

Development on Early Warning System about Technology Leakage of Small and Medium Enterprises (중소기업 기술 유출에 대한 조기경보시스템 개발에 대한 연구)

  • Seo, Bong-Goon;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.143-159
    • /
    • 2017
  • Due to the rapid development of IT in recent years, not only personal information but also the key technologies and information leakage that companies have are becoming important issues. For the enterprise, the core technology that the company possesses is a very important part for the survival of the enterprise and for the continuous competitive advantage. Recently, there have been many cases of technical infringement. Technology leaks not only cause tremendous financial losses such as falling stock prices for companies, but they also have a negative impact on corporate reputation and delays in corporate development. In the case of SMEs, where core technology is an important part of the enterprise, compared to large corporations, the preparation for technological leakage can be seen as an indispensable factor in the existence of the enterprise. As the necessity and importance of Information Security Management (ISM) is emerging, it is necessary to check and prepare for the threat of technology infringement early in the enterprise. Nevertheless, previous studies have shown that the majority of policy alternatives are represented by about 90%. As a research method, literature analysis accounted for 76% and empirical and statistical analysis accounted for a relatively low rate of 16%. For this reason, it is necessary to study the management model and prediction model to prevent leakage of technology to meet the characteristics of SMEs. In this study, before analyzing the empirical analysis, we divided the technical characteristics from the technology value perspective and the organizational factor from the technology control point based on many previous researches related to the factors affecting the technology leakage. A total of 12 related variables were selected for the two factors, and the analysis was performed with these variables. In this study, we use three - year data of "Small and Medium Enterprise Technical Statistics Survey" conducted by the Small and Medium Business Administration. Analysis data includes 30 industries based on KSIC-based 2-digit classification, and the number of companies affected by technology leakage is 415 over 3 years. Through this data, we conducted a randomized sampling in the same industry based on the KSIC in the same year, and compared with the companies (n = 415) and the unaffected firms (n = 415) 1:1 Corresponding samples were prepared and analyzed. In this research, we will conduct an empirical analysis to search for factors influencing technology leakage, and propose an early warning system through data mining. Specifically, in this study, based on the questionnaire survey of SMEs conducted by the Small and Medium Business Administration (SME), we classified the factors that affect the technology leakage of SMEs into two factors(Technology Characteristics, Organization Characteristics). And we propose a model that informs the possibility of technical infringement by using Support Vector Machine(SVM) which is one of the various techniques of data mining based on the proven factors through statistical analysis. Unlike previous studies, this study focused on the cases of various industries in many years, and it can be pointed out that the artificial intelligence model was developed through this study. In addition, since the factors are derived empirically according to the actual leakage of SME technology leakage, it will be possible to suggest to policy makers which companies should be managed from the viewpoint of technology protection. Finally, it is expected that the early warning model on the possibility of technology leakage proposed in this study will provide an opportunity to prevent technology Leakage from the viewpoint of enterprise and government in advance.

Determination of shear wave velocity profiles in soil deposit from seismic piezo-cone penetration test (탄성파 피에조콘 관입 시험을 통한 국내 퇴적 지반의 전단파 속도 결정)

  • Sun Chung Guk;Jung Gyungja;Jung Jong Hong;Kim Hong-Jong;Cho Sung-Min
    • 한국지구물리탐사학회:학술대회논문집
    • /
    • 2005.09a
    • /
    • pp.125-153
    • /
    • 2005
  • It has been widely known that the seismic piezo-cone penetration test (SCPTU) is one of the most useful techniques for investigating the geotechnical characteristics including dynamic soil properties. As the practical applications in Korea, SCPTU was carried out at two sites in Busan and four sites in Incheon, which are mainly composed of alluvial or marine soil deposits. From the SCPTU waveform data obtained from the testing sites, the first arrival times of shear waves were and the corresponding time differences with depth were determined using the cross-over method, and the shear wave velocity profiles (VS) were derived based on the refracted ray path method based on Snell's law and similar to the trend of cone tip resistance (qt) profiles. In Incheon area, the testing depths of SCPTU were deeper than those of conventional down-hole seismic tests. Moreover, for the application of the conventional CPTU to earthquake engineering practices, the correlations between VS and CPTU data were deduced based on the SCPTU results. For the empirical evaluation of VS for all soils together with clays and sands which are classified unambiguously in this study by the soil behavior type classification Index (IC), the authors suggested the VS-CPTU data correlations expressed as a function of four parameters, qt, fs, $\sigma$, v0 and Bq, determined by multiple statistical regression modeling. Despite the incompatible strain levels of the down-hole seismic test during SCPTU and the conventional CPTU, it is shown that the VS-CPTU data correlations for all soils clays and sands suggested in this study is applicable to the preliminary estimation of VS for the Korean deposits and is more reliable than the previous correlations proposed by other researchers.

  • PDF

Characteristics of Groundwater Contamination Caused by Seawater Intrusion and Agricultural Activity in Sacheon and Hadong Areas, Republic of Korea (해수침투와 농업활동에 의한 사천-하동 해안지역 지하수의 오염 특성)

  • Kim, Hyun-Ji;Hamm, Se-Yeong;Kim, Nam-Hoon;Cheong, Jae-Yeol;Lee, Jeong-Hwan;Jang, Sung
    • Economic and Environmental Geology
    • /
    • v.42 no.6
    • /
    • pp.575-589
    • /
    • 2009
  • Groundwater has been extracted for irrigation in Sacheon-Hadong area, which is close to the South Sea. We analyzed chemical components of groundwater to examine the effects of seawater intrusion and agricultural activities in the study area. Most groundwater samples displayed the Na/Cl concentration ratios similar to that of seawater (0.55) with an increasing tendency of electrical conductivity ($227-7,910\;{\mu}S/cm$) towards the coast. In addition, statistical interpretation of the cumulative frequency curves of Cl and $HCO_3$ showed that 30.1% of the groundwater samples were highly affected by seawater intrusion. Groundwaters in the study area mostly belonged to the Ca-Cl and Na-Cl type, demonstrating that they were highly influenced by seawater intrusion and cation exchange. The result of oxygen-hydrogen isotope analysis demonstrated slightly higher $\delta^{18}O$ ((-8.53)-(-6.13)‰) and ${\delta}D$ ((-58.7)-(-43.7)‰) comparing to mean oxygen-hydrogen isotope ratios in Korea. As a result of nitrogen isotope analysis, the $\delta^{15}N-NO_3$ values ((-0.5)-(19.1)‰) indicate two major sources of nitrate pollution (organic nitrogen in soil and animal and human wastes) and mixed source of the two. However, denitrification may partly contribute as a source of nitrogen. According to factor analysis, four factors were identified among which factor 1 with an eigenvalue of 6.21 reflected the influence of seawater intrusion. Cluster analysis indicated the classification of groundwater into fresh, saline, and mixed ones.

Comparison of Clinical Progress between Single- and Multiple-dose Surfactant Treatment in Neonatal Respiratory Distress Syndrome (신생아 호흡곤란증후군에서 폐 표면활성제 단일 투여군과 재투여군의 임상경과 비교)

  • Kil, Chang Hee;Jeon, Ho Sang;Bae, Chong Woo
    • Clinical and Experimental Pediatrics
    • /
    • v.48 no.10
    • /
    • pp.1090-1095
    • /
    • 2005
  • Purpose : In the case of serious respiratory distress syndrome(RDS) or relapse of clinical appearances after single treatment, we obtained more effective results with multiple-dose surfactant replacement therapy. We carried out this investigation for comparing and observing clinical progress between single-dose(group S) and multiple-dose(group M) pulmonary surfactant treatment group of neonatal RDS. Methods : We investigated 48 neonates who were diagnosed as RDS and treated with pulmonary surfactant(PS) replacement therapy in NICU of Kyunghee University hospital from January 2002 to March 2004, then we compared and verified clinical progress of 32 neonates in group S with that of 16 neonates in group M. Results : There were no significant statistical differences in average birth weights, average gestational periods, initial pH values of birth, whether operation of resuscitation at that time of birth was made or not, whether prenatal steroid prescription for mother, RDS classification standardized by Bomsel, and ventilation index(VI) before instillation of PS of two groups. However, there was significant statistical difference in a/A $PO_2$(P<0.05). We could observe changes of VI and a/A $PO_2$ within 72 hours have been continuously improved at group S rather than group M. In spite of relapses, group M changed for the better after second dose. There were also no significant differences between the two groups in duration of ventilator therapy, mortality within 28 days after birth, intraventricular hemorrhage by complication, retinopathy of premature, necrotizing enterocolitis, chronic lung diseases, sepsis, and DIC. Conclusion : In these relapse cases, as there were no significant differences in the mortality rate and the occurence of complication between group S and group M, the requirement of multiple-dose PS replacement therapy which brought improvement of prognosis was emphasized.

Treatment and Survial Rate of Malignant Peripheral Nerve Sheath Tumors (악성 말초신경막 종양의 치료와 생존율)

  • Lee, Jong-Seok;Jeon, Dae-Geun;Cho, Wan-Hyung;Lee, Soo-Yong;Oh, Jung-Moon;Kim, Jin-Wook
    • The Journal of the Korean bone and joint tumor society
    • /
    • v.9 no.2
    • /
    • pp.131-138
    • /
    • 2003
  • Purpose: We analyzed our malignant peripheral nerve sheath tumor (MPNST) cases to find out their oncologic results following by each treatment modalities. Materials and Methods: Thirty four patients with MPNST were registered in Korea Cancer Center Hospital from Feb. 1986 to Nov. 1996. Seventeen cases were male and 17, female. Average age was 41 years (range 18 to 74). Location of the tumor was as follows; 17 in lower extremity, 11 upper extremity, 4 trunk, and 2 retroperitoneum. Following the AJC classification, stage IA were 2 cases, stage IIA 2, stage IIB 6, stage III 16 and stage IV 8. Twenty six patients took operations and adjuvant chemotherapy and/or radiation therapy, 3 operation only and 3 adjuvant chemotherapy or radiation therapy. Average follow up period was 33.5 months (5.6 to 146.1). Kaplan-Meiyer method was done for survival curve, and log rank test for comparison analysis. Results: Fourteen cases were continuous disease free, 2 no evidence of disease, 2 alive with disease and 14 dead of disease states at final follow up. Actual 5-year and 10-year survival rates were 53.5%, 35.7%. Local recurrence rate after operation was 24.1%. 5-year survival rates of stage I/II/III were 100/85.7/55.9% and 2-year survival rate of stage IV was 14.3% (p=0.04). In 21 cases operated with stage II-III, wide margin (15cases) had 76.0% 5-year survival rate, and marginal or intralesional marigin (6cases) had 40.0%. The actual 5-year survival rate of the group which were done 4 or more cycles chemotherapy (8cases) was 71.4% and the actual 3-year survival rate less than 4cycles chemotherapy (6cases) was 83.3% (p=0.96). In 19 cases operated with stage II-III and which had no radiotherapy, marginal or intralesional margin (5cases) had 3 cases of local recurrences (60.0%), though wide margin (14cases) had 4 cases recurrences (28.6%). There was no local recurrence in 8cases which had pre-or post-operative radiotherapy. Conclusions: Surgical margin is an important factor in local recurrence. Resection margin has a tendency to influence the survival despite insufficient statistical significance. Conventional chemotherapy has no defnite statistical sigficance in the effect on local control and survival. Preoperative and postoperative radiotherapy has some positive effect on local control.

  • PDF

Statistical Analysis of Water Flow and Water Quality Data in the Imjin River Basin for Total Pollutant Load Management (임진강 유역 오염물질 총량관리를 위한 유량-수질 자료의 통계분석)

  • Cho, Yong-Chul;Choi, Hyeon-Mi;Lee, Young Joon;Ryu, Ingu;Lee, Myung-Gu;Gu, Donghoi;Choi, Kyungwan;Yu, Soonju
    • Journal of Environmental Impact Assessment
    • /
    • v.27 no.4
    • /
    • pp.353-366
    • /
    • 2018
  • The purpose of this study was assessment the quality of water by using the statistical analysis technique of the Water flow and water quality from January 2012 to December 2016 at the unit basin for total pollutant load management system (TPLMS) in the Imjin River. Water flow and water quality were monitored at an average of 8 day intervals, 11 parameters were used for correlation analysis, principal component analysis (PCA), factor analysis (FA), and cluster analysis (CA). The Hierarchical CA was classified into three according to the change of space, such as natural rivers, urban rivers, point with large influence of point pollution source, it was found that the type of contamination source the similarity of water quality affected the classification of cluster. Using one-way analysis of variance (ANOVA) and post-hoc Analysis, there were statistically significant differences between mean values among the clusters. Correlation analysis showed the correlation coefficient between $COD_{Mn}$ and TOC was 0.951 (p<0.01) and the correlation was statistically significantly higher. According to the result PCA and FA, 3 principal components can explaining 72% of the total variations in water quality characteristics and main factor was EC, $BOD_5$, $COD_{Mn}$, TN, TP and TOC indirect indicators of organic matter and nutrients were influenced. This study presented the regression equation obtained by applying the factor scores to the multiple linear regression analysis and concluded that the management Indirect indicators of organic matter and nutrients is important for water quality management in the Imjin River basin.

Development of Volume Growth Rate Model for Major Quercus Species in Korea (우리나라 주요 참나무류 수종의 재적생장률 추정 모델의 개발)

  • Shin, Man Yong;Kim, Sung Ho;Jeong, Jin-Hyun;Kim, Chong Chan;Jeon, Eo Jin
    • Journal of Korean Society of Forest Science
    • /
    • v.97 no.6
    • /
    • pp.627-633
    • /
    • 2008
  • This study was conducted to estimate volume growth rates for major Quercus species distributed in Korea, and based on the data collected from the 5th National Forest Inventory. Volume growth rates were estimated by each age class for each species, and their similarity or distinction was statistically analyzed. It was also intended to compare the resulted volume growth rates with the existing growth rates, and to develope a volume growth rate estimation model for the Quercus species. Six major Quercus species were considered in this study; Quercus acutissima, Quercus aliena, Quercus serrata, Quercus variabilis, Quercus dentata, and Quercus mongolica. Based on the data collected from the 5th National Forest Inventory, the diameter growth rates and the height growth rates were estimated for each species, and then the volume growth rates were estimated with the given diameter and height growth rates. To examine the distinction between species or age classes, statistical analyses such as ANOVA and Duncan's multiple range test were applied. The results indicated that the volume growth rate was 10% in the age class II, 6% in the age class III, and lower in the subsequent classes. In addition, the volume growth rates of Quercus acutissima, Quercus aliena, and Quercus serrata were relatively high compared to those of Quercus variabilis, Quercus dentata, and Quercus mongolica. According to their growth rates, the six Quercus species were classified into two groups; high-growth-rate group and low-growth-rate group. Statistical analysis conducted to examine the difference between and within the groups showed that there is no significant difference within groups, while significant between groups. Based on the results, volume growth rate estimation model were finally developed for each group. The classification of the Quercus species suggested in this study was not the same with that of existing volume growth estimation. Thus, it is necessary to improve the existing volume growth rate or its estimation system.

Synthetic Application of Seismic Piezo-cone Penetration Test for Evaluating Shear Wave Velocity in Korean Soil Deposits (국내 퇴적 지반의 전단파 속도 평가를 위한 탄성파 피에조콘 관입 시험의 종합적 활용)

  • Sun, Chang-Guk;Kim, Hong-Jong;Jung, Jong-Hong;Jung, Gyung-Ja
    • Geophysics and Geophysical Exploration
    • /
    • v.9 no.3
    • /
    • pp.207-224
    • /
    • 2006
  • It has been widely known that the seismic piezo-cone penetration test (SCPTu) is one of the most useful techniques for investigating the geotechnical characteristics such as static and dynamic soil properties. As practical applications in Korea, SCPTu was carried out at two sites in Busan and four sites in Incheon, which are mainly composed of alluvial or marine soil deposits. From the SCPTu waveform data obtained from the testing sites, the first arrival times of shear waves and the corresponding time differences with depth were determined using the cross-over method, and the shear wave velocity $(V_S)$ profiles with depth were derived based on the refracted ray path method based on Snell's law. Comparing the determined $V_S$ profile with the cone tip resistance $(q_t)$ profile, both trends of profiles with depth were similar. For the application of the conventional CPTu to earthquake engineering practices, the correlations between $V_S$ and CPTu data were deduced based on the SCPTu results. For the empirical evaluation of $V_S$ for all soils together with clays and sands which are classified unambiguously in this study by the soil behavior type classification index $(I_C)$, the authors suggested the $V_S-CPTu$ data correlations expressed as a function of four parameters, $q_t,\;f_s,\;\sigma'_{v0}$ and $B_q$, determined by multiple statistical regression modeling. Despite the incompatible strain levels of the downhole seismic test during SCPTu and the conventional CPTu, it is shown that the $V_S-CPTu$ data correlations for all soils, clays and sands suggested in this study is applicable to the preliminary estimation of $V_S$ for the soil deposits at a part in Korea and is more reliable than the previous correlations proposed by other researchers.