• Title/Summary/Keyword: 자동화 실험

Search Result 945, Processing Time 0.028 seconds

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Export Control System based on Case Based Reasoning: Design and Evaluation (사례 기반 지능형 수출통제 시스템 : 설계와 평가)

  • Hong, Woneui;Kim, Uihyun;Cho, Sinhee;Kim, Sansung;Yi, Mun Yong;Shin, Donghoon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.109-131
    • /
    • 2014
  • As the demand of nuclear power plant equipment is continuously growing worldwide, the importance of handling nuclear strategic materials is also increasing. While the number of cases submitted for the exports of nuclear-power commodity and technology is dramatically increasing, preadjudication (or prescreening to be simple) of strategic materials has been done so far by experts of a long-time experience and extensive field knowledge. However, there is severe shortage of experts in this domain, not to mention that it takes a long time to develop an expert. Because human experts must manually evaluate all the documents submitted for export permission, the current practice of nuclear material export is neither time-efficient nor cost-effective. Toward alleviating the problem of relying on costly human experts only, our research proposes a new system designed to help field experts make their decisions more effectively and efficiently. The proposed system is built upon case-based reasoning, which in essence extracts key features from the existing cases, compares the features with the features of a new case, and derives a solution for the new case by referencing similar cases and their solutions. Our research proposes a framework of case-based reasoning system, designs a case-based reasoning system for the control of nuclear material exports, and evaluates the performance of alternative keyword extraction methods (full automatic, full manual, and semi-automatic). A keyword extraction method is an essential component of the case-based reasoning system as it is used to extract key features of the cases. The full automatic method was conducted using TF-IDF, which is a widely used de facto standard method for representative keyword extraction in text mining. TF (Term Frequency) is based on the frequency count of the term within a document, showing how important the term is within a document while IDF (Inverted Document Frequency) is based on the infrequency of the term within a document set, showing how uniquely the term represents the document. The results show that the semi-automatic approach, which is based on the collaboration of machine and human, is the most effective solution regardless of whether the human is a field expert or a student who majors in nuclear engineering. Moreover, we propose a new approach of computing nuclear document similarity along with a new framework of document analysis. The proposed algorithm of nuclear document similarity considers both document-to-document similarity (${\alpha}$) and document-to-nuclear system similarity (${\beta}$), in order to derive the final score (${\gamma}$) for the decision of whether the presented case is of strategic material or not. The final score (${\gamma}$) represents a document similarity between the past cases and the new case. The score is induced by not only exploiting conventional TF-IDF, but utilizing a nuclear system similarity score, which takes the context of nuclear system domain into account. Finally, the system retrieves top-3 documents stored in the case base that are considered as the most similar cases with regard to the new case, and provides them with the degree of credibility. With this final score and the credibility score, it becomes easier for a user to see which documents in the case base are more worthy of looking up so that the user can make a proper decision with relatively lower cost. The evaluation of the system has been conducted by developing a prototype and testing with field data. The system workflows and outcomes have been verified by the field experts. This research is expected to contribute the growth of knowledge service industry by proposing a new system that can effectively reduce the burden of relying on costly human experts for the export control of nuclear materials and that can be considered as a meaningful example of knowledge service application.

A Study on the Establishment of Comparison System between the Statement of Military Reports and Related Laws (군(軍) 보고서 등장 문장과 관련 법령 간 비교 시스템 구축 방안 연구)

  • Jung, Jiin;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.109-125
    • /
    • 2020
  • The Ministry of National Defense is pushing for the Defense Acquisition Program to build strong defense capabilities, and it spends more than 10 trillion won annually on defense improvement. As the Defense Acquisition Program is directly related to the security of the nation as well as the lives and property of the people, it must be carried out very transparently and efficiently by experts. However, the excessive diversification of laws and regulations related to the Defense Acquisition Program has made it challenging for many working-level officials to carry out the Defense Acquisition Program smoothly. It is even known that many people realize that there are related regulations that they were unaware of until they push ahead with their work. In addition, the statutory statements related to the Defense Acquisition Program have the tendency to cause serious issues even if only a single expression is wrong within the sentence. Despite this, efforts to establish a sentence comparison system to correct this issue in real time have been minimal. Therefore, this paper tries to propose a "Comparison System between the Statement of Military Reports and Related Laws" implementation plan that uses the Siamese Network-based artificial neural network, a model in the field of natural language processing (NLP), to observe the similarity between sentences that are likely to appear in the Defense Acquisition Program related documents and those from related statutory provisions to determine and classify the risk of illegality and to make users aware of the consequences. Various artificial neural network models (Bi-LSTM, Self-Attention, D_Bi-LSTM) were studied using 3,442 pairs of "Original Sentence"(described in actual statutes) and "Edited Sentence"(edited sentences derived from "Original Sentence"). Among many Defense Acquisition Program related statutes, DEFENSE ACQUISITION PROGRAM ACT, ENFORCEMENT RULE OF THE DEFENSE ACQUISITION PROGRAM ACT, and ENFORCEMENT DECREE OF THE DEFENSE ACQUISITION PROGRAM ACT were selected. Furthermore, "Original Sentence" has the 83 provisions that actually appear in the Act. "Original Sentence" has the main 83 clauses most accessible to working-level officials in their work. "Edited Sentence" is comprised of 30 to 50 similar sentences that are likely to appear modified in the county report for each clause("Original Sentence"). During the creation of the edited sentences, the original sentences were modified using 12 certain rules, and these sentences were produced in proportion to the number of such rules, as it was the case for the original sentences. After conducting 1 : 1 sentence similarity performance evaluation experiments, it was possible to classify each "Edited Sentence" as legal or illegal with considerable accuracy. In addition, the "Edited Sentence" dataset used to train the neural network models contains a variety of actual statutory statements("Original Sentence"), which are characterized by the 12 rules. On the other hand, the models are not able to effectively classify other sentences, which appear in actual military reports, when only the "Original Sentence" and "Edited Sentence" dataset have been fed to them. The dataset is not ample enough for the model to recognize other incoming new sentences. Hence, the performance of the model was reassessed by writing an additional 120 new sentences that have better resemblance to those in the actual military report and still have association with the original sentences. Thereafter, we were able to check that the models' performances surpassed a certain level even when they were trained merely with "Original Sentence" and "Edited Sentence" data. If sufficient model learning is achieved through the improvement and expansion of the full set of learning data with the addition of the actual report appearance sentences, the models will be able to better classify other sentences coming from military reports as legal or illegal. Based on the experimental results, this study confirms the possibility and value of building "Real-Time Automated Comparison System Between Military Documents and Related Laws". The research conducted in this experiment can verify which specific clause, of several that appear in related law clause is most similar to the sentence that appears in the Defense Acquisition Program-related military reports. This helps determine whether the contents in the military report sentences are at the risk of illegality when they are compared with those in the law clauses.

Risk Factor Analysis for Operative Death and Brain Injury after Surgery of Stanford Type A Aortic Dissection (스탠포드 A형 대동맥 박리증 수술 후 수술 사망과 뇌손상의 위험인자 분석)

  • Kim Jae-Hyun;Oh Sam-Sae;Lee Chang-Ha;Baek Man-Jong;Hwang Seong-Wook;Lee Cheul;Lim Hong-Gook;Na Chan-Young
    • Journal of Chest Surgery
    • /
    • v.39 no.4 s.261
    • /
    • pp.289-297
    • /
    • 2006
  • Background: Surgery for Stanford type A aortic dissection shows a high operative mortality rate and frequent postoperative brain injury. This study was designed to find out the risk factors leading to operative mortality and brain injury after surgical repair in patients with type A aortic dissection. Material and Method: One hundred and eleven patients with type A aortic dissection who underwent surgical repair between February, 1995 and January 2005 were reviewed retrospectively. There were 99 acute dissections and 12 chronic dissections. Univariate and multivariate analysis were performed to identify risk factors of operative mortality and brain injury. Resuit: Hospital mortality occurred in 6 patients (5.4%). Permanent neurologic deficit occurred in 8 patients (7.2%) and transient neurologic deficit in 4 (3.6%). Overall 1, 5, 7 year survival rate was 94.4, 86.3, and 81.5%, respectively. Univariate analysis revealed 4 risk factors to be statistically significant as predictors of mortality: previous chronic type III dissection, emergency operation, intimal tear in aortic arch, and deep hypothemic circulatory arrest (DHCA) for more than 45 minutes. Multivariate analysis revealed previous chronic type III aortic dissection (odds ratio (OR) 52.2), and DHCA for more than 45 minutes (OR 12.0) as risk factors of operative mortality. Pathological obesity (OR 12.9) and total arch replacement (OR 8.5) were statistically significant risk factors of brain injury in multivariate analysis. Conclusion: The result of surgical repair for Stanford type A aortic dissection was good when we took into account the mortality rate, the incidence of neurologic injury, and the long-term survival rate. Surgery of type A aortic dissection in patients with a history of chronic type III dissection may increase the risk of operative mortality. Special care should be taken and efforts to reduce the hypothermic circulatory arrest time should alway: be kept in mind. Surgeons who are planning to operate on patients with pathological obesity, or total arch replacement should be seriously consider for there is a higher risk of brain injury.

Facile [11C]PIB Synthesis Using an On-cartridge Methylation and Purification Showed Higher Specific Activity than Conventional Method Using Loop and High Performance Liquid Chromatography Purification (Loop와 HPLC Purification 방법보다 더 높은 비방사능을 보여주는 카트리지 Methylation과 Purification을 이용한 손쉬운 [ 11C]PIB 합성)

  • Lee, Yong-Seok;Cho, Yong-Hyun;Lee, Hong-Jae;Lee, Yun-Sang;Jeong, Jae Min
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.22 no.2
    • /
    • pp.67-73
    • /
    • 2018
  • $[^{11}C]PIB$ synthesis has been performed by a loop-methylation and HPLC purification in our lab. However, this method is time-consuming and requires complicated systems. Thus, we developed an on-cartridge method which simplified the synthetic procedure and reduced time greatly by removing HPLC purification step. We compared 6 different cartridges and evaluated the $[^{11}C]PIB$ production yields and specific activities. $[^{11}C]MeOTf$ was synthesized by using TRACERlab FXC Pro and was transferred into the cartridge by blowing with helium gas for 3 min. To remove byproducts and impurities, cartridges were washed out by 20 mL of 30% EtOH in 0.5 M $NaH_2PO_4$ solution (pH 5.1) and 10 mL of distilled water. And then, $[^{11}C]PIB$ was eluted by 5 mL of 30% EtOH in 0.5 M $NaH_2PO_4$ into the collecting vial containing 10 mL saline. Among the 6 cartridges, only tC18 environmental cartridge could remove impurities and byproducts from $[^{11}C]PIB$ completely and showed higher specific activity than traditional HPLC purification method. This method took only 8 ~ 9 min from methylation to formulation. For the tC18 environmental cartridge and conventional HPLC loop methods, the radiochemical yields were $12.3{\pm}2.2%$ and $13.9{\pm}4.4%$, respectively, and the molar activities were $420.6{\pm}20.4GBq/{\mu}mol$ (n=3) and $78.7{\pm}39.7GBq/{\mu}mol$ (n=41), respectively. We successfully developed a facile on-cartridge methylation method for $[^{11}C]PIB$ synthesis which enabled the procedure more simple and rapid, and showed higher molar radio-activity than HPLC purification method.