• Title/Summary/Keyword: mining system

Search Result 1,851, Processing Time 0.029 seconds

Analysis of Research Trends in SIAM Journal on Applied Mathematics Using Topic Modeling (토픽모델링을 활용한 SIAM Journal on Applied Mathematics의 연구 동향 분석)

  • Kim, Sung-Yeun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.7
    • /
    • pp.607-615
    • /
    • 2020
  • The purpose of this study was to analyze the research status and trends related to the industrial mathematics based on text mining techniques with a sample of 4910 papers collected in the SIAM Journal on Applied Mathematics from 1970 to 2019. The R program was used to collect titles, abstracts, and key words from the papers and to analyze topic modeling techniques based on LDA algorithm. As a result of the coherence score on the collected papers, 20 topics were determined optimally using the Gibbs sampling methods. The main results were as follows. First, studies on industrial mathematics were conducted in a variety of mathematics fields, including computational mathematics, geometry, mathematical modeling, topology, discrete mathematics, probability and statistics, with a focus on analysis and algebra. Second, 5 hot topics (mathematical biology, nonlinear partial differential equation, discrete mathematics, statistics, topology) and 1 cold topic (probability theory) were found based on time series regression analysis. Third, among the fields that were not reflected in the 2015 revised mathematics curriculum, numeral system, matrix, vector in space, and complex numbers were extracted as the contents to be covered in the high school mathematical curriculum. Finally, this study suggested strategies to activate industrial mathematics in Korea, described the study limitations, and proposed directions for future research.

Online Document Mining Approach to Predicting Crowdfunding Success (온라인 문서 마이닝 접근법을 활용한 크라우드펀딩의 성공여부 예측 방법)

  • Nam, Suhyeon;Jin, Yoonsun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.45-66
    • /
    • 2018
  • Crowdfunding has become more popular than angel funding for fundraising by venture companies. Identification of success factors may be useful for fundraisers and investors to make decisions related to crowdfunding projects and predict a priori whether they will be successful or not. Recent studies have suggested several numeric factors, such as project goals and the number of associated SNS, studying how these affect the success of crowdfunding campaigns. However, prediction of the success of crowdfunding campaigns via non-numeric and unstructured data is not yet possible, especially through analysis of structural characteristics of documents introducing projects in need of funding. Analysis of these documents is promising because they are open and inexpensive to obtain. We propose a novel method to predict the success of a crowdfunding project based on the introductory text. To test the performance of the proposed method, in our study, texts related to 1,980 actual crowdfunding projects were collected and empirically analyzed. From the text data set, the following details about the projects were collected: category, number of replies, funding goal, fundraising method, reward, number of SNS followers, number of images and videos, and miscellaneous numeric data. These factors were identified as significant input features to be used in classification algorithms. The results suggest that the proposed method outperforms other recently proposed, non-text-based methods in terms of accuracy, F-score, and elapsed time.

Inclusive Impact Index "Triple I" for Assessing Ocean Utilization Technologies (해양이용기술 평가를 위한 포괄적 영향지수 "트리플 I")

  • Otsuka, Koji
    • Journal of the Korean Society for Marine Environment & Energy
    • /
    • v.15 no.2
    • /
    • pp.118-125
    • /
    • 2012
  • World population has increased rapidly following the industrial revolution, reaching 7 billion in 2012. Several forecasts estimate that this number will rise to about 8 billion in 2025. Improvements of living standards in developing nations have also raised resource and energy demands worldwide. In consequences, human beings have faced many global and urgent problems, such as global warming, water and food shortages, resource and energy crises, and so on. Many ocean utilization technologies for avoiding or reducing such big problems have been developed, for examples $CO_2$ ocean sequestration, seawater desalination, artificial upwelling, deepwater mining, and ocean energies. It is important, however, to assess such technologies from the viewpoints of sustainability and public acceptancy, since the aims of those technologies are to develop sustainable social systems rather than conventional ones based on fossil resources. Inclusive Marine Pressure Assessment and Classification Technology Research Committee (generally called IMPACT Research Committee) of Japan Society of Naval Architects and Ocean Engineers, has proposed Inclusive Impact Index "Triple I" as an indicator, which can predict both environmental sustainability and economical feasibility, in order to assess the ocean utilization technologies from the viewpoints of sustainability and public acceptancy. This index was considered by combining Ecological Footprint and Environmental Risk Assessment. The Ecological Footprint and the Environmental Risk Assessment are introduced in the first part of this paper. Then the concept and the structure of the Triple I are explained in the second part of this paper. Finally, the economy-ecology conversion factor in Triple I accounting is considered.

Development of Machine Learning-Based Platform for Distillation Column (증류탑을 위한 머신러닝 기반 플랫폼 개발)

  • Oh, Kwang Cheol;Kwon, Hyukwon;Roh, Jiwon;Choi, Yeongryeol;Park, Hyundo;Cho, Hyungtae;Kim, Junghwan
    • Korean Chemical Engineering Research
    • /
    • v.58 no.4
    • /
    • pp.565-572
    • /
    • 2020
  • This study developed a software platform using machine learning of artificial intelligence to optimize the distillation column system. The distillation column is representative and core process in the petrochemical industry. Process stabilization is difficult due to various operating conditions and continuous process characteristics, and differences in process efficiency occur depending on operator skill. The process control based on the theoretical simulation was used to overcome this problem, but it has a limitation which it can't apply to complex processes and real-time systems. This study aims to develop an empirical simulation model based on machine learning and to suggest an optimal process operation method. The development of empirical simulations involves collecting big data from the actual process, feature extraction through data mining, and representative algorithm for the chemical process. Finally, the platform for the distillation column was developed with verification through a developed model and field tests. Through the developed platform, it is possible to predict the operating parameters and provided optimal operating conditions to achieve efficient process control. This study is the basic study applying the artificial intelligence machine learning technique for the chemical process. After application on a wide variety of processes and it can be utilized to the cornerstone of the smart factory of the industry 4.0.

Development of a Pilot-Scale Soil Washing Process (파일롯 규모의 토양세척장치 개발)

  • 장윤영;신정엽;황경엽
    • Journal of Korea Soil Environment Society
    • /
    • v.3 no.3
    • /
    • pp.55-62
    • /
    • 1998
  • Soils contaminated with hydrocarbons and residual metals can be effectively treated by soil washing. In developing the soil washing process several major effects for separating contaminants from coarse soils progressively improved upon combinations of mining and chemical processing approaches. The pilot-scale soils washing process consists of the four major parts : 1) abrasive scouring, 2) scrubbing action using a washwater that is sometimes augmented by surfactants or other agents, 3) rinsing, and 4) regenerating the contaminated washwater. The plant was designed based upon the treatment capacity > 5 ton/hr on site. The lumpy contaminated soil fractions first experience deagglomeration and desliming passing through a rolling mill pipe. In the second unit the attrition scrubbing module equipped with paddles uses high-energy to remove contaminants from the soils. And a final rinsing system is assembled to separate the washwater containing the contaminants and very fine soils from the washed coarse soils. For recycling the contaminated washwater passes through a washwater clarifier specifically designed for flocculation, sedimentation and gravity separation of fine as well as flotation and separation of oils from the washwater. In order to more rapidly assess the applicability of soil washing at a potential site while minimizing the expense of mobilization and operation, a mobile-type soil washing process which is self-contained upon a trailer will be further developed.

  • PDF

Multi-family Housing Complex Breakdown Structure for Decision Making on Rehabilitation (노후 공동주택 개선여부 의사결정을 위한 공동주택 분류체계 개발)

  • Hong, Tae-Hoon;Kim, Hyun-Joong;Koo, Choong-Wan;Park, Sung-Ki
    • Korean Journal of Construction Engineering and Management
    • /
    • v.12 no.6
    • /
    • pp.101-109
    • /
    • 2011
  • As climate change is becoming the main issue, various efforts are focused on saving building energy consumption both at home and abroad. In particular, it is very important to save energy by maintenance, repair and rehabilitation of existing multi-family housing complex, because energy consumption in residential buildings is not only forming a great part of gross energy consumption in Korea but the number of deteriorated complexes is also sharply increasing. However, energy saving is not considered as a main factor in decision making on rehabilitation project. Also, any supporting tool is not appropriately prepared in existing process. As the first step for development of decision support system on rehabilitation, this paper developed a breakdown structure, which makes clusters of multi-family housing complexes. Decision tree, one of data mining methods, was used to make clusters based on the characteristics and energy consumption data of multi-family housing complexes. Energy saving and CO2 reduction will be maximized by considering energy consumption during rehabilitation process of multi-family housing complex, based on these results and following research.

The Identification Framework for source code author using Authorship Analysis and CNN (작성자 분석과 CNN을 적용한 소스 코드 작성자 식별 프레임워크)

  • Shin, Gun-Yoon;Kim, Dong-Wook;Hong, Sung-sam;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.19 no.5
    • /
    • pp.33-41
    • /
    • 2018
  • Recently, Internet technology has developed, various programs are being created and therefore various codes are being made through many authors. On this aspect, some author deceive a program or code written by other particular author as they make it themselves and use other writers' code indiscriminately, or not indicating the exact code which has been used. Due to this makes it more and more difficult to protect the code. In this paper, we propose author identification framework using Authorship Analysis theory and Natural Language Processing(NLP) based on Convolutional Neural Network(CNN). We apply Authorship Analysis theory to extract features for author identification in the source code, and combine them with the features being used text mining to perform author identification using machine learning. In addition, applying CNN based natural language processing method to source code for code author classification. Therefore, we propose a framework for the identification of authors using the Authorship Analysis theory and the CNN. In order to identify the author, we need special features for identifying the authors only, and the NLP method based on the CNN is able to apply language with a special system such as source code and identify the author. identification accuracy based on Authorship Analysis theory is 95.1% and identification accuracy applied to CNN is 98%.

Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec (Word2Vec 기반의 의미적 유사도를 고려한 웹사이트 키워드 선택 기법)

  • Lee, Donghun;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.83-96
    • /
    • 2018
  • Extracting keywords representing documents is very important because it can be used for automated services such as document search, classification, recommendation system as well as quickly transmitting document information. However, when extracting keywords based on the frequency of words appearing in a web site documents and graph algorithms based on the co-occurrence of words, the problem of containing various words that are not related to the topic potentially in the web page structure, There is a difficulty in extracting the semantic keyword due to the limit of the performance of the Korean tokenizer. In this paper, we propose a method to select candidate keywords based on semantic similarity, and solve the problem that semantic keyword can not be extracted and the accuracy of Korean tokenizer analysis is poor. Finally, we use the technique of extracting final semantic keywords through filtering process to remove inconsistent keywords. Experimental results through real web pages of small business show that the performance of the proposed method is improved by 34.52% over the statistical similarity based keyword selection technique. Therefore, it is confirmed that the performance of extracting keywords from documents is improved by considering semantic similarity between words and removing inconsistent keywords.

Short-term Mortality Prediction of Recurrence Patients with ST-segment Elevation Myocardial Infarction (ST 분절 급상승 심근경색 환자들의 단기 재발 사망 예측)

  • Lim, Kwang-Hyeon;Ryu, Kwang-Sun;Park, Soo-Ho;Shon, Ho-Sun;Ryu, Keun-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.10
    • /
    • pp.145-154
    • /
    • 2012
  • Recently, the cardiovascular disease has increased by causes such as westernization dietary life, smoking, and obesity. In particular, the acute myocardial infarction (AMI) occupies 50% death rate in cardiovascular disease. Following this trend, the AMI has been carried out a research for discovery of risk factors based on national data. However, there is a lack of diagnosis minor suitable for Korean. The objective of this paper is to develop a classifier for short-term relapse mortality prediction of cardiovascular disease patient based on prognosis data which is supported by KAMIR(Korea Acute Myocardial Infarction). Through this study, we came to a conclusion that ANN is the most suitable method for predicting the short-term relapse mortality of patients who have ST-segment elevation myocardial infarction. Also, data set obtained by logistic regression analysis performed highly efficient performance than existing data set. So, it is expect to contribute to prognosis estimation through proper classification of high-risk patients.

A proper folder recommendation technique using frequent itemsets for efficient e-mail classification (효과적인 이메일 분류를 위한 빈발 항목집합 기반 최적 이메일 폴더 추천 기법)

  • Moon, Jong-Pil;Lee, Won-Suk;Chang, Joong-Hyuk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.33-46
    • /
    • 2011
  • Since an e-mail has been an important mean of communication and information sharing, there have been much effort to classify e-mails efficiently by their contents. An e-mail has various forms in length and style, and words used in an e-mail are usually irregular. In addition, the criteria of an e-mail classification are subjective. As a result, it is quite difficult for the conventional text classification technique to be adapted to an e-mail classification efficiently. An e-mail classification technique in a commercial e-mail program uses a simple text filtering technique in an e-mail client. In the previous studies on automatic classification of an e-mail, the Naive Bayesian technique based on the probability has been used to improve the classification accuracy, and most of them are on an e-mail in English. This paper proposes the personalized recommendation technique of an email in Korean using a data mining technique of frequent patterns. The proposed technique consists of two phases such as the pre-processing of e-mails in an e-mail folder and the generating a profile for the e-mail folder. The generated profile is used for an e-mail to be classified into the most appropriate e-mail folder by the subjective criteria. The e-mail classification system is also implemented, which adapts the proposed technique.