• Title, Summary, Keyword: C4.5 Algorithm

Search Result 288, Processing Time 0.041 seconds

거제도 해안유출지하수 예비조사 및 활용방안 연구

  • 이대근;김형수;박찬석;원종호;김규범
    • Proceedings of the Korean Society of Soil and Groundwater Environment Conference
    • /
    • /
    • pp.253-256
    • /
    • 2002
  • 거제도는 남쪽에서 두 번째로 큰 섬으로써 총면적은 399.96$\textrm{km}^2$이며 총면적의 71.85%가 임야로 이루어져 있고 하천이 짧으며 유역면적이 좁은 관계로 지하수의 함양이 어려우며 해안으로 유출되는 지하수가 상당량이 될 것으로 사료되었다. 따라서 유출지하수의 특성을 연구하여 지하수의 유출가능성이 높은 지역을 찾을 수 있도록 여러 가지 분석을 통하여 알아보았다. 이를 위하여 기본적으로 기온, 강수량 등의 기상자료와 지하수온도, 지하수위등의 수문자료 및 해수표면온도 등의 해양관측자료를 이용하였으며, 해수와 지하수의 온도차가 많은 달의 Lanset 7 ETM+ 인공위성 영상자료와 NOAA 인공영상자료를 이용하여 온도자료를 비교하고, 각개 영사의 열분포도를 분석함으로써 유출지하수의 가능성이 높은 지역을 추출하였다. 추출한 지역에서 인구밀집지역, 공단지역, 기 공급지역을 제외하였으며, 수문지질학 적으로 유리한 지역을 선정하고, 평균해수분포차가 큰 지역을 추출함으로써 이후에 이루어질 현장조사시에 접근이 용이하도록 하였다. 연구결과 거제도 일대의 해안유출지하수 가능지점은 10개소 이상이며 자연적, 사회학석인 여건을 고려한다면 지하수개발가능 지역은 6개 정도로 예상된다. 또한 해수면의 온도와 지하수의 온도가 차이가 클 때는11~13$^{\circ}C$의 분포를 보이고 있어, 이후 이와 같은 연구에 충분히 활용할 수 있을 것이며, 해상도가 높은 자료와 연계하면 보다 정확한 자료의 추출이 가능해 앞으로의 국내에 활용되지 못한 수자원 개발에도 많은 도움이 될 것으로 판단된다.하게도 유기물과의 친화력이 높은 것으로 알려진 Cu 역시 F1과 F2에 대하여 높은 함량을 나타내어 오염원으로부터의 Cu의 확산을 지시하였다. 외국에 비하여, 그동안 국내에서는 사격장 주변의 자연환경변화에 관하여 연구된 결과가 거의 전무하였다. 본 연구 결과는, 이와 유사한 사격장 주변 환경에서의 중금속 분포와 거동 특성에 대하여 종합적인 모니터링(즉, 체계적인 환경지구화학적 조사ㆍ연구)이 시급함을 시사해 주고 있다.할 수 있었다.연구지역을 대상으로 추정한 함양율은 지하수이용에 따른 지하수위하강에 대한 보정을 할 필요가 있으며 지하수이용실태조사를 추가로 하여 그 이용량만큼을 지하수함양량에 더하여야 할것이다.의 특성 등을 고려하여 거기에 맞는 기술들을 복합적으로 또는 단독으로 사용하되 처리방법 채택 시 신중을 기할 것이 요망된다.정시에는 SeaWiFS 위성과 관련된 global algorithms 중에서 490nm와 555nm의 복합밴드를 포함하는 OC2 알고리즘(ocean color chlorophyll 2 algorithm)을 사용하는 것이 OC2 series 및 OC4 알고리즘보다 좋은 추정 값을 도출할 수 있을 것으로 기대된다.환경에서는 5일에서 7월에 주로 이 충체의 유충이 발육되고 전파되는 것으로 추측되었다.러 가지 방법들을 적극 적용하여 금후 검토해볼 필요가 있을 것이다.잡은 전혀 삭과가 형성되지 않았다. 이 결과는 종간 교잡종을 자방친으로 하고 그 자방친의 화분친을 사용할 때만 교잡이 이루어지고 있음을 나타내고 있다. 따라서 여교잡을 통한 종간잡종 품종육성 활용방안을 금후 적극 확대 검토해야 할 것이다하였다.함을 보

  • PDF

A Study on the Recognition Algorithm of Paprika in the Images using the Deep Neural Networks (심층 신경망을 이용한 영상 내 파프리카 인식 알고리즘 연구)

  • Hwa, Ji Ho;Lee, Bong Ki;Lee, Dae Weon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • /
    • pp.142-142
    • /
    • 2017
  • 본 연구에서는 파프리카를 자동 수확하기 위한 시스템 개발의 일환으로 파프리카 재배환경에서 획득한 영상 내에 존재하는 파프리카 영역과 비 파프리카 영역의 RGB 정보를 입력으로 하는 인공신경망을 설계하고 학습을 수행하고자 하였다. 학습된 신경망을 이용하여 영상 내 파프리카 영역과 비 파프리카 영역의 구분이 가능 할 것으로 사료된다. 심층 신경망을 설계하기 위하여 MS Visual studio 2015의 C++, MFC와 Python 및 TensorFlow를 사용하였다. 먼저, 심층 신경망은 입력층과 출력층, 그리고 은닉층 8개를 가지는 형태로 입력 뉴런 3개, 출력 뉴런 4개, 각 은닉층의 뉴런은 5개로 설계하였다. 일반적으로 심층 신경망에서는 은닉층이 깊을수록 적은 입력으로 좋은 학습 결과를 기대 할 수 있지만 소요되는 시간이 길고 오버 피팅이 일어날 가능성이 높아진다. 따라서 본 연구에서는 소요시간을 줄이기 위하여 Xavier 초기화를 사용하였으며, 오버 피팅을 줄이기 위하여 ReLU 함수를 활성화 함수로 사용하였다. 파프리카 재배환경에서 획득한 영상에서 파프리카 영역과 비 파프리카 영역의 RGB 정보를 추출하여 학습의 입력으로 하고 기대 출력으로 붉은색 파프리카의 경우 [0 0 1], 노란색 파프리카의 경우 [0 1 0], 비 파프리카 영역의 경우 [1 0 0]으로 하는 형태로 3538개의 학습 셋을 만들었다. 학습 후 학습 결과를 평가하기 위하여 30개의 테스트 셋을 사용하였다. 학습 셋을 이용하여 학습을 수행하기 위해 학습률을 변경하면서 학습 결과를 확인하였다. 학습률을 0.01 이상으로 설정한 경우 학습이 이루어지지 않았다. 이는 학습률에 의해 결정되는 가중치의 변화량이 너무 커서 비용 함수의 결과가 0에 수렴하지 않고 발산하는 경향에 의한 것으로 사료된다. 학습률을 0.005, 0.001로 설정 한 경우 학습에 성공하였다. 학습률 0.005의 경우 학습 횟수 3146회, 소요시간 20.48초, 학습 정확도 99.77%, 테스트 정확도 100%였으며, 학습률 0.001의 경우 학습 횟수 38931회, 소요시간 181.39초, 학습 정확도 99.95%, 테스트 정확도 100%였다. 학습률이 작을수록 더욱 정확한 학습이 가능하지만 소요되는 시간이 크고 국부 최소점에 빠질 확률이 높았다. 학습률이 큰 경우 학습 소요 시간이 줄어드는 반면 학습 과정에서 비용이 발산하여 학습이 이루어지지 않는 경우가 많음을 확인 하였다.

  • PDF

Prediction of Lung Cancer Based on Serum Biomarkers by Gene Expression Programming Methods

  • Yu, Zhuang;Chen, Xiao-Zheng;Cui, Lian-Hua;Si, Hong-Zong;Lu, Hai-Jiao;Liu, Shi-Hai
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.21
    • /
    • pp.9367-9373
    • /
    • 2014
  • In diagnosis of lung cancer, rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important. Serum markers, including lactate dehydrogenase (LDH), C-reactive protein (CRP), carcino-embryonic antigen (CEA), neurone specific enolase (NSE) and Cyfra21-1, are reported to reflect lung cancer characteristics. In this study classification of lung tumors was made based on biomarkers (measured in 120 NSCLC and 60 SCLC patients) by setting up optimal biomarker joint models with a powerful computerized tool - gene expression programming (GEP). GEP is a learning algorithm that combines the advantages of genetic programming (GP) and genetic algorithms (GA). It specifically focuses on relationships between variables in sets of data and then builds models to explain these relationships, and has been successfully used in formula finding and function mining. As a basis for defining a GEP environment for SCLC and NSCLC prediction, three explicit predictive models were constructed. CEA and NSE are requentlyused lung cancer markers in clinical trials, CRP, LDH and Cyfra21-1 have significant meaning in lung cancer, basis on CEA and NSE we set up three GEP models-GEP 1(CEA, NSE, Cyfra21-1), GEP2 (CEA, NSE, LDH), GEP3 (CEA, NSE, CRP). The best classification result of GEP gained when CEA, NSE and Cyfra21-1 were combined: 128 of 135 subjects in the training set and 40 of 45 subjects in the test set were classified correctly, the accuracy rate is 94.8% in training set; on collection of samples for testing, the accuracy rate is 88.9%. With GEP2, the accuracy was significantly decreased by 1.5% and 6.6% in training set and test set, in GEP3 was 0.82% and 4.45% respectively. Serum Cyfra21-1 is a useful and sensitive serum biomarker in discriminating between NSCLC and SCLC. GEP modeling is a promising and excellent tool in diagnosis of lung cancer.

Analytical Methods of Levoglucosan, a Tracer for Cellulose in Biomass Burning, by Four Different Techniques

  • Bae, Min-Suk;Lee, Ji-Yi;Kim, Yong-Pyo;Oak, Min-Ho;Shin, Ju-Seon;Lee, Kwang-Yul;Lee, Hyun-Hee;Lee, Sun-Young;Kim, Young-Joon
    • Asian Journal of Atmospheric Environment
    • /
    • v.6 no.1
    • /
    • pp.53-66
    • /
    • 2012
  • A comparison of analytical approaches for Levoglucosan ($C_6H_{10}O_5$, commonly formed from the pyrolysis of carbohydrates such as cellulose) and used for a molecular marker in biomass burning is made between the four different analytical systems. 1) Spectrothermography technique as the evaluation of thermograms of carbon using Elemental Carbon & Organic Carbon Analyzer, 2) mass spectrometry technique using Gas Chromatography/mass spectrometer (GC/MS), 3) Aerosol Mass Spectrometer (AMS) for the identification of the particle size distribution and chemical composition, and 4) two dimensional Gas Chromatography with Time of Flight mass spectrometry (GC${\times}$GC-TOFMS) for defining the signature of Levoglucosan in terms of chemical analytical process. First, a Spectrothermography, which is defined as the graphical representation of the carbon, can be measured as a function of temperature during the thermal separation process and spectrothermographic analysis. GC/MS can detect mass fragment ions of Levoglucosan characterized by its base peak at m/z 60, 73 in mass fragment-grams by methylation and m/z 217, 204 by trimethylsilylderivatives (TMS-derivatives). AMS can be used to analyze the base peak at m/z 60.021, 73.029 in mass fragment-grams with a multiple-peak Gaussian curve fit algorithm. In the analysis of TMS derivatives by GC${\times}$GC-TOFMS, it can detect m/z 73 as the base ion for the identification of Levoglucosan. It can also observe m/z 217 and 204 with existence of m/z 333. Although the ratios of m/z 217 and m/z 204 to the base ion (m/z 73) in the mass spectrum of GC${\times}$GC-TOFMS lower than those of GC/MS, Levoglucosan can be separated and characterized from D (-) +Ribose in the mixture of sugar compounds. At last, the environmental significance of Levoglucosan will be discussed with respect to the health effect to offer important opportunities for clinical and potential epidemiological research for reducing incidence of cardiovascular and respiratory diseases.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

Design of a Bit-Serial Divider in GF(2$^{m}$ ) for Elliptic Curve Cryptosystem (타원곡선 암호시스템을 위한 GF(2$^{m}$ )상의 비트-시리얼 나눗셈기 설계)

  • 김창훈;홍춘표;김남식;권순학
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.12C
    • /
    • pp.1288-1298
    • /
    • 2002
  • To implement elliptic curve cryptosystem in GF(2$\^$m/) at high speed, a fast divider is required. Although bit-parallel architecture is well suited for high speed division operations, elliptic curve cryptosystem requires large m(at least 163) to support a sufficient security. In other words, since the bit-parallel architecture has an area complexity of 0(m$\^$m/), it is not suited for this application. In this paper, we propose a new serial-in serial-out systolic array for computing division operations in GF(2$\^$m/) using the standard basis representation. Based on a modified version of tile binary extended greatest common divisor algorithm, we obtain a new data dependence graph and design an efficient bit-serial systolic divider. The proposed divider has 0(m) time complexity and 0(m) area complexity. If input data come in continuously, the proposed divider can produce division results at a rate of one per m clock cycles, after an initial delay of 5m-2 cycles. Analysis shows that the proposed divider provides a significant reduction in both chip area and computational delay time compared to previously proposed systolic dividers with the same I/O format. Since the proposed divider can perform division operations at high speed with the reduced chip area, it is well suited for division circuit of elliptic curve cryptosystem. Furthermore, since the proposed architecture does not restrict the choice of irreducible polynomial, and has a unidirectional data flow and regularity, it provides a high flexibility and scalability with respect to the field size m.

A Destructive Method in the Connection of the Algorithm and Design in the Digital media - Centered on the Rapid Prototyping Systems of Product Design - (디지털미디어 환경(環境)에서 디자인 특성(特性)에 관한 연구(硏究) - 실내제품(室內製品) 디자인을 중심으로 -)

  • Kim Seok-Hwa
    • Journal of Science of Art and Design
    • /
    • v.5
    • /
    • pp.87-129
    • /
    • 2003
  • The purpose of this thesis is to propose a new concept of design of the 21st century, on the basis of the study on the general signification of the structures and the signs of industrial product design, by examining the difference between modern and post-modern design, which is expected to lead the users to different design practice and interpretation of it. The starting point of this study is the different styles and patterns of 'Gestalt' in the post-modern design of the late 20th century from modern design - the factor of determination in industrial product design. That is to say, unlike functional and rational styles of modern product design, the late 20th century is based upon the pluralism characterized by complexity, synthetic and decorativeness. So far, most of the previous studies on design seem to have excluded visual aspects and usability, focused only on effective communication of design phenomena. These partial studies on design, blinded by phenomenal aspects, have resulted in failure to discover a principle of fundamental system. However, design varies according to the times; and the transformation of design is reflected in Design Pragnanz to constitute a new text of design. Therefore, it can be argued that Design Pragnanz serves as an essential factor under influence of the significance of text. In this thesis, therefore, I delve into analysis of the 20th century product design, in the light of Gestalt theory and Design Pragnanz, which have been functioning as the principle of the past design. For this study, I attempted to discover the fundamental elements in modern and post-modern designs, and to examine the formal structure of product design, the users' aesthetic preference and its semantics, from the integrative viewpoint. Also, with reference to history and theory of design my emphasis is more on fundamental visual phenomena than on structural analysis or process of visualization in product design, in order to examine the formal properties of modern and post-modern designs. Firstly, In Chapter 1, 'Issues and Background of the Study', I investigated the Gestalt theory and Design Pragnanz, on the premise of formal distinction between modern and post-modern designs. These theories are founded upon the discussion on visual perception of Gestalt in Germany in 1910's, in pursuit of the principle of perception centered around visual perception of human beings. In Chapter 2, I dealt with functionalism of modern design, as an advance preparation for the further study on the product design of the late 20th century. First of all, in Chapter 2-1, I examined the tendency of modern design focused on functionalism, which can be exemplified by the famous statement 'Form follows function'. Excluding all unessential elements in design - for example, decoration, this tendency has attained the position of the international style based on the spirit of Bauhause - universality and regularity - in search of geometric order, standardization and rationalization. In Chapter 2-2, I investigated the anthropological viewpoint that modern design started representing culture in a symbolic way including overall aspects of the society - politics, economics and ethics, and its criticism on functionalist design that aesthetic value is missing in exchange of excessive simplicity in style. Moreover, I examined the pluralist phenomena in post-modern design such as kitsch, eclecticism, reactionism, hi-tech and digital design, breaking away from functionalist purism of modern design. In Chapter 3, I analyzed Gestalt Pragnanz in design in a practical way, against the background of design trends. To begin with, I selected mass product design among those for the 20th century products as a target of analysis, highlighting representative styles in each category of the products. For this analysis, I adopted the theory of J. M Lehnhardt, who gradated in percentage the aesthetic and semantic levels of Pragnantz in design expression, and that of J. K. Grutter, who expressed it in a formula of M = O : C. I also employed eight units of dichotomies, according to the G. D. Birkhoff's aesthetic criteria, for the purpose of scientific classification of the degree of order and complexity in design; and I analyzed phenomenal aspects of design form represented in each unit. For Chapter 4, I executed a questionnaire about semiological phenomena of Design Pragnanz with 28 units of antonymous adjectives, based upon the research in the previous chapter. Then, I analyzed the process of signification of Design Pragnanz, founded on this research. Furthermore, the interpretation of the analysis served as an explanation to preference, through systematic analysis of Gestalt and Design Pragnanz in product design of the late 20th century. In Chapter 5, I determined the position of Design Pragnanz by integrating the analyses of Gestalt and Pragnanz in modern and post-modern designs In this process, 1 revealed the difference of each Design Pragnanz in formal respect, in order to suggest a vision of the future as a result, which will provide systemic and structural stimulation to current design.

  • PDF

A Study of Factors Associated with Software Developers Job Turnover (데이터마이닝을 활용한 소프트웨어 개발인력의 업무 지속수행의도 결정요인 분석)

  • Jeon, In-Ho;Park, Sun W.;Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.191-204
    • /
    • 2015
  • According to the '2013 Performance Assessment Report on the Financial Program' from the National Assembly Budget Office, the unfilled recruitment ratio of Software(SW) Developers in South Korea was 25% in the 2012 fiscal year. Moreover, the unfilled recruitment ratio of highly-qualified SW developers reaches almost 80%. This phenomenon is intensified in small and medium enterprises consisting of less than 300 employees. Young job-seekers in South Korea are increasingly avoiding becoming a SW developer and even the current SW developers want to change careers, which hinders the national development of IT industries. The Korean government has recently realized the problem and implemented policies to foster young SW developers. Due to this effort, it has become easier to find young SW developers at the beginning-level. However, it is still hard to recruit highly-qualified SW developers for many IT companies. This is because in order to become a SW developing expert, having a long term experiences are important. Thus, improving job continuity intentions of current SW developers is more important than fostering new SW developers. Therefore, this study surveyed the job continuity intentions of SW developers and analyzed the factors associated with them. As a method, we carried out a survey from September 2014 to October 2014, which was targeted on 130 SW developers who were working in IT industries in South Korea. We gathered the demographic information and characteristics of the respondents, work environments of a SW industry, and social positions for SW developers. Afterward, a regression analysis and a decision tree method were performed to analyze the data. These two methods are widely used data mining techniques, which have explanation ability and are mutually complementary. We first performed a linear regression method to find the important factors assaociated with a job continuity intension of SW developers. The result showed that an 'expected age' to work as a SW developer were the most significant factor associated with the job continuity intention. We supposed that the major cause of this phenomenon is the structural problem of IT industries in South Korea, which requires SW developers to change the work field from developing area to management as they are promoted. Also, a 'motivation' to become a SW developer and a 'personality (introverted tendency)' of a SW developer are highly importantly factors associated with the job continuity intention. Next, the decision tree method was performed to extract the characteristics of highly motivated developers and the low motivated ones. We used well-known C4.5 algorithm for decision tree analysis. The results showed that 'motivation', 'personality', and 'expected age' were also important factors influencing the job continuity intentions, which was similar to the results of the regression analysis. In addition to that, the 'ability to learn' new technology was a crucial factor for the decision rules of job continuity. In other words, a person with high ability to learn new technology tends to work as a SW developer for a longer period of time. The decision rule also showed that a 'social position' of SW developers and a 'prospect' of SW industry were minor factors influencing job continuity intensions. On the other hand, 'type of an employment (regular position/ non-regular position)' and 'type of company (ordering company/ service providing company)' did not affect the job continuity intension in both methods. In this research, we demonstrated the job continuity intentions of SW developers, who were actually working at IT companies in South Korea, and we analyzed the factors associated with them. These results can be used for human resource management in many IT companies when recruiting or fostering highly-qualified SW experts. It can also help to build SW developer fostering policy and to solve the problem of unfilled recruitment of SW Developers in South Korea.