• 제목/요약/키워드: Data Mining Technique

검색결과 637건 처리시간 0.034초

Protein Disorder/Order Region Classification Using EPs-TFP Mining Method (EPs-TFP 마이닝 기법을 이용한 단백질 Disorder/Order 지역 분류)

  • Lee, Heon Gyu;Shin, Yong Ho
    • Journal of Korea Society of Industrial Information Systems
    • /
    • 제17권6호
    • /
    • pp.59-72
    • /
    • 2012
  • Since a protein displays its specific functions when disorder region of protein sequence transits to order region with provoking a biological reaction, the separation of disorder region and order region from the sequence data is urgently necessary for predicting three dimensional structure and characteristics of the protein. To classify the disorder and order region efficiently, this paper proposes a classification/prediction method using sequence data while acquiring a non-biased result on a specific characteristics of protein and improving the classification speed. The emerging patterns based EPs-TFP methods utilizes only the essential emerging pattern in which the redundant emerging patterns are removed. This classification method finds the sequence patterns of disorder region, such sequence patterns are frequently shown in disorder region but relatively not frequently in the order region. We expand P-tree and T-tree conceptualized TFP method into a classification/prediction method in order to improve the performance of the proposed algorithm. We used Disprot 4.9 and CASP 7 data to evaluate EPs-TFP technique, the results of order/disorder classification show sensitivity 73.6, specificity 69.51 and accuracy 74.2.

Analysis of Consulting Research Trends Using Topic Modeling (토픽 모델링을 활용한 컨설팅 연구동향 분석)

  • Kim, Min Kwan;Lee, Yong;Han, Chang Hee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • 제40권4호
    • /
    • pp.46-54
    • /
    • 2017
  • 'Consulting', which is the main research topic of the knowledge service industry, is a field of study that is essential for the growth and development of companies and proliferation to specialized fields. However, it is difficult to grasp the current status of international research related to consulting, mainly on which topics are being studied, and what are the latest research topics. The purpose of this study is to analyze the research trends of academic research related to 'consulting' by applying quantitative analysis such as topic modeling and statistic analysis. In this study, we collected statistical data related to consulting in the Scopus DB of Elsevier, which is a representative academic database, and conducted a quantitative analysis on 15,888 documents. We scientifically analyzed the research trends related to consulting based on the bibliographic data of academic research published all over the world. Specifically, the trends of the number of articles published in the major countries including Korea, the author key word trend, and the research topic trend were compared by country and year. This study is significant in that it presents the result of quantitative analysis based on bibliographic data in the academic DB in order to scientifically analyze the trend of academic research related to consulting. Especially, it is meaningful that the traditional frequency-based quantitative bibliographic analysis method and the text mining (topic modeling) technique are used together and analyzed. The results of this study can be used as a tool to guide the direction of research in consulting field. It is expected that it will help to predict the promising field, changes and trends of consulting industry related research through the trend analysis.

Discovering Sequence Association Rules for Protein Structure Prediction (단백질 구조 예측을 위한 서열 연관 규칙 탐사)

  • Kim, Jeong-Ja;Lee, Do-Heon;Baek, Yun-Ju
    • The KIPS Transactions:PartD
    • /
    • 제8D권5호
    • /
    • pp.553-560
    • /
    • 2001
  • Bioinformatics is a discipline to support biological experiment projects by storing, managing data arising from genome research. In can also lead the experimental design for genome function prediction and regulation. Among various approaches of the genome research, the proteomics have been drawing increasing attention since it deals with the final product of genomes, i.e., proteins, directly. This paper proposes a data mining technique to predict the structural characteristics of a given protein group, one of dominant factors of the functions of them. After explains associations among amino acid subsequences in the primary structures of proteins, which can provide important clues for determining secondary or tertiary structures of them, it defines a sequence association rule to represent the inter-subsequences. It also provides support and confidence measures, newly designed to evaluate the usefulness of sequence association rules, After is proposes a method to discover useful sequence association rules from a given protein group, it evaluates the performance of the proposed method with protein sequence data from the SWISS-PROT protein database.

  • PDF

Comparison of ensemble pruning methods using Lasso-bagging and WAVE-bagging (분류 앙상블 모형에서 Lasso-bagging과 WAVE-bagging 가지치기 방법의 성능비교)

  • Kwak, Seungwoo;Kim, Hyunjoong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제25권6호
    • /
    • pp.1371-1383
    • /
    • 2014
  • Classification ensemble technique is a method to combine diverse classifiers to enhance the accuracy of the classification. It is known that an ensemble method is successful when the classifiers that participate in the ensemble are accurate and diverse. However, it is common that an ensemble includes less accurate and similar classifiers as well as accurate and diverse ones. Ensemble pruning method is developed to construct an ensemble of classifiers by choosing accurate and diverse classifiers only. In this article, we proposed an ensemble pruning method called WAVE-bagging. We also compared the results of WAVE-bagging with that of the existing pruning method called Lasso-bagging. We showed that WAVE-bagging method performed better than Lasso-bagging by the extensive empirical comparison using 26 real dataset.

Physical Properties of Rocks at the Gagok Skarn Deposit (가곡 스카른광상 암석의 물리적 특성)

  • Shin, Seungwook;Park, Samgyu;Kim, Hyoung-Rae
    • Geophysics and Geophysical Exploration
    • /
    • 제16권3호
    • /
    • pp.180-189
    • /
    • 2013
  • Geophysical exploration is widely used to develop strategic mineral resources in the world because of its efficient method in detecting mineralized zones in the metallic ore deposit. It is important to understand the physical properties of the stratum so that geophysical data can be more accurately interpreted. This paper is to comprehend physical properties of the rock at the Gagok mine, a typical skarn deposit in Korea. Thus, laboratory tests were conducted on specimens of ore and host rocks which were collected from rock outcrops and drill cores at the Gagok mine. Using the measurement system of rock physical property, we investigated the density, magnetic susceptibility, resistivity, and spectral induced polarization. According to the results, all physical properties of specimens had wide differences depending on contents of ore minerals, which are formed by skarnization. Especially, using the chargeability and time constant from the calculated spectral induced polarization data by the Cole-Cole inversion, we could estimate the volume contents as well as the grain size of the sulfide minerals. Therefore, the spectral induced polarization technique may be considered a useful method when exploring metallic ore deposit with sulfide minerals.

Topic Modeling of Profit Adjustment Research Trend in Korean Accounting (텍스트 마이닝을 이용한 이익조정 연구동향 토픽모델링)

  • Kim, JiYeon;Na, HongSeok;Park, Kyung Hwan
    • Journal of Digital Convergence
    • /
    • 제19권1호
    • /
    • pp.125-139
    • /
    • 2021
  • This study identifies the trend of Korean accounting researches on profit adjustment. We analyzed the abstract of accounting research articles published in Korean Citation Index (KCI) by using text mining technique. Among papers whose themes were profit adjustment, topics were divided into 4 parts: (i) Auditing and audit reports, (ii) corporate taxes and debt ratios, (iii) general management strategy of companies, and (iv) financial statements and accounting principles. Unlike the prediction that financial statements and accounting principles would be the main topic, auditing was analyzed as the most studied area. We analyzed topic trends based on the number of papers by topic, and could figure out the impact of K-IFRS introduction on profit adjustment research. By using Big Data method, this study enabled the division of research themes that have not been available in the past studies. This study enables the policy makers and business managers to learn about additional considerations in addition to accounting principles related to profit adjustment.

Research on the Movie Reviews Regarded as Unsuccessful in Box Office Outcomes in Korea: Based on Big Data Posted on Naver Movie Portal

  • Jeon, Ho-Seong
    • Asia-Pacific Journal of Business
    • /
    • 제12권3호
    • /
    • pp.51-69
    • /
    • 2021
  • Purpose - Based on literature studies of movie reviews and movie ratings, this study raised two research questions on the contents of online word of mouth and the number of movie screens as mediator variables. Research question 1 wanted to figure out which topics of word groups had a positive or negative impact on movie ratings. Research question 2 tried to identify the role of the number of movie screens between movie ratings and box office outcomes. Design/methodology/approach - Through R program, this study collected about 82,000 movie reviews and movie ratings posted on Naver's movie website to examine the role of online word of mouths and movie screen counts in 10 movies that were considered commercially unsuccessful with fewer than 2 million viewers despite securing about 1,000 movie screens. To confirm research question 1, topic modeling, a text mining technique, was conducted on movie reviews. In addition, this study linked the movie ratings posted on Naver with information of KOBIS by date, to identify the research question 2. Findings - Through topic modeling, 5 topics were identified. Topics found in this study were largely organized into two groups, the content of the movie (topic 1, 2, 3) and the evaluation of the movie (topics 4, 5). When analyzing the relationship between movie reviews and movie ratings with 5 mediators identified in topic modeling to probe research question 1, the topic word groups related to topic 2, 3 and 5 appeared having a negative effect on the netizen's movie ratings. In addition, by connecting two secondary data by date, analysis for research question 2 was implemented. The outcomes showed that the causal relationship between movie ratings and audience numbers was mediated by the number of movie screens. Research implications or Originality - The results suggested that the information presented in text format was harder to quantify than the information provided in scores, but if content information could be digitalized through text mining techniques, it could become variable and be analyzed to identify causality with other variables. The outcomes in research question 2 showed that movie ratings had a direct impact on the number of viewers, but also had indirect effects through changes in the number of movie screens. An interesting point is that the direct effect of movie ratings on the number of viewers is found in most American films released in Korea.

'Elderly image' Analysis Using Big Data and Social Networking Techniques (빅데이터와 사회연결망 기법을 이용한 '노인 이미지' 분석)

  • Han, Sun-Bo;Lee, Hyun-Sim
    • The Journal of the Korea Contents Association
    • /
    • 제16권11호
    • /
    • pp.253-263
    • /
    • 2016
  • We analyzed the social issue 'image of the elderly' using Big Data and Social Network Analysis. First, we analyzed the words extracted by the text mining technique by inputting the keyword 'elderly'. As a result of analysis, the image of the elderly viewed through media such as cafes, blogs, etc. Representing the trend of the public was using the word 'Senior' the most. The image of the elderly is expressed using the word having the highest frequency in the top 10, "The elderly are 'Senior' people who are respected by society, they are organized to earn money, to earn their qualifications, to health, and to 'Seniors' who desire to work healthy up to 100 years old". The purpose of this study is to differentiate from the existing analysis method by analyzing the macro-level image of the elderly including the social discourse by collecting vast amount of data and analyzing it with the social networking technique. When the image of the elderly that the public perceives is positively expressed as 'Senior', it can be said that the direction of the current elderly policy is evaluated as a desirable direction. On the other hand, it was able to feel the 'desire' of the public who wanted to be evaluated. Therefore, the policy direction of the elderly to be applied in the future should be the policy that enables the elderly to be perceived as 'Necessary existence' in society by taking on social roles. In addition, we proposed to implement the policy of the elderly that reflects priorities such as job creation, welfare, and alienation that can activity and maintain health.

Visualization of Flow in a Transonic Centrifugal Compressor

  • Hayami Hiroshi
    • 한국가시화정보학회:학술대회논문집
    • /
    • 한국가시화정보학회 2002년도 추계학술대회 논문집
    • /
    • pp.1-6
    • /
    • 2002
  • How is the flow in a rotating impeller. About 35 years have passed since one experimentalist rotating with the impeller. of a huge centrifugal blower made the flow measurements using a hot-wire anemometer (Fowler 1968). Optical measurement methods have great advantages over the intrusive methods especially for the flow measurement in a rotating impeller. One is the optical flow visualization (FV) technique (Senoo, et al., 1968) and the other is the application of laser velocimetry (LV) (Hah and Krain, 1990). Particle image velocimetries (PIVs) combine major features of both FV and LV, and are very attractive due to the feasibility of simultaneous and multi-points measurements (Hayami and Aramaki, 1999). A high-pressure-ratio transonic centrifugal compressor with a low-solidity cascade diffuser was tested in a closed loop with HFC134a gas at 18,000rpm (Hayami, 2000). Two kinds of measurement techniques by image processing were applied to visualize a flow in the compressor. One is a velocity field measurement at the inducer of the impeller using a PIV and the other is a pressure field measurement on the side wall of the cascade diffuser using a pressure sensitive paint (PSP) measurement technique. The PIV was successfully applied for visualization of an unsteady behavior of a shock wave based on the instantaneous velocity field measurement (Hayami, et al., 2002b) as well as a phase-averaged velocity vector field with a shock wave over one blade pitch (Hayami, et al., 2002a. b). A violent change in pressure was successfully visualized using a PSP measurement during a surge condition even though there are still some problems to be overcome (Hayami, et al., 2002c). Both PIV and PSP results are discussed in comparison with those of laser-2-focus (L2F) velocimetry and those of semiconductor pressure sensors. Experimental fluid dynamics (EFDs) are still growing up more and more both in hardware and in software. On the other hand, computational fluid dynamics (CFDs) are very attractive to understand the details of flow. A secondary flow on the side wall of the cascade diffuser was visualized based either steady or unsteady CFD calculations (Bonaiuti, et al.,2002). EFD and CFD methods will be combined to a hybrid method being complementary to each other. Measurement techniques by image processing as well as CFD calculations give a huge amount of data. Then, data mining technique will become more important to understand the flow mechanism both for EFD and CFD.

  • PDF

Video Data Classification based on a Video Feature Profile (특성정보 프로파일에 기반한 동영상 데이터 분류)

  • Son Jeong-Sik;Chang Joong-Hyuk;Lee Won-Suk
    • The KIPS Transactions:PartD
    • /
    • 제12D권1호
    • /
    • pp.31-42
    • /
    • 2005
  • Generally, conventional video searching or classification methods are based on its meta-data. However, it is almost Impossible to represent the precise information of a video data by its meta-data. Therefore, a processing method of video data that is based on its meta-data has a limitation to be efficiently applied in application fields. In this paper, for efficient classification of video data, a classification method of video data that is based on its low-level data is proposed. The proposed method extracts the characteristics of video data from the given video data by clustering process, and makes the profile of the video data. Subsequently. the similarity between the profile and video data to be classified is computed by a comparing process of the profile and the video data. Based on the similarity. the video data is classified properly. Furthermore, in order to improve the performance of the comparing process, generating and comparing techniques of integrated profile are presented. A comparing technique based on a differentiated weight to improve a result of a comparing Process Is also Presented. Finally, the performance of the proposed method is verified through a series of experiments using various video data.