• Title/Summary/Keyword: word selection

Search Result 173, Processing Time 0.025 seconds

A Study of Research on Methods of Automated Biomedical Document Classification using Topic Modeling and Deep Learning (토픽모델링과 딥 러닝을 활용한 생의학 문헌 자동 분류 기법 연구)

  • Yuk, JeeHee;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.63-88
    • /
    • 2018
  • This research evaluated differences of classification performance for feature selection methods using LDA topic model and Doc2Vec which is based on word embedding using deep learning, feature corpus sizes and classification algorithms. In addition to find the feature corpus with high performance of classification, an experiment was conducted using feature corpus was composed differently according to the location of the document and by adjusting the size of the feature corpus. Conclusionally, in the experiments using deep learning evaluate training frequency and specifically considered information for context inference. This study constructed biomedical document dataset, Disease-35083 which consisted biomedical scholarly documents provided by PMC and categorized by the disease category. Throughout the study this research verifies which type and size of feature corpus produces the highest performance and, also suggests some feature corpus which carry an extensibility to specific feature by displaying efficiency during the training time. Additionally, this research compares the differences between deep learning and existing method and suggests an appropriate method by classification environment.

A Design and Implementation of the Division/square-Root for a Redundant Floating Point Binary Number using High-Speed Quotient Selector (고속 지수 선택기를 이용한 여분 부동 소수점 이진수의 제산/스퀘어-루트 설계 및 구현)

  • 김종섭;조상복
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.37 no.5
    • /
    • pp.7-16
    • /
    • 2000
  • This paper described a design and implementation of the division/square-root for a redundant floating point binary number using high-speed quotient selector. This division/square-root used the method of a redundant binary addition with 25MHz clock speed. The addition of two numbers can be performed in a constant time independent of the word length since carry propagation can be eliminated. We have developed a 16-bit VLSI circuit for division and square-root operations used extensively in each iterative step. It performed the division and square-toot by a redundant binary addition to the shifted binary number every 16 cycles. Also the circuit uses the nonrestoring method to obtain a quotient. The quotient selection logic used a leading three digits of partial remainders in order to be implemented in a simple circuit. As a result, the performance of the proposed scheme is further enhanced in the speed of operation process by applying new quotient selection addition logic which can be parallelly process the quotient decision field. It showed the speed-up of 13% faster than previously presented schemes used the same algorithms.

  • PDF

Analysis of the Latest Trends of Radioisotope Using in RI-Biomics Fields (RI-Biomics분야 RI의 최신 동향 분석)

  • Jang, Sol-Ah;Yeom, Yu-Sun;Park, Tai-Jin;Hwang, Young Muk;Youn, Dol-Mi
    • Journal of Radiation Industry
    • /
    • v.7 no.2_3
    • /
    • pp.221-224
    • /
    • 2013
  • RI-Biomics is a new compound word of radiation technology and Biomics related to the study of life. RI-Biomics is high radiation fusion technology by combining evaluation of pharmacokinetics in vivo (RI-ADME) of new drugs and medical materials using radioisotope and molecular imaging technology using nuclear medicine equipments. RI-Biomics fields are emerging with the increasing usage of radioisotopes (RI). In this paper, we investigated the latest trends of radioisotope using in RI-Biomics fields. The representative radioisotopes are $^{14}C$, $^3H$ and $^{32}P$ for the optimization and the selection of candidates in the development process of new drugs among the RI-Biomics fields. As shown in the status of accumulated income of radioisotopes, using amounts of radioisotopes are showing a tendency to increase every year. $^{14}C$ is 61.6% increase of accumulated income growth rate and $^3H$ increased by 58.8% and $^{32}P$ increased by 33.9% in 2012 compared to 2007. These isotopes are used in a variety of fields as using of $^{14}C$ for microdosing test, development of [$^3H$]cholesterol absorption inhibitors, study of [$^{131}I$]pyronaridine tetraphosphate for malaria therapy. These are going on in vivo test sucessfully. So, clinical research step is expected to begin soon. Therefore, usages of radioisotopes are necessary and need for the evaluation of pharmacokinetics, optimization and the selection of new drug candidates in the development process of new drugs among the RI-Biomics fields. So, using of radioisotopes is predict to increase continuously except for primarily used $^{14}C$, $^3H$.

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.

Development of Similar Bibliographic Retrieval System based on Neighboring Words and Keyword Topic Information (인접한 단어와 키워드 주제어 정보에 기반한 유사 문헌 검색 시스템 개발)

  • Kim, Kwang-Young;Kwak, Seung-Jin
    • Journal of Korean Library and Information Science Society
    • /
    • v.40 no.3
    • /
    • pp.367-387
    • /
    • 2009
  • The similar bibliographic retrieval system follows whether it selects a thing of the extracted index term and or not the difference in which the similar document retrieval system There be many in the search result is generated. In this research, the method minimally making the error of the selection of the extracted candidate index term is provided In this research, the word information in which it is adjacent by using candidate index terms extracted from the similar literature and the keyword topic information were used. And by using the related author information and the reranking method of the search result, the similar bibliographic system in which an accuracy is high was developed. In this paper, we conducted experiments for similar bibliographic retrieval system on a collection of Korean journal articles of science and technology arena. The performance of similar bibliographic retrieval system was proved through an experiment and user evaluation.

  • PDF

Heuristic-based Korean Coreference Resolution for Information Extraction

  • Euisok Chung;Soojong Lim;Yun, Bo-Hyun
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.50-58
    • /
    • 2002
  • The information extraction is to delimit in advance, as part of the specification of the task, the semantic range of the output and to filter information from large volumes of texts. The most representative word of the document is composed of named entities and pronouns. Therefore, it is important to resolve coreference in order to extract the meaningful information in information extraction. Coreference resolution is to find name entities co-referencing real-world entities in the documents. Results of coreference resolution are used for name entity detection and template generation. This paper presents the heuristic-based approach for coreference resolution in Korean. We constructed the heuristics expanded gradually by using the corpus and derived the salience factors of antecedents as the importance measure in Korean. Our approach consists of antecedents selection and antecedents weighting. We used three kinds of salience factors that are used to weight each antecedent of the anaphor. The experiment result shows 80% precision.

  • PDF

A Study on the Contents of a Basic Technical Writing Course for Engineering Students (이공계 Technical Writing 기본과정 내용에 대한 고찰)

  • Cho, Jin-Ho
    • Journal of Engineering Education Research
    • /
    • v.15 no.5
    • /
    • pp.131-139
    • /
    • 2012
  • This paper emphasizes writing education for engineering students should be communication driven writing education based on KEC2005. Communication driven writing for engineering students is essentially same as Technical Writing(TW) developed on the basis of ABET. Considering the current writing capability of engineering students and social need for various types of writing, TW education should be divided into two courses: basic and advanced. This paper deals with contents of a basic TW course in Myongji University, as a model case of a basic TW course for engineering students. It underlines various methods of prewriting that should be stressed and practiced in the TW class, because the prewriting step in the writing process determines the overall direction and structure of an essay. In particular, this paper introduces Power Writing(PW) which uses the structure of a paragraph as a means for providing building-blocks for the essay, employing logic, and ordering information arrangement in a paragraph. This paper also deals with important guidelines about sentence structure and word selection and proposes various applications of TW such as resume, interview, proposal, report, and presentation as a latter part of the basic course. Finally this paper highlights the etics of writing, such as plagiarism and the basic principles of quotation.

A Study on the Applicability of 2-Poisson Model for Selecting Korean Subject Words (2-포아송 모형을 이용한 한글 주제어 선정에 관한 연구)

  • 정영미;최대식
    • Journal of the Korean Society for information Management
    • /
    • v.17 no.1
    • /
    • pp.129-148
    • /
    • 2000
  • Experiments were performed on three subsets of a Korean test collection in order to determine whether 2-Poisson model's Z value is a good measure for selecting subject words from a document to be indexed. It was found that subject word selection based on the Z value was effective for only one subset with short texts, i.e., the Science and Technology subset. Correlation analyses between 2-Poisson model's Z and TF.IDF weight for the three subsets showed that the correlation was relatively high for two test subsets with short texts, i.e., the Science and Technology subset and the Newspaper subset.

  • PDF

Animal Models in the Neurobehavioral Research (신경행동학적 연구의 동물모형)

  • Kim, Dong-Goo
    • Korean Journal of Psychosomatic Medicine
    • /
    • v.2 no.1
    • /
    • pp.46-51
    • /
    • 1994
  • Model' is one of the well-used, but poorly understood word in the neurobehavioral research. After Darwin's evolutionary theory, it has been generally believed that human is different from animals in terms of the complexity, not of the essential. This notion could be applied to the mind as well as body. Therefore, it became possible to establish animal models in the scientific field of mind. Experimental analysis of the animal behavior becomes an important area for establishing an animal model of human psychopathology because behavior is the ambassador of the mind. A model emphasizes a structural correspondence between sets of causally related variables in two different domains such as the animal and the human. The first selection of elements of the two domains in correspondence called the initial analogy. Once the initial analogy is formed. causally related variables in the two domains are examined and arrayed The structural parallel is the formal analogy of a model, and similarities between corresponding variables are called material analogy. Models may serve any of three major functions ; heuristic, evidential and representative. In many cases, utilizing models may be more practical than directly assessing the domain of primary interest, since technical and/or ethical problems are more serious in the human domain. Although modeling is important to study human psychopathology, rare animal models approved to be a good model for the human psychopathology up to now. Developing the appropriate model is urgent to solve many problems raised from human psychopathology.

  • PDF

New Development of Two-Dimensional Sound Quality Index for Brand sound in Passenger Cars (승용차 브랜드 사운드를 위한 이차원 음질 인덱스 개발)

  • Jo, Byoung-Ok;Lee, Sang-Kwon;Park, Dong-Chul;Lee, Min-Sub;Jung, Seung-Gyoon
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2005.11b
    • /
    • pp.174-179
    • /
    • 2005
  • In automotive engineering, the brand sound is one of the important advantage strategy in a car company. For the design of brand sound, the selection of descriptive word for a car sound is one of major works in automotive sound quality research. In paper, booming sound and rumbling sound, which are professional words used by NVH engineers are used for the design of brand sound. We employed sound metrics which are the subjective parameter used in psychoacoustics. According to most research results, the relationship between subjective evaluations and sound metrics has nonlinear characteristics and is very complex. In order to link these subjective evaluations to sound metrics, the artificial neural network technology has been applied to two-dimensional sound quality index for a passenger car. These indexes is used for 46 passenger cars, which are samples of famous cars in the world. Also the preference in car sounds is evaluated by the trained NVH engineers. We coupled this preference with booming and rumbling sounds by using artificial neural network. In future, the two -dimensional sound index and preference index are very useful fur the development of brand sound in passenger cars.

  • PDF