• Title/Summary/Keyword: Sentence Extraction

Search Result 97, Processing Time 0.021 seconds

Music Structure Analysis and Application (악곡구조 분석과 활용)

  • Seo, Jung-Bum;Bae, Jae-Hak
    • The KIPS Transactions:PartB
    • /
    • v.14B no.1 s.111
    • /
    • pp.33-42
    • /
    • 2007
  • This paper presents a new methodology for music structure analysis which facilitates rhetoric-based music summarization. Similarity analysis of musical constituents suggests the structure of a musical piece. We can recognize its musical form from the structure. Musical forms have rhetorical characteristics of their on. We have utilized the characteristics for locating musical motifs. Motif extraction is to music summarization what topic sentence extraction is to text summarization. We have evaluated the effectiveness of this methodology through a popular music case study.

Conceptual Graph Matching Method for Reading Comprehension Tests

  • Zhang, Zhi-Chang;Zhang, Yu;Liu, Ting;Li, Sheng
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.4
    • /
    • pp.419-430
    • /
    • 2009
  • Reading comprehension (RC) systems are to understand a given text and return answers in response to questions about the text. Many previous studies extract sentences that are the most similar to questions as answers. However, texts for RC tests are generally short and facts about an event or entity are often expressed in multiple sentences. The answers for some questions might be indirectly presented in the sentences having few overlapping words with the questions. This paper proposes a conceptual graph matching method towards RC tests to extract answer strings. The method first represents the text and questions as conceptual graphs, and then extracts subgraphs for every candidate answer concept from the text graph. All candidate answer concepts will be scored and ranked according to the matching similarity between their sub-graphs and question graph. The top one will be returned as answer seed to form a concise answer string. Since the sub-graphs for candidate answer concepts are not restricted to only covering a single sentence, our approach improved the performance of answer extraction on the Remedia test data.

Development of Optimum Rutin Extraction Process from Fagopyrum tataricum (쓴 메밀에서의 루틴 추출 최적 공정 개발)

  • Yoon, Seong-Jun;Cho, Nam-Ji;Na, Seog-Hwan;Kim, Young-Ho;Kim, Young-Mo
    • Journal of the East Asian Society of Dietary Life
    • /
    • v.16 no.5
    • /
    • pp.573-577
    • /
    • 2006
  • The rutin content of Fagopyrum tataricum is 100-fold higher than that of Fagopyrum esculentum. For the development of a rutin-containing beverage, a suitable method to extract rutin from buckwheat (Fagopyrum tataricum) with high rutin yield was investigated. A roasting temperature range of $310/240^{\circ}C$ (Ed-confirm that this is indeed a range; otherwise perhaps, 'Roasting temperatures ranging from 310 to $240^{\circ}C$ were considered$\ldots$') was considered to be the best as the basic color reference. Rutin content varied according to the roasting time and heating temperature; i.e., it decreased with increasing roasting time and temperature. (Ed- this sentence is unnecessarily complicated and should be simplified to 'Rutin content decreased with increasing roasting time and heating temperature.') The optimal extraction temperature and processing time were obtained as $80^{\circ}C$ and 10 minutes to maximize the rutin concentration in the extract.

  • PDF

Event Sentence Extraction for Information Extraction (정보 추출을 위한 이벤트 문장 추출)

  • Kim, Tae-Hyun;Lim, Soo-Jong;Yun, Bo-Hyun;Park, Sang-Gyu
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.325-331
    • /
    • 2002
  • 정보추출 시스템의 목적은 관심의 대상이 되는 특정 정보를 선택적으로 찾아내 제시하는데 있다. 따라서 도메인 정보에 의존적인 방법으로 정보추출이 이루어질 수밖에 없고, 이에 따른 도메인 정보 구축의 부담이 컸다. 이러한 부담을 줄이기 위해 본 연구에서는 특정 주제영역과 관련한 문서로부터 자동으로 이벤트 문장을 추출하는 시스템을 제안한다. 이벤트 문장이란, 특정도메인에서 다루어지는 이벤트의 구체적인 내용을 포함하고 있는 문장이다. 이러한 문장을 추출함으로써 기본적인 수준의 정보추출 요구를 만족시킬 수 있을 뿐만 아니라, 주출된 이벤트 문장을 도메인 정보 구축에 활용할 수 있을 것이다. 본 연구에서는 동사, 명사, 명사구, 및 3W 자질을 이용하여 문장추출의 성능을 최대화하기 위한 방안을 제안하고, 세 개의 평가 도메인을 대상으로 실험을 수행하였다. 실험 결과, when 및 where 자질과 동사, 명사. 명사구의 가중치를 이용하여 문장 가중치를 계산함으로써 최적의 이벤트 문장추출 성능을 얻을 수 있음을 알 수 있었다.

  • PDF

The Recognition of Korean Syllables using Parameter Based on Principal Component Analysis (PCA 기반 파라메타를 이용한 숫자음 인식)

  • 박경훈;표창수;김창근;허강인
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.12a
    • /
    • pp.181-184
    • /
    • 2000
  • The new method of feature extraction is proposed, considering the statistic feature of human voice, unlike the conventional methods of voice extraction. PCA(principal Component Analysis) is applied to this new method. PCA removes the repeating of data after finding the axis direction which has the greatest variance in input dimension. Then the new method is applied to real voice recognition to assess performance. When results of the number recognition in this paper and the conventional Mel-Cepstrum of voice feature parameter are compared, there is 0.5% difference of recognition rate. Better recognition rate is expected than word or sentence recognition in that less convergence time than the conventional method in extracting voice feature. Also, better recognition tate is expected when the optimum vector is used by statistic feature of data.

  • PDF

Extraction of ObjectProperty-UsageMethod Relation from Web Documents

  • Pechsiri, Chaveevan;Phainoun, Sumran;Piriyakul, Rapeepun
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1103-1125
    • /
    • 2017
  • This paper aims to extract an ObjectProperty-UsageMethod relation, in particular the HerbalMedicinalProperty-UsageMethod relation of the herb-plant object, as a semantic relation between two related sets, a herbal-medicinal-property concept set and a usage-method concept set from several web documents. This HerbalMedicinalProperty-UsageMethod relation benefits people by providing an alternative treatment/solution knowledge to health problems. The research includes three main problems: how to determine EDU (where EDU is an elementary discourse unit or a simple sentence/clause) with a medicinal-property/usage-method concept; how to determine the usage-method boundary; and how to determine the HerbalMedicinalProperty-UsageMethod relation between the two related sets. We propose using N-Word-Co on the verb phrase with the medicinal-property/usage-method concept to solve the first and second problems where the N-Word-Co size is determined by the learning of maximum entropy, support vector machine, and naïve Bayes. We also apply naïve Bayes to solve the third problem of determining the HerbalMedicinalProperty-UsageMethod relation with N-Word-Co elements as features. The research results can provide high precision in the HerbalMedicinalProperty-UsageMethod relation extraction.

Research on Chinese Microblog Sentiment Classification Based on TextCNN-BiLSTM Model

  • Haiqin Tang;Ruirui Zhang
    • Journal of Information Processing Systems
    • /
    • v.19 no.6
    • /
    • pp.842-857
    • /
    • 2023
  • Currently, most sentiment classification models on microblogging platforms analyze sentence parts of speech and emoticons without comprehending users' emotional inclinations and grasping moral nuances. This study proposes a hybrid sentiment analysis model. Given the distinct nature of microblog comments, the model employs a combined stop-word list and word2vec for word vectorization. To mitigate local information loss, the TextCNN model, devoid of pooling layers, is employed for local feature extraction, while BiLSTM is utilized for contextual feature extraction in deep learning. Subsequently, microblog comment sentiments are categorized using a classification layer. Given the binary classification task at the output layer and the numerous hidden layers within BiLSTM, the Tanh activation function is adopted in this model. Experimental findings demonstrate that the enhanced TextCNN-BiLSTM model attains a precision of 94.75%. This represents a 1.21%, 1.25%, and 1.25% enhancement in precision, recall, and F1 values, respectively, in comparison to the individual deep learning models TextCNN. Furthermore, it outperforms BiLSTM by 0.78%, 0.9%, and 0.9% in precision, recall, and F1 values.

General Relation Extraction Using Probabilistic Crossover (확률적 교차 연산을 이용한 보편적 관계 추출)

  • Je-Seung Lee;Jae-Hoon Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.371-380
    • /
    • 2023
  • Relation extraction is to extract relationships between named entities from text. Traditionally, relation extraction methods only extract relations between predetermined subject and object entities. However, in end-to-end relation extraction, all possible relations must be extracted by considering the positions of the subject and object for each pair of entities, and so this method uses time and resources inefficiently. To alleviate this problem, this paper proposes a method that sets directions based on the positions of the subject and object, and extracts relations according to the directions. The proposed method utilizes existing relation extraction data to generate direction labels indicating the direction in which the subject points to the object in the sentence, adds entity position tokens and entity type to sentences to predict the directions using a pre-trained language model (KLUE-RoBERTa-base, RoBERTa-base), and generates representations of subject and object entities through probabilistic crossover operation. Then, we make use of these representations to extract relations. Experimental results show that the proposed model performs about 3 ~ 4%p better than a method for predicting integrated labels. In addition, when learning Korean and English data using the proposed model, the performance was 1.7%p higher in English than in Korean due to the number of data and language disorder and the values of the parameters that produce the best performance were different. By excluding the number of directional cases, the proposed model can reduce the waste of resources in end-to-end relation extraction.

Deep Neural Architecture for Recovering Dropped Pronouns in Korean

  • Jung, Sangkeun;Lee, Changki
    • ETRI Journal
    • /
    • v.40 no.2
    • /
    • pp.257-265
    • /
    • 2018
  • Pronouns are frequently dropped in Korean sentences, especially in text messages in the mobile phone environment. Restoring dropped pronouns can be a beneficial preprocessing task for machine translation, information extraction, spoken dialog systems, and many other applications. In this work, we address the problem of dropped pronoun recovery by resolving two simultaneous subtasks: detecting zero-pronoun sentences and determining the type of dropped pronouns. The problems are statistically modeled by encoding the sentence and classifying types of dropped pronouns using a recurrent neural network (RNN) architecture. Various RNN-based encoding architectures were investigated, and the stacked RNN was shown to be the best model for Korean zero-pronoun recovery. The proposed method does not require any manual features to be implemented; nevertheless, it shows good performance.

The Conceptual Unit Extraction and Knowledge Base Construction from Korean Sentence (한국어 문장으로부터 개념단위의 추출과 지식베이스의 구축)

  • Han, K.R.;Lee, J.K.
    • Annual Conference on Human and Language Technology
    • /
    • 1989.10a
    • /
    • pp.247-251
    • /
    • 1989
  • 본 논문은 한국어를 대상으로 하는 자연언어 처리 시스템을 개발하는데 있어서 기초가 되는 지식베이스의 구축에 대하여 논한다. 한국어의 일반문에서 단문을 분리해 내기 위하여 형태소 해석의 결과로부터 도출한 구 단위를 한-일 기계번역 시스템의 구문, 의미 해석기(VCPN) 을 적용하여 절단위로 결합한다. 그리고 이들 단위절에 대하여 대명사의 조응관계, 생략에의 재생을 위한 추론, 부정어, 시제일치 등을 처리하여 논리적 지식베이스를 구성하는 방법을 제안한다. 본 논문은 입력문장에 제한을 두지 않고 단문으로부터 장문에 이르기까지 광범위한 일반문을 대상으로 하여 Horn Clause 이론을 확장한다.

  • PDF