• Title/Summary/Keyword: similar relationship between sentences

Search Result 9, Processing Time 0.029 seconds

A Study on Rhythmic Units in Korean -with Respect to Syntactic Structure- (한국어의 리듬 단위에 관한 연구 - 문법 구조와 관련하여)

  • Kim, Sun-Mi
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.224-228
    • /
    • 1996
  • This paper is intended as a study on how an utterance is divided into rhythmic units in Standard Korean with respect to its syntactic structure. With respect to the data in this study I used 150 sentences which contained similar number of words and various syntactic structures. Those sentences were read by 7 speakers of Seoul dialect in a conversation style. Each sentence was read twice in a normal speed and twice in a fast speed. As a total, 4200 sentences were recorded. Then listening to them, the author marked the sentences with two kinds of boundaries i.e. strong and weak. To explore the relationship between rhythmic units and syntactic structure I devised a framework of grammatical symbols. Each symbol is designed to have both syntactic and morphological information at the same time. So I assigned those grammatical symbols to the sentences. Having sentences marked with grammatical symbols on the one hand, and with the rhythmic boundaries on the other hand, 1 could show the relationship between rhythmic units and syntactic structure; which syntactic structures are likely to be pronounced as one rhythmic unit, and which are on the rhythmic boundaries.

  • PDF

Multi-source information integration framework using self-supervised learning-based language model (자기 지도 학습 기반의 언어 모델을 활용한 다출처 정보 통합 프레임워크)

  • Kim, Hanmin;Lee, Jeongbin;Park, Gyudong;Sohn, Mye
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.141-150
    • /
    • 2021
  • Based on Artificial Intelligence technology, AI-enabled warfare is expected to become the main issue in the future warfare. Natural language processing technology is a core technology of AI technology, and it can significantly contribute to reducing the information burden of underrstanidng reports, information objects and intelligences written in natural language by commanders and staff. In this paper, we propose a Language model-based Multi-source Information Integration (LAMII) framework to reduce the information overload of commanders and support rapid decision-making. The proposed LAMII framework consists of the key steps of representation learning based on language models in self-supervsied way and document integration using autoencoders. In the first step, representation learning that can identify the similar relationship between two heterogeneous sentences is performed using the self-supervised learning technique. In the second step, using the learned model, documents that implies similar contents or topics from multiple sources are found and integrated. At this time, the autoencoder is used to measure the information redundancy of the sentences in order to remove the duplicate sentences. In order to prove the superiority of this paper, we conducted comparison experiments using the language models and the benchmark sets used to evaluate their performance. As a result of the experiment, it was demonstrated that the proposed LAMII framework can effectively predict the similar relationship between heterogeneous sentence compared to other language models.

Attention-based word correlation analysis system for big data analysis (빅데이터 분석을 위한 어텐션 기반의 단어 연관관계 분석 시스템)

  • Chi-Gon, Hwang;Chang-Pyo, Yoon;Soo-Wook, Lee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.27 no.1
    • /
    • pp.41-46
    • /
    • 2023
  • Recently, big data analysis can use various techniques according to the development of machine learning. Big data collected in reality lacks an automated refining technique for the same or similar terms based on semantic analysis of the relationship between words. Since most of the big data is described in general sentences, it is difficult to understand the meaning and terms of the sentences. To solve these problems, it is necessary to understand the morphological analysis and meaning of sentences. Accordingly, NLP, a technique for analyzing natural language, can understand the word's relationship and sentences. Among the NLP techniques, the transformer has been proposed as a way to solve the disadvantages of RNN by using self-attention composed of an encoder-decoder structure of seq2seq. In this paper, transformers are used as a way to form associations between words in order to understand the words and phrases of sentences extracted from big data.

Improving the effectiveness of document extraction summary based on the amount of sentence information (문장 정보량 기반 문서 추출 요약의 효과성 제고)

  • Kim, Eun Hee;Lim, Myung Jin;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.11 no.3
    • /
    • pp.31-38
    • /
    • 2022
  • In the document extraction summary study, various methods for selecting important sentences based on the relationship between sentences were proposed. In the Korean document summary using the summation similarity of sentences, the summation similarity of the sentences was regarded as the amount of sentence information, and the summary sentences were extracted by selecting important sentences based on this. However, the problem is that it does not take into account the various importance that each sentence contributes to the entire document. Therefore, in this study, we propose a document extraction summary method that provides a summary by selecting important sentences based on the amount of quantitative and semantic information in the sentence. As a result, the extracted sentence agreement was 58.56% and the ROUGE-L score was 34, which was superior to the method using only the combined similarity. Compared to the deep learning-based method, the extraction method is lighter, but the performance is similar. Through this, it was confirmed that the method of compressing information based on semantic similarity between sentences is an important approach in document extraction summary. In addition, based on the quickly extracted summary, the document generation summary step can be effectively performed.

Transformer-based Language Recognition Technique for Big Data (빅데이터를 위한 트랜스포머 기반의 언어 인식 기법)

  • Hwang, Chi-Gon;Yoon, Chang-Pyo;Lee, Soo-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.267-268
    • /
    • 2022
  • Recently, big data analysis can use various techniques according to the development of machine learning. Big data collected in reality lacks an automated refining technique for the same or similar terms based on semantic analysis of the relationship between words. Big data is usually in the form of sentences, and morphological analysis or understanding of the sentences is required. Accordingly, NLP, a technique for analyzing natural language, can understand the relationship of words and sentences. In this paper, we study the advantages and disadvantages of Transformers and Reformers, which are techniques that complement the disadvantages of RNN, which is a time series approach to big data.

  • PDF

A Diagrammatic Analysis of Dress (복식의 도해적 분석 -언어학 이론을 적용하여-)

  • 한명숙
    • The Research Journal of the Costume Culture
    • /
    • v.2 no.2
    • /
    • pp.317-335
    • /
    • 1994
  • The purpose of this thesis is to analyze the relation of the mentalistics linguistics and costume, and to clarify the relationship between language as psychological expressions and costume to understand the phenomena of human costume behavior more deeply and comprehensively. As for the analysis of he costume phenomena, Noam Chomsky's psychological linguistic theory were applied to costume system. In this respect, particularly, by means of Chomsky's later theory(1965), the costume behavior were analyzed. The followings are the findings of the analysis : 1. The Syntactic Component: (※ See Full-Text) 2. The Semantic Component. The costme behavior is similar to a language system. Just as one morpheme or a phrase and phrases make different sentences, so various methods wearing costume make different sentences, so various methods wearing costume make different forms is costume. Language and costume have dictionary entry showing the meaning of vocabulary, and rules combining the individual meaning of the dictionary entry to complete the sentence.

  • PDF

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

『황제내경소문(黃帝內經素問)·칠편대론(七篇大論)』 왕빙 주본(注本)을 통(通)한 운기학설(運氣學說) 관(關)한 연구(硏究)

  • Kim, Gi-Uk;Park, Hyeon-Guk
    • The Journal of Dong Guk Oriental Medicine
    • /
    • v.4
    • /
    • pp.109-140
    • /
    • 1995
  • As we considered in the main subjects, investigations on the theory of 'Doctrine on five elements' motion and six kinds of natural factors(運氣學說)' through 'Wang Bing's Commentary(王氷 注本)' of 'The seven great chapters in The Yellow Emperor's Internal Classic Su Wen' ("黃帝內經素問 七篇大論") are as follows. (1) In The seven great chapters("七篇大論")' Wang Bing supplement theory and in the academic aspects as a interpreter, judging from 'forget(亡)' character. expressed in the 'The missing chapters("素問遺篇")', 'Bonbyung-ron("本病論")' and 'Jabeob-ron(刺法論)', 'The seven great chapters("七篇大論")' must be supplementary work by Wang Bing. Besides, he quoted such forty books as medical books, taoist books, confucianist books, miscellaneous books, etc in the commentary and the contents quoted in the 'Su Wen(素問)' and 'Ling Shu("靈樞")' scripture nearly occupy in the book. As a method of interpreting scripiure as scripture, he edited the order of 'Internal Classic("內經")' ascended from the ancient time and when he compensated for commentary, with exhaustive scholarly mind and by observing the natural phenomena practically and writing the pathology and the methods of treatment. We knew that the book is combined with the study of 'Doctrine on five elements motion and six kinds of natural factors(運氣學說)' (2) When we compare, analyze the similar phrase of 'The seven great chapters in The Yellow Emperor's Internal Classic Su Wen'("黃帝內經素問ㆍ七篇大論") through 'Wang Bing's Commentary(王氷 注本)', he tells abouts organized 'five elements(五行)' and 'heaven's regularly movement(天道運行)' rather than 'Emyangengsangdae-ron("陰陽應象大論")' in 'The seven great chapters("七篇大論")'. Also the 'Ohanunhangdae-ron("五運行大論")' because the repeated sentences with 'Emyangengsangdae-ron("陰陽應象大論")' is long they are omitted. And in the 'Youkmijidae-ron("六微旨大論")', 'Cheonjin ideology(天眞四象)' based on the 'Sanggocheonjin- ron("上古天眞論")', 'Sagijosindae-ron("四氣調神大論")' is written and in the 'Gigoupyondae-ron("氣交變大論")', the syndrome and symptom are explained in detail rather than 'Janggibeobsi-ron("藏氣法時論")', 'Okgijinjang-ron ("玉機眞藏論")' and in the 'Osangieongdae-ron("五常政大論")', the concept of 'five element(五行)' of the 'Gemgwejineon-ron("金櫃眞言論")' is expanded to 'the five elements' motion concept(五運槪念)' and in the 'Youkwonjeonggidae-ron("六元正紀大論")', explanations of 'The five elements' motion and six kinds of natural factors(運氣)' function are mentioned mainly and instead systematic pathology is not revealed rather than 'Emyangengsangdae-ron("陰陽應象大論")'. And in the 'Jijinyodae-ron("至眞要大論")', explanations of the change of atmosphere which correspond to treatment principle by 'The three Yin and Yang(三陰三陽)' as a progressed concepts are revealed. Therefore there are much similarity between the phrase of 'Emyangengsangdae-ron("陰陽應象大論")' and 'chapters of addition(補缺之篇)'. Generally, the doctrine which 'The seven great chapters("七篇大論")' are added by Wang Bing(王氷) is supported because there are more profound concepts rather than the other chapter in 'The seven great chapters("七篇大論")'. (3) When we study Wang Bing's(王氷) 'Pattern on five elements motion and six kinds of natural factors(運氣格局)' in 'The seven great chapter("七篇大論")', in the 'Cheonwongi-dae-ron("天元紀大論")', With 'Cheonjin ideology(天眞思想)' and the concepts of 'Owang(旺)'${\cdot}$'Sang(相)'${\cdot}$'Sa(死)'${\cdot}$'Su(囚)'${\cdot}$'Hu(休)' and 'Cheonbu(天符)'${\cdot}$'Sehwoi(歲會)' are measured time-spacially to the concept of 'Three Sum(三合)' the concept of 'Taeulcheonbu(太乙天符)' is explained. In the 'Ounhangdae-ron("五運行大論")', 'The calender Signs five Sum(天干五合)' is compared to the concepts of 'couples(夫婦)', 'weak-strong(柔强)' and in the 'Youkmijidae-ron("六微旨大論")', 'the relationship of obedience and disobedience(順逆關係)' which conform to the 'energy status(氣位)' change and 'monarch-minister(君相)' position is mentioned. In the 'Gikyobyeondae-ron("氣交變大論")', the concept of 'Sang-duk(相得)', 'Pyungsang(平常)' is emphasized but concrete measurement is mentioned. In the 'Osangieongdae-ron("五常政大論")', the detailed explanation with twenty three 'systemic of the five elements' motion(五運體系)' form and 'rountine-contrary treatment(正治. 反治)' with 'chill-fever-warm-cold(寒${\cdot}$${\cdot}$${\cdot}$凉)' are mentioned according to the 'analyse and differentiate pathological conditions in accordance with the eight principal syndromes(八綱辨證)'. In the 'Youkwonjeonggidae-ron("六元正紀大論")', Wang Bing of doesn't mention the concepts of 'Jungwun(中運)' that is seen in the original classic. In the new corrective edition, as the concepts of 'Jungwun, Dongcheonbu, Dongsehae and Taeulcheonbu(中運, 同天符, 同歲會, 太乙天符)' is appeared, Wang Bing seems to only use the concepts of 'Daewun, Juwun, and Gaekwun(大運, 主運, 客運)'. In the 'Jijinyodaeron("至眞要大論")', Wang Bing added detailed commentary to pathology and treatment doctrine by explaining the numerous appearances of 'Sebo, sufficiency, deficiency(歲步, 有餘, 不足)' and in the relation of 'victory-defeat(勝復)', he argued clearly that it is not mechanical estimation. (4) When we observe the Wang Bing's originality on the study of 'the theory of Doctrine on five elements' motion and six kinds of natural factors(運氣學說)', he emphasized 'The idea of Jeongindogi and Health preserving(全眞導氣${\cdot}$養生思想)' by adding 'Wang Bing's Commentary(王氷 注本)' of 'The seven great chapters("七篇大論")' and explained clearly 'The theory of Doctrine on five elements' motion and six kinds of natural factors(運氣學說)' and simpled and expanded the meaning of 'man, as a microcosm, is connected with the macrocosm(天人相應)' and with 'Atmosphere theory(大氣論)' also explained the meaning of 'rising and falling mechanism(升降氣機)'. In the sentence of 'By examining the pathology, take care of your health(審察病機 無失氣宜)'. he explained the meaning of pathology of 'heart-kidney-water-fire(心腎水火)' and suggested the doctrine and management of prescription. In the estimation and treatment, by suggesting 'asthenia and sthenia(虛實)' two method's estimation, 'contrary treatment(反治)' and treatment principals of 'falling heart fire tonifyng kidney water(降心火益腎水)', 'two class of chill and fever(寒熱二綱)' were demonstrated. There are 'inside and outside in the illness and so inner and outer in the treatment(病有中外 治有表囊)'. This sentence suggests concertedly. 'two class of superfies and interior(表囊二綱)' conforming to the position of disease. Therefore Wang Bing as an excellent theorist and introduced 'Cheoniin ideology(天眞思想)' as a clinician and realized the medical science. With these accomplishes mainly written in 'The theory of Doctrine on five elements' motion and six kinds of natural factors(運氣學說)' of 'The seven great chapters("七篇大論")', he interpreted the ancient medical scriptures and expanded the meaning of scriptures and conclusively contributed to the development of the study 'Korean Oriental Medicine(韓醫學)'.

  • PDF