• Title/Summary/Keyword: part of speech

Search Result 433, Processing Time 0.038 seconds

Detecting Errors in POS-Tagged Corpus on XGBoost and Cross Validation (XGBoost와 교차검증을 이용한 품사부착말뭉치에서의 오류 탐지)

  • Choi, Min-Seok;Kim, Chang-Hyun;Park, Ho-Min;Cheon, Min-Ah;Yoon, Ho;Namgoong, Young;Kim, Jae-Kyun;Kim, Jae-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.7
    • /
    • pp.221-228
    • /
    • 2020
  • Part-of-Speech (POS) tagged corpus is a collection of electronic text in which each word is annotated with a tag as the corresponding POS and is widely used for various training data for natural language processing. The training data generally assumes that there are no errors, but in reality they include various types of errors, which cause performance degradation of systems trained using the data. To alleviate this problem, we propose a novel method for detecting errors in the existing POS tagged corpus using the classifier of XGBoost and cross-validation as evaluation techniques. We first train a classifier of a POS tagger using the POS-tagged corpus with some errors and then detect errors from the POS-tagged corpus using cross-validation, but the classifier cannot detect errors because there is no training data for detecting POS tagged errors. We thus detect errors by comparing the outputs (probabilities of POS) of the classifier, adjusting hyperparameters. The hyperparameters is estimated by a small scale error-tagged corpus, in which text is sampled from a POS-tagged corpus and which is marked up POS errors by experts. In this paper, we use recall and precision as evaluation metrics which are widely used in information retrieval. We have shown that the proposed method is valid by comparing two distributions of the sample (the error-tagged corpus) and the population (the POS-tagged corpus) because all detected errors cannot be checked. In the near future, we will apply the proposed method to a dependency tree-tagged corpus and a semantic role tagged corpus.

Analyzing Vocabulary Characteristics of Colloquial Style Corpus and Automatic Construction of Sentiment Lexicon (구어체 말뭉치의 어휘 사용 특징 분석 및 감정 어휘 사전의 자동 구축)

  • Kang, Seung-Shik;Won, HyeJin;Lee, Minhaeng
    • Smart Media Journal
    • /
    • v.9 no.4
    • /
    • pp.144-151
    • /
    • 2020
  • In a mobile environment, communication takes place via SMS text messages. Vocabularies used in SMS texts can be expected to use vocabularies of different classes from those used in general Korean literary style sentence. For example, in the case of a typical literary style, the sentence is correctly initiated or terminated and the sentence is well constructed, while SMS text corpus often replaces the component with an omission and a brief representation. To analyze these vocabulary usage characteristics, the existing colloquial style corpus and the literary style corpus are used. The experiment compares and analyzes the vocabulary use characteristics of the colloquial corpus SMS text corpus and the Naver Sentiment Movie Corpus, and the written Korean written corpus. For the comparison and analysis of vocabulary for each corpus, the part of speech tag adjective (VA) was used as a standard, and a distinctive collexeme analysis method was used to measure collostructural strength. As a result, it was confirmed that adjectives related to emotional expression such as'good-','sorry-', and'joy-' were preferred in the SMS text corpus, while adjectives related to evaluation expressions were preferred in the Naver Sentiment Movie Corpus. The word embedding was used to automatically construct a sentiment lexicon based on the extracted adjectives with high collostructural strength, and a total of 343,603 sentiment representations were automatically built.

A Study about the Users's Preferred Playing Speeds on Categorized Video Content using WSOLA method (WSOLA를 이용한 동영상 미세배속 재생 서비스에 대한 콘텐츠별 배속 선호도 분석 연구)

  • Kim, I-Gil
    • Journal of Digital Contents Society
    • /
    • v.16 no.2
    • /
    • pp.291-298
    • /
    • 2015
  • In a fast-paced information technology environment, consumption of video content is changing from one-way television viewing to VOD (Video on Demand) playing anywhere, anytime, on any device. This video-watching trend gives additional importance to videos with fine-speed-control, in addition to the strength of the digital video signal. Currently, many video players provide a fine-speed-control function which can speed up the video to skip a boring part, or slow it down to focus on an exciting scene. The audio information is just as important as the visual information for understanding the content of the speed-controlled video. Thus, a number of algorithms for fine-speed-control video-playing technologies have been proposed to solve the pitch distortion in the audio-processing area. In this study, well-known techniques for prosodic modification of speech signals, WSOLA (Waveform-Similarity-Based Overlap-Add), have been applied to analyze users' needs for fine-speed-control video playing. By surveying the users' preferred speeds on categorized video content and analyzing the results, this paper proposes that various fine-speed adjustments are needed to accommodate users' preferred video consumption.

A Study on the Degree of Physical, Psychological and Social Adaptation of CVA Patients (뇌졸중(腦卒中) 환자(患者)의 신체적(身體的).심리적(心理的).사회적(社會的) 적응도(適應度)에 관(關)한 연구(硏究))

  • Hwang Hyun-Sook;Park Kyung-Sook
    • Journal of Korean Academy of Fundamentals of Nursing
    • /
    • v.3 no.2
    • /
    • pp.213-233
    • /
    • 1996
  • This study was made on 274 apoplectics patients who received the rehabilitation therapy and tests on physical, psychological and social adaptations as outpatients in 23 general hospitals in the Seoul and Kyungi area. The basic data on degree of improvement of apoplectic patients studied from rehabilitation therapy. Data was collected over a period of 63 days, from February 21st till April, 23, 1996. The assigned physical therapist conducted direct interviews with patients after he answered the distributed questionnaires for each individual patient. The colleted data was processed by the $SPCC/C^+$ method. The results of the tests conducted to meascne the the degree of ADL dependency, depression and social activity corresponding to the physical, psychological, and social adaptation. The details are ; 1) The test to meascne the degree of ADL dependency, corresponding to the study of physical adaptation of CVA patients, indicated a mean score of 2.57(ideal score is 1.0) with a standard deviation of ${\pm}0.75$. The worst score was 3.95 while the best score was a perfect 1.0, representing a severe range of dependency. The distribution was centered with a median of 2.65 and a mode of 2.68. 2) The test to meascne the degree of depression which corresponds to the level of psychological adaptation yielded a mean of 2.99 which is higher than the normal limit of 2.45. The standard deviation was ${\pm}0.52$ and the worst score and the best score were 4.35 and Respectirdy. The distribution was centered with a median of 3.00 and a mode of 3.00. 3) The test to meascne the degree of social activities for the level of social adaptation indicated a very low mean score of 26.52 (perfect score is 144), with the standard deviation of ${\pm}16.23$. Some patients scored as high as 100, but others scored as low as 3. The distribution of social activities at a very low level was shifted to the left with a median of 24.00 and a mode of 20.00. 4) Factors influencing the level of physical, psychological and social adaptation are as follows : Factors significantly influencing the level of physical adaptation measured by ADL dependency are age, personal guardian, payer of medical expenses, and paralysis of the right arm, right leg and facial paralysis. Factors significantly influencing the level of psychological adaptation measured by the degree of depression, are age, marital status, education, medical history of individual and family, speech impediment, and facial paralysis. Factors significantly influencing the level of social adaptation measured by the degree of social activity are age, marital status, education, employment status, and the burden of medical expense. 5) The Corelationship is significant(9.00), between ADL dependeing as degree of physical adaptation and depreseion as degree of psychologial adaptation. ADL dependency is proportional to depression. But social activity is inversely protional to ADL dependeny and depression. In conclusion, the increased care for physical function of the patients is not the only necessary means to better facilitate the appropriate adaptation of CVA patients. The introduction of a solid rehabilitation program for psychological and social adaptation will also play the integral part of the treatment of CVA patients.

  • PDF

TOOTH MOVEMENTS TO THE SITE OF ALVEOLAR BONE GRAFT (구순구개열 환아에서의 치조골이식)

  • Cho, Hae-Sung;Park, Jae-Hong;Kim, Gwang-Chul;Choi, Seong-Chul;Lee, Keung-Ho;Choi, Yeung-Chul
    • Journal of the korean academy of Pediatric Dentistry
    • /
    • v.34 no.1
    • /
    • pp.140-149
    • /
    • 2007
  • Cleft lip and palate are congenital craniofacial malformation. Reconstruction of dental arch in patient with alveolo-palatal clefts is very important, because they have many problems in functions and esthetics. Malnutrition, poor oral hygiene, respiratory infections, speech malfunctions, maxillofacial deformity, and psychological problems may be occured without proper treatment during the long period of management of the cleft lip and palate. So the treatment should be managed with a multidisciplinary approach. Bone grafting is a consequential step in the dental rehabilitation of the cleft lip and palate patient A complete alveolar arch should be achieyed of the teeth to erupt in and to form a stable dentition. And the presence of the cleft complicate the orthodontic treatment. Therefore bone grafting in patients with cleft lip and palate is a widely adopted surgical procedure. Grafted bone stabilizes the alveolar process and allows the canine or incisor to move into the graft site. After the bone grafting, orthodontic closure of the maxillary arch has become a common practice for achieving dental reconstruction without any prosthodontic treatment. Various grafting materials have been used in alveolar clefts. Iliac bone is most widely fovoured, but tibia, rib, cranial bone, mandible have also been used. And according to its time of occurrence, the bone graft may be divided into primary, early secondary, secondary, late secondary. Bone grafting is called secondary when performed later, at the end of the mixed dentition. It is the most accepted procedure and has become part of treatment of protocol A secondary bone graft is performed preferably before the eruption of the permanent canine in order to provide adequate periodontal support for the eruption and preservation of the teeth adjacent to the cleft. In this report, we report here on a patient with unilateral cleft lip and palate, who underwent iliac bone graft. The cleft was fully obliterated by grafted bone in the region of the alveolar process. The presence of bone permitted physiologic tooth movement and the orthodontic movement of adjacent tooth into the former cleft area. Satisfactory arch alignment could be achieved in by subsequent orthodontic treatment.

  • PDF

Evaluations of Chinese Brand Name by Different Translation Types: Focusing on The Moderating Role of Brand Concept (영문 브랜드네임의 중문 브랜드네임 전환 방식에 대한 중화권 소비자들의 브랜드 평가에 관한 연구 -브랜드컨셉의 조절효과를 중심으로-)

  • Lee, Jieun;Jeon, Jooeon;Hsiao, Chen Fei
    • Asia Marketing Journal
    • /
    • v.12 no.4
    • /
    • pp.1-25
    • /
    • 2011
  • Brand names are often considered as a part of product and important extrinsic cues of product evaluation, when consumers make purchasing decisions. For a company, brand names are also important assets. Building a strong brand name in the Chinese commonwealth is a main challenge for many global companies. One of the first problem global company has to face is how to translate English brand name into Chinese brand name. It is very difficult decision because of cultural and linguistic differences. Western languages are based on an alphabet phonetic system, whereas Chinese are based on ideogram. Chinese speakers are more likely to recall stimuli presented as brand names in visual rather than spoken recall, whereas English speakers are more likely to recall the names in spoken rather than in visual recall. We interpret these findings in terms of the fact that mental representations of verbal information in Chinese are coded primarily in a visual manner, whereas verbal information in English is coded by primarily in a phonological manner. A key linguistic differences that would affect the decision to standardize or localize when transferring English brand name to Chinese brand name is the writing system. Prior Chinese brand naming research suggests that popular Chinese naming translations foreign companies adopt are phonetic, semantic, and phonosemantic translation. The phonetic translation refers to the speech sound that is produced, such as the pronunciation of the brand name. The semantic translation involves the actual meaning of and association made with the brand name. The phonosemantic translation preserves the sound of the brand name and brand meaning. Prior brand naming research has dealt with word-level analysis in examining English brand name that are desirable for improving memorability. We predict Chinese brand name suggestiveness with different translation methods lead to different levels of consumers' evaluations. This research investigates the structural linguistic characteristics of the Chinese language and its impact on the brand name evaluation. Otherwise purpose of this study is to examine the effect of brand concept on the evaluation of brand name. We also want to examine whether the evaluation is moderated by Chinese translation types. 178 Taiwanese participants were recruited for the research. The following findings are from the empirical analysis on the hypotheses established in this study. In the functional brand concept, participants in Chinese translation by semantic were likely to evaluate positively than Chinese translation by phonetic. On the contrary, in the symbolic brand concept condition, participants in Chinese translation by phonetic evaluated positively than by semantic. And then, we found Chinese translation by phonosemantic was most favorable evaluations regardless of brand concept. The implications of these findings are discussed for Chinese commonwealth marketers with respect to brand name strategies. The proposed model helps companies to effectively select brand name, making it highly applicable for academia and practitioner. name and brand meaning. Prior brand naming research has dealt with word-level analysis in examining English brand name that are desirable for improving memorability. We predict Chinese brand name suggestiveness with different translation methods lead to different levels of consumers' evaluations. This research investigates the structural linguistic characteristics of the Chinese language and its impact on the brand name evaluation. Otherwise purpose of this study is to examine the effect of brand concept on the evaluation of brand name. We also want to examine whether the evaluation is moderated by Chinese translation types. 178 Taiwanese participants were recruited for the research. The following findings are from the empirical analysis on the hypotheses established in this study. In the functional brand concept, participants in Chinese translation by semantic were likely to evaluate positively than Chinese translation by phonetic. On the contrary, in the symbolic brand concept condition, participants in Chinese translation by phonetic evaluated positively than by semantic. And then, we found Chinese translation by phonosemantic was most favorable evaluations regardless of brand concept. The implications of these findings are discussed for Chinese commonwealth marketers with respect to brand name strategies. The proposed model helps companies to effectively select brand name, making it highly applicable for academia and practitioner.

  • PDF

The Operation Plan of the Community-Linked Extracurricular Education program for Lifelong Education for the Persons with Disabilities Based on the Memorandum of Understanding (MOU) of Extracurricular Education between Chosun University and Daegu University (조선대학교-대구대학교 비교과 교육 업무협약(MOU) 기반 지역 연계 장애인평생교육 비교과프로그램 운영 방략)

  • Kim, Young-Jun;Kim, Wha-Soo;Rhee, Kun-Yong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.2
    • /
    • pp.273-280
    • /
    • 2022
  • Based on the MOU between Chosun University and Daegu University, this study was conducted with the aim of exploring the operation strategy of a extracurricular education program on the theme of lifelong education for the disabled in community connection. In front-line university sites, extracurricular education programs are often recognized as forms and procedures to assist in subject learning at the major or liberal arts level, but they have a very important status and identity considering that they are classified as "learning competency reinforcement support", "career psychological counseling support", "employment and start-up support", "subject-linked extracurricular education". Accordingly, the extracurricular education programs has the nature and advantage of covering not only the level of the one-time trend program itself, but also various community -linked problem-solving learning, including students' major learning and employment linkage. As part of the above, this study aims to present a strategy for the operation of a extracurricular education programs with the main theme and content of "lifelong education for the disabled" by viewing Chosun University and Daegu University. The contents of the study were largely presented as "organizational operation strategy between two universities," "operation strategy of curriculum between two universities," and "comprehensive system for extracurricular education programs operation of lifelong education for the disabled between the two universities". First, the first research content, "Organized Operation Strategy between Two Universities," was schematized in detail the process of collaborating and communicating with Chosun University's center of extracurricular activities, Daegu University Lifelong Education Center, and other committees and departments. The second research content, "The Curriculum Operation Strategy between Two Universities", is a detailed schematic diagram of the learning contents, methods, and procedures to be organized in the extracurricular education program. The third study, "Comprehensive System of extracurricular education program Operation for Lifelong Education for the Disabled between Two Universities," presents the results of synthesizing the basis elements essential for operating the extracurricular education program at the level of a roadmap. As a result of the study, it was possible to see the project tasks that could be promoted in-depth through the operation of a extracurricular education program on lifelong education for the disabled through the MOU between the two universities.

Effect of Noise in Human Body (소음이 인체에 미치는 영향)

  • 이영노
    • Proceedings of the KOR-BRONCHOESO Conference
    • /
    • 1972.03a
    • /
    • pp.7-8
    • /
    • 1972
  • The effects of noise exposure are of two types: Nonauditory effects and auditory effects. Nonauditory effects of noise exposure are interference with communication by speech, sleeping and emotional behavior. The noise will cause the high blood pressure and rapid pulse, also that decrease the salivation and gastric juice. in experimentaly showed that the Corticoid hormon: Gonatotropic hormone were decrease and Thyrotropic hormoone is increase. Auditory effect of noise exposure. when the normal ear is exposed to noise at noise at hamful intensities (above 90㏈) for sufficiently long periods of time, a temoral depression of hearing results, disappearing after minutes or hours of rest. When the exposure longer or intesity greater is reached the Permanent threshold shift called noise-induced hearing loss. Hearing loss resulting from noise exposure presents legal as well as medical problems. The otologist who examines and evaluates the industrial hearing loss cases must be properly informed, not only concerning the otologic but also about the physical and legal aspects of the problems. The measurement of hearing ability is the most important part of a hearing conservation, both preplacement and periodic follow-up tests of hearing. The ideal hearing conservation program would be able to reduce or eliminate the hazardous noise at its source or by acoustic isolation of noisy working area and two ear protections (plugs and muff type) were developed for personal protection.

  • PDF

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.