• 제목/요약/키워드: related words

검색결과 1,691건 처리시간 0.025초

Style-Specific Language Model Adaptation using TF*IDF Similarity for Korean Conversational Speech Recognition

  • Park, Young-Hee;Chung, Min-Hwa
    • The Journal of the Acoustical Society of Korea
    • /
    • 제23권2E호
    • /
    • pp.51-55
    • /
    • 2004
  • In this paper, we propose a style-specific language model adaptation scheme using n-gram based tf*idf similarity for Korean spontaneous speech recognition. Korean spontaneous speech shows especially different style-specific characteristics such as filled pauses, word omission, and contraction, which are related to function words and depend on preceding or following words. To reflect these style-specific characteristics and overcome insufficient data for training language model, we estimate in-domain dependent n-gram model by relevance weighting of out-of-domain text data according to their n-. gram based tf*idf similarity, in which in-domain language model include disfluency model. Recognition results show that n-gram based tf*idf similarity weighting effectively reflects style difference.

북한의 물리 교육 및 교과서 분석 연구 (An Analysis on Education and Textbooks of Physics in North Korea)

  • 민영기
    • 한국과학교육학회지
    • /
    • 제16권4호
    • /
    • pp.329-339
    • /
    • 1996
  • We examined the science education system in North Korea from the elementary to the high schools. We also analyzed the physics textbooks used in North Korea and compared the results with the textbooks used in South Korea. We compared the goal and system of physics education, and the content, order of study, and volume of the textbooks. Physics education starts at the 4th year at the elementary school, and is taught through the whole school years in North Korea. The science process skills are regarded to be important and figures, tables, problem sets, experiments, and sample solutions are exclusively used in the textbooks. Electomagnetism occupies the largest portion in physics textbooks, but subjects related to the application of physics are more stressed. There are a few subjects which are included in the North Korean textbooks but not in the South Korean textbooks. We have compiled about 60 North Korean physics words which are different from the South Korean words used in the textbooks. Overall, there will be not much difficulty in integrating the physics education system and physics textbooks after the two Koreas are unified.

  • PDF

Semantic Word Categorization using Feature Similarity based K Nearest Neighbor

  • Jo, Taeho
    • Journal of Multimedia Information System
    • /
    • 제5권2호
    • /
    • pp.67-78
    • /
    • 2018
  • This article proposes the modified KNN (K Nearest Neighbor) algorithm which considers the feature similarity and is applied to the word categorization. The texts which are given as features for encoding words into numerical vectors are semantic related entities, rather than independent ones, and the synergy effect between the word categorization and the text categorization is expected by combining both of them with each other. In this research, we define the similarity metric between two vectors, including the feature similarity, modify the KNN algorithm by replacing the exiting similarity metric by the proposed one, and apply it to the word categorization. The proposed KNN is empirically validated as the better approach in categorizing words in news articles and opinions. The significance of this research is to improve the classification performance by utilizing the feature similarities.

건설현장 안전 지적 사항 분석 (Vocabulary Analysis of Safety Warnings in Construction Site)

  • 강경수;류한국
    • 한국건축시공학회:학술대회논문집
    • /
    • 한국건축시공학회 2019년도 추계 학술논문 발표대회
    • /
    • pp.40-41
    • /
    • 2019
  • The purpose of this study is to analyze the vocabulary related to safety accidents based on the reports recorded on the violation of safety rules at the construction sites. We used Word2Vec and Topic Model as natural language processing techniques to analyze the safety accidents presented in the reports of the large enterprise. The words that appeared based on the occupational accident types such as the fall, falling objects, and others were derived and visualized. We derive the frequency and similarity of the words and topics of the accident that occur at the construction site. In future studies, we will be able to proceed with the generation of texts from pictures based on images and this reports.

  • PDF

연결망 분석을 활용한 우리나라 금연연구 동향분석 (A Social Network Analysis of Research Key Words Related Smoke Cessation in South Korea)

  • 안은성
    • 보건행정학회지
    • /
    • 제29권2호
    • /
    • pp.138-145
    • /
    • 2019
  • Background: The purpose of this study is supposed to figure out the keyword network from 2009 to 2018 with social network analysis and provide the research data that can help the Korea government's policy making on smoking cessation. Methods: First, frequency analysis on the keyword was performed. After, in this study, I applied three classic centrality measures (degree centrality, betweenness centrality, and eigenvector centrality) with R 3.5.1. Moreover, I visualized the results as the word cloud and keyword network. Results: As a result of network analysis, 'smoking' and 'smoking cessation' were key words with high frequency, high degree centrality, and betweenness centrality. As a result of looking at trends in keyword, many study had been done on the keyword 'secondhand smoke' and 'adolescent' from 2009 to 2013, and 'cigarette graphic warning' and 'electronic cigarette' from 2014 to 2018. Conclusion: This study contributes to understand trends on smoking cessation study and seek further study with the keyword network analysis.

Newly Extended Audit Report and Cost of Debt: Empirical Evidence from Thailand

  • WUTTICHINDANON, Suneerat;ISSARAWORNRAWANICH, Panya
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제9권4호
    • /
    • pp.261-272
    • /
    • 2022
  • This study examined the association between key audit matters (KAM) and the cost of debt. Financial records and auditors' reports were used to collect data for the fiscal years 2016 and 2017, which were the first two years after KAM was implemented in Thailand. Samples are listed companies in Thailand, where the financial system is primarily debt-based and external auditors play an important role in maintaining financial reporting quality. The final samples for the two-year period consist of 770 observations. The KAM is measured in three aspects: the number of issues, the number of words, and the readability, while the cost of debt is measured by the ratio of interest expense to total debt. The research finds that the KAM readability is significantly and negatively related to the cost of debt. Meanwhile, the number of issues and words have no significant effect on the cost of debt. The finding suggests that auditors' writing skills play a crucial role in the lending decisions of creditors.

An Integrative Literature Review about Sports Participation and Perceived Benefits

  • JEONG, Bong Kyu;YOON, Sang Hoon;SEO, Won Jae
    • Journal of Sport and Applied Science
    • /
    • 제5권2호
    • /
    • pp.55-61
    • /
    • 2021
  • Purpose: This study aims to obtain basic data for conceptual establishment of sports participation and perceived benefits by considering prior research on the effects of sports participation to derive variables for perceived benefits of sports participants. Research design, data, and methodology: This study used an integrated literature review. A conceptual model is designed with reference to a prior study by adopting a guiding theory. Based on the key words. The literature collection was conducted online, and the reference period for the literature collection was for studies published between 2015 and 2020. Results: First, a total of seven related variables were derived from the literature analysis related to sports participation and physical benefits. Second, a total of six variables were derived from the literature analysis related to sports participation and mental benefits. Third, a total of four related variables were derived through a literature analysis related to sports participation and social benefits. Conclusions: Health fitness, objectified body consciousness, and social body shape anxiety, including body composition, approach physical benefits among perceived benefits through sports participation, and physical self-efficacy and physical self-concepts are related to physical benefits but are also shown to be related to mental benefits. And successful aging is seen to be close to social benefits and related to some mental benefits. Mental and social benefits can be seen as the variables derived from the results being related to the benefits, and more in-depth exploration of perceived benefits is needed.

한국어 화자의 영어 어말 폐쇄음 파열의 인지와 발음 연구 (Korean speakers' perception and production of English word-final voiceless stop release)

  • 이보림;이숙향;박천배;강석근
    • 대한음성학회지:말소리
    • /
    • 제38호
    • /
    • pp.41-70
    • /
    • 1999
  • Researches on perception have, in recent years, been increasingly popular as a means of accounting for cross-linguistic sound patterns (Ohala, 1992; Hemming, 1995; Jun, 1995; Steriade, 1997 among others). In loanword phonology, Silverman(1990, 1992) argues that words from a source language are scanned through the perceptual level and that the features perceived by a speaker are stored in the input to be processed according to his/her native language's phonological constraints. The purpose of this paper is to test the validity of Silverman's proposal by examining the correlation between perception and production of Korean learners of English. We specifically focussed on perception and production of stop release by contrasting English loanwords with English words loarned through education to see if there were any significant differences. The results showed that there was no substantive correlation between the Korean speakers' perception of the loanwords pronounced by English speakers and their own production of those words. In the case of English words, however, the Korean speakers' production was closely related with their perception, although some inter-speaker variations were observed. With Optimality Theory (Prince & Smolenksy, 1993) as a theoretical framework of analysis, it was shown that the theory is a useful means of implementing a phonetics-phonology interface and relating perceptual processes with speech production. Specifically, under the assumption that loanwords with [t]~[t/sup h/] alternation (e.g.,'cut') are originally borrowed into Korean as two different input forms, all the alternations could be straightforwardly accounted for in terms of a unified ranking of constraints.

  • PDF

이미지 단어집과 관심영역 자동추출을 사용한 이미지 분류 (Image Classification Using Bag of Visual Words and Visual Saliency Model)

  • 장현웅;조수선
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제3권12호
    • /
    • pp.547-552
    • /
    • 2014
  • 플리커, 페이스북과 같은 대용량 소셜 미디어 공유 사이트의 발전으로 이미지 정보가 매우 빠르게 증가하고 있다. 이에 따라 소셜 이미지를 정확하게 검색하기 위한 다양한 연구가 활발히 진행되고 있다. 이미지 태그들의 의미적 연관성을 이용하여 태그기반의 이미지 검색의 정확도를 높이고자 하는 연구를 비롯하여 이미지 단어집(Bag of Visual Words)을 기반으로 웹 이미지를 분류하는 연구도 다양하게 진행되고 있다. 본 논문에서는 이미지에서 배경과 같은 중요도가 떨어지는 정보를 제거하여 중요부분을 찾는 GBVS(Graph Based Visual Saliency)모델을 기존 연구에 사용할 것을 제안한다. 제안하는 방법은 첫 번째, 이미지 태그들의 의미적 연관성을 이용해 1차 분류된 데이터베이스에 SIFT알고리즘을 사용하여 이미지 단어집(BoVW)을 만든다. 두 번째, 테스트할 이미지에 GBVS를 통해서 이미지의 관심영역을 선택하여 테스트한다. 의미연관성 태그와 SIFT기반의 이미지 단어집을 사용한 기존의 방법에 GBVS를 적용한 결과 더 높은 정확도를 보임을 확인하였다.

LSI를 이용한 차원 축소 클러스터 기반 키워드 연관망 자동 구축 기법 (Automatic Construction of Reduced Dimensional Cluster-based Keyword Association Networks using LSI)

  • 유한묵;김한준;장재영
    • 정보과학회 논문지
    • /
    • 제44권11호
    • /
    • pp.1236-1243
    • /
    • 2017
  • 본 논문은 기존의 TextRank 알고리즘에 상호정보량 척도를 결합하여 군집 기반에서 키워드 추출하는 LSI-based ClusterTextRank 기법과 추출된 키워드를 Latent Semantic Indexing(LSI)을 이용한 연관망 구축 기법을 제안한다. 제안 기법은 문서집합을 단어-문서 행렬로 표현하고, 이를 LSI를 이용하여 저차원의 개념 공간으로 차원을 축소한다. 그 다음 k-means 군집화 알고리즘을 이용하여 여러 군집으로 나누고, 각 군집에 포함된 단어들을 최대신장트리 그래프로 표현한 후 이에 근거한 군집 정보량을 고려하여 키워드를 추출한다. 그리고나서 추출된 키워드들 간에 유사도를 LSI 기법을 통해 구한 단어-개념 행렬을 이용하여 계산한 후, 이를 키워드 연관망으로 활용한다. 제안 기법의 성능을 평가하기 위해 여행 관련 블로그 데이터를 이용하였으며, 제안 기법이 기존 TextRank 알고리즘보다 키워드 추출의 정확도가 약 14% 가량 개선됨을 보인다.