• 제목/요약/키워드: Language Models

검색결과 884건 처리시간 0.032초

Annotation of a Non-native English Speech Database by Korean Speakers

  • Kim, Jong-Mi
    • 음성과학
    • /
    • 제9권1호
    • /
    • pp.111-135
    • /
    • 2002
  • An annotation model of a non-native speech database has been devised, wherein English is the target language and Korean is the native language. The proposed annotation model features overt transcription of predictable linguistic information in native speech by the dictionary entry and several predefined types of error specification found in native language transfer. The proposed model is, in that sense, different from other previously explored annotation models in the literature, most of which are based on native speech. The validity of the newly proposed model is revealed in its consistent annotation of 1) salient linguistic features of English, 2) contrastive linguistic features of English and Korean, 3) actual errors reported in the literature, and 4) the newly collected data in this study. The annotation method in this model adopts the widely accepted conventions, Speech Assessment Methods Phonetic Alphabet (SAMPA) and the TOnes and Break Indices (ToBI). In the proposed annotation model, SAMPA is exclusively employed for segmental transcription and ToBI for prosodic transcription. The annotation of non-native speech is used to assess speaking ability for English as Foreign Language (EFL) learners.

  • PDF

Semi-CRF or Linear-Chain CRF? 한국어 형태소 분할 및 품사 태깅을 위한 결합 모델 비교 (Semi-CRF or Linear-chain CRF? A Comparative Study of Joint Models for Korean Morphological Analysis and POS Tagging)

  • 나승훈;김창현;김영길
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2013년도 제25회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.9-12
    • /
    • 2013
  • 본 논문에서는 한국어 형태소 분할 및 품사 태깅 방법을 위한 결합 모델로 Semi-CRF와 Linear-chain CRF에 대한 초기 비교 실험을 수행한다. Linear-chain방법은 출력 레이블을 형태소 분할 정보와 품사 태그를 조합함으로써 결합을 시도하는 방식이고, Semi-CRF는 출력의 구조가 분할과 태깅 정보를 동시에 포함하도록 표현함으로써, 디코딩 과정에서 분할과 태깅을 동시에 수행하는 방법이다. Sejong품사 부착말뭉치에서 비교결과 Linear-chain방법이 Semi-CRF방법보다 우수한 성능을 보여주었다.

  • PDF

Applications of Machine Learning for Online Learning Systems towards Children with Speech Disorders

  • Jadi, Amr;Alzahrani, Ali
    • International Journal of Computer Science & Network Security
    • /
    • 제22권8호
    • /
    • pp.55-60
    • /
    • 2022
  • Specific Language Impairment is one of the serious disorders that interferes with spontaneous communication skills in children. Children suffering from this disorder may have reading, speaking, or listening impairments, and such type of disorders are also termed Autism Speech Disorder (ASD) in medical terminology. The aim of the article is to define specific language impairment in children and the problems it can cause. The different methods adopted by speech pathologists to diagnose language impairment. Finally implementing machine learning models to automate the process and help speech pathologists and pediatricians/ in diagnosing the specific language impairment.

Plug and Play Language Model을 활용한 대화 모델의 독성 응답 생성 감소 (Reducing Toxic Response Generation in Conversational Models using Plug and Play Language Model)

  • 김병주;이근배
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2021년도 제33회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.433-438
    • /
    • 2021
  • 대화 시스템은 크게 사용자와 시스템이 특정 목적 혹은 자유 주제에 대해 대화를 진행하는 것으로 구분된다. 최근 자유주제 대화 시스템(Open-Domain Dialogue System)에 대한 연구가 활발히 진행됨에 따라 자유 주제를 기반으로 하는 상담 대화, 일상 대화 시스템의 독성 발화 제어 생성에 대한 연구의 중요성이 더욱 커지고 있다. 이에 본 논문에서는 대화 모델의 독성 응답 생성을 제어하기 위해 일상 대화 데이터셋으로 학습된 BART 모델에 Plug-and-Play Language Model 방법을 적용한다. 공개된 독성 대화 분류 데이터셋으로 학습된 독성 응답 분류기를 PPLM의 어트리뷰트(Attribute) 모델로 활용하여 대화 모델의 독성 응답 생성을 감소시키고 그 차이를 실험을 통해 정량적으로 비교한다. 실험 결과 어트리뷰트 모델을 활용한 모든 실험에서 독성 응답 생성이 감소함을 확인하였다.

  • PDF

거대 언어 모델(LLM)을 이용한 비훈련 이진 감정 분류 (Utilizing Large Language Models for Non-trained Binary Sentiment Classification)

  • 안형진;황태욱;정상근
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2023년도 제35회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.66-71
    • /
    • 2023
  • ChatGPT가 등장한 이후 다양한 거대 언어 모델(Large Language Model, LLM)이 등장하였고, 이러한 LLM을 목적에 맞게 파인튜닝하여 사용할 수 있게 되었다. 하지만 LLM을 새로 학습하는 것은 물론이고, 단순 튜닝만 하더라도 일반인은 시도하기 어려울 정도의 많은 컴퓨팅 자원이 필요하다. 본 연구에서는 공개된 LLM을 별도의 학습 없이 사용하여 zero-shot 프롬프팅으로 이진 분류 태스크에 대한 성능을 확인하고자 했다. 학습이나 추가적인 튜닝 없이도 기존 선학습 언어 모델들에 준하는 이진 분류 성능을 확인할 수 있었고, 성능이 좋은 LLM의 경우 분류 실패율이 낮고 일관적인 성능을 보여 상당히 높은 활용성을 확인하였다.

  • PDF

A Survey Study on Standard Security Models in Wireless Sensor Networks

  • 이상호
    • 중소기업융합학회논문지
    • /
    • 제4권4호
    • /
    • pp.31-36
    • /
    • 2014
  • Recent advancement in Wireless Sensor Networks (WSNs) has paved the way for WSNs to enable in various environments in monitoring temperature, motion, sound, and vibration. These applications often include the detection of sensitive information from enemy movements in hostile areas or in locations of personnel in buildings. Due to characteristics of WSNs and dealing with sensitive information, wireless sensor nodes tend to be exposed to the enemy or in a hazard area, and security is a major concern in WSNs. Because WSNs pose unique challenges, traditional security techniques used in conventional networks cannot be applied directly, many researchers have developed various security protocols to fit into WSNs. To develop countermeasures of various attacks in WSNs, descriptions and analysis of current security attacks in the network layers must be developed by using a standard notation. However, there is no research paper describing and analyzing security models in WSNs by using a standard notation such as The Unified Modeling Language (UML). Using the UML helps security developers to understand security attacks and design secure WSNs. In this research, we provide standard models for security attacks by UML Sequence Diagrams to describe and analyze possible attacks in the three network layers.

  • PDF

Exploring the feasibility of fine-tuning large-scale speech recognition models for domain-specific applications: A case study on Whisper model and KsponSpeech dataset

  • Jungwon Chang;Hosung Nam
    • 말소리와 음성과학
    • /
    • 제15권3호
    • /
    • pp.83-88
    • /
    • 2023
  • This study investigates the fine-tuning of large-scale Automatic Speech Recognition (ASR) models, specifically OpenAI's Whisper model, for domain-specific applications using the KsponSpeech dataset. The primary research questions address the effectiveness of targeted lexical item emphasis during fine-tuning, its impact on domain-specific performance, and whether the fine-tuned model can maintain generalization capabilities across different languages and environments. Experiments were conducted using two fine-tuning datasets: Set A, a small subset emphasizing specific lexical items, and Set B, consisting of the entire KsponSpeech dataset. Results showed that fine-tuning with targeted lexical items increased recognition accuracy and improved domain-specific performance, with generalization capabilities maintained when fine-tuned with a smaller dataset. For noisier environments, a trade-off between specificity and generalization capabilities was observed. This study highlights the potential of fine-tuning using minimal domain-specific data to achieve satisfactory results, emphasizing the importance of balancing specialization and generalization for ASR models. Future research could explore different fine-tuning strategies and novel technologies such as prompting to further enhance large-scale ASR models' domain-specific performance.

A Study on Applying a Consistent UML Model to Naval Combat System Software Using Model Verification System

  • Jung, Seung-Mo;Lee, Woo-Jin
    • 한국컴퓨터정보학회논문지
    • /
    • 제27권5호
    • /
    • pp.109-116
    • /
    • 2022
  • 최근 대규모 소프트웨어 개발하는 데 있어 불명확한 의사소통을 해결하기 위해 가독성이 높은 표준화된 UML(Unified Modeling Language) 모델 중심의 모델 기반 개발 방법이 적용되고 있다. 하지만 소프트웨어 개발자들의 숙련도, 모델 및 모델링 도구의 이해도에 따라 대규모 소프트웨어에 일관성 있는 UML 모델을 적용하기에는 어려움이 발생한다. 이에 본 논문에서는 소프트웨어 개발에 일관성 있는 UML 모델을 적용하기 위한 모델 검증 시스템 개발 방법을 제시한다. 그리고 개발된 모델 검증 시스템을 함정 전투체계 소프트웨어 개발에 일부 적용하여 기능을 입증한다. 모델 검증 시스템은 개발자들이 작성한 모델들을 도메인 특성에 맞게 자동으로 검증할 수 있는 기능을 제공한다. 본 논문에서 제안한 모델 검증 시스템을 사용하면 함정 전투체계 소프트웨어 개발에 좀 더 쉽게 일관성 있는 UML 모델을 적용할 수 있는 장점을 가진다.

Whitman's Strategy of Cultural Independence through Reterritorialization and Deterritorialization

  • Jang, Jeong U
    • 영어영문학
    • /
    • 제55권3호
    • /
    • pp.497-515
    • /
    • 2009
  • Culture as a source of identity, as Edward Said says, can be a battleground on which various political and ideological causes engage one another. It is not mere individual cultivation or private possession, but a program for social cohesion. Sensitively aware that a national culture should be independent from Europe, Walt Whitman enacts a new form of literature by placing different cultural values against Old World tradition. His interest in autochthonous culture originates from his deep concern about national consciousness. He believes that literary taste directed toward highly-ornamented elite culture is an obstacle to cultural unification of a nation. In order to represent American culture of the common people, Whitman incorporates a lot of cultural material into his poetry. Since he believes that America has many respectable writers at home, he urges people to adjust to their own taste instead of running after foreign authors. Whitman differentiated his poetry from previous literary models by disrupting the established literary norms and reconfiguring cultural values on the basis of American ways of life. In his comment on other poets, he concentrates on the originality and nativity of poetry. By claiming that words have characteristics of nativity, independence, and individuality, he envisions American literature to be distinguished from British literature in literary materials as well as in language. Whitman s language is composed of a vast number of words that can fully portray the nation. He works over language materials in two ways: reterritorialization and deterritorialization. Not only does his literary language become subversive of the established literary language, but also makes it possible to express strength and intensity in feeling.