• Title/Summary/Keyword: 발화 생성

Search Result 139, Processing Time 0.022 seconds

A Study on Disassembly Path Generation Using Petri Net (페트리네트를 이용한 분해경로 생성에 관한 연구)

  • 이화조;주해호;경기현
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.17 no.2
    • /
    • pp.176-184
    • /
    • 2000
  • Possible representation methods for the product structure have been compared and analyzed to determine optimal disassembly path of a product. Petri net is selected as the most optimal method to represent disassembly path of the product. In this method, a reachability tree for the product is generated and disassembly time for each path is calculated. A path with the smallest disassembly time is selected as the optimal path. A software far DPN(Disassembly Petri Net) has been developed and applied to search the optimal disassembly path for a ballpoint pen disassembly process as an example.

  • PDF

Error detection and correction in speech recognition by using lexico-semantic patterns (어휘의미패턴을 이용한 음성인식 오류 검출 및 수정)

  • Yoon, Yong-Wook;Jung, Han-Min;Lee, Gary Geun-Bae
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.62-68
    • /
    • 2002
  • 음성인식기를 거친 결과는 오류를 포함할 수 있으며 이를 다른 자연어처리 응용에 이용하기 위해서는 오류의 검출과 수정과정이 필수적이다. 음성인식 오류 후처리는 그 성격상 문자인식 후처리와는 다른 접근 방법을 필요로 하며, 본 인구에서는 잡음환경을 제외한 특정 도메인에 국한된 음성발화 상황에 초점을 맞추고자 한다. 후처리 방법에 있어서는 통계적 접근과 패턴매칭에 의한 접근 방법이 있으며, 본 연구에서는 특정 도메인에서 사용되는 어휘의 의미정보를 포함하는 패턴을 자동으로 생성시켜 이에 의한 오류 검출 및 수정 방안을 제안한다. 본 실험에 사용된 도메인은 차량정보센터용 음성정보 제공 시나리오이며 상용 음성인식기를 후처리를 위한 개발 툴로 사용하였다.

  • PDF

Predicting Contextually Appropriate Intonation from Utterances in Korean with Combinatory Categorial Grammar (결합범주문법을 이용한 한국어 문장의 자연스러운 억양 생성에 대한 연구)

  • Lee, Hwa-Jin;Park, Jong-C.
    • Annual Conference on Human and Language Technology
    • /
    • 2000.10d
    • /
    • pp.68-75
    • /
    • 2000
  • 상대방에게 의사를 전달할 때 보다 정확하게 자신의 의도를 표현하려면 대화의 흐름에 맞는 적절한 억양을 주어 발화해야 한다. 본 논문에서는 결합범주문법을 이용하여 문장을 분석하고 문장 내 정보와 문장 간 정보 즉, 문맥에 따라 강세(pitch accent), 휴지(pause), 강조 등의 억양정보를 어떻게 나타내야 하는지를 분석하여 문장의 정보구조에 추가하는 방법을 제시한다.

  • PDF

A Study on Deep Learning Based RobotArm System (딥러닝 기반의 로봇팔 시스템 연구)

  • Shin, Jun-Ho;Shim, Gyu-Seok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.901-904
    • /
    • 2020
  • 본 시스템은 세 단계의 모델을 복합적으로 구성하여 이루어진다. 첫 단계로 사람의 음성언어를 텍스트로 전환한 후 사용자의 발화 의도를 분류해내는 BoW방식을 이용해 인간의 명령을 이해할 수 있는 자연어 처리 알고리즘을 구성한다. 이후 YOLOv3-tiny를 이용한 실시간 영상처리모델과 OctoMapping모델을 활용하여 주변환경에 대한 3차원 지도생성 후 지도데이터를 기반으로하여 동작하는 기구제어 알고리즘 등을 ROS actionlib을 이용한 관리자시스템을 구성하여 ROS와 딥러닝을 활용한 편리한 인간-로봇 상호작용 시스템을 제안한다.

Korean Generative Chatbot using Topic Embedding (주제 임베딩을 활용한 한국어 생성 기반 챗봇)

  • Oh, Shinhyeok;Kim, Harksoo
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.524-528
    • /
    • 2020
  • 챗봇은 발화에 대해 컴퓨터가 자동으로 응답하는 시스템이다. 현재 챗봇은 전체 주제에 대한 잡담(chit-chat)보다는 특정 주제에 관한 대화를 목적으로 많이 개발되고 있다. 하지만 개개인이 필요로 하는 챗봇 용도에 적합한 학습 데이터는 부족하다. 이러한 상황에서 챗봇 학습을 위해 필요한 주제의 말뭉치를 대량으로 구축하는 것은 시간과 비용이 많이 소모되어 현실적으로 어렵다. 따라서 학습에 필요한 소량의 말뭉치만 사용하더라도 주제에 적합한 응답을 할 수 있는 챗봇이 필요하다. 이에 본 논문은 챗봇의 목적과 관련 없는 대량의 말뭉치와 소량의 주제 기반 말뭉치를 이용하여 높은 성능을 끌어낼 수 있는 주제 임베딩 방법을 제안한다.

  • PDF

Radiolysis of Paraffin Encapsulation Wax (파라핀 고화체의 방사선적 가수분해)

  • Kim, Chang-Lak;Lee, Myung-Chan;Park, Won-Jae;Suk, Tae-Won;Burns William G.
    • Journal of Radiation Protection and Research
    • /
    • v.20 no.4
    • /
    • pp.237-243
    • /
    • 1995
  • An estimate is made on the potential generation rate of H: from radiolysis of the Paraffin-wax encapsulant Proposed for the solidified liquid concentrate wasteform. The results show that the radiolytic Production of $H_2$ from paraffin-wax-encapsulated waste is dominated by the radiation energy released from $^{60}Co$. The radiolytic production of $H_2$ will proceed at an initial rate equivalent to aproximately $4.4{\times}10^2cm^3yr^1$ in 200 litre drums that are partly filled with 120 litres of encapsulated waste. The gas production rate will fall to a value of $7.2cm^3yr^1$ after 100 years. The lower flammable limit for $H_2$ in air will be reached in about 25 years and the lower explosive limit for $H_2$ in air would not be reached in 1000years. The timescale in which these safety-related limits are reached is strongly dependent on the level of filling of each waste drum. A reduction of the air space inside each drum would reduce the time required to reach the lower flammable limit.

  • PDF

A study on end-to-end speaker diarization system using single-label classification (단일 레이블 분류를 이용한 종단 간 화자 분할 시스템 성능 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.536-543
    • /
    • 2023
  • Speaker diarization, which labels for "who spoken when?" in speech with multiple speakers, has been studied on a deep neural network-based end-to-end method for labeling on speech overlap and optimization of speaker diarization models. Most deep neural network-based end-to-end speaker diarization systems perform multi-label classification problem that predicts the labels of all speakers spoken in each frame of speech. However, the performance of the multi-label-based model varies greatly depending on what the threshold is set to. In this paper, it is studied a speaker diarization system using single-label classification so that speaker diarization can be performed without thresholds. The proposed model estimate labels from the output of the model by converting speaker labels into a single label. To consider speaker label permutations in the training, the proposed model is used a combination of Permutation Invariant Training (PIT) loss and cross-entropy loss. In addition, how to add the residual connection structures to model is studied for effective learning of speaker diarization models with deep structures. The experiment used the Librispech database to generate and use simulated noise data for two speakers. When compared with the proposed method and baseline model using the Diarization Error Rate (DER) performance the proposed method can be labeling without threshold, and it has improved performance by about 20.7 %.

Detection of video editing points using facial keypoints (얼굴 특징점을 활용한 영상 편집점 탐지)

  • Joshep Na;Jinho Kim;Jonghyuk Park
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.15-30
    • /
    • 2023
  • Recently, various services using artificial intelligence(AI) are emerging in the media field as well However, most of the video editing, which involves finding an editing point and attaching the video, is carried out in a passive manner, requiring a lot of time and human resources. Therefore, this study proposes a methodology that can detect the edit points of video according to whether person in video are spoken by using Video Swin Transformer. First, facial keypoints are detected through face alignment. To this end, the proposed structure first detects facial keypoints through face alignment. Through this process, the temporal and spatial changes of the face are reflected from the input video data. And, through the Video Swin Transformer-based model proposed in this study, the behavior of the person in the video is classified. Specifically, after combining the feature map generated through Video Swin Transformer from video data and the facial keypoints detected through Face Alignment, utterance is classified through convolution layers. In conclusion, the performance of the image editing point detection model using facial keypoints proposed in this paper improved from 87.46% to 89.17% compared to the model without facial keypoints.

An Analysis of Preservice Teachers' Lesson Plays: How Do Preservice Teachers Give Feedbacks to Students in an Imaginary Classroom Discourse? (예비교사들은 학생의 대답에 어떻게 피드백 하는가? - Lesson Play의 분석 -)

  • Lee, Jihyu
    • School Mathematics
    • /
    • v.19 no.1
    • /
    • pp.19-41
    • /
    • 2017
  • The purpose of this article was to a) identify how preservice teachers conceive feedbacks and subsequent classroom discourses, and b) compare them with those in reform-oriented mathematics classroom video for mathematics teachers' professional development about classroom discourse. This article analyzes feedback patterns and subsequent classroom discourses in preservice teachers' imaginary classroom scripts (lesson plays) and compares them with those in the reform-oriented classroom video dealing with the same teaching situation. Most of the preservice teachers' feedbacks focused the evaluation of students' responses and transmission of meaning (univocal function), whereas the teacher's feedback in the reform-oriented classroom allowed the whole class to validate or challenge the answers, thereby facilitating students' generation of meaning (dialogic function). The comparison analysis between the univocal discourse in a preservice teacher's lesson play and the dialogical discourse in the reform-oriented classroom video shows that teacher feedback serves as an important indicator for the main function of classroom discourse and the levels of students' cognitive participation, and also as a variable that determines and changes them. This case study suggests that to improve the quality of classroom discourse, preservice and in-service teachers need experience of perceiving the variety of feedback patterns available in specific teaching contexts and exploring ways to balance the univocal and dialogical functioning in their feedback move during the teacher training courses.

Extending StarGAN-VC to Unseen Speakers Using RawNet3 Speaker Representation (RawNet3 화자 표현을 활용한 임의의 화자 간 음성 변환을 위한 StarGAN의 확장)

  • Bogyung Park;Somin Park;Hyunki Hong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.7
    • /
    • pp.303-314
    • /
    • 2023
  • Voice conversion, a technology that allows an individual's speech data to be regenerated with the acoustic properties(tone, cadence, gender) of another, has countless applications in education, communication, and entertainment. This paper proposes an approach based on the StarGAN-VC model that generates realistic-sounding speech without requiring parallel utterances. To overcome the constraints of the existing StarGAN-VC model that utilizes one-hot vectors of original and target speaker information, this paper extracts feature vectors of target speakers using a pre-trained version of Rawnet3. This results in a latent space where voice conversion can be performed without direct speaker-to-speaker mappings, enabling an any-to-any structure. In addition to the loss terms used in the original StarGAN-VC model, Wasserstein distance is used as a loss term to ensure that generated voice segments match the acoustic properties of the target voice. Two Time-Scale Update Rule (TTUR) is also used to facilitate stable training. Experimental results show that the proposed method outperforms previous methods, including the StarGAN-VC network on which it was based.