• Title/Summary/Keyword: deep Learning

Search Result 5,763, Processing Time 0.028 seconds

A Multi-speaker Speech Synthesis System Using X-vector (x-vector를 이용한 다화자 음성합성 시스템)

  • Jo, Min Su;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.675-681
    • /
    • 2021
  • With the recent growth of the AI speaker market, the demand for speech synthesis technology that enables natural conversation with users is increasing. Therefore, there is a need for a multi-speaker speech synthesis system that can generate voices of various tones. In order to synthesize natural speech, it is required to train with a large-capacity. high-quality speech DB. However, it is very difficult in terms of recording time and cost to collect a high-quality, large-capacity speech database uttered by many speakers. Therefore, it is necessary to train the speech synthesis system using the speech DB of a very large number of speakers with a small amount of training data for each speaker, and a technique for naturally expressing the tone and rhyme of multiple speakers is required. In this paper, we propose a technology for constructing a speaker encoder by applying the deep learning-based x-vector technique used in speaker recognition technology, and synthesizing a new speaker's tone with a small amount of data through the speaker encoder. In the multi-speaker speech synthesis system, the module for synthesizing mel-spectrogram from input text is composed of Tacotron2, and the vocoder generating synthesized speech consists of WaveNet with mixture of logistic distributions applied. The x-vector extracted from the trained speaker embedding neural networks is added to Tacotron2 as an input to express the desired speaker's tone.

Development of Graph based Deep Learning methods for Enhancing the Semantic Integrity of Spaces in BIM Models (BIM 모델 내 공간의 시멘틱 무결성 검증을 위한 그래프 기반 딥러닝 모델 구축에 관한 연구)

  • Lee, Wonbok;Kim, Sihyun;Yu, Youngsu;Koo, Bonsang
    • Korean Journal of Construction Engineering and Management
    • /
    • v.23 no.3
    • /
    • pp.45-55
    • /
    • 2022
  • BIM models allow building spaces to be instantiated and recognized as unique objects independently of model elements. These instantiated spaces provide the required semantics that can be leveraged for building code checking, energy analysis, and evacuation route analysis. However, theses spaces or rooms need to be designated manually, which in practice, lead to errors and omissions. Thus, most BIM models today does not guarantee the semantic integrity of space designations, limiting their potential applicability. Recent studies have explored ways to automate space allocation in BIM models using artificial intelligence algorithms, but they are limited in their scope and relatively low classification accuracy. This study explored the use of Graph Convolutional Networks, an algorithm exclusively tailored for graph data structures. The goal was to utilize not only geometry information but also the semantic relational data between spaces and elements in the BIM model. Results of the study confirmed that the accuracy was improved by about 8% compared to algorithms that only used geometric distinctions of the individual spaces.

Performance Evaluation of YOLOv5s for Brain Hemorrhage Detection Using Computed Tomography Images (전산화단층영상 기반 뇌출혈 검출을 위한 YOLOv5s 성능 평가)

  • Kim, Sungmin;Lee, Seungwan
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.1
    • /
    • pp.25-34
    • /
    • 2022
  • Brain computed tomography (CT) is useful for brain lesion diagnosis, such as brain hemorrhage, due to non-invasive methodology, 3-dimensional image provision, low radiation dose. However, there has been numerous misdiagnosis owing to a lack of radiologist and heavy workload. Recently, object detection technologies based on artificial intelligence have been developed in order to overcome the limitations of traditional diagnosis. In this study, the applicability of a deep learning-based YOLOv5s model was evaluated for brain hemorrhage detection using brain CT images. Also, the effect of hyperparameters in the trained YOLOv5s model was analyzed. The YOLOv5s model consisted of backbone, neck and output modules. The trained model was able to detect a region of brain hemorrhage and provide the information of the region. The YOLOv5s model was trained with various activation functions, optimizer functions, loss functions and epochs, and the performance of the trained model was evaluated in terms of brain hemorrhage detection accuracy and training time. The results showed that the trained YOLOv5s model is able to provide a bounding box for a region of brain hemorrhage and the accuracy of the corresponding box. The performance of the YOLOv5s model was improved by using the mish activation function, the stochastic gradient descent (SGD) optimizer function and the completed intersection over union (CIoU) loss function. Also, the accuracy and training time of the YOLOv5s model increased with the number of epochs. Therefore, the YOLOv5s model is suitable for brain hemorrhage detection using brain CT images, and the performance of the model can be maximized by using appropriate hyperparameters.

A Korean menu-ordering sentence text-to-speech system using conformer-based FastSpeech2 (콘포머 기반 FastSpeech2를 이용한 한국어 음식 주문 문장 음성합성기)

  • Choi, Yerin;Jang, JaeHoo;Koo, Myoung-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.359-366
    • /
    • 2022
  • In this paper, we present the Korean menu-ordering Sentence Text-to-Speech (TTS) system using conformer-based FastSpeech2. Conformer is the convolution-augmented transformer, which was originally proposed in Speech Recognition. Combining two different structures, the Conformer extracts better local and global features. It comprises two half Feed Forward module at the front and the end, sandwiching the Multi-Head Self-Attention module and Convolution module. We introduce the Conformer in Korean TTS, as we know it works well in Korean Speech Recognition. For comparison between transformer-based TTS model and Conformer-based one, we train FastSpeech2 and Conformer-based FastSpeech2. We collected a phoneme-balanced data set and used this for training our models. This corpus comprises not only general conversation, but also menu-ordering conversation consisting mainly of loanwords. This data set is the solution to the current Korean TTS model's degradation in loanwords. As a result of generating a synthesized sound using ParallelWave Gan, the Conformer-based FastSpeech2 achieved superior performance of MOS 4.04. We confirm that the model performance improved when the same structure was changed from transformer to Conformer in the Korean TTS.

Design and Implementation of Sandcastle Play Guide Application using Artificial Intelligence and Augmented Reality (인공지능과 증강현실 기술을 이용한 모래성 놀이 가이드 애플리케이션 설계 및 구현)

  • Ryu, Jeeseung;Jang, Seungwoo;Mun, Yujeong;Lee, Jungjin
    • Journal of the Korea Computer Graphics Society
    • /
    • v.28 no.3
    • /
    • pp.79-89
    • /
    • 2022
  • With the popularity and the advanced graphics hardware technology of mobile devices, various mobile applications that help children with physical activities have been studied. This paper presents SandUp, a mobile application that guides the play of building sand castles using artificial intelligence and augmented reality(AR) technology. In the process of building the sandcastle, children can interactively explore the target virtual sandcastle through the smartphone display using AR technology. In addition, to help children complete the sandcastle, SandUp informs the sand shape and task required step by step and provides visual and auditory feedback while recognizing progress in real-time using the phone's camera and deep learning classification. We prototyped our SandUp app using Flutter and TensorFlow Lite. To evaluate the usability and effectiveness of the proposed SandUp, we conducted a questionnaire survey on 50 adults and a user study on 20 children aged 4~7 years. The survey results showed that SandUp effectively helps build the sandcastle with proper interactive guidance. Based on the results from the user study on children and feedback from their parents, we also derived usability issues that can be further improved and suggested future research directions.

Training of a Siamese Network to Build a Tracker without Using Tracking Labels (샴 네트워크를 사용하여 추적 레이블을 사용하지 않는 다중 객체 검출 및 추적기 학습에 관한 연구)

  • Kang, Jungyu;Song, Yoo-Seung;Min, Kyoung-Wook;Choi, Jeong Dan
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.21 no.5
    • /
    • pp.274-286
    • /
    • 2022
  • Multi-object tracking has been studied for a long time under computer vision and plays a critical role in applications such as autonomous driving and driving assistance. Multi-object tracking techniques generally consist of a detector that detects objects and a tracker that tracks the detected objects. Various publicly available datasets allow us to train a detector model without much effort. However, there are relatively few publicly available datasets for training a tracker model, and configuring own tracker datasets takes a long time compared to configuring detector datasets. Hence, the detector is often developed separately with a tracker module. However, the separated tracker should be adjusted whenever the former detector model is changed. This study proposes a system that can train a model that performs detection and tracking simultaneously using only the detector training datasets. In particular, a Siam network with augmentation is used to compose the detector and tracker. Experiments are conducted on public datasets to verify that the proposed algorithm can formulate a real-time multi-object tracker comparable to the state-of-the-art tracker models.

Comparative study of data augmentation methods for fake audio detection (음성위조 탐지에 있어서 데이터 증강 기법의 성능에 관한 비교 연구)

  • KwanYeol Park;Il-Youp Kwak
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.

Fake News Detection on YouTube Using Related Video Information (관련 동영상 정보를 활용한 YouTube 가짜뉴스 탐지 기법)

  • Junho Kim;Yongjun Shin;Hyunchul Ahn
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.19-36
    • /
    • 2023
  • As advances in information and communication technology have made it easier for anyone to produce and disseminate information, a new problem has emerged: fake news, which is false information intentionally shared to mislead people. Initially spread mainly through text, fake news has gradually evolved and is now distributed in multimedia formats. Since its founding in 2005, YouTube has become the world's leading video platform and is used by most people worldwide. However, it has also become a primary source of fake news, causing social problems. Various researchers have been working on detecting fake news on YouTube. There are content-based and background information-based approaches to fake news detection. Still, content-based approaches are dominant when looking at conventional fake news research and YouTube fake news detection research. This study proposes a fake news detection method based on background information rather than content-based fake news detection. In detail, we suggest detecting fake news by utilizing related video information from YouTube. Specifically, the method detects fake news through CNN, a deep learning network, from the vectorized information obtained from related videos and the original video using Doc2vec, an embedding technique. The empirical analysis shows that the proposed method has better prediction performance than the existing content-based approach to detecting fake news on YouTube. The proposed method in this study contributes to making our society safer and more reliable by preventing the spread of fake news on YouTube, which is highly contagious.

Content-based Korean journal recommendation system using Sentence BERT (Sentence BERT를 이용한 내용 기반 국문 저널추천 시스템)

  • Yongwoo Kim;Daeyoung Kim;Hyunhee Seo;Young-Min Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.37-55
    • /
    • 2023
  • With the development of electronic journals and the emergence of various interdisciplinary studies, the selection of journals for publication has become a new challenge for researchers. Even if a paper is of high quality, it may face rejection due to a mismatch between the paper's topic and the scope of the journal. While research on assisting researchers in journal selection has been actively conducted in English, the same cannot be said for Korean journals. In this study, we propose a system that recommends Korean journals for submission. Firstly, we utilize SBERT (Sentence BERT) to embed abstracts of previously published papers at the document level, compare the similarity between new documents and published papers, and recommend journals accordingly. Next, the order of recommended journals is determined by considering the similarity of abstracts, keywords, and title. Subsequently, journals that are similar to the top recommended journal from previous stage are added by using a dictionary of words constructed for each journal, thereby enhancing recommendation diversity. The recommendation system, built using this approach, achieved a Top-10 accuracy level of 76.6%, and the validity of the recommendation results was confirmed through user feedback. Furthermore, it was found that each step of the proposed framework contributes to improving recommendation accuracy. This study provides a new approach to recommending academic journals in the Korean language, which has not been actively studied before, and it has also practical implications as the proposed framework can be easily applied to services.

Development and Application of Tunnel Design Automation Technology Using 3D Spatial Information : BIM-Based Design for Namhae Seomyeon - Yeosu Shindeok National Highway Construction (3D 공간정보를 활용한 터널 설계 자동화 기술 개발 및 적용 사례 : 남해 서면-여수 신덕 국도 건설공사 BIM기반 설계를 중심으로)

  • Eunji Jo;Woojin Kim;Kwangyeom Kim;Jaeho Jung;Sanghyuk Bang
    • Tunnel and Underground Space
    • /
    • v.33 no.4
    • /
    • pp.209-227
    • /
    • 2023
  • The government continues to announce measures to revitalize smart construction technology based on BIM for productivity innovation in the construction industry. In the design phase, the goal is design automation and optimization by converging BIM Data and other advanced technologies. Accordingly, in the basic design of the Namhae Seomyeon-Yeosu Sindeok National Road Construction Project, a domestic undersea tunnel project, BIM-based design was carried out by developing tunnel design automation technology using 3D spatial information according to the tunnel design process. In order to derive the optimal alignment, more than 10,000 alignment cases were generated in 36hr using the generative design technique and a quantitative evaluation of the objective functions defined by the designer was performed. AI-based ground classification and 3D Geo Model were established to evaluate the economic feasibility and stability of the optimal alignment. AI-based ground classification has improved its precision by performing about 30 types of ground classification per borehole, and in the case of the 3D Geo Model, its utilization can be expected in that it can accumulate ground data added during construction. In the case of 3D blasting design, the optimal charge weight was derived in 5 minutes by reviewing all security objects on the project range on Dynamo, and the design result was visualized in 3D space for intuitive and convenient construction management so that it could be used directly during construction.