• Title/Summary/Keyword: Voice learning

Search Result 276, Processing Time 0.022 seconds

Application for Workout and Diet Assistant using Image Processing and Machine Learning Skills (영상처리 및 머신러닝 기술을 이용하는 운동 및 식단 보조 애플리케이션)

  • Chi-Ho Lee;Dong-Hyun Kim;Seung-Ho Choi;In-Woong Hwang;Kyung-Sook Han
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.5
    • /
    • pp.83-88
    • /
    • 2023
  • In this paper, we developed a workout and diet assistance application to meet the growing demand for workout and dietary support services due to the increase in the home training population. The application analyzes the user's workout posture in real-time through the camera and guides the correct posture using guiding lines and voice feedback. It also classifies the foods included in the captured photos, estimates the amount of each food, and calculates and provides nutritional information such as calories. Nutritional information calculations are executed on the server, which then transmits the results back to the application. Once received, this data is presented visually to the user. Additionally, workout results and nutritional information are saved and organized by date for users to review.

Method of Automatically Generating Metadata through Audio Analysis of Video Content (영상 콘텐츠의 오디오 분석을 통한 메타데이터 자동 생성 방법)

  • Sung-Jung Young;Hyo-Gyeong Park;Yeon-Hwi You;Il-Young Moon
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.6
    • /
    • pp.557-561
    • /
    • 2021
  • A meatadata has become an essential element in order to recommend video content to users. However, it is passively generated by video content providers. In the paper, a method for automatically generating metadata was studied in the existing manual metadata input method. In addition to the method of extracting emotion tags in the previous study, a study was conducted on a method for automatically generating metadata for genre and country of production through movie audio. The genre was extracted from the audio spectrogram using the ResNet34 artificial neural network model, a transfer learning model, and the language of the speaker in the movie was detected through speech recognition. Through this, it was possible to confirm the possibility of automatically generating metadata through artificial intelligence.

An User-Friendly Kiosk System Based on Deep Learning (딥러닝 기반 사용자 친화형 키오스크 시스템)

  • Su Yeon Kang;Yu Jin Lee;Hyun Ah Jung;Seung A Cho;Hyung Gyu Lee
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.1
    • /
    • pp.1-13
    • /
    • 2024
  • This study aims to provide a customized dynamic kiosk screen that considers user characteristics to cope with changes caused by increased use of kiosks. In order to optimize the screen composition according to the characteristics of the digital vulnerable group such as the visually impaired, the elderly, children, and wheelchair users, etc., users are classified into nine categories based on real-time analysis of user characteristics (wheelchair use, visual impairment, age, etc.). The kiosk screen is dynamically adjusted according to the characteristics of the user to provide efficient services. This study shows that the system communication and operation were performed in the embedded environment, and the used object detection, gait recognition, and speech recognition technologies showed accuracy of 74%, 98.9%, and 96%, respectively. The proposed technology was verified for its effectiveness by implementing a prototype, and through this, this study showed the possibility of reducing the digital gap and providing user-friendly "barrier-free kiosk" services.

Increasing Accuracy of Stock Price Pattern Prediction through Data Augmentation for Deep Learning (데이터 증강을 통한 딥러닝 기반 주가 패턴 예측 정확도 향상 방안)

  • Kim, Youngjun;Kim, Yeojeong;Lee, Insun;Lee, Hong Joo
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.1-12
    • /
    • 2019
  • As Artificial Intelligence (AI) technology develops, it is applied to various fields such as image, voice, and text. AI has shown fine results in certain areas. Researchers have tried to predict the stock market by utilizing artificial intelligence as well. Predicting the stock market is known as one of the difficult problems since the stock market is affected by various factors such as economy and politics. In the field of AI, there are attempts to predict the ups and downs of stock price by studying stock price patterns using various machine learning techniques. This study suggest a way of predicting stock price patterns based on the Convolutional Neural Network(CNN) among machine learning techniques. CNN uses neural networks to classify images by extracting features from images through convolutional layers. Therefore, this study tries to classify candlestick images made by stock data in order to predict patterns. This study has two objectives. The first one referred as Case 1 is to predict the patterns with the images made by the same-day stock price data. The second one referred as Case 2 is to predict the next day stock price patterns with the images produced by the daily stock price data. In Case 1, data augmentation methods - random modification and Gaussian noise - are applied to generate more training data, and the generated images are put into the model to fit. Given that deep learning requires a large amount of data, this study suggests a method of data augmentation for candlestick images. Also, this study compares the accuracies of the images with Gaussian noise and different classification problems. All data in this study is collected through OpenAPI provided by DaiShin Securities. Case 1 has five different labels depending on patterns. The patterns are up with up closing, up with down closing, down with up closing, down with down closing, and staying. The images in Case 1 are created by removing the last candle(-1candle), the last two candles(-2candles), and the last three candles(-3candles) from 60 minutes, 30 minutes, 10 minutes, and 5 minutes candle charts. 60 minutes candle chart means one candle in the image has 60 minutes of information containing an open price, high price, low price, close price. Case 2 has two labels that are up and down. This study for Case 2 has generated for 60 minutes, 30 minutes, 10 minutes, and 5minutes candle charts without removing any candle. Considering the stock data, moving the candles in the images is suggested, instead of existing data augmentation techniques. How much the candles are moved is defined as the modified value. The average difference of closing prices between candles was 0.0029. Therefore, in this study, 0.003, 0.002, 0.001, 0.00025 are used for the modified value. The number of images was doubled after data augmentation. When it comes to Gaussian Noise, the mean value was 0, and the value of variance was 0.01. For both Case 1 and Case 2, the model is based on VGG-Net16 that has 16 layers. As a result, 10 minutes -1candle showed the best accuracy among 60 minutes, 30 minutes, 10 minutes, 5minutes candle charts. Thus, 10 minutes images were utilized for the rest of the experiment in Case 1. The three candles removed from the images were selected for data augmentation and application of Gaussian noise. 10 minutes -3candle resulted in 79.72% accuracy. The accuracy of the images with 0.00025 modified value and 100% changed candles was 79.92%. Applying Gaussian noise helped the accuracy to be 80.98%. According to the outcomes of Case 2, 60minutes candle charts could predict patterns of tomorrow by 82.60%. To sum up, this study is expected to contribute to further studies on the prediction of stock price patterns using images. This research provides a possible method for data augmentation of stock data.

  • PDF

Automatic Speech Style Recognition Through Sentence Sequencing for Speaker Recognition in Bilateral Dialogue Situations (양자 간 대화 상황에서의 화자인식을 위한 문장 시퀀싱 방법을 통한 자동 말투 인식)

  • Kang, Garam;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.17-32
    • /
    • 2021
  • Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speaker's attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speaker's speech determines the type of sentence or has functions and information such as the speaker's intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.

A Case Study on the Teaching Mathematics Carried by a Researcher as a Parent of One Elementary School Child - Focused on the area of figures in the 5th grade - (부모로서 연구자의 초등 자녀 수학지도에 대한 사례 연구: 초등 5학년 도형의 넓이를 중심으로)

  • Son, Byoung Im;Choi-Koh, Sang Sook
    • Education of Primary School Mathematics
    • /
    • v.22 no.4
    • /
    • pp.261-280
    • /
    • 2019
  • This study is a qualitative study on the case of teaching mathematics between parents and children. 12 lesson units were applied to the 5th grade elementary school child for the first semester, 2019. The purpose of this study was to identify conceptual understanding in the area, the types of problems that child felt difficult during the learning and parents' advantages and difficulties in this setting. For this study, video recording and voice recording were collected for each lesson class. The concept of the area was recognized correctly, the awareness of reconstruction became clear, and the concept of partitioning, unit iteration and structuring an array was more clearly rebuilt. He showed difficulty in conversion between units of the area, in displaying height of the shape whose height is displayed outside and drawing type of figure with same area after the value of the area was offered. In the learning situation of parents and children, parents who are researchers have the advantage of being able to customize up to their children and being free from time and cost constraints. There were difficulties in controlling negative emotion toward the child, determining the level of the children, distribution the class time and deciding the degree of intervention. Furthermore, research on parenting and child-to-parent teaching in mathematics is recommended.

Evaluation of the Subjective Acoustic Performance of University Small Hall Remodeled as a Lecture Room : Based on the case of the W University (강의전용 공간으로 리모델링된 대학 소공연장의 주관적 음향성능 평가 : W대학의 사례를 바탕으로)

  • Kim, Min-Ju;Kim, Jae-Soo
    • The Journal of Sustainable Design and Educational Environment Research
    • /
    • v.19 no.4
    • /
    • pp.40-49
    • /
    • 2020
  • Recently, the form of education has changed from one-way to two-way and mutual exchange rather than the existing one-way order form, and accordingly, it is necessary to consider creating a suitable learning environment for each type of education. The basic form of education consists of the delivery of knowledge, that is, the delivery of knowledge by teachers to education consumers through voice delivery, so the sound environment is considered an essential factor in creating a pleasant learning environment. The indoor sound environment is very closely related to the mental stress of the inmate, so the quality level of education will also change greatly depending on whether or not the appropriate sound environment is created. However, the importance of the sound environment in educational facilities such as classrooms has not been highlighted due to the lack of research and related laws on the sound environment. Therefore, in this study, auditory tests were conducted using the auralization based on the physical acoustic performance data presented in the preceding study. Through this, we wanted to verify the validity of this research by analyzing the subjective acoustic performance satisfaction of the occupants due to the improvement of the physical acoustic performance. Based on these research results, it is estimated that the improvement of the sound environment of educational facilities through remodeling in the future will be possible to verify whether the sound environment suitable for educational facilities is created only after the analysis stage on the improvement of subjective sound performance as well as physical sound performance.

Analysis of User Experience for the Class Using Metaverse - Focus on 'Spatial' - (메타버스의 수업활용에 관한 사용자 경험 분석 - 스페이셜(Spatial)을 중심으로 -)

  • Lee, Yejin;Jung, Kwang-Tae
    • Journal of Practical Engineering Education
    • /
    • v.14 no.2
    • /
    • pp.367-376
    • /
    • 2022
  • In this study, the user experience was analyzed from the learner's point of view, focusing on the metaverse platform 'Spatial'. SUS(System Usability Scale) was used to evaluate the usability of the metaverse platform 'Spatial' in a college class, and the Magnitude estimation technique was used to evaluate the immersion and satisfaction with the class. In addition, a questionnaire survey was used to collect user experience opinions on the use of 'Spatial' as a teaching tool. Looking at the usability evaluation results of the 'Spatial' system, the students evaluated the usability, immersion, and satisfaction quite positively. Looking at the user experience of metaverse platform 'Spatial', it was found that students highly valued Metaverse as an educational tool that can provide a place for many people to gather and communicate even in a non-face-to-face space. Compared to other online platforms, metaverse has advantages in ease of use, interaction, immersion, and interest. In particular, in addition to keyboard, touch, and display, interaction using the five senses such as voice, motion, and gaze was recognized as a great advantage. On the other hand, it was found that high openness, freedom, and interest factors can both promote learning and inhibit learning. Nevertheless, it is judged that the metaverse platform 'Spatial' can be effectively applied in college classes because it enables various interactions between instructor and learner or between learner and learner.

Automatic detection and severity prediction of chronic kidney disease using machine learning classifiers (머신러닝 분류기를 사용한 만성콩팥병 자동 진단 및 중증도 예측 연구)

  • Jihyun Mun;Sunhee Kim;Myeong Ju Kim;Jiwon Ryu;Sejoong Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.45-56
    • /
    • 2022
  • This paper proposes an optimal methodology for automatically diagnosing and predicting the severity of the chronic kidney disease (CKD) using patients' utterances. In patients with CKD, the voice changes due to the weakening of respiratory and laryngeal muscles and vocal fold edema. Previous studies have phonetically analyzed the voices of patients with CKD, but no studies have been conducted to classify the voices of patients. In this paper, the utterances of patients with CKD were classified using the variety of utterance types (sustained vowel, sentence, general sentence), the feature sets [handcrafted features, extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS), CNN extracted features], and the classifiers (SVM, XGBoost). Total of 1,523 utterances which are 3 hours, 26 minutes, and 25 seconds long, are used. F1-score of 0.93 for automatically diagnosing a disease, 0.89 for a 3-classes problem, and 0.84 for a 5-classes problem were achieved. The highest performance was obtained when the combination of general sentence utterances, handcrafted feature set, and XGBoost was used. The result suggests that a general sentence utterance that can reflect all speakers' speech characteristics and an appropriate feature set extracted from there are adequate for the automatic classification of CKD patients' utterances.