• Title/Summary/Keyword: Voice Processing

Search Result 561, Processing Time 0.024 seconds

Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network (CNN 기반 스펙트로그램을 이용한 자유발화 음성감정인식)

  • Guiyoung Son;Soonil Kwon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.6
    • /
    • pp.284-290
    • /
    • 2024
  • Speech emotion recognition (SER) is a technique that is used to analyze the speaker's voice patterns, including vibration, intensity, and tone, to determine their emotional state. There has been an increase in interest in artificial intelligence (AI) techniques, which are now widely used in medicine, education, industry, and the military. Nevertheless, existing researchers have attained impressive results by utilizing acted-out speech from skilled actors in a controlled environment for various scenarios. In particular, there is a mismatch between acted and spontaneous speech since acted speech includes more explicit emotional expressions than spontaneous speech. For this reason, spontaneous speech-emotion recognition remains a challenging task. This paper aims to conduct emotion recognition and improve performance using spontaneous speech data. To this end, we implement deep learning-based speech emotion recognition using the VGG (Visual Geometry Group) after converting 1-dimensional audio signals into a 2-dimensional spectrogram image. The experimental evaluations are performed on the Korean spontaneous emotional speech database from AI-Hub, consisting of 7 emotions, i.e., joy, love, anger, fear, sadness, surprise, and neutral. As a result, we achieved an average accuracy of 83.5% and 73.0% for adults and young people using a time-frequency 2-dimension spectrogram, respectively. In conclusion, our findings demonstrated that the suggested framework outperformed current state-of-the-art techniques for spontaneous speech and showed a promising performance despite the difficulty in quantifying spontaneous speech emotional expression.

A Comparative Study on the Aesthetic Aspect of Design Preferred Between Countries Centering Around the Analysis on the Aesthetic Aspect of Mobile Phone Preferred by Korean and Chinese Consumers - (국가 간 선호 디자인의 심미성요소 비교연구 - 한.중 소비자 선호휴대폰의 심미성요소 분석을 중심으로 -)

  • Jeong Su-Kyoung;Hong Jung-Pyo
    • Science of Emotion and Sensibility
    • /
    • v.9 no.1
    • /
    • pp.49-61
    • /
    • 2006
  • The present mobile phone industry has significant effect on the domestic economy and has taken root as the core item that has the responsibility to lead the Korean economy for a considerable period of time. As the mobile phone market becomes gigantic, the mobile phone is being used by people in broader age bracket, and functions or designs preferred by people of various age are getting more diverse. Like that, as the mobile phone has greater effect on and meaning in our daily lives, consumers of mobile phone have growing expectation of the mobile phone Now, the core function of voice communication via the mobile phone is not a great concern to consumers. But the function, such as more convenient and friendly information input and output, processing and storage, and the design, which is more sophisticated and optimized for the user environment, are being demanded, not just the simple voice communication. And as the modern design is getting more similar to the objects of traditional high art consumed by consumers every day, the aesthetic aspect of design can play an important role, as the factor that differentiates the product, in creating new value which forms the spiritual and emotional value of human beings to improve the quality of living, and in addition, the willingness of consumers to buy is determined by the design that they prefer the most. Like that, a new design of mobile phone based on a new dimension and preferred by the consumers the most is urgently required to be developed by shedding light on the factors related to the preference of consumers on the basis of the analysis on the aesthetic aspect, which can be said to be the most critical factor in the design process. Therefore, this study aims to identity the common preference and different factors of aesthetic aspects through the analysis on the aesthetic aspects of the mobile phone preferred by users among countries, and figure out the formative artistic factors of aesthetic aspects that are considered to be important, in order to propose the guideline on the aesthetic aspect of mobile phone that can be applied to the design of mobile phone practically.

  • PDF

Design and Implementation of a Bluetooth Baseband Module with DMA Interface (DMA 인터페이스를 갖는 블루투스 기저대역 모듈의 설계 및 구현)

  • Cheon, Ik-Jae;O, Jong-Hwan;Im, Ji-Suk;Kim, Bo-Gwan;Park, In-Cheol
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.3
    • /
    • pp.98-109
    • /
    • 2002
  • Bluetooth technology is a publicly available specification proposed for Radio Frequency (RF) communication for short-range :1nd point-to-multipoint voice and data transfer. It operates in the 2.4㎓ ISM(Industrial, Scientific and Medical) band and offers the potential for low-cost, broadband wireless access for various mobile and portable devices at range of about 10 meters. In this paper, we describe the structure and the test results of the bluetooth baseband module with direct memory access method we have developed. This module consists of three blocks; link controller, UART interface, and audio CODEC. This module has a bus interface for data communication between this module and main processor and a RF interface for the transmission of bit-stream between this module and RF module. The bus interface includes DMA interface. Compared with the link controller with FIFOs, The module with DMA has a wide difference in size of module and speed of data processing. The small size module supplies lorr cost and various applications. In addition, this supports a firmware upgrade capability through UART. An FPGA and an ASIC implementation of this module, designed as soft If, are tested for file and bit-stream transfers between PCs.

A Study of Efficient Algorithm for Survivable Network Design with Conduit (관로가 있는 생존가능망 설계에 관한 효율적인 알고리즘 연구)

  • Kang, Hyo-Kwan;Han, Chi-Geun
    • The KIPS Transactions:PartC
    • /
    • v.8C no.5
    • /
    • pp.629-636
    • /
    • 2001
  • Network is changed from voice-based network into multimedia-based network by development of communication technology and multimedia service. We need a large bandwidth for multimedia service. The optical fiber is a more suitable medium than existing copper-based cable for large bandwidth. But, it is so expensive than copper-based cable. So, Minimizing total cost becomes a more important concept. In order to construct a minimum cost network, we have to consider existing conduits in network. On the other hand, optical fiber network allows that larger amount of traffic can be transmitted than copper-based network does. However, a failure of a node or link can make a serious damage to the network service. Thus, we have to get multiple paths to support continuous service even if a loss of failure occurs in some point of the network. The network survivability problem is to design the network that can provide reliable service to customers anytime with minimum total cost. In an existing solution of the network survivability problem with conduits, a conduit is considered only one time. But, the conduit is reusable if the network satisfies the required survivability. Proposed algorithm can more effectively considered already existed conduit. Network survivability and edge cost is predetermined. The proposed algorithm finds the best solution by conduit sharing within the limits of network survivability. According to the simulation result, the proposed method can decrease 7% of total cost than an existing method by effective conduits adaption.

  • PDF

DFT-spread OFDM Communication System for the Power Efficiency and Nonlinear Distortion in Underwater Communication (수중통신에서 비선형 왜곡과 전력효율을 위한 DFT-spread OFDM 통신 시스템)

  • Lee, Woo-Min;Ryn, Heung-Gyoon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.8A
    • /
    • pp.777-784
    • /
    • 2010
  • Recently, the necessity of underwater communication and demand for transmitting and receiving various data such as voice or high resolution image data are increasing as well. The performance of underwater acoustic communication system is influenced by characteristics of the underwater communication channels. Especially, ISI(inter symbol interference) occurs because of delay spread according to multi-path and communication performance is degraded. In this paper, we study the OFDM technique to overcome the delay spread in underwater channel and by using CP, we compensate for delay spread. But PAPR which OFDM system has problem is very high. Therefore, we use DFT-spread OFDM method to avoid nonlinear distortion by high PAPR and to improve efficiency of amplifier. DFT-spread OFDM technique obtains high PAPR reduction effect because of each parallel data loads to all subcarrier by DFT spread processing before IFFT. In this paper, we show performance about delay spread through OFDM system and verify method that DFT spread OFDM is more suitable than OFDM for underwater communication. And we analyze performance according to two subcarrier mapping methods(Interleaved, Localized). Through the simulation results, performance of DFT spread OFDM is better about 5~6dB at $10^{-4}$ than OFDM. When compared to BER according to subcarrier mapping, Interleaved method is better about 3.5dB at $10^{-4}$ than Localized method.

User Detection and Main Body Parts Estimation using Inaccurate Depth Information and 2D Motion Information (정밀하지 않은 깊이정보와 2D움직임 정보를 이용한 사용자 검출과 주요 신체부위 추정)

  • Lee, Jae-Won;Hong, Sung-Hoon
    • Journal of Broadcast Engineering
    • /
    • v.17 no.4
    • /
    • pp.611-624
    • /
    • 2012
  • 'Gesture' is the most intuitive means of communication except the voice. Therefore, there are many researches for method that controls computer using gesture input to replace the keyboard or mouse. In these researches, the method of user detection and main body parts estimation is one of the very important process. in this paper, we propose user objects detection and main body parts estimation method on inaccurate depth information for pose estimation. we present user detection method using 2D and 3D depth information, so this method robust to changes in lighting and noise and 2D signal processing 1D signals, so mainly suitable for real-time and using the previous object information, so more accurate and robust. Also, we present main body parts estimation method using 2D contour information, 3D depth information, and tracking. The result of an experiment, proposed user detection method is more robust than only using 2D information method and exactly detect object on inaccurate depth information. Also, proposed main body parts estimation method overcome the disadvantage that can't detect main body parts in occlusion area only using 2D contour information and sensitive to changes in illumination or environment using color information.

A Basic Study on the Differential Diagnostic System of Laryngeal Diseases using Hierarchical Neural Networks (다단계 신경회로망을 이용한 후두질환 감별진단 시스템의 개발)

  • 전계록;김기련;권순복;예수영;이승진;왕수건
    • Journal of Biomedical Engineering Research
    • /
    • v.23 no.3
    • /
    • pp.197-205
    • /
    • 2002
  • The objectives of this Paper is to implement a diagnostic classifier of differential laryngeal diseases from acoustic signals acquired in a noisy room. For this Purpose, the voice signals of the vowel /a/ were collected from Patients in a soundproof chamber and got mixed with noise. Then, the acoustic Parameters were analyzed, and hierarchical neural networks were applied to the data classification. The classifier had a structure of five-step hierarchical neural networks. The first neural network classified the group into normal and benign or malign laryngeal disease cases. The second network classified the group into normal or benign laryngeal disease cases The following network distinguished polyp. nodule. Palsy from the benign laryngeal cases. Glottic cancer cases were discriminated into T1, T2. T3, T4 by the fourth and fifth networks All the neural networks were based on multilayer perceptron model which classified non-linear Patterns effectively and learned by an error back-propagation algorithm. We chose some acoustic Parameters for classification by investigating the distribution of laryngeal diseases and Pilot classification results of those Parameters derived from MDVP. The classifier was tested by using the chosen parameters to find the optimum ones. Then the networks were improved by including such Pre-Processing steps as linear and z-score transformation. Results showed that 90% of T1, 100% of T2-4 were correctly distinguished. On the other hand. 88.23% of vocal Polyps, 100% of normal cases. vocal nodules. and vocal cord Paralysis were classified from the data collected in a noisy room.

Research Suggestion for Disaster Prediction using Safety Report of Korea Government (안전신문고를 이용한 재난 예측 방법론 제안)

  • Lee, Jun;Shin, Jindong;Cho, Sangmyeong;Lee, Sanghwa
    • Journal of Korean Society of Disaster and Security
    • /
    • v.12 no.4
    • /
    • pp.15-26
    • /
    • 2019
  • Anjunshinmungo (The safety e-report) has been in operation since 2014, and there are about 1 million cumulative reports by June 2019. This study analyzes the contents of more than 1 million safety newspapers reported at the present time of information age to determine how powerful and meaningful the people's voice and interest are. In particular, we are interested in forecasting ability. We wanted to check whether the report of the safety newspaper was related to possible disasters. To this end, the researchers received data reported in the safety newspaper as text and analyzed it by natural language analysis methodology. Based on this, the newspaper articles during the analysis of the safety newspaper were analyzed, and the correlation between the contents of the newspaper and the newspaper was analyzed. As a result, accidents occurred within a few months as the number of reports related to response and confirmation increased, and analyzing the contents of safety reports previously reported on social instability can be used to predict future disasters.

Testimony of the Real World, Documentary-Animation (현실세계의 증언, 다큐멘터리-애니메이션 분석)

  • Oh, Jin-Hee
    • Cartoon and Animation Studies
    • /
    • s.45
    • /
    • pp.27-50
    • /
    • 2016
  • The present study argues that documentary-animation films, which are based on actual human voices, on the level of representation, constitute a new expansion for the medium of animation films, which serve as testimonies to the real world. Animation films are produced using very diverse techniques so that they are complex to the degree of being indefinable, and documentary films, though based on objective representation, increase in complexity in that there exist various types of artificial interventions such as direction and digital image processing. Having emerged as a hybrid genre of the two media, documentary-animation films draw into themselves actual events and elements so that they conceptually share reality-based narratives and are visually characterized by the trappings of animation films. Generally classified as 'animated documentaries', this genre triggered discussions following the release of , a work that is mistaken as having used rotoscoping transforming live action in terms of the technique. When analyzed in detail, however, this work is presented as an ambiguous medium where the characteristics of animation films, which are virtual simulacra without reality, and of documentaries, which are based on the objective indexicality of the referents, coexist because of its mixed use of typical animation techniques, 3D programs, and live-action images. Discussed in the present study, , , and share the characteristics of the medium of documentaries in that the narratives develop as testimonies of historical figures but, at the same time, are connected to animation films because of their production techniques and direction characteristics. Consequently, this medium must be discussed as a new expansion rather than being included in the existing classification system, and such a presupposition is an indispensable process for directly facing the reality of the works and for developing discussions. Through works that directly use the interviewees' voices yet do not transcend the characteristics of animation films, the present study seeks to define documentary-animation films and to discuss the possibility of the medium, which has expanded as a testimony to the real world.

A Thought on the Right to Be Forgotten Articulated in the European Commission's Proposal for General Data Protection Regulation (유럽연합(EU) 정보보호법(General Data Protection Regulation)개정안상의 잊혀질 권리와 현행 우리 법의 규율 체계 및 앞으로의 입법방향에 관한 소고)

  • Hah, Jung Chul
    • Journal of Digital Convergence
    • /
    • v.10 no.11
    • /
    • pp.87-92
    • /
    • 2012
  • In the early 2012, European Union proposed new legal framework, including the right to be forgotten, for the protection of personal data. The new Proposal articulates kind of sweeping new privacy right and there has been debates on its potential threat to free speech in the digital age. While the situation is similar in Korea, I want to introduce the right to be forgotten in the Proposal. Then, I will analyze current legal system in Korea regarding the new privacy right and suggest some guidelines in searching direction for the coming legislation with respect to the right to be forgotten. The right to be forgotten should not have been promulgated without considering fully its effect on the free speech, especially in the society where the voice toward direct democracy or movement toward participation of the citizen, mainly through cyber space or Social Network Services, has risen much higher in Korea. Especially, the new right seems not to cover the control of data subject on a third party where the third party expressing his opinion by posting himself other's personal data on his blog or others.