• Title/Summary/Keyword: Voice function

Search Result 436, Processing Time 0.027 seconds

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

  • Liu, Min;Tang, Jun
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.754-771
    • /
    • 2021
  • In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.

Strength in Numbers and Voice: An Assessment of the Networking Capacity of Chinese ENGOs

  • Shapiro, Matthew A.;Brunner, Elizabeth;Li, Hui
    • Journal of Contemporary Eastern Asia
    • /
    • v.17 no.2
    • /
    • pp.147-175
    • /
    • 2018
  • Under authoritarian regimes, citizen-led NGOs such as environmental NGOs (ENGOs) often operate under close scrutiny of the government. While this presents a challenge to a single ENGO, we propose here - in line with existing research on network effects - that there are opportunities for multiple ENGOs to coordinate and thus work in ways that supersede government controls, affect public opinion, and contribute to policy revision and/or creation. In this paper, we specifically examine the possibility that the gamut of citizen-based ENGOs in China are coordinating. Based on network analysis of ENGOs web pages as well as interviews with more than a dozen ENGO leaders between 2014 and 2016, we find that ENGOs have few direct and public connections to each other, but social media sites and personal connections offline provide a crucial function in creating bridges. A closer examination of these bridges reveals, however, that they can be substantive to the environmental discussion or functional to the dissemination of web page information but typically not both. In short, ENGOs in China are not directly connected but rather are connected in a way that responds to the available social media and the government's censorship practices.

A Multi-Sensor Module of Snake Robot for Searching Survivors in Narrow Space (협소 공간 생존자 탐색을 위한 뱀형 로봇의 다중 센서 모듈)

  • Kim, Sungjae;Shin, Dong-Gwan;Pyo, Juhyun;Shin, Juseong;Jin, Maolin;Suh, Jinho
    • The Journal of Korea Robotics Society
    • /
    • v.16 no.4
    • /
    • pp.291-298
    • /
    • 2021
  • In this study, we present a multi-sensor module for snake robot searching survivors in a narrow space. To this end, we integrated five sensor systems by considering the opinions of the first responders: a gas sensor to detect CO2 gases from the exhalation of survivors, a CMOS camera to provide the image of survivors, an IR camera to see in the dark & smoky environment, two microphones to detect the voice of survivors, and an IMU to recognize the approximate location and direction of the robot and survivors. Furthermore, we integrated a speaker into the sensor module system to provide a communication channel between the first responders and survivors. To integrated all these mechatronics systems in a small, compact snake head, we optimized the positions of the sensors and designed a stacked structure for the whole system. We also developed a user-friendly GUI to show the information from the proposed sensor systems visually. Experimental results verified the searching function of the proposed sensor module system.

Translating English By-Phrase Passives into Korean: A Parallel Corpus Analysis (영한 병렬 코퍼스에 나타난 영어 수동문의 한국어 번역)

  • Lee, Seung-Ah
    • Journal of English Language & Literature
    • /
    • v.56 no.5
    • /
    • pp.871-905
    • /
    • 2010
  • This paper is motivated by Watanabe's (2001) observation that English byphrase passives are sometimes translated into Japanese object topicalization constructions. That is, the original English sentence in the passive may be translated into the active voice with the logical object topicalized. A number of scholars, including Chomsky (1981) and Baker (1992), have remarked that languages have various ways to avoid focusing on the logical subject. The aim of the present study is to examine the translation equivalents of the English by-phrase passives in an English-Korean parallel corpus compiled by the author. A small sample of articles from Newsweek magazine and its published Korean translation reveals that there are indeed many ways to translate English by-phrase passives, including object topicalization (12.5%). Among the 64 translated sentences analyzed and classified, 12 (18.8%) examples were problematic in terms of agent defocusing, which is the primary function of passives. Of these 12 instances, five cases were identified where an alternative translation would be more suitable. The results suggest that the functional characteristics of English by-phrase passives should be highlighted in translator training as well as language teaching.

A study on combination of loss functions for effective mask-based speech enhancement in noisy environments (잡음 환경에 효과적인 마스크 기반 음성 향상을 위한 손실함수 조합에 관한 연구)

  • Jung, Jaehee;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.3
    • /
    • pp.234-240
    • /
    • 2021
  • In this paper, the mask-based speech enhancement is improved for effective speech recognition in noise environments. In the mask-based speech enhancement, enhanced spectrum is obtained by multiplying the noisy speech spectrum by the mask. The VoiceFilter (VF) model is used as the mask estimation, and the Spectrogram Inpainting (SI) technique is used to remove residual noise of enhanced spectrum. In this paper, we propose a combined loss to further improve speech enhancement. In order to effectively remove the residual noise in the speech, the positive part of the Triplet loss is used with the component loss. For the experiment TIMIT database is re-constructed using NOISEX92 noise and background music samples with various Signal to Noise Ratio (SNR) conditions. Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI) are used as the metrics of performance evaluation. When the VF was trained with the mean squared error and the SI model was trained with the combined loss, SDR, PESQ, and STOI were improved by 0.5, 0.06, and 0.002 respectively compared to the system trained only with the mean squared error.

Speed Limit Violation Warning Function in Trade Ports and Fairways - GPS Plotter and ECDIS Enhancements (항만과 진입수로에서 속력제한 위반 경고기능에 관한 연구 - GPS 플로터 및 ECDIS 기능개선을 중점으로 -)

  • Kim, Do-Hoon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.25 no.7
    • /
    • pp.841-850
    • /
    • 2019
  • The Korean government has designated speed-limit zones and speed limits in 19 ports and 3 routes to ensure safe navigation and transportation. However, the speed limit differs from port to port, no practical means of management exist. This often leads to violation of the speed limit. Additionally, ship collisions due to human error continue to occur. First, the study analyzed marine accidents that occurred at trade ports and fairways. The result of the analysis revealed the occurrence of 1344 accidents (average 269 cases per year) from 2014 to 2018. Five hundred sixty three accidents involved fishing boats, whereas, merchant vessels were involved in 508 cases. Second, the efficacy of the application of voice and message warnings to GPS plotters and electronic chart display and information system (ECDIS) was reviewed, and these were proposed as measures to inform vessel operators of the hazards of speed limit violation. Third, experts' opinions from relevant agencies and navigation system manufacturers were consulted and it was found that the proposed warning function was technically implementable. The findings are expected to help reduce human error among ship operators and establish a Korean e-navigation system.

Experience Design Guideline for Smart Car Interface (스마트카의 인터페이스를 위한 경험 디자인 가이드라인)

  • Yoo, Hoon Sik;Ju, Da Young
    • Design Convergence Study
    • /
    • v.15 no.1
    • /
    • pp.135-150
    • /
    • 2016
  • Due to the development of communication technology and expansion of Intelligent Transport System (ITS), the car is changing from a simple mechanical device to second living space which has comprehensive convenience function and is evolved into the platform which is playing as an interface for this role. As the interface area to provide various information to the passenger is being expanded, the research importance about smart car based user experience is rising. This study has a research objective to propose the guidelines regarding the smart car user experience elements. In order to conduct this study, smart car user experience elements were defined as function, interaction, and surface and through the discussions of UX/UI experts, 8 representative techniques, 14 representative techniques, and 8 locations of the glass windows were specified for each element. Following, the smart car users' priorities of the experience elements, which were defined through targeting 100 drivers, were analyzed in the form of questionnaire survey. The analysis showed that the users' priorities in applying the main techniques were in the order of safety, distance, and sensibility. The priorities of the production method were in the order of voice recognition, touch, gesture, physical button, and eye tracking. Furthermore, regarding the glass window locations, users prioritized the front of the driver's seat to the back. According to the demographic analysis on gender, there were no significant differences except for two functions. Therefore this showed that the guidelines of male and female can be commonly applied. Through user requirement analysis about individual elements, this study provides the guides about the requirement in each element to be applied to commercialized product with priority.

Automatic Speech Style Recognition Through Sentence Sequencing for Speaker Recognition in Bilateral Dialogue Situations (양자 간 대화 상황에서의 화자인식을 위한 문장 시퀀싱 방법을 통한 자동 말투 인식)

  • Kang, Garam;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.17-32
    • /
    • 2021
  • Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speaker's attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speaker's speech determines the type of sentence or has functions and information such as the speaker's intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.

Robust Speech Segmentation Method in Noise Environment for Speech Recognizer (음성인식기 구현을 위한 잡음에 강인한 음성구간 검출기법)

  • 김창근;박정원;권호민;허강인
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.2
    • /
    • pp.18-24
    • /
    • 2003
  • One of the most important subjects in the implementation of real time speech recognizer is to design both reliable VAD(Voice Activity Detection) and suitable speech feature vector. But, because it is difficult to calculate reliable VAD in the environment having surrounding noise, designed suitable speech feature vector may not be obtained. Solving this problem, in this paper, we implement not only short time power spectrum which is generally used but also two additive parameters, the comparison measure of spectrum density having robust property in noise and linear discriminant function using linear regression, then perform VAD by using the combination of each parameter having apt weight in other magnitudes of surrounding noise and confirm that proposed parameters show a robust characteristic in circumstances having surrounding noise by using DTW(Dynamic Time Waning) in recognition experiment.

  • PDF

A Reservation-based HWMP Routing Protocol Design Supporting E2E Bandwidth in TICN Combat Wireless Network (TICN 전투무선망에서의 종단간 대역폭을 보장하는 예약 기반 HWMP 라우팅 프로토콜 설계)

  • Jung, Whoi Jin;Min, Seok Hong;Kim, Bong Gyu;Choi, Hyung Suk;Lee, Jong Sung;Lee, Jae Yong;Kim, Byung Chul
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.16 no.2
    • /
    • pp.160-168
    • /
    • 2013
  • In tactical environment, tactical wireless networks are generally comprised of Tactical MANETs(T-MANETs) or Tactical WMNs(T-WMNs). The most important services in tactical network are voice and low rate data such as command control and situation awareness. These data should be forwarded via multi-hop in tactical wireless networks. Urgent and mission-critical data should be protected in this environment, so QoS(Quality of Service) must be guaranteed for specific type of traffic for satisfying the requirement of a user. In IEEE 802.11s, TDMA-based MAC protocol, MCCA(MCF Controlled Channel Access), has a function of resource reservation. But 802.11s protocol can not guarantee the end-to-end QoS, because it only supports reservation with neighbors. In this paper, we propose the routing protocol, R-HWMP(Reservation-based HWMP) which has the resource reservation to support the end-to-end QoS. The proposed protocol can reserve the channel slots and find optimal path in T-WMNs. We analyzed the performance of the proposed protocol and showed that end-to-end QoS is guaranteed using NS-2 simulation.