• Title/Summary/Keyword: AI Voice Speaker

Search Result 24, Processing Time 0.022 seconds

Convergence research on the speaker's voice perceived by listener, and suggestions for future research application

  • Hahm, SangWoo
    • International journal of advanced smart convergence
    • /
    • v.11 no.1
    • /
    • pp.55-63
    • /
    • 2022
  • Although research on the leader's or speaker's voice has been continuously conducted, existing research has a single point of view. Sound analysis of voice characteristics has been studied from engineering perspectives, and leadership trait theory has been studied from a business perspective. Convergence studies on leader voice and member cognition are being attempted today. Convergence research on voice has a positive effect on refinement of voice analysis, diversification of voice use, and establishment of voice utilization strategy. This study explains the current flow of research on convergence between speaker's voice and listener's perception, and suggests a direction for the future development of voice fusion research. Furthermore, in connection with AI in the 4th industrial age, new attempts for voice research are sought. First, advances in AI focus on strategically generating the voices needed for individual situations. Second, the voice corrected in real time will support the leader and speaker to utilize the desired voice type. Third, voices through AI based on big data will affect the cognition, attitude and behavior of individual listeners who members, customers, and students in more diverse situations. The purpose and significance of this study is to suggest the way to research the leader's voice recognized by members, and to suggest a method that can be applied in various situations.

The Effect of Perceived Anthropomorphic Characteristics on Continuous Usage Intention of Artificial Intelligence Voice Speaker : Based on the Integrated Adoption Model (인공지능 음성 스피커의 의인화 특성 지각 정도가 지속적 이용 의향에 미치는 영향: 통합 수용 모델을 기반으로)

  • Lee, Sungjoon
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.41-55
    • /
    • 2021
  • AI voice speaker has played an important role in forming an early market and development for AI-based goods and service with growing attention from many people. In this context, this research examined factors affecting continuous intention of AI voice speaker based on the integrated adoption model, which combined two factors of perceived playfulness and innovation resistance with extended technology acceptance model. It was also examined whether three perceived anthropomorphic features(i.e., perceived rational support, perceived intimacy, perceived cognitive openness) have influences on continuous intention of AI voice speaker. The data was collected by an online-survey and were responses of those who are in their 20s and 30s and have experienced in using AI voice speaker. They were analyzed by using SEM(Structural Equation Modeling). The results showed that all of perceived ease of use, perceived usefulness, perceived playfulness and innovation resistance had significant influences on continuous intention of AI voice speaker. In addition, all of perceived rational support, perceived intimacy and perceived cognitive openness as perceived anthropomorphic features had significant influences on perceived ease of use, perceived usefulness and perceived playfulness. The implications of found results in this research was also discussed.

Cyber Threats Analysis of AI Voice Recognition-based Services with Automatic Speaker Verification (화자식별 기반의 AI 음성인식 서비스에 대한 사이버 위협 분석)

  • Hong, Chunho;Cho, Youngho
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.33-40
    • /
    • 2021
  • Automatic Speech Recognition(ASR) is a technology that analyzes human speech sound into speech signals and then automatically converts them into character strings that can be understandable by human. Speech recognition technology has evolved from the basic level of recognizing a single word to the advanced level of recognizing sentences consisting of multiple words. In real-time voice conversation, the high recognition rate improves the convenience of natural information delivery and expands the scope of voice-based applications. On the other hand, with the active application of speech recognition technology, concerns about related cyber attacks and threats are also increasing. According to the existing studies, researches on the technology development itself, such as the design of the Automatic Speaker Verification(ASV) technique and improvement of accuracy, are being actively conducted. However, there are not many analysis studies of attacks and threats in depth and variety. In this study, we propose a cyber attack model that bypasses voice authentication by simply manipulating voice frequency and voice speed for AI voice recognition service equipped with automated identification technology and analyze cyber threats by conducting extensive experiments on the automated identification system of commercial smartphones. Through this, we intend to inform the seriousness of the related cyber threats and raise interests in research on effective countermeasures.

A Study on the Selection Factors of Contents Service for the Popularization of AI Speaker based on AHP (AI Speaker 대중화를 위한 콘텐츠 서비스 선택 요인에 관한 연구 - AHP(계층화 분석)를 중심으로)

  • Lee, Hweejae;Kim, Sunmoo;Byun, Hyung Gyoun
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.11
    • /
    • pp.38-48
    • /
    • 2020
  • The domestic AI speaker market is growing into a full-fledged early audience market beyond the innovative consumer market with 3 million domestic supply units at the end of 2018, but the reality is that for various reasons, we are not satisfied with the use. There are many previous papers on AI Speaker, but the majority of research so far tends to be biased towards the acceptance of the device's own performance. Many changes are being made, such as OTT providers trying to secure the market through collaboration with AI speaker providers. This study tried to identify the priorities for content services, which can be another major selection factor for AI speakers, excluding the factors of unsatisfactory technology. First, this study identified the priorities among AI speaker selection factors using AHP (Analytic Hierarchy Process), based on the AI speaker selection factors derived through literature research. The most important hierarchical factor are Concierge Service, Education Service, and Entertainment Service order in AI speaker selection, and the primary content among the individual factors was the one that ranked weather/temperature/fine dust (11.6%) and child caring content was in the second place (10.8%), and then music service was in the third place (9.8%). The three top priorities were derived from the items in the top tier 1, 2 and 3 priorities. Of the total 15 individual services, 6 sub-layers of Concierge Service (weather/temperature/fine dust, news, voice schedule notification) and Education Service (foreign language, toddler, reading books) were in the top 8, and two of the Entertainment Service Music service and movie service ranked third and sixth.

State Visualization Design of AI Speakers using Color Field Painting (색면추상 기법을 통한 AI 스피커의 상태 시각화 디자인 연구)

  • Hong, Seung Yoon;Choe, Jong-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.2
    • /
    • pp.572-580
    • /
    • 2020
  • Recently released AI speakers show a pattern of interacting with the user by mainly with voice and simultaneously displaying simple and formal visual feedback through status LED light. This is due to the limitations of the product characteristics of the speaker, which makes it difficult to interact variously, and even such visual feedback is not standardized for each product, and thus does not give a consistent user experience. By maximizing the visual elements that can be expressed through color and abstract movement to assist voice feedback, the product can provide the user with an extended experience that includes not only functional satisfaction but also emotional satisfaction. In this study, after analyzing the interaction methods of the existing AI speakers, we examined the theory of color communication in order to expand the visual feedback effect, and examined the meaning and expression technique of Color Field Painting, an art genre that maximizes the emotional experience by using only color. Through this, the AI speaker's visual communication function was expanded by designing a way to feedback communication status using LED light.

Study on the AI Speaker Security Evaluations and Countermeasure (AI 스피커의 보안성 평가 및 대응방안 연구)

  • Lee, Ji-seop;Kang, Soo-young;Kim, Seung-joo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.6
    • /
    • pp.1523-1537
    • /
    • 2018
  • The AI speaker is a simple operation that provides users with useful functions such as music playback, online search, and so the AI speaker market is growing at a very fast pace. However, AI speakers always wait for the user's voice, which can cause serious problems such as eavesdropping and personal information exposure if exposed to security threats. Therefore, in order to provide overall improved security of all AI speakers, it is necessary to identify potential security threats and analyze them systematically. In this paper, security threat modeling is performed by selecting four products with high market share. Data Flow Diagram, STRIDE and LINDDUN Threat modeling was used to derive a systematic and objective checklist for vulnerability checks. Finally, we proposed a method to improve the security of AI speaker by comparing the vulnerability analysis results and the vulnerability of each product.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

Interactions between AI Speaker and Children : A Field Study on the Success/Failure Cases by Types of Interactions (인공지능 스피커와 아동들의 상호작용 :유형별 성공/실패 사례 도출을 위한 현장 연구)

  • Hong, Junglim;Choi, Boreum
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.7
    • /
    • pp.19-29
    • /
    • 2020
  • As the AI speaker market is growing rapidly in recent years, the competition for the preoccupation of children who are the main users and the future prospective customers of the related companies is very intense. However, there is a lack of empirical research on how children interact with AI speakers. Therefore, this research examines the interactions between children and AI speakers, primarily through field studies, to extract what functions they use and what features they have. For this purpose, 799 conversations were collected and analyzed using the log data of the AI speaker recorded in real time. As a result, children were more likely to use children's songs, fairy tales, emotional conversations, and personification compared to adults. In addition, content analysis by specific types resulted in success/failure cases of interaction between children and AI speakers and proposed improvements by failure type. This study is meaningful in that it identifies children's AI speaker preferences, content, and major conversation patterns, and provides guidelines for developing services that meet children's eye level.

A Study on the User Experience of Smart Speaker in China - Focused on Tmall Genie and Mi AI Speaker - (중국 인공지능 스피커 사용자 경험에 관한 연구 - 티몰 지니와 샤오미 스마트 스피커를 중심으로 -)

  • Xiao, Xin-Ting;Kim, Seung-In
    • Journal of Digital Convergence
    • /
    • v.16 no.10
    • /
    • pp.409-414
    • /
    • 2018
  • In China, the usage of smart speaker is continuously increasing. In this study, it is aimed to research on the user experience of the Chinese smart speaker. Therefore, we did literature research followed with theoretical background of smart speaker, and did case study of worldwide popular smart speaker brands. On this basis, we conducted in-depth interview with 8 users who have experienced with the top-selling Chinese smart speaker product "Tmall Genie" and "Mi AI speaker". The interview is based on 7 principles named Honeycomb model, which created by Peter Morville. As a result, users' discomfort was found in the functional part and the usability part of the smart speaker. Furthermore, the users were highly unsatisfied with the smart speaker in the credibility part. Accordingly, Chinese smart speaker should consider the user experience aspects to complement functional and usability parts for user.

Sensor Control and Aquisition Information Using Voice I/O (음성 입출력을 이용한 센서 제어 및 정보 획득)

  • Youn, Hyung Jin;Lee, Chang Woo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.495-496
    • /
    • 2018
  • As more and more companies introduce artificial intelligent(AI) speakers, the price of the speakers has become a burden to someone. Based on some knowledge and dexterity, it is not difficult to make an AI speaker that acquires sensor information and environmental information of the house in accordance with your own taste. In this paper, we implement an AI speaker using Raspberry Pie, Google Cloud Speech (GCS) and Naver's Clova Speech Synthesis (CSS) API.

  • PDF