• Title/Summary/Keyword: Automatic Transcription

Search Result 35, Processing Time 0.021 seconds

A study on improving the performance of the machine-learning based automatic music transcription model by utilizing pitch number information (음고 개수 정보 활용을 통한 기계학습 기반 자동악보전사 모델의 성능 개선 연구)

  • Daeho Lee;Seokjin Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.207-213
    • /
    • 2024
  • In this paper, we study how to improve the performance of a machine learning-based automatic music transcription model by adding musical information to the input data. Where, the added musical information is information on the number of pitches that occur in each time frame, and which is obtained by counting the number of notes activated in the answer sheet. The obtained information on the number of pitches was used by concatenating it to the log mel-spectrogram, which is the input of the existing model. In this study, we use the automatic music transcription model included the four types of block predicting four types of musical information, we demonstrate that a simple method of adding pitch number information corresponding to the music information to be predicted by each block to the existing input was helpful in training the model. In order to evaluate the performance improvement proceed with an experiment using MIDI Aligned Piano Sounds (MAPS) data, as a result, when using all pitch number information, performance improvement was confirmed by 9.7 % in frame-based F1 score and 21.8 % in note-based F1 score including offset.

A Threshold Adaptation based Voice Query Transcription Scheme for Music Retrieval (음악검색을 위한 가변임계치 기반의 음성 질의 변환 기법)

  • Han, Byeong-Jun;Rho, Seung-Min;Hwang, Een-Jun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.2
    • /
    • pp.445-451
    • /
    • 2010
  • This paper presents a threshold adaptation based voice query transcription scheme for music information retrieval. The proposed scheme analyzes monophonic voice signal and generates its transcription for diverse music retrieval applications. For accurate transcription, we propose several advanced features including (i) Energetic Feature eXtractor (EFX) for onset, peak, and transient area detection; (ii) Modified Windowed Average Energy (MWAE) for defining multiple small but coherent windows with local threshold values as offset detector; and finally (iii) Circular Average Magnitude Difference Function (CAMDF) for accurate acquisition of fundamental frequency (F0) of each frame. In order to evaluate the performance of our proposed scheme, we implemented a prototype music transcription system called AMT2 (Automatic Music Transcriber version 2) and carried out various experiments. In the experiment, we used QBSH corpus [1], adapted in MIREX 2006 contest data set. Experimental result shows that our proposed scheme can improve the transcription performance.

Reducing latency of neural automatic piano transcription models (인공신경망 기반 저지연 피아노 채보 모델)

  • Dasol Lee;Dasaem Jeong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.2
    • /
    • pp.102-111
    • /
    • 2023
  • Automatic Music Transcription (AMT) is a task that detects and recognizes musical note events from a given audio recording. In this paper, we focus on reducing the latency of real-time AMT systems on piano music. Although neural AMT models have been adapted for real-time piano transcription, they suffer from high latency, which hinders their usefulness in interactive scenarios. To tackle this issue, we explore several techniques for reducing the intrinsic latency of a neural network for piano transcription, including reducing window and hop sizes of Fast Fourier Transformation (FFT), modifying convolutional layer's kernel size, and shifting the label in the time-axis to train the model to predict onset earlier. Our experiments demonstrate that combining these approaches can lower latency while maintaining high transcription accuracy. Specifically, our modified model achieved note F1 scores of 92.67 % and 90.51 % with latencies of 96 ms and 64 ms, respectively, compared to the baseline model's note F1 score of 93.43 % with a latency of 160 ms. This methodology has potential for training AMT models for various interactive scenarios, including providing real-time feedback for piano education.

Automatic Speech Database Verification Method Based on Confidence Measure

  • Kang Jeomja;Jung Hoyoung;Kim Sanghun
    • MALSORI
    • /
    • no.51
    • /
    • pp.71-84
    • /
    • 2004
  • In this paper, we propose the automatic speech database verification method(or called automatic verification) based on confidence measure for a large speech database. This method verifies the consistency between given transcription and speech using the confidence measure. The automatic verification process consists of two stages : the word-level likelihood computation stage and multi-level likelihood ratio computation stage. In the word-level likelihood computation stage, we calculate the word-level likelihood using the viterbi decoding algorithm and make the segment information. In the multi-level likelihood ratio computation stage, we calculate the word-level and the phone-level likelihood ratio based on confidence measure with anti-phone model. By automatic verification, we have achieved about 61% error reduction. And also we can reduce the verification time from 1 month in manual to 1-2 days in automatic.

  • PDF

Automatic Transcription of the Union Symbols in Korean Texts (한국어 텍스트에 사용된 이음표의 자동 전사)

  • 윤애선;권혁철
    • Language and Information
    • /
    • v.7 no.1
    • /
    • pp.23-40
    • /
    • 2003
  • In this paper, we have proposed Auto-TUS, an automatic transcription module of three union symbols-hyphen, dash and tilde (‘­’, ‘―’, ‘∼’)-using their linguistic contexts. Few previous studies have discussed the problems of ambiguities in transcribing symbols into Korean alphabetic letters. We have classified six different reading formulae of the union symbols, analyzed the left and right contexts of the symbols, and investigated selection rules and distributions between the symbols and their contexts. Based on these linguistic features, 86 stereotyped patterns, 78 rules and 8 heuristics determining the types of reading formulae are suggested for Auto-TUS. This module works modularly in three steps. The pilot test was conducted with three test suites, which contains respectively 418, 987 and 1,014 clusters of words containing a union symbol. Encouraging results of 97.36%, 98.48%, 96.55% accuracy were obtained for three test suites. Our next phases are to develop a guessing routine for unknown contexts of the union symbols by using statistical information; to refine the proper nouns and terminology detecting module; and to apply Auto-TUS on a larger scale.

  • PDF

Humming based High Quality Music Creation (허밍을 이용한 고품질 음악 생성)

  • Lee, Yoonjae;Kim, Sunmin
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2014.10a
    • /
    • pp.146-149
    • /
    • 2014
  • In this paper, humming based automatic music creation method is described. It is difficult for the general public which does not have music theory to compose the music in general. However, almost people can make the main melody by a humming. With this motivation, a melody and chord sequences are estimated by the humming analysis. In this paper, humming is generated without a metronome. Then based on the estimated chord sequence, accompaniment is generated using the MIDI template matched to each chord. The 5 Genre is supported in the music creation. The melody transcription is evaluated in terms of onset and pitch estimation accuracy and MOS evaluation is used for created music evaluation.

  • PDF

A semi-automatic cell type annotation method for single-cell RNA sequencing dataset

  • Kim, Wan;Yoon, Sung Min;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • v.18 no.3
    • /
    • pp.26.1-26.6
    • /
    • 2020
  • Single-cell RNA sequencing (scRNA-seq) has been widely applied to provide insights into the cell-by-cell expression difference in a given bulk sample. Accordingly, numerous analysis methods have been developed. As it involves simultaneous analyses of many cell and genes, efficiency of the methods is crucial. The conventional cell type annotation method is laborious and subjective. Here we propose a semi-automatic method that calculates a normalized score for each cell type based on user-supplied cell type-specific marker gene list. The method was applied to a publicly available scRNA-seq data of mouse cardiac non-myocyte cell pool. Annotating the 35 t-stochastic neighbor embedding clusters into 12 cell types was straightforward, and its accuracy was evaluated by constructing co-expression network for each cell type. Gene Ontology analysis was congruent with the annotated cell type and the corollary regulatory network analysis showed upstream transcription factors that have well supported literature evidences. The source code is available as an R script upon request.

Statistical Analysis of Korean Phonological Rules Using a Automatic Phonetic Transcription (발음열 자동 변환을 이용한 한국어 음운 변화 규칙의 통계적 분석)

  • Lee Kyong-Nim;Chung Minhwa
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.81-85
    • /
    • 2002
  • We present a statistical analysis of Korean phonological variations using automatic generation of phonetic transcription. We have constructed the automatic generation system of Korean pronunciation variants by applying rules modeling obligatory and optional phonemic changes and allophonic changes. These rules are derived from knowledge-based morphophonological analysis and government standard pronunciation rules. This system is optimized for continuous speech recognition by generating phonetic transcriptions for training and constructing a pronunciation dictionary for recognition. In this paper, we describe Korean phonological variations by analyzing the statistics of phonemic change rule applications for the 60,000 sentences in the Samsung PBS(Phonetic Balanced Sentence) Speech DB. Our results show that the most frequently happening obligatory phonemic variations are in the order of liaison, tensification, aspirationalization, and nasalization of obstruent, and that the most frequently happening optional phonemic variations are in the order of initial consonant h-deletion, insertion of final consonant with the same place of articulation as the next consonants, and deletion of final consonant with the same place of articulation as the next consonants. These statistics can be used for improving the performance of speech recognition systems.

  • PDF

Implement of Semi-automatic Labeling Using Transcripts Text (전사텍스트를 이용한 반자동 레이블링 구현)

  • Won, Dong-Jin;Chang, Moon-soo;Kang, Sun-Mee
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.6
    • /
    • pp.585-591
    • /
    • 2015
  • In transcription for spoken language research, labeling is a work linking text-represented utterance to recorded speech. Most existing labeling tools have been working manually. Semi-automatic labeling we are proposing consists of automation module and manual adjustment module. Automation module extracts voice boundaries utilizing G.Saha's algorithm, and predicts utterance boundaries using the number and length of utterance which established utterance text. For maintaining existing manual tool's accuracy, we provide manual adjustment user interface revising the auto-labeling utterance boundaries. The implemented tool of our semi-automatic algorithm speed up to 27% than existing manual labeling tools.

Spoken-to-written text conversion for enhancement of Korean-English readability and machine translation

  • HyunJung Choi;Muyeol Choi;Seonhui Kim;Yohan Lim;Minkyu Lee;Seung Yun;Donghyun Kim;Sang Hun Kim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.127-136
    • /
    • 2024
  • The Korean language has written (formal) and spoken (phonetic) forms that differ in their application, which can lead to confusion, especially when dealing with numbers and embedded Western words and phrases. This fact makes it difficult to automate Korean speech recognition models due to the need for a complete transcription training dataset. Because such datasets are frequently constructed using broadcast audio and their accompanying transcriptions, they do not follow a discrete rule-based matching pattern. Furthermore, these mismatches are exacerbated over time due to changing tacit policies. To mitigate this problem, we introduce a data-driven Korean spoken-to-written transcription conversion technique that enhances the automatic conversion of numbers and Western phrases to improve automatic translation model performance.