Search | Korea Science

Conformer with lexicon transducer for Korean end-to-end speech recognition (Lexicon transducer를 적용한 conformer 기반 한국어 end-to-end 음성인식)

Son, Hyunsoo;Park, Hosung;Kim, Gyujin;Cho, Eunsoo;Kim, Ji-Hwan
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.5
- /
- pp.530-536
- /
- 2021
Recently, due to the development of deep learning, end-to-end speech recognition, which directly maps graphemes to speech signals, shows good performance. Especially, among the end-to-end models, conformer shows the best performance. However end-to-end models only focuses on the probability of which grapheme will appear at the time. The decoding process uses a greedy search or beam search. This decoding method is easily affected by the final probability output by the model. In addition, the end-to-end models cannot use external pronunciation and language information due to structual problem. Therefore, in this paper conformer with lexicon transducer is proposed. We compare phoneme-based model with lexicon transducer and grapheme-based model with beam search. Test set is consist of words that do not appear in training data. The grapheme-based conformer with beam search shows 3.8 % of CER. The phoneme-based conformer with lexicon transducer shows 3.4 % of CER.
https://doi.org/10.7776/ASK.2021.40.5.530 인용 PDF KSCI

Study on the Retrieval of Vertical Air Motion from the Surface-Based and Airborne Cloud Radar (구름레이더를 이용한 대기 공기의 연직속도 추정연구)

Jung, Eunsil
- Atmosphere
- /
- v.29 no.1
- /
- pp.105-112
- /
- 2019
Measurements of vertical air motion and microphysics are essential for improving our understanding of convective clouds. In this paper, the author reviews the current research on the retrieval of vertical air motions using the cloud radar. At radar wavelengths of 3 mm (W-band radar; 94-GHz radar; cloud radar), the raindrop backscattering cross-section (${\sigma}b$) varies between successive maxima and minima as a function of the raindrop diameter (D) that are well described by Mie theory. The first Mie minimum in the backscattering cross-section occurs at D~1.68 mm, which translates to a raindrop terminal fall velocity of ${\sim}5.85m\;s^{-1}$ based on the Gunn and Kinzer relationship. Since raindrop diameters often exceed this size, the signal is captured in the radar Doppler spectrum, and thus, the location of the first Mie minimum can be used as a reference for retrieving the vertical air motion. The Mie technique is applied to radar Doppler spectra from the surface-based and airborne, upward pointing W-band radars. The contributions of aircraft motion to the vertical air motion are also described and further the first-order aircraft motion corrected equation is presented. The review also shows that the separate spectral peaks due to the cloud droplets can provide independent validation of the Mie technique retrieved vertical air motion using the cloud droplets as a tracer of vertical air motion.
https://doi.org/10.14191/Atmos.2019.29.1.105 인용 PDF KSCI HTML

Flood Mapping Using Modified U-NET from TerraSAR-X Images (TerraSAR-X 영상으로부터 Modified U-NET을 이용한 홍수 매핑)

Yu, Jin-Woo;Yoon, Young-Woong;Lee, Eu-Ru;Baek, Won-Kyung;Jung, Hyung-Sup
- Korean Journal of Remote Sensing
- /
- v.38 no.6_2
- /
- pp.1709-1722
- /
- 2022
The rise in temperature induced by global warming caused in El Nino and La Nina, and abnormally changed the temperature of seawater. Rainfall concentrates in some locations due to abnormal variations in seawater temperature, causing frequent abnormal floods. It is important to rapidly detect flooded regions to recover and prevent human and property damage caused by floods. This is possible with synthetic aperture radar. This study aims to generate a model that directly derives flood-damaged areas by using modified U-NET and TerraSAR-X images based on Multi Kernel to reduce the effect of speckle noise through various characteristic map extraction and using two images before and after flooding as input data. To that purpose, two synthetic aperture radar (SAR) images were preprocessed to generate the model's input data, which was then applied to the modified U-NET structure to train the flood detection deep learning model. Through this method, the flood area could be detected at a high level with an average F1 score value of 0.966. This result is expected to contribute to the rapid recovery of flood-stricken areas and the derivation of flood-prevention measures.
https://doi.org/10.7780/kjrs.2022.38.6.2.11 인용 PDF KSCI HTML

Extending StarGAN-VC to Unseen Speakers Using RawNet3 Speaker Representation (RawNet3 화자 표현을 활용한 임의의 화자 간 음성 변환을 위한 StarGAN의 확장)

Bogyung Park;Somin Park;Hyunki Hong
- KIPS Transactions on Software and Data Engineering
- /
- v.12 no.7
- /
- pp.303-314
- /
- 2023
Voice conversion, a technology that allows an individual's speech data to be regenerated with the acoustic properties(tone, cadence, gender) of another, has countless applications in education, communication, and entertainment. This paper proposes an approach based on the StarGAN-VC model that generates realistic-sounding speech without requiring parallel utterances. To overcome the constraints of the existing StarGAN-VC model that utilizes one-hot vectors of original and target speaker information, this paper extracts feature vectors of target speakers using a pre-trained version of Rawnet3. This results in a latent space where voice conversion can be performed without direct speaker-to-speaker mappings, enabling an any-to-any structure. In addition to the loss terms used in the original StarGAN-VC model, Wasserstein distance is used as a loss term to ensure that generated voice segments match the acoustic properties of the target voice. Two Time-Scale Update Rule (TTUR) is also used to facilitate stable training. Experimental results show that the proposed method outperforms previous methods, including the StarGAN-VC network on which it was based.
https://doi.org/10.3745/KTSDE.2023.12.7.303 인용 PDF

A Topic Analysis of Requested Books by User Types at a University Library for Patron-Driven Acquisition (이용자 요구 기반 장서개발을 위한 대학도서관 희망도서 주제 분석)

Sanghee Choi
- Journal of the Korean Society for Library and Information Science
- /
- v.58 no.1
- /
- pp.395-415
- /
- 2024
In the development of a university library's collection, the concept of patron-driven acquisition refers to a collection strategy that addresses users' direct information needs. In this study, an analysis of ten years' worth of book requests by user types was conducted to understand the topic preferences for efficient collection devleopment in the university library. In collection development, identifying subject areas of users' requested books is necessary for librarians to identify key areas of collection development and establish balanced collection development policies. To identify the major subject areas for each user group, KDC (Korean Decimal Classification) subject classifications were used, and network analysis techniques were applied to investigate the relationships between book topics in detail. The analysis revealed that "social sciences" emerged as the major topic across all user groups. However, in the analysis of sub-topics, "medicine" and "psychology" were distinctively identified as the major subject areas for graduate students, setting them apart from other user groups. The result of the network analysis further indicated that undergraduate students showed unique topics such as civil service, job placement, and career, which were not observed as major topic clusters in other user groups. On the other hand, graduate students tended to concentrate on a few specialized subjects, forming distinct topic clusters in the analysis.
https://doi.org/10.4275/KSLIS.2024.58.1.395 인용 PDF

Physical Offset of UAVs Calibration Method for Multi-sensor Fusion (다중 센서 융합을 위한 무인항공기 물리 오프셋 검보정 방법)

Kim, Cheolwook;Lim, Pyeong-chae;Chi, Junhwa;Kim, Taejung;Rhee, Sooahm
- Korean Journal of Remote Sensing
- /
- v.38 no.6_1
- /
- pp.1125-1139
- /
- 2022
In an unmanned aerial vehicles (UAVs) system, a physical offset can be existed between the global positioning system/inertial measurement unit (GPS/IMU) sensor and the observation sensor such as a hyperspectral sensor, and a lidar sensor. As a result of the physical offset, a misalignment between each image can be occurred along with a flight direction. In particular, in a case of multi-sensor system, an observation sensor has to be replaced regularly to equip another observation sensor, and then, a high cost should be paid to acquire a calibration parameter. In this study, we establish a precise sensor model equation to apply for a multiple sensor in common and propose an independent physical offset estimation method. The proposed method consists of 3 steps. Firstly, we define an appropriate rotation matrix for our system, and an initial sensor model equation for direct-georeferencing. Next, an observation equation for the physical offset estimation is established by extracting a corresponding point between a ground control point and the observed data from a sensor. Finally, the physical offset is estimated based on the observed data, and the precise sensor model equation is established by applying the estimated parameters to the initial sensor model equation. 4 region's datasets(Jeon-ju, Incheon, Alaska, Norway) with a different latitude, longitude were compared to analyze the effects of the calibration parameter. We confirmed that a misalignment between images were adjusted after applying for the physical offset in the sensor model equation. An absolute position accuracy was analyzed in the Incheon dataset, compared to a ground control point. For the hyperspectral image, root mean square error (RMSE) for X, Y direction was calculated for 0.12 m, and for the point cloud, RMSE was calculated for 0.03 m. Furthermore, a relative position accuracy for a specific point between the adjusted point cloud and the hyperspectral images were also analyzed for 0.07 m, so we confirmed that a precise data mapping is available for an observation without a ground control point through the proposed estimation method, and we also confirmed a possibility of multi-sensor fusion. From this study, we expect that a flexible multi-sensor platform system can be operated through the independent parameter estimation method with an economic cost saving.
https://doi.org/10.7780/kjrs.2022.38.6.1.13 인용 PDF KSCI HTML

Search Result 96, Processing Time 0.018 seconds

Conformer with lexicon transducer for Korean end-to-end speech recognition (Lexicon transducer를 적용한 conformer 기반 한국어 end-to-end 음성인식)

Study on the Retrieval of Vertical Air Motion from the Surface-Based and Airborne Cloud Radar (구름레이더를 이용한 대기 공기의 연직속도 추정연구)

Flood Mapping Using Modified U-NET from TerraSAR-X Images (TerraSAR-X 영상으로부터 Modified U-NET을 이용한 홍수 매핑)

Extending StarGAN-VC to Unseen Speakers Using RawNet3 Speaker Representation (RawNet3 화자 표현을 활용한 임의의 화자 간 음성 변환을 위한 StarGAN의 확장)

A Topic Analysis of Requested Books by User Types at a University Library for Patron-Driven Acquisition (이용자 요구 기반 장서개발을 위한 대학도서관 희망도서 주제 분석)

Physical Offset of UAVs Calibration Method for Multi-sensor Fusion (다중 센서 융합을 위한 무인항공기 물리 오프셋 검보정 방법)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)