Search | Korea Science

A study on end-to-end speaker diarization system using single-label classification (단일 레이블 분류를 이용한 종단 간 화자 분할 시스템 성능 향상에 관한 연구)

Jaehee Jung;Wooil Kim
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.6
- /
- pp.536-543
- /
- 2023
Speaker diarization, which labels for "who spoken when?" in speech with multiple speakers, has been studied on a deep neural network-based end-to-end method for labeling on speech overlap and optimization of speaker diarization models. Most deep neural network-based end-to-end speaker diarization systems perform multi-label classification problem that predicts the labels of all speakers spoken in each frame of speech. However, the performance of the multi-label-based model varies greatly depending on what the threshold is set to. In this paper, it is studied a speaker diarization system using single-label classification so that speaker diarization can be performed without thresholds. The proposed model estimate labels from the output of the model by converting speaker labels into a single label. To consider speaker label permutations in the training, the proposed model is used a combination of Permutation Invariant Training (PIT) loss and cross-entropy loss. In addition, how to add the residual connection structures to model is studied for effective learning of speaker diarization models with deep structures. The experiment used the Librispech database to generate and use simulated noise data for two speakers. When compared with the proposed method and baseline model using the Diarization Error Rate (DER) performance the proposed method can be labeling without threshold, and it has improved performance by about 20.7 %.
https://doi.org/10.7776/ASK.2023.42.6.536 인용 PDF

Applying deep learning based super-resolution technique for high-resolution urban flood analysis (고해상도 도시 침수 해석을 위한 딥러닝 기반 초해상화 기술 적용)

Choi, Hyeonjin;Lee, Songhee;Woo, Hyuna;Kim, Minyoung;Noh, Seong Jin
- Journal of Korea Water Resources Association
- /
- v.56 no.10
- /
- pp.641-653
- /
- 2023
As climate change and urbanization are causing unprecedented natural disasters in urban areas, it is crucial to have urban flood predictions with high fidelity and accuracy. However, conventional physically- and deep learning-based urban flood modeling methods have limitations that require a lot of computer resources or data for high-resolution flooding analysis. In this study, we propose and implement a method for improving the spatial resolution of urban flood analysis using a deep learning based super-resolution technique. The proposed approach converts low-resolution flood maps by physically based modeling into the high-resolution using a super-resolution deep learning model trained by high-resolution modeling data. When applied to two cases of retrospective flood analysis at part of City of Portland, Oregon, U.S., the results of the 4-m resolution physical simulation were successfully converted into 1-m resolution flood maps through super-resolution. High structural similarity between the super-solution image and the high-resolution original was found. The results show promising image quality loss within an acceptable limit of 22.80 dB (PSNR) and 0.73 (SSIM). The proposed super-resolution method can provide efficient model training with a limited number of flood scenarios, significantly reducing data acquisition efforts and computational costs.
https://doi.org/10.3741/JKWRA.2023.56.10.641 인용 PDF

Exploring the power of physics-informed neural networks for accurate and efficient solutions to 1D shallow water equations (물리 정보 신경망을 이용한 1차원 천수방정식의 해석)

Nguyen, Van Giang;Nguyen, Van Linh;Jung, Sungho;An, Hyunuk;Lee, Giha
- Journal of Korea Water Resources Association
- /
- v.56 no.12
- /
- pp.939-953
- /
- 2023
Shallow water equations (SWE) serve as fundamental equations governing the movement of the water. Traditional numerical approaches for solving these equations generally face various challenges, such as sensitivity to mesh generation, and numerical oscillation, or become more computationally unstable around shock and discontinuities regions. In this study, we present a novel approach that leverages the power of physics-informed neural networks (PINNs) to approximate the solution of the SWE. PINNs integrate physical law directly into the neural network architecture, enabling the accurate approximation of solutions to the SWE. We provide a comprehensive methodology for formulating the SWE within the PINNs framework, encompassing network architecture, training strategy, and data generation techniques. Through the results obtained from experiments, we found that PINNs could be an accurate output solution of SWE when its results were compared with the analytical method. In addition, PINNs also present better performance over the Artificial Neural Network. This study highlights the transformative potential of PINNs in revolutionizing water resources research, offering a new paradigm for accurate and efficient solutions to the SVE.
https://doi.org/10.3741/JKWRA.2023.56.12.939 인용 PDF

Development of a deep learning-based cabbage core region detection and depth classification model (딥러닝 기반 배추 심 중심 영역 및 깊이 분류 모델 개발)

Ki Hyun Kwon;Jong Hyeok Roh;Ah-Na Kim;Tae Hyong Kim
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.16 no.6
- /
- pp.392-399
- /
- 2023
This paper proposes a deep learning model to determine the region and depth of cabbage cores for robotic automation of the cabbage core removal process during the kimchi manufacturing process. In addition, rather than predicting the depth of the measured cabbage, a model was presented that simultaneously detects and classifies the area by converting it into a discrete class. For deep learning model learning and verification, RGB images of the harvested cabbage 522 were obtained. The core region and depth labeling and data augmentation techniques from the acquired images was processed. MAP, IoU, acuity, sensitivity, specificity, and F1-score were selected to evaluate the performance of the proposed YOLO-v4 deep learning model-based cabbage core area detection and classification model. As a result, the mAP and IoU values were 0.97 and 0.91, respectively, and the acuity and F1-score values were 96.2% and 95.5% for depth classification, respectively. Through the results of this study, it was confirmed that the depth information of cabbage can be classified, and that it can be used in the development of a robot-automation system for the cabbage core removal process in the future.
https://doi.org/10.17661/jkiiect.2023.16.6.392 인용 PDF HTML

Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network (CNN 기반 스펙트로그램을 이용한 자유발화 음성감정인식)

Guiyoung Son;Soonil Kwon
- The Transactions of the Korea Information Processing Society
- /
- v.13 no.6
- /
- pp.284-290
- /
- 2024
Speech emotion recognition (SER) is a technique that is used to analyze the speaker's voice patterns, including vibration, intensity, and tone, to determine their emotional state. There has been an increase in interest in artificial intelligence (AI) techniques, which are now widely used in medicine, education, industry, and the military. Nevertheless, existing researchers have attained impressive results by utilizing acted-out speech from skilled actors in a controlled environment for various scenarios. In particular, there is a mismatch between acted and spontaneous speech since acted speech includes more explicit emotional expressions than spontaneous speech. For this reason, spontaneous speech-emotion recognition remains a challenging task. This paper aims to conduct emotion recognition and improve performance using spontaneous speech data. To this end, we implement deep learning-based speech emotion recognition using the VGG (Visual Geometry Group) after converting 1-dimensional audio signals into a 2-dimensional spectrogram image. The experimental evaluations are performed on the Korean spontaneous emotional speech database from AI-Hub, consisting of 7 emotions, i.e., joy, love, anger, fear, sadness, surprise, and neutral. As a result, we achieved an average accuracy of 83.5% and 73.0% for adults and young people using a time-frequency 2-dimension spectrogram, respectively. In conclusion, our findings demonstrated that the suggested framework outperformed current state-of-the-art techniques for spontaneous speech and showed a promising performance despite the difficulty in quantifying spontaneous speech emotional expression.
https://doi.org/10.3745/TKIPS.2024.13.6.284 인용 PDF

Dimensional Quality Assessment for Assembly Part of Prefabricated Steel Structures Using a Stereo Vision Sensor (스테레오 비전 센서 기반 프리팹 강구조물 조립부 형상 품질 평가)

Jonghyeok Kim;Haemin Jeon
- Journal of the Computational Structural Engineering Institute of Korea
- /
- v.37 no.3
- /
- pp.173-178
- /
- 2024
This study presents a technique for assessing the dimensional quality of assembly parts in Prefabricated Steel Structures (PSS) using a stereo vision sensor. The stereo vision system captures images and point cloud data of the assembly area, followed by applying image processing algorithms such as fuzzy-based edge detection and Hough transform-based circular bolt hole detection to identify bolt hole locations. The 3D center positions of each bolt hole are determined by correlating 3D real-world position information from depth images with the extracted bolt hole positions. Principal Component Analysis (PCA) is then employed to calculate coordinate axes for precise measurement of distances between bolt holes, even when the sensor and structure orientations differ. Bolt holes are sorted based on their 2D positions, and the distances between sorted bolt holes are calculated to assess the assembly part's dimensional quality. Comparison with actual drawing data confirms measurement accuracy with an absolute error of 1mm and a relative error within 4% based on median criteria.
https://doi.org/10.7734/COSEIK.2024.37.3.173 인용 PDF

Multi-View 3D Human Pose Estimation Based on Transformer (트랜스포머 기반의 다중 시점 3차원 인체자세추정)

Seoung Wook Choi;Jin Young Lee;Gye Young Kim
- Smart Media Journal
- /
- v.12 no.11
- /
- pp.48-56
- /
- 2023
The technology of Three-dimensional human posture estimation is used in sports, motion recognition, and special effects of video media. Among various methods for this, multi-view 3D human pose estimation is essential for precise estimation even in complex real-world environments. But Existing models for multi-view 3D human posture estimation have the disadvantage of high order of time complexity as they use 3D feature maps. This paper proposes a method to extend an existing monocular viewpoint multi-frame model based on Transformer with lower time complexity to 3D human posture estimation for multi-viewpoints. To expand to multi-viewpoints our proposed method first generates an 8-dimensional joint coordinate that connects 2-dimensional joint coordinates for 17 joints at 4-vieiwpoints acquired using the 2-dimensional human posture detector, CPN(Cascaded Pyramid Network). This paper then converts them into 17×32 data with patch embedding, and enters the data into a transformer model, finally. Consequently, the MLP(Multi-Layer Perceptron) block that outputs the 3D-human posture simultaneously updates the 3D human posture estimation for 4-viewpoints at every iteration. Compared to Zheng[5]'s method the number of model parameters of the proposed method was 48.9%, MPJPE(Mean Per Joint Position Error) was reduced by 20.6 mm (43.8%) and the average learning time per epoch was more than 20 times faster.
PDF

Large-Scale Current Source Development in Nuclear Power Plant (원전에 사용되는 직류전압제어 대전류원의 개발)

Jong-ho Kim;Gyu-shik Che
- Journal of Advanced Navigation Technology
- /
- v.28 no.3
- /
- pp.348-355
- /
- 2024
A current source capable of stably supplying current as a measurement medium is required in order to measure and test important facilities that require large-scale measurement current, such as a control element drive mechanism control system(CEDMCS), in case of dismantling a nuclear power plant. However, it can provides only voltage power as a source, not current, although direct voltage controlled constant current source is essential to test major equipment. That kind of source is not available to supply stable constant current regardless of load variation. It is just voltage supplier. Developing current source is not easy other than voltage source. Very large-scale current source up to ampere class more than such ten times of normal current is inevitable to test above mentioned equipment. So, we developed large-scale current source which is controlled by input DC voltage and supplies constant stable current to object equipment according to this requirement. We measured and tested nuclear power plant equipment using given real site data for a long time and afforded long period load test, and then proved its validity and verification. The developed invetion will be used future installed important equipment measuring and testing.
https://doi.org/10.12673/jant.2024.28.3.348 인용 PDF HTML

Investigation of the level difference of floor impact noises through the shape variation of EVA resilient materials with composite floor structure (EVA 완충재의 형상변환을 통한 복합구조의 바닥충격음 변이 조사)

Jakin Lee;Seung-Min Lee;Chan-Hoon Haan
- The Journal of the Acoustical Society of Korea
- /
- v.43 no.1
- /
- pp.60-71
- /
- 2024
The present study aims to investigate the level difference of floor impact noises of composite floor structure using EVA resilient materials. In order to this, four different types of resilient materials were designed combining PET, PP sheet and EVA mount including Flat type, Deck type, Cavity type and Mount type. Totally 9 different samples were made for acoustic measurements which were carried out twice with bang-machine and impact ball as the heavy-weight floor impact noise sources. All the floor impact noise measurements were undertaken at the authentication institution. As a result, concerning Flat and Cavity types, it was found that 2 dB ~ 5 dB of heavy-weight floor impact noise was reduced supplementally when PET was added, while floor impact noise larger than 50 dB was acquired when single resilient material was used. Especially, most high performance was obtained for Mount type with 1st grade of light-weight floor impact noise and 2nd grade of heavy-weight floor impact noise. This is because of material property with low dense PET sound absorption materials which fill all around EVA mounts. Also, it was considered that this results are due to the sound impact absorption by the both EVA mounts and the air cavity between EVA mount and PP sheet. Also, it was found that at least 36 EVA mounts per 1m2 area of resilient panel make more noise reduction of heavy-weight floor impact noises.
https://doi.org/10.7776/ASK.2024.43.1.060 인용 PDF

A 10b 50MS/s Low-Power Skinny-Type 0.13um CMOS ADC for CIS Applications (CIS 응용을 위해 제한된 폭을 가지는 10비트 50MS/s 저 전력 0.13um CMOS ADC)

Song, Jung-Eun;Hwang, Dong-Hyun;Hwang, Won-Seok;Kim, Kwang-Soo;Lee, Seung-Hoon
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.48 no.5
- /
- pp.25-33
- /
- 2011
This work proposes a skinny-type 10b 50MS/s 0.13um CMOS three-step pipeline ADC for CIS applications. Analog circuits for CIS applications commonly employ a high supply voltage to acquire a sufficiently acceptable dynamic range, while digital circuits use a low supply voltage to minimize power consumption. The proposed ADC converts analog signals in a wide-swing range to low voltage-based digital data using both of the two supply voltages. An op-amp sharing technique employed in residue amplifiers properly controls currents depending on the amplification mode of each pipeline stage, optimizes the performance of op-amps, and improves the power efficiency. In three FLASH ADCs, the number of input stages are reduced in half by the interpolation technique while each comparator consists of only a latch with low kick-back noise based on pull-down switches to separate the input nodes and output nodes. Reference circuits achieve a required settling time only with on-chip low-power drivers and digital correction logic has two kinds of level shifter depending on signal-voltage levels to be processed. The prototype ADC in a 0.13um CMOS to support 0.35um thick-gate-oxide transistors demonstrates the measured DNL and INL within 0.42LSB and 1.19LSB, respectively. The ADC shows a maximum SNDR of 55.4dB and a maximum SFDR of 68.7dB at 50MS/s, respectively. The ADC with an active die area of 0.53$mm^2$ consumes 15.6mW at 50MS/s with an analog voltage of 2.0V and two digital voltages of 2.8V ($=D_H$) and 1.2V ($=D_L$).
PDF KSCI

Search Result 3,708, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)