Search | Korea Science

End-to-end non-autoregressive fast text-to-speech (End-to-end 비자기회귀식 가속 음성합성기)

Kim, Wiback;Nam, Hosung
- Phonetics and Speech Sciences
- /
- v.13 no.4
- /
- pp.47-53
- /
- 2021
Autoregressive Text-to-Speech (TTS) models suffer from inference instability and slow inference speed. Inference instability occurs when a poorly predicted sample at time step t affects all the subsequent predictions. Slow inference speed arises from a model structure that forces the predicted samples from time steps 1 to t-1 to predict the sample at time step t. In this study, an end-to-end non-autoregressive fast text-to-speech model is suggested as a solution to these problems. The results of this study show that this model's Mean Opinion Score (MOS) is close to that of Tacotron 2 - WaveNet, while this model's inference speed and stability are higher than those of Tacotron 2 - WaveNet. Further, this study aims to offer insight into the improvement of non-autoregressive models.
https://doi.org/10.13064/KSSS.2021.13.4.047 인용 PDF KSCI

2-Stage Detection and Classification Network for Kiosk User Analysis (디스플레이형 자판기 사용자 분석을 위한 이중 단계 검출 및 분류 망)

Seo, Ji-Won;Kim, Mi-Kyung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.26 no.5
- /
- pp.668-674
- /
- 2022
Machine learning techniques using visual data have high usability in fields of industry and service such as scene recognition, fault detection, security and user analysis. Among these, user analysis through the videos from CCTV is one of the practical way of using vision data. Also, many studies about lightweight artificial neural network have been published to increase high usability for mobile and embedded environment so far. In this study, we propose the network combining the object detection and classification for mobile graphic processing unit. This network detects pedestrian and face, classifies age and gender from detected face. Proposed network is constructed based on MobileNet, YOLOv2 and skip connection. Both detection and classification models are trained individually and combined as 2-stage structure. Also, attention mechanism is used to improve detection and classification ability. Nvidia Jetson Nano is used to run and evaluate the proposed system.
https://doi.org/10.6109/jkiice.2022.26.5.668 인용 PDF KSCI

Analysis of methods for the model extraction without training data (학습 데이터가 없는 모델 탈취 방법에 대한 분석)

Hyun Kwon;Yonggi Kim;Jun Lee
- Convergence Security Journal
- /
- v.23 no.5
- /
- pp.57-64
- /
- 2023
In this study, we analyzed how to steal the target model without training data. Input data is generated using the generative model, and a similar model is created by defining a loss function so that the predicted values of the target model and the similar model are close to each other. At this time, the target model has a process of learning so that the similar model is similar to it by gradient descent using the logit (logic) value of each class for the input data. The tensorflow machine learning library was used as an experimental environment, and CIFAR10 and SVHN were used as datasets. A similar model was created using the ResNet model as a target model. As a result of the experiment, it was found that the model stealing method generated a similar model with an accuracy of 86.18% for CIFAR10 and 96.02% for SVHN, producing similar predicted values to the target model. In addition, considerations on the model stealing method, military use, and limitations were also analyzed.
https://doi.org/10.33778/kcsa.2023.23.5.057 인용 PDF HTML

Fashion Category Oversampling Automation System

Minsun Yeu;Do Hyeok Yoo;SuJin Bak
- Journal of the Korea Society of Computer and Information
- /
- v.29 no.1
- /
- pp.31-40
- /
- 2024
In the realm of domestic online fashion platform industry the manual registration of product information by individual business owners leads to inconvenience and reliability issues, especially when dealing with simultaneous registrations of numerous product groups. Moreover, bias is significantly heightened due to the low quality of product images and an imbalance in data quantity. Therefore, this study proposes a ResNet50 model aimed at minimizing data bias through oversampling techniques and conducting multiple classifications for 13 fashion categories. Transfer learning is employed to optimize resource utilization and reduce prolonged learning times. The results indicate improved discrimination of up to 33.4% for data augmentation in classes with insufficient data compared to the basic convolution neural network (CNN) model. The reliability of all outcomes is underscored by precision and affirmed by the recall curve. This study is suggested to advance the development of the domestic online fashion platform industry to a higher echelon.
https://doi.org/10.9708/jksci.2024.29.01.031 인용 PDF HTML

Dual-stream Co-enhanced Network for Unsupervised Video Object Segmentation

Hongliang Zhu;Hui Yin;Yanting Liu;Ning Chen
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.18 no.4
- /
- pp.938-958
- /
- 2024
Unsupervised Video Object Segmentation (UVOS) is a highly challenging problem in computer vision as the annotation of the target object in the testing video is unknown at all. The main difficulty is to effectively handle the complicated and changeable motion state of the target object and the confusion of similar background objects in video sequence. In this paper, we propose a novel deep Dual-stream Co-enhanced Network (DC-Net) for UVOS via bidirectional motion cues refinement and multi-level feature aggregation, which can fully take advantage of motion cues and effectively integrate different level features to produce high-quality segmentation mask. DC-Net is a dual-stream architecture where the two streams are co-enhanced by each other. One is a motion stream with a Motion-cues Refine Module (MRM), which learns from bidirectional optical flow images and produces fine-grained and complete distinctive motion saliency map, and the other is an appearance stream with a Multi-level Feature Aggregation Module (MFAM) and a Context Attention Module (CAM) which are designed to integrate the different level features effectively. Specifically, the motion saliency map obtained by the motion stream is fused with each stage of the decoder in the appearance stream to improve the segmentation, and in turn the segmentation loss in the appearance stream feeds back into the motion stream to enhance the motion refinement. Experimental results on three datasets (Davis2016, VideoSD, SegTrack-v2) demonstrate that DC-Net has achieved comparable results with some state-of-the-art methods.
https://doi.org/10.3837/tiis.2024.04.007 인용 PDF HTML

Classification of mandibular molar furcation involvement in periapical radiographs by deep learning

Katerina Vilkomir;Cody Phen;Fiondra Baldwin;Jared Cole;Nic Herndon;Wenjian Zhang
- Imaging Science in Dentistry
- /
- v.54 no.3
- /
- pp.257-263
- /
- 2024
Purpose: The purpose of this study was to classify mandibular molar furcation involvement (FI) in periapical radiographs using a deep learning algorithm. Materials and Methods: Full mouth series taken at East Carolina University School of Dental Medicine from 2011-2023 were screened. Diagnostic-quality mandibular premolar and molar periapical radiographs with healthy or FI mandibular molars were included. The radiographs were cropped into individual molar images, annotated as "healthy" or "FI," and divided into training, validation, and testing datasets. The images were preprocessed by PyTorch transformations. ResNet-18, a convolutional neural network model, was refined using the PyTorch deep learning framework for the specific imaging classification task. CrossEntropyLoss and the AdamW optimizer were employed for loss function training and optimizing the learning rate, respectively. The images were loaded by PyTorch DataLoader for efficiency. The performance of ResNet-18 algorithm was evaluated with multiple metrics, including training and validation losses, confusion matrix, accuracy, sensitivity, specificity, the receiver operating characteristic (ROC) curve, and the area under the ROC curve. Results: After adequate training, ResNet-18 classified healthy vs. FI molars in the testing set with an accuracy of 96.47%, indicating its suitability for image classification. Conclusion: The deep learning algorithm developed in this study was shown to be promising for classifying mandibular molar FI. It could serve as a valuable supplemental tool for detecting and managing periodontal diseases.
https://doi.org/10.5624/isd.20240020 인용 PDF

Deep Learning-Based Plant Health State Classification Using Image Data (영상 데이터를 이용한 딥러닝 기반 작물 건강 상태 분류 연구)

Ali Asgher Syed;Jaehawn Lee;Alvaro Fuentes;Sook Yoon;Dong Sun Park
- Journal of Internet of Things and Convergence
- /
- v.10 no.4
- /
- pp.43-53
- /
- 2024
Tomatoes are rich in nutrients like lycopene, β-carotene, and vitamin C. However, they often suffer from biological and environmental stressors, resulting in significant yield losses. Traditional manual plant health assessments are error-prone and inefficient for large-scale production. To address this need, we collected a comprehensive dataset covering the entire life span of tomato plants, annotated across 5 health states from 1 to 5. Our study introduces an Attention-Enhanced DS-ResNet architecture with Channel-wise attention and Grouped convolution, refined with new training techniques. Our model achieved an overall accuracy of 80.2% using 5-fold cross-validation, showcasing its robustness in precisely classifying the health states of tomato plants.
https://doi.org/10.20465/KIOTS.2024.10.4.043 인용 PDF

Color Correction Using Back Propagation Neural Network in Film Scanner (필름 스캐너에서 역전파 신경회로망을 이용한 색 보정)

홍승범;백중환
- Journal of the Institute of Convergence Signal Processing
- /
- v.4 no.4
- /
- pp.15-22
- /
- 2003
A film scanner is one of the input devices for ac acquiring high resolution and high qualify of digital images from the existing optical film. Recently the demand of film scanners have risen for experts of image printing and editing fields. However, due to the nonlinear characteristic of light source and sensor, colors of the original film image do not correspond to the colors of the scanned image. Therefore color correction for the scanned digital image is essential in film scanner. In this paper, neural network method is applied for the color correction to CIE L/sup *//a/sup *//b/sup */ color model data converted from RGB color model data. Also a film scanner hardware with 12 bit color resolution for each R, G, B and 2400 dpi is implemented by using the TMS320C32 DSP chip and high resolution line sensor. An experimental result shows that the average color correction rate is 79.8%, which is an improvement of 43.5% than our previous method, polygonal regression method.
PDF

Markov chain-based mass estimation method for loose part monitoring system and its performance

Shin, Sung-Hwan;Park, Jin-Ho;Yoon, Doo-Byung;Han, Soon-Woo;Kang, To
- Nuclear Engineering and Technology
- /
- v.49 no.7
- /
- pp.1555-1562
- /
- 2017
A loose part monitoring system is used to identify unexpected loose parts in a nuclear reactor vessel or steam generator. It is still necessary for the mass estimation of loose parts, one function of a loose part monitoring system, to develop a new method due to the high estimation error of conventional methods such as Hertz's impact theory and the frequency ratio method. The purpose of this study is to propose a mass estimation method using a Markov decision process and compare its performance with a method using an artificial neural network model proposed in a previous study. First, how to extract feature vectors using discrete cosine transform was explained. Second, Markov chains were designed with codebooks obtained from the feature vector. A 1/8-scaled mockup of the reactor vessel for OPR1000 was employed, and all used signals were obtained by impacting its surface with several solid spherical masses. Next, the performance of mass estimation by the proposed Markov model was compared with that of the artificial neural network model. Finally, it was investigated that the proposed Markov model had matching error below 20% in mass estimation. That was a similar performance to the method using an artificial neural network model and considerably improved in comparison with the conventional methods.
https://doi.org/10.1016/j.net.2017.05.005 인용 PDF KSCI

Structure Pruning of Dynamic Recurrent Neural Networks Based on Evolutionary Computations (진화연산을 이용한 동적 귀환 신경망의 구조 저차원화)

김대준;심귀보
- Journal of the Korean Institute of Intelligent Systems
- /
- v.7 no.4
- /
- pp.65-73
- /
- 1997
This paper proposes a new method of the structure pruning of dynamic recurrent neural networks (DRNN) using evolutionary computations. In general, evolutionary computations are population-based search methods, therefore it is very useful when several different properties of neural networks need to be optimized. In order to prune the structure of the DRNN in this paper, we used the evolutionary programming that searches the structure and weight of the DRNN and evolution strategies which train the weight of neuron and pruned the net structure. An addition or elimination of the hidden-layer's node of the DRNN is decided by mutation probability. Its strategy is as follows, the node which has mhnimum sum of input weights is eliminated and a node is added by predesignated probability function. In this case, the weight is connected to the other nodes according to the probability in all cases which can in- 11:ract to the other nodes. The proposed pruning scheme is exemplified on the stabilization and position control of the inverted-pendulum system and visual servoing of a robot manipulator and the effc: ctiveness of the proposed method is demonstrated by numerical simulations.
PDF

Search Result 767, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)