Search | Korea Science

A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments (네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계)

Lee, Gil-Ho;Yoon, Jae-Sam;Oh, Yoo-Rhee;Kim, Hong-Kook
- MALSORI
- /
- no.54
- /
- pp.27-43
- /
- 2005
Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.
PDF

A Multi-Stage Convolution Machine with Scaling and Dilation for Human Pose Estimation

Nie, Yali;Lee, Jaehwan;Yoon, Sook;Park, Dong Sun
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.13 no.6
- /
- pp.3182-3198
- /
- 2019
Vision-based Human Pose Estimation has been considered as one of challenging research subjects due to problems including confounding background clutter, diversity of human appearances and illumination changes in scenes. To tackle these problems, we propose to use a new multi-stage convolution machine for estimating human pose. To provide better heatmap prediction of body joints, the proposed machine repeatedly produces multiple predictions according to stages with receptive field large enough for learning the long-range spatial relationship. And stages are composed of various modules according to their strategic purposes. Pyramid stacking module and dilation module are used to handle problem of human pose at multiple scales. Their multi-scale information from different receptive fields are fused with concatenation, which can catch more contextual information from different features. And spatial and channel information of a given input are converted to gating factors by squeezing the feature maps to a single numeric value based on its importance in order to give each of the network channels different weights. Compared with other ConvNet-based architectures, we demonstrated that our proposed architecture achieved higher accuracy on experiments using standard benchmarks of LSP and MPII pose datasets.
https://doi.org/10.3837/tiis.2019.06.023 인용 PDF KSCI HTML

Dimmable Spatial Intensity Modulation for Visible-light Communication: Capacity Analysis and Practical Design

Kim, Byung Wook;Jung, Sung-Yoon
- Current Optics and Photonics
- /
- v.2 no.6
- /
- pp.532-539
- /
- 2018
Multiple LED arrays can be utilized in visible-light communication (VLC) to improve communication efficiency, while maintaining smart illumination functionality through dimming control. This paper proposes a modulation scheme called "Spatial Intensity Modulation" (SIM), where the effective number of turned-on LEDs is employed for data modulation and dimming control in VLC systems. Unlike the conventional pulse-amplitude modulation (PAM), symbol intensity levels are not determined by the amplitude levels of a VLC signal from each LED, but by counting the number of turned-on LEDs, illuminating with a single amplitude level. Because the intensity of a SIM symbol and the target dimming level are determined solely in the spatial domain, the problems of conventional PAM-based VLC and related MIMO VLC schemes, such as unstable dimming control, non uniform illumination functionality, and burdens of channel prediction, can be solved. By varying the number and formation of turned-on LEDs around the target dimming level in time, the proposed SIM scheme guarantees homogeneous illumination over a target area. An analysis of the dimming capacity, which is the achievable communication rate under the target dimming level in VLC, is provided by deriving the turn-on probability to maximize the entropy of the SIM-based VLC system. In addition, a practical design of dimmable SIM scheme applying the multilevel inverse source coding (MISC) method is proposed. The simulation results under a range of parameters provide baseline data to verify the performance of the proposed dimmable SIM scheme and applications in real systems.
https://doi.org/10.3807/COPP.2018.2.6.532 인용 PDF KSCI HTML

A QoS-aware Adaptive Coloring Scheduling Algorithm for Co-located WBANs

Wang, Jingxian;Sun, Yongmei;Luo, Shuyun;Ji, Yuefeng
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.12 no.12
- /
- pp.5800-5818
- /
- 2018
Interference may occur when several co-located wireless body area networks (WBANs) share the same channel simultaneously, which is compressed by resource scheduling generally. In this paper, a QoS-aware Adaptive Coloring (QAC) scheduling algorithm is proposed, which contains two components: interference sets determination and time slots assignment. The highlight of QAC is to determine the interference graph based on the relay scheme and adapted to the network QoS by multi-coloring approach. However, the frequent resource assignment brings in extra energy consumption and packet loss. Thus we come up with a launch condition for the QAC scheduling algorithm, that is if the interference duration is longer than a threshold predetermined, time slots rescheduling is activated. Furthermore, based on the relative distance and moving speed between WBANs, a prediction model for interference duration is proposed. The simulation results show that compared with the state-of-the-art approaches, the QAC scheduling algorithm has better performance in terms of network capacity, average delay and resource utility.
https://doi.org/10.3837/tiis.2018.12.011 인용 PDF KSCI

Correcting Misclassified Image Features with Convolutional Coding

Mun, Ye-Ji;Kim, Nayoung;Lee, Jieun;Kang, Je-Won
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2018.11a
- /
- pp.11-14
- /
- 2018
The aim of this study is to rectify the misclassified image features and enhance the performance of image classification tasks by incorporating a channel- coding technique, widely used in telecommunication. Specifically, the proposed algorithm employs the error - correcting mechanism of convolutional coding combined with the convolutional neural networks (CNNs) that are the state - of- the- arts image classifier s. We develop an encoder and a decoder to employ the error - correcting capability of the convolutional coding. In the encoder, the label values of the image data are converted to convolutional codes that are used as target outputs of the CNN, and the network is trained to minimize the Euclidean distance between the target output codes and the actual output codes. In order to correct misclassified features, the outputs of the network are decoded through the trellis structure with Viterbi algorithm before determining the final prediction. This paper demonstrates that the proposed architecture advances the performance of the neural networks compared to the traditional one- hot encoding method.
PDF

Numerical analysis of the temperature distribution of the EM pump for the sodium thermo-hydraulic test loop of the GenIV PGSFR

Kwak, Jaesik;Kim, Hee Reyoung
- Nuclear Engineering and Technology
- /
- v.53 no.5
- /
- pp.1429-1435
- /
- 2021
The temperature distribution of an electromagnetic pump was analyzed with a flow rate of 1380 L/min and a pressure of 4 bar designed for the sodium thermo-hydraulic test in the Sodium Test Loop for Safety Simulation and Assessment-Phase 1 (STELLA-1). The electromagnetic pump was used for the circulation of the liquid sodium coolant in the Intermediate Heat Transport System (IHTS) of the Prototype Gen-IV Sodium-cooled Fast Reactor (PGSFR) with an electric power of 150 MWe. The temperature distribution of the components of the electromagnetic pump was numerically analyzed to prevent functional degradation in the high temperature environment during pump operation. The heat transfer was numerically calculated using ANSYS Fluent for prediction of the temperature distribution in the excited coils, the electromagnet core, and the liquid sodium flow channel of the electromagnetic pump. The temperature distribution of operating electromagnetic pump was compared with cooling of natural and forced air circulation. The temperature in the coil, the core and the flow gap in the two conditions, natural circulation and forced circulation, were compared. The electromagnetic pump with cooling of forced circulation had better efficiency than natural circulation even considering consumption of the input power for the air blower. Accordingly, this study judged that forced cooling is good for both maintenance and efficiency of the electromagnetic pump.
https://doi.org/10.1016/j.net.2020.11.015 인용 PDF KSCI

Electroencephalography-based imagined speech recognition using deep long short-term memory network

Agarwal, Prabhakar;Kumar, Sandeep
- ETRI Journal
- /
- v.44 no.4
- /
- pp.672-685
- /
- 2022
This article proposes a subject-independent application of brain-computer interfacing (BCI). A 32-channel Electroencephalography (EEG) device is used to measure imagined speech (SI) of four words (sos, stop, medicine, washroom) and one phrase (come-here) across 13 subjects. A deep long short-term memory (LSTM) network has been adopted to recognize the above signals in seven EEG frequency bands individually in nine major regions of the brain. The results show a maximum accuracy of 73.56% and a network prediction time (NPT) of 0.14 s which are superior to other state-of-the-art techniques in the literature. Our analysis reveals that the alpha band can recognize SI better than other EEG frequencies. To reinforce our findings, the above work has been compared by models based on the gated recurrent unit (GRU), convolutional neural network (CNN), and six conventional classifiers. The results show that the LSTM model has 46.86% more average accuracy in the alpha band and 74.54% less average NPT than CNN. The maximum accuracy of GRU was 8.34% less than the LSTM network. Deep networks performed better than traditional classifiers.
https://doi.org/10.4218/etrij.2021-0118 인용 PDF KSCI

Prediction of a hit drama with a pattern analysis on early viewing ratings (초기 시청시간 패턴 분석을 통한 대흥행 드라마 예측)

Nam, Kihwan;Seong, Nohyoon
- Journal of Intelligence and Information Systems
- /
- v.24 no.4
- /
- pp.33-49
- /
- 2018
The impact of TV Drama success on TV Rating and the channel promotion effectiveness is very high. The cultural and business impact has been also demonstrated through the Korean Wave. Therefore, the early prediction of the blockbuster success of TV Drama is very important from the strategic perspective of the media industry. Previous studies have tried to predict the audience ratings and success of drama based on various methods. However, most of the studies have made simple predictions using intuitive methods such as the main actor and time zone. These studies have limitations in predicting. In this study, we propose a model for predicting the popularity of drama by analyzing the customer's viewing pattern based on various theories. This is not only a theoretical contribution but also has a contribution from the practical point of view that can be used in actual broadcasting companies. In this study, we collected data of 280 TV mini-series dramas, broadcasted over the terrestrial channels for 10 years from 2003 to 2012. From the data, we selected the most highly ranked and the least highly ranked 45 TV drama and analyzed the viewing patterns of them by 11-step. The various assumptions and conditions for modeling are based on existing studies, or by the opinions of actual broadcasters and by data mining techniques. Then, we developed a prediction model by measuring the viewing-time distance (difference) using Euclidean and Correlation method, which is termed in our study similarity (the sum of distance). Through the similarity measure, we predicted the success of dramas from the viewer's initial viewing-time pattern distribution using 1~5 episodes. In order to confirm that the model is shaken according to the measurement method, various distance measurement methods were applied and the model was checked for its dryness. And when the model was established, we could make a more predictive model using a grid search. Furthermore, we classified the viewers who had watched TV drama more than 70% of the total airtime as the "passionate viewer" when a new drama is broadcasted. Then we compared the drama's passionate viewer percentage the most highly ranked and the least highly ranked dramas. So that we can determine the possibility of blockbuster TV mini-series. We find that the initial viewing-time pattern is the key factor for the prediction of blockbuster dramas. From our model, block-buster dramas were correctly classified with the 75.47% accuracy with the initial viewing-time pattern analysis. This paper shows high prediction rate while suggesting audience rating method different from existing ones. Currently, broadcasters rely heavily on some famous actors called so-called star systems, so they are in more severe competition than ever due to rising production costs of broadcasting programs, long-term recession, aggressive investment in comprehensive programming channels and large corporations. Everyone is in a financially difficult situation. The basic revenue model of these broadcasters is advertising, and the execution of advertising is based on audience rating as a basic index. In the drama, there is uncertainty in the drama market that it is difficult to forecast the demand due to the nature of the commodity, while the drama market has a high financial contribution in the success of various contents of the broadcasting company. Therefore, to minimize the risk of failure. Thus, by analyzing the distribution of the first-time viewing time, it can be a practical help to establish a response strategy (organization/ marketing/story change, etc.) of the related company. Also, in this paper, we found that the behavior of the audience is crucial to the success of the program. In this paper, we define TV viewing as a measure of how enthusiastically watching TV is watched. We can predict the success of the program successfully by calculating the loyalty of the customer with the hot blood. This way of calculating loyalty can also be used to calculate loyalty to various platforms. It can also be used for marketing programs such as highlights, script previews, making movies, characters, games, and other marketing projects.
https://doi.org/10.13088/jiis.2018.24.4.033 인용 PDF KSCI HTML

Comparison of Wind Vectors Derived from GK2A with Aeolus/ALADIN (위성기반 GK2A의 대기운동벡터와 Aeolus/ALADIN 바람 비교)

Shin, Hyemin;Ahn, Myoung-Hwan;KIM, Jisoo;Lee, Sihye;Lee, Byung-Il
- Korean Journal of Remote Sensing
- /
- v.37 no.6_1
- /
- pp.1631-1645
- /
- 2021
This research aims to provide the characteristics of the world's first active lidar sensor Atmospheric Laser Doppler Instrument (ALADIN) wind data and Geostationary Korea Multi Purpose Satellite 2A (GK2A) Atmospheric Motion Vector (AMV) data by comparing two wind data. As a result of comparing the data from September 2019 to August 1, 2020, The total number of collocated data for the AMV (using IR channel) and Mie channel ALADIN data is 177,681 which gives the Root Mean Square Error (RMSE) of 3.73 m/s and the correlation coefficient is 0.98. For a more detailed analysis, Comparison result considering altitude and latitude, the Normalized Root Mean Squared Error (NRMSE) is 0.2-0.3 at most latitude bands. However, the upper and middle layers in the lower latitudes and the lower layer in the southern hemispheric are larger than 0.4 at specific latitudes. These results are the same for the water vapor channel and the visible channel regardless of the season, and the channel-specific and seasonal characteristics do not appear prominently. Furthermore, as a result of analyzing the distribution of clouds in the latitude band with a large difference between the two wind data, Cirrus or cumulus clouds, which can lower the accuracy of height assignment of AMV, are distributed more than at other latitude bands. Accordingly, it is suggested that ALADIN wind data in the southern hemisphere and low latitude band, where the error of the AMV is large, can have a positive effect on the numerical forecast model.
https://doi.org/10.7780/kjrs.2021.37.6.1.12 인용 PDF KSCI HTML

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

Lee, Mo-Se;Ahn, Hyunchul
- Journal of Intelligence and Information Systems
- /
- v.24 no.1
- /
- pp.167-181
- /
- 2018
Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.
https://doi.org/10.13088/jiis.2018.24.1.167 인용 PDF KSCI

Search Result 475, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)