Search | Korea Science

A study on end-to-end speaker diarization system using single-label classification (단일 레이블 분류를 이용한 종단 간 화자 분할 시스템 성능 향상에 관한 연구)

Jaehee Jung;Wooil Kim
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.6
- /
- pp.536-543
- /
- 2023
Speaker diarization, which labels for "who spoken when?" in speech with multiple speakers, has been studied on a deep neural network-based end-to-end method for labeling on speech overlap and optimization of speaker diarization models. Most deep neural network-based end-to-end speaker diarization systems perform multi-label classification problem that predicts the labels of all speakers spoken in each frame of speech. However, the performance of the multi-label-based model varies greatly depending on what the threshold is set to. In this paper, it is studied a speaker diarization system using single-label classification so that speaker diarization can be performed without thresholds. The proposed model estimate labels from the output of the model by converting speaker labels into a single label. To consider speaker label permutations in the training, the proposed model is used a combination of Permutation Invariant Training (PIT) loss and cross-entropy loss. In addition, how to add the residual connection structures to model is studied for effective learning of speaker diarization models with deep structures. The experiment used the Librispech database to generate and use simulated noise data for two speakers. When compared with the proposed method and baseline model using the Diarization Error Rate (DER) performance the proposed method can be labeling without threshold, and it has improved performance by about 20.7 %.
https://doi.org/10.7776/ASK.2023.42.6.536 인용 PDF

TAGS: Text Augmentation with Generation and Selection (생성-선정을 통한 텍스트 증강 프레임워크)

Kim Kyung Min;Dong Hwan Kim;Seongung Jo;Heung-Seon Oh;Myeong-Ha Hwang
- KIPS Transactions on Software and Data Engineering
- /
- v.12 no.10
- /
- pp.455-460
- /
- 2023
Text augmentation is a methodology that creates new augmented texts by transforming or generating original texts for the purpose of improving the performance of NLP models. However existing text augmentation techniques have limitations such as lack of expressive diversity semantic distortion and limited number of augmented texts. Recently text augmentation using large language models and few-shot learning can overcome these limitations but there is also a risk of noise generation due to incorrect generation. In this paper, we propose a text augmentation method called TAGS that generates multiple candidate texts and selects the appropriate text as the augmented text. TAGS generates various expressions using few-shot learning while effectively selecting suitable data even with a small amount of original text by using contrastive learning and similarity comparison. We applied this method to task-oriented chatbot data and achieved more than sixty times quantitative improvement. We also analyzed the generated texts to confirm that they produced semantically and expressively diverse texts compared to the original texts. Moreover, we trained and evaluated a classification model using the augmented texts and showed that it improved the performance by more than 0.1915, confirming that it helps to improve the actual model performance.
https://doi.org/10.3745/KTSDE.2023.12.10.455 인용 PDF

Study on Analysis of Queen Bee Sound Patterns (여왕벌 사운드 패턴 분석에 대한 연구)

Kim Joon Ho;Han Wook
- The Journal of the Convergence on Culture Technology
- /
- v.9 no.5
- /
- pp.867-874
- /
- 2023
Recently, many problems are occurring in the bee ecosystem due to rapid climate change. The decline in the bee population and changes in the flowering period are having a huge impact on the harvest of bee-keepers. Since it is impossible to continuously observe the beehives in the hive with the naked eye, most people rely on knowledge based on experience about the state of the hive.Therefore, interest is focused on smart beekeeping incorporating IoT technology. In particular, with regard to swarming, which is one of the most important parts of beekeeping, we know empirically that the swarming time can be determined by the sound of the queen bee, but there is no way to systematically analyze this with data.You may think that it can be done by simply recording the sound of the queen bee and analyzing it, but it does not solve various problems such as various noise issues around the hive and the inability to continuously record.In this study, we developed a system that records queen bee sounds in a real-time cloud system and analyzes sound patterns.After receiving real-time analog sound from the hive through multiple channels and converting it to digital, a sound pattern that was continuously output in the queen bee sound frequency band was discovered. By accessing the cloud system, you can monitor sounds around the hive, temperature/humidity inside the hive, weight, and internal movement data.The system developed in this study made it possible to analyze the sound patterns of the queen bee and learn about the situation inside the hive. Through this, it will be possible to predict the swarming period of bees or provide information to control the swarming period.
https://doi.org/10.17703/JCCT.2023.9.5.867 인용 PDF

Indoor Exposure and Health Risk of Polycyclic Aromatic Hydrocarbons (PAHs) via Public Facilities PM_2.5, Korea (II)

Kim, Ho-Hyun;Lee, Geon-Woo;Yang, Ji-Yeon;Jeon, Jun-Min;Lee, Woo-Seok;Lim, Jung-Yun;Lee, Han-Seul;Gwak, Yoon-Kyung;Shin, Dong-Chun;Lim, Young-Wook
- Asian Journal of Atmospheric Environment
- /
- v.8 no.1
- /
- pp.35-47
- /
- 2014
The purpose of the study is to evaluate the pollution level (gaseous and particle phase) in the public facilities for the PAHs, non-regulated materials, forecast the risk level by the health risk assessment (HRA) and propose the guideline level. PAH assessments through sampling of particulate matter of diameter < 2.5 ${\mu}m$ ($PM_{2.5}$). The user and worker exposure scenario for the PAHs consists of 24-hour exposure scenario (WIES) assuming the worst case and the normal exposure scenario (MIES) based on the survey. This study investigated 20 PAH substances selected out of 32 substances known to be carcinogenic or potentially carcinogenic. The risk assessment applies major toxic equivalency factor (TEF) proposed from existing studies and estaimates individual Excess Cancer Risk (ECR). The study assesses the fine dusts ($PM_{2.5}$) and the exposure levels of the gaseous and particle PAH materials for 6 spots in each 8 facility, e.g. underground subway stations, child-care facilities, elderly care facilities, super market, indoor parking lot, terminal waiting room, internet caf$\acute{e}$ (PC-rooms), movie theater. For internet caf$\acute{e}$ (PC-rooms) in particular, that marks the highest $PM_{2.5}$ concentration and the average concentration of 10 spots (2 spots for each cafe) is 73.3 ${\mu}g/m^3$ (range: 6.8-185.2 ${\mu}g/m^3$). The high level of $PM_{2.5}$ seen in internet cafes was likely due to indoor smoking in most cases. For the gaseous PAHs, the detection frequency for 4-5 rings shows high and the elements with 6 rings shows low frequency. For the particle PAHs, the detection frequency for 2-3 rings shows low and the elements with 6 rings show high frequency. As a result, it is investigated that the most important PAHs are the naphthalene, acenaphthene and phenanthrene from the study of Kim et al. (2013) and this annual study. The health risk assessment demonstrates that each facility shows the level of $10^{-6}-10^{-4}$. Considering standards and local source of pollution levels, it is judged that the management standard of the benzo (a)pyrene, one of the PAHs, shall be managed with the range of 0.5-1.2 $ng/m^3$. Smoking and ventilation were considered as the most important PAHs exposure associated with public facility $PM_{2.5}$. This study only estimated for inhalation health risk of PAHs and focused on the associated cancer risk, while multiple measurements would be necessary for public health and policy.
https://doi.org/10.5572/ajae.2014.8.1.035 인용 PDF KSCI

Imaging Characteristics of Computed Radiography Systems (CR 시스템의 종류와 I.P 크기에 따른 정량적 영상특성평가)

Jung, Ji-Young;Park, Hye-Suk;Cho, Hyo-Min;Lee, Chang-Lae;Nam, So-Ra;Lee, Young-Jin;Kim, Hee-Joung
- Progress in Medical Physics
- /
- v.19 no.1
- /
- pp.63-72
- /
- 2008
With recent advancement of the medical imaging systems and picture archiving and communication system (PACS), installation of digital radiography has been accelerated over past few years. Moreover, Computed Radiography (CR) which was well established for the foundation of digital x-ray imaging systems at low cost was widely used for clinical applications. This study analyzes imaging characteristics for two systems with different pixel sizes through the Modulation Transfer Function (MTF), Noise Power Spectrum (NPS) and Detective Quantum Efficiency (DQE). In addition, influence of radiation dose to the imaging characteristics was also measured by quantitative assessment. A standard beam quality RQA5 based on an international electro-technical commission (IEC) standard was used to perform the x-ray imaging studies. For the results, the spatial resolution based on MTF at 10% for Agfa CR system with I.P size of $8{\times}10$ inches and $14{\times}17$ inches was measured as 3.9 cycles/mm and 2.8 cycles/mm, respectively. The spatial resolution based on MTF at 10% for Fuji CR system with I.P size of $8{\times}10$ inches and $14{\times}17$ inches was measured as 3.4 cycles/mm and 3.2 cycles/mm, respectively. There was difference in the spatial resolution for $14{\times}17$ inches, although radiation dose does not effect to the MTF. The NPS of the Agfa CR system shows similar results for different pixel size between $100{\mu}m$ for $8{\times}10$ inch I.P and $150{\mu}m$ for $14{\times}17$ inch I.P. For both systems, the results show better NPS for increased radiation dose due to increasing number of photons. DQE of the Agfa CR system for $8{\times}10$ inch I.P and $14{\times}17$ inch I.P resulted in 11% and 8.8% at 1.5 cycles/mm, respectively. Both systems show that the higher level of radiation dose would lead to the worse DQE efficiency. Measuring DQE for multiple factors of imaging characteristics plays very important role in determining efficiency of equipment and reducing radiation dose for the patients. In conclusion, the results of this study could be used as a baseline to optimize imaging systems and their imaging characteristics by measuring MTF, NPS, and DQE for different level of radiation dose.
PDF

The Optimal Turbo Coded V-BLAST Technique in the Adaptive Modulation System corresponding to each MIMO Scheme (적응 변조 시스템에서 각 MIMO 기법에 따른 최적의 터보 부호화된 V-BLAST 기법)

Lee, Kyung-Hwan;Ryoo, Sang-Jin;Choi, Kwang-Wook;You, Cheol-Woo;Hong, Dae-Ki;Kim, Dae-Jin;Hwang, In-Tae;Kim, Cheol-Sung
- Journal of the Institute of Electronics Engineers of Korea TC
- /
- v.44 no.6 s.360
- /
- pp.40-47
- /
- 2007
In this paper, we propose and analyze the Adaptive Modulation System with optimal Turbo Coded V-BLAST(Vertical-Bell-lab Layered Space-Time) technique that adopts the extrinsic information from MAP (Maximum A Posteriori) Decoder with Iterative Decoding as a priori probability in two decoding procedures of V-BLAST; the ordering and the slicing. Also, we consider and compare the Adaptive Modulation System using conventional Turbo Coded V-BLAST technique that is simply combined V-BLAST with Turbo Coding scheme and the Adaptive Modulation System using conventional Turbo Coded V-BLAST technique that is decoded by the ML (Maximum Likelihood) decoding algorithm. We observe a throughput performance and a complexity. As a result of a performance comparison of each system, it has been proved that the complexity of the proposed decoding algorithm is lower than that of the ML decoding algorithm but is higher than that of the conventional V-BLAST decoding algorithm. however, we can see that the proposed system achieves a better throughput performance than the conventional system in the whole SNR (Signal to Noise Ratio) range. And the result shows that the proposed system achieves a throughput performance close to the ML decoded system. Specifically, a simulation shows that the maximum throughput improvement in each MIMO scheme is respectively about 350 kbps, 460 kbps, and 740 kbps compared to the conventional system. It is suggested that the effect of the proposed decoding algorithm accordingly gets higher as the number of system antenna increases.
PDF KSCI

The Intelligent Determination Model of Audience Emotion for Implementing Personalized Exhibition (개인화 전시 서비스 구현을 위한 지능형 관객 감정 판단 모형)

Jung, Min-Kyu;Kim, Jae-Kyeong
- Journal of Intelligence and Information Systems
- /
- v.18 no.1
- /
- pp.39-57
- /
- 2012
Recently, due to the introduction of high-tech equipment in interactive exhibits, many people's attention has been concentrated on Interactive exhibits that can double the exhibition effect through the interaction with the audience. In addition, it is also possible to measure a variety of audience reaction in the interactive exhibition. Among various audience reactions, this research uses the change of the facial features that can be collected in an interactive exhibition space. This research develops an artificial neural network-based prediction model to predict the response of the audience by measuring the change of the facial features when the audience is given stimulation from the non-excited state. To present the emotion state of the audience, this research uses a Valence-Arousal model. So, this research suggests an overall framework composed of the following six steps. The first step is a step of collecting data for modeling. The data was collected from people participated in the 2012 Seoul DMC Culture Open, and the collected data was used for the experiments. The second step extracts 64 facial features from the collected data and compensates the facial feature values. The third step generates independent and dependent variables of an artificial neural network model. The fourth step extracts the independent variable that affects the dependent variable using the statistical technique. The fifth step builds an artificial neural network model and performs a learning process using train set and test set. Finally the last sixth step is to validate the prediction performance of artificial neural network model using the validation data set. The proposed model is compared with statistical predictive model to see whether it had better performance or not. As a result, although the data set in this experiment had much noise, the proposed model showed better results when the model was compared with multiple regression analysis model. If the prediction model of audience reaction was used in the real exhibition, it will be able to provide countermeasures and services appropriate to the audience's reaction viewing the exhibits. Specifically, if the arousal of audience about Exhibits is low, Action to increase arousal of the audience will be taken. For instance, we recommend the audience another preferred contents or using a light or sound to focus on these exhibits. In other words, when planning future exhibitions, planning the exhibition to satisfy various audience preferences would be possible. And it is expected to foster a personalized environment to concentrate on the exhibits. But, the proposed model in this research still shows the low prediction accuracy. The cause is in some parts as follows : First, the data covers diverse visitors of real exhibitions, so it was difficult to control the optimized experimental environment. So, the collected data has much noise, and it would results a lower accuracy. In further research, the data collection will be conducted in a more optimized experimental environment. The further research to increase the accuracy of the predictions of the model will be conducted. Second, using changes of facial expression only is thought to be not enough to extract audience emotions. If facial expression is combined with other responses, such as the sound, audience behavior, it would result a better result.
https://doi.org/10.13088/jiis.2012.18.1.039 인용 PDF KSCI

Seismic wave propagation through surface basalts - implications for coal seismic surveys (지표 현무암을 통해 전파하는 탄성파의 거동 - 석탄 탄성파탐사에 적용)

Sun, Weijia;Zhou, Binzhong;Hatherly, Peter;Fu, Li-Yun
- Geophysics and Geophysical Exploration
- /
- v.13 no.1
- /
- pp.1-8
- /
- 2010
Seismic reflection surveying is one of the most widely used and effective techniques for coal seam structure delineation and risk mitigation for underground longwall mining. However, the ability of the method can be compromised by the presence of volcanic cover. This problem arises within parts of the Bowen and Sydney Basins of Australia and seismic surveying can be unsuccessful. As a consequence, such areas are less attractive for coal mining. Techniques to improve the success of seismic surveying over basalt flows are needed. In this paper, we use elastic wave-equation-based forward modelling techniques to investigate the effects and characteristics of seismic wave propagation under different settings involving changes in basalt properties, its thickness, lateral extent, relative position to the shot position and various forms of inhomogeneity. The modelling results suggests that: 1) basalts with high impedance contrasts and multiple flows generate strong multiples and weak reflectors; 2) thin basalts have less effect than thick basalts; 3) partial basalt cover has less effect than full basalt cover; 4) low frequency seismic waves (especially at large offsets) have better penetration through the basalt than high frequency waves; and 5) the deeper the coal seams are below basalts of limited extent, the less influence the basalts will have on the wave propagation. In addition to providing insights into the issues that arise when seismic surveying under basalts, these observations suggest that careful management of seismic noise and the acquisition of long-offset seismic data with low-frequency geophones have the potential to improve the seismic results.
PDF KSCI

Diagnostic testing for Duchenne/Becker Muscular dystrophy using Dual Priming Oligonucleotide (DPO) system (Dual Priming Oligonucleotide (DPO) system을 이용한 듀시엔/베커형 근이영양증 진단법)

Kim, Joo-Hyun;Kim, Gu-Hwan;Lee, Jin-Joo;Lee, Dae-Hoon;Kim, Jong-Kee;Yoo, Han-Wook
- Journal of Genetic Medicine
- /
- v.5 no.1
- /
- pp.15-20
- /
- 2008
Purpose : Large exon deletions in the DMD gene are found in about 60% of DMD/BMD patients. Multiplex PCR has been employed to detect the deletion mutation, which frequently generates noise PCR products due to the presence of multiple primers in a single reaction as well as the stringency of PCR conditions. This often leads to a false-negative or false-positive result. To address this problematic issue, we introduced the dual primer oligonucleotide (DPO) system. DPO contains two separate priming regions joined by a polydeoxyinosine linker that results in high PCR specificity even under suboptimal PCR conditions. Methods : We tested 50 healthy male controls, 50 patients with deletion mutation as deletion-positive patient controls, and 20 patients with no deletions as deletion-negative patient controls using DPO-multiplex PCR. Both the presence and extent of deletion were verified by simplex PCR spanning the promoter region (PM) and 18 exons including exons 3, 4, 6, 8, 12, 13, 17, 19, 43-48, 50-52, and 60 in all 120 controls. Results : DPO-multiplex PCR showed 100% sensitivity and specificity for the detection a deletion. However, it showed 97.1% sensitivity and 100% specificity for determining the extent of deletions. Conclusion : The DPO-multiplex PCR method is a useful molecular test to detect large deletions of DMD for the diagnosis of patients with DMD/BMD because it is easy to perform, fast, and cost-effective and has excellent sensitivity and specificity.
PDF

Optimal Selection of Classifier Ensemble Using Genetic Algorithms (유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택)

Kim, Myung-Jong
- Journal of Intelligence and Information Systems
- /
- v.16 no.4
- /
- pp.99-112
- /
- 2010
Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.
PDF KSCI

Search Result 1,536, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)