Search | Korea Science

Residual Convolutional Recurrent Neural Network-Based Sound Event Classification Applicable to Broadcast Captioning Services (자막방송을 위한 잔차 합성곱 순환 신경망 기반 음향 사건 분류)

Kim, Nam Kyun;Kim, Hong Kook;Ahn, Chung Hyun
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2021.06a
- /
- pp.26-27
- /
- 2021
본 논문에서는 자막방송 제공을 위해 방송콘텐츠를 이해하는 방법으로 잔차 합성곱 순환신경망 기반 음향 사건 분류 기법을 제안한다. 제안된 기법은 잔차 합성곱 신경망과 순환 신경망을 연결한 구조를 갖는다. 신경망의 입력 특징으로는 멜-필터벵크 특징을 활용하고, 잔차 합성곱 신경망은 하나의 스템 블록과 5개의 잔차 합성곱 신경망으로 구성된다. 잔차 합성곱 신경망은 잔차 학습으로 구성된 합성곱 신경망과 기존의 합성곱 신경망 대비 특징맵의 표현 능력 향상을 위해 합성곱 블록 주의 모듈로 구성한다. 추출된 특징맵은 순환 신경망에 연결되고, 최종적으로 음향 사건 종류와 시간정보를 추출하는 완전연결층으로 연결되는 구조를 활용한다. 제안된 모델 훈련을 위해 라벨링되지 않는 데이터 활용이 가능한 평균 교사 모델을 기반으로 훈련하였다. 제안된 모델의 성능평가를 위해 DCASE 2020 챌린지 Task 4 데이터 셋을 활용하였으며, 성능 평가 결과 46.8%의 이벤트 단위의 F1-score를 얻을 수 있었다.
PDF

A Probing Task on Linguistic Properties of Korean Sentence Embedding (한국어 문장 임베딩의 언어적 속성 입증 평가)

Ahn, Aelim;Ko, ByeongiI;Lee, Daniel;Han, Gyoungeun;Shin, Myeongcheol;Nam, Jeesun
- Annual Conference on Human and Language Technology
- /
- 2021.10a
- /
- pp.161-166
- /
- 2021
본 연구는 한국어 문장 임베딩(embedding)에 담겨진 언어적 속성을 평가하기 위한 프로빙 태스크(Probing Task)를 소개한다. 프로빙 태스크는 임베딩으로부터 문장의 표층적, 통사적, 의미적 속성을 구분하는 문제로 영어, 폴란드어, 러시아어 문장에 적용된 프로빙 테스크를 소개하고, 이를 기반으로하여 한국어 문장의 속성을 잘 보여주는 한국어 문장 임베딩 프로빙 태스크를 설계하였다. 언어 공통적으로 적용 가능한 6개의 프로빙 태스크와 한국어 문장의 주요 특징인 주어 생략(SubjOmission), 부정법(Negation), 경어법(Honorifics)을 추가로 고안하여 총 9개의 프로빙 태스크를 구성하였다. 각 태스크를 위한 데이터셋은 '세종 구문분석 말뭉치'를 의존구문문법(Universal Dependency Grammar) 구조로 변환한 후 자동으로 구축하였다. HuggingFace에 공개된 4개의 다국어(multilingual) 문장 인코더와 4개의 한국어 문장 인코더로부터 획득한 임베딩의 언어적 속성을 프로빙 태스크를 통해 비교 분석한 결과, 다국어 문장 인코더인 mBART가 9개의 프로빙 태스크에서 전반적으로 높은 성능을 보였다. 또한 한국어 문장 임베딩에는 표층적, 통사적 속성보다는 심층적인 의미적 속성을 더욱 잘 담고 있음을 확인할 수 있었다.
PDF

Sound Event Classification Based on Concatenated Residual Network Applicable to Closed Captioning Services for the Hearing Impaired (청각장애인용 자막방송 서비스를 위한 연쇄잔차 신경망 기반 음향 사건 분류 기법)

Kim, Nam Kyun;Park, Dong Keun;Kim, Jun Ho;Kim, Hong Kook;Ahn, Chung Hyun
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2020.07a
- /
- pp.472-475
- /
- 2020
본 논문에서는 청각장애인에게 자막방송을 제공하기 위하여 오디오 콘텐츠에 등장하는 음향 사건을 분류하는 기법을 제안한다. 제안된 기법은 복수의 잔차 신경망(ResNet)을 연결하는 연쇄잔차(concatenated residual) 신경망 구조를 갖는다. 신경망의 입력 특징을 위해 음성의 멜-주파수 켑스트럼 벡터를 다수의 프레임으로 결합하여 형성한 2 차원 이미지와 전체 프레임에 대한 멜-주파수 켑스트럼 벡터들로부터 얻은 1 차원의 통계 특징벡터를 얻는다. 각각의 입력은 2 차원 잔차 신경망과 1 차원 잔차 신경망으로 모델링되고, 두 개의 잔차 신경망을 연쇄연결(concatenation)하는 구조를 가진 연쇄잔차 신경망으로 구성된다. 성능평가를 위해 수집된 데이터셋으로부터 6-fold 교차검증을 통해 평가한 결과, 85.48%의 분류 정확도를 얻을 수 있었다.
PDF

Generation Methodology Using Super In-Context Learning (Super In-Context Learning을 활용한 생성 방법론)

Seongtae Hong;Seungjun Lee;Gyeongmin Kim;Heuiseok Lim
- Annual Conference on Human and Language Technology
- /
- 2023.10a
- /
- pp.382-387
- /
- 2023
현재 GPT-4와 같은 거대한 언어 모델이 기계 번역, 요약 및 대화와 같은 다양한 작업에서 압도적인 성능을 보이고 있다. 그러나 이러한 거대 언어 모델은 학습 및 적용에 상당한 계산 리소스와 도메인 특화 미세 조정이 어려운 등 몇 가지 문제를 가지고 있다. In-Context learning은 데이터셋에서 추출한 컨택스트의 정보만으로 효과적으로 작동할 수 있는 효율성을 제공하여 앞선 문제를 일부 해결했지만, 컨텍스트의 샷 개수와 순서에 민감한 문제가 존재한다. 이러한 도전 과제를 해결하기 위해, 우리는 Super In-Context Learning (SuperICL)을 활용한 새로운 방법론을 제안한다. 기존의 SuperICL은 적용한 플러그인 모델의 출력 정보를 이용하여 문맥을 새로 구성하고 이를 활용하여 거대 언어 모델이 더욱 잘 분류할 수 있도록 한다. Super In-Context Learning for Generation은 다양한 자연어 생성 작업에 효과적으로 최적화하는 방법을 제공한다. 실험을 통해 플러그인 모델을 교체하여 다양한 작업에 적응하는 가능성을 확인하고, 자연어 생성 작업에서 우수한 성능을 보여준다. BLEU 및 ROUGE 메트릭을 포함한 평가 결과에서도 성능 향상을 보여주며, 선호도 평가를 통해 모델의 효과성을 확인했다.
PDF

Development of a Deep Learning Network for Quality Inspection in a Multi-Camera Inline Inspection System for Pharmaceutical Containers (의약 용기의 다중 카메라 인라인 검사 시스템에서의 품질 검사를 위한 딥러닝 네트워크 개발)

Tae-Yoon Lee;Seok-Moon Yoon;Seung-Ho Lee
- Journal of IKEEE
- /
- v.28 no.3
- /
- pp.474-478
- /
- 2024
In this paper, we proposes a deep learning network for quality inspection in a multi-camera inline inspection system for pharmaceutical containers. The proposed deep learning network is specifically designed for pharmaceutical containers by using data produced in real manufacturing environments, leading to more accurate quality inspection. Additionally, the use of an inline-capable deep learning network allows for an increase in inspection speed. The development of the deep learning network for quality inspection in the multi-camera inline inspection system consists of three steps. First, a dataset of approximately 10,000 images is constructed from the production site using one line camera for foreign substance inspection and three area cameras for dimensional inspection. Second, the pharmaceutical container data is preprocessed by designating regions of interest (ROI) in areas where defects are likely to occur, tailored for foreign substance and dimensional inspections. Third, the preprocessed data is used to train the deep learning network. The network improves inference speed by reducing the number of channels and eliminating the use of linear layers, while accuracy is enhanced by applying PReLU and residual learning. This results in the creation of four deep learning modules tailored to the dataset built from the four cameras. The performance of the proposed deep learning network for quality inspection in the multi-camera inline inspection system for pharmaceutical containers was evaluated through experiments conducted by a certified testing agency. The results show that the deep learning modules achieved a classification accuracy of 99.4%, exceeding the world-class level of 95%, and an average classification speed of 0.947 seconds, which is superior to the world-class level of 1 second. Therefore, the effectiveness of the proposed deep learning network for quality inspection in a multi-camera inline inspection system for pharmaceutical containers has been demonstrated.
https://doi.org/10.7471/ikeee.2024.28.3.474 인용 PDF

Automated Areal Feature Matching in Different Spatial Data-sets (이종의 공간 데이터 셋의 면 객체 자동 매칭 방법)

Kim, Ji Young;Lee, Jae Bin
- Journal of Korean Society for Geospatial Information Science
- /
- v.24 no.1
- /
- pp.89-98
- /
- 2016
In this paper, we proposed an automated areal feature matching method based on geometric similarity without user intervention and is applied into areal features of many-to-many relation, for confusion of spatial data-sets of different scale and updating cycle. Firstly, areal feature(node) that a value of inclusion function is more than 0.4 was connected as an edge in adjacency matrix and candidate corresponding areal features included many-to-many relation was identified by multiplication of adjacency matrix. For geometrical matching, these multiple candidates corresponding areal features were transformed into an aggregated polygon as a convex hull generated by a curve-fitting algorithm. Secondly, we defined matching criteria to measure geometrical quality, and these criteria were changed into normalized values, similarity, by similarity function. Next, shape similarity is defined as a weighted linear combination of these similarities and weights which are calculated by Criteria Importance Through Intercriteria Correlation(CRITIC) method. Finally, in training data, we identified Equal Error Rate(EER) which is trade-off value in a plot of precision versus recall for all threshold values(PR curve) as a threshold and decided if these candidate pairs are corresponding pairs or not. To the result of applying the proposed method in a digital topographic map and a base map of address system(KAIS), we confirmed that some many-to-many areal features were mis-detected in visual evaluation and precision, recall and F-Measure was highly 0.951, 0.906, 0.928, respectively in statistical evaluation. These means that accuracy of the automated matching between different spatial data-sets by the proposed method is highly. However, we should do a research on an inclusion function and a detail matching criterion to exactly quantify many-to-many areal features in future.
https://doi.org/10.7319/kogsis.2016.24.1.089 인용 PDF KSCI

Crack detection in concrete using deep learning for underground facility safety inspection (지하시설물 안전점검을 위한 딥러닝 기반 콘크리트 균열 검출)

Eui-Ik Jeon;Impyeong Lee;Donggyou Kim
- Journal of Korean Tunnelling and Underground Space Association
- /
- v.25 no.6
- /
- pp.555-567
- /
- 2023
The cracks in the tunnel are currently determined through visual inspections conducted by inspectors based on images acquired using tunnel imaging acquisition systems. This labor-intensive approach, relying on inspectors, has inherent limitations as it is subject to their subjective judgments. Recently research efforts have actively explored the use of deep learning to automatically detect tunnel cracks. However, most studies utilize public datasets or lack sufficient objectivity in the analysis process, making it challenging to apply them effectively in practical operations. In this study, we selected test datasets consisting of images in the same format as those obtained from the actual inspection system to perform an objective evaluation of deep learning models. Additionally, we introduced ensemble techniques to complement the strengths and weaknesses of the deep learning models, thereby improving the accuracy of crack detection. As a result, we achieved high recall rates of 80%, 88%, and 89% for cracks with sizes of 0.2 mm, 0.3 mm, and 0.5 mm, respectively, in the test images. In addition, the crack detection result of deep learning included numerous cracks that the inspector could not find. if cracks are detected with sufficient accuracy in a more objective evaluation by selecting images from other tunnels that were not used in this study, it is judged that deep learning will be able to be introduced to facility safety inspection.
https://doi.org/10.9711/KTAJ.2023.25.6.555 인용 PDF

A Combined Forecast Scheme of User-Based and Item-based Collaborative Filtering Using Neighborhood Size (이웃크기를 이용한 사용자기반과 아이템기반 협업여과의 결합예측 기법)

Choi, In-Bok;Lee, Jae-Dong
- The KIPS Transactions:PartB
- /
- v.16B no.1
- /
- pp.55-62
- /
- 2009
Collaborative filtering is a popular technique that recommends items based on the opinions of other people in recommender systems. Memory-based collaborative filtering which uses user database can be divided in user-based approaches and item-based approaches. User-based collaborative filtering predicts a user's preference of an item using the preferences of similar neighborhood, while item-based collaborative filtering predicts the preference of an item based on the similarity of items. This paper proposes a combined forecast scheme that predicts the preference of a user to an item by combining user-based prediction and item-based prediction using the ratio of the number of similar users and the number of similar items. Experimental results using MovieLens data set and the BookCrossing data set show that the proposed scheme improves the accuracy of prediction for movies and books compared with the user-based scheme and item-based scheme.
https://doi.org/10.3745/KIPSTB.2009.16-B.1.55 인용 PDF KSCI

Study on Detection Technique for Cochlodinium polykrikoides Red tide using Logistic Regression Model under Imbalanced Data (불균형 데이터 환경에서 로지스틱 회귀모형을 이용한 Cochlodinium polykrikoides 적조 탐지 기법 연구)

Bak, Su-Ho;Kim, Heung-Min;Kim, Bum-Kyu;Hwang, Do-Hyun;Enkhjargal, Unuzaya;Yoon, Hong-Joo
- The Journal of the Korea institute of electronic communication sciences
- /
- v.13 no.6
- /
- pp.1353-1364
- /
- 2018
This study proposed a method to detect Cochlodinium polykrikoides red tide pixels in satellite images using a logistic regression model of machine learning technique under Imbalanced data. The spectral profiles extracted from red tide, clear water, and turbid water were used as training dataset. 70% of the entire data set was extracted and used for as model training, and the classification accuracy of the model was evaluated using the remaining 30%. At this time, the white noise was added to the spectral profile of the red tide, which has a relatively small number of data compared to the clear water and the turbid water, and over-sampling was performed to solve the unbalanced data problem. As a result of the accuracy evaluation, the proposed algorithm showed about 94% classification accuracy.
https://doi.org/10.13067/JKIECS.2018.13.6.1353 인용 PDF KSCI HTML

Study on the Performance Evaluation of Encoding and Decoding Schemes in Vector Symbolic Architectures (벡터 심볼릭 구조의 부호화 및 복호화 성능 평가에 관한 연구)

Youngseok Lee
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.17 no.4
- /
- pp.229-235
- /
- 2024
Recent years have seen active research on methods for efficiently processing and interpreting large volumes of data in the fields of artificial intelligence and machine learning. One of these data processing technologies, Vector Symbolic Architecture (VSA), offers an innovative approach to representing complex symbols and data using high-dimensional vectors. VSA has garnered particular attention in various applications such as natural language processing, image recognition, and robotics. This study quantitatively evaluates the characteristics and performance of VSA methodologies by applying five VSA methodologies to the MNIST dataset and measuring key performance indicators such as encoding speed, decoding speed, memory usage, and recovery accuracy across different vector lengths. BSC and VT demonstrated relatively fast performance in encoding and decoding speeds, while MAP and HRR were relatively slow. In terms of memory usage, BSC was the most efficient, whereas MAP used the most memory. The recovery accuracy was highest for MAP and lowest for BSC. The results of this study provide a basis for selecting appropriate VSA methodologies depending on the application area.
https://doi.org/10.17661/jkiiect.2024.17.4.229 인용 PDF HTML

Search Result 483, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)