• 제목/요약/키워드: Open set recognition

검색결과 35건 처리시간 0.027초

객체 검출을 위한 2차원 인조데이터 셋 구축 시스템과 데이터 특징 및 배치 구조에 따른 검출률 분석 : 자동차 번호판 검출을 중점으로 (2D Artificial Data Set Construction System for Object Detection and Detection Rate Analysis According to Data Characteristics and Arrangement Structure: Focusing on vehicle License Plate Detection)

  • 김상준;최진원;김도영;박구만
    • 방송공학회논문지
    • /
    • 제27권2호
    • /
    • pp.185-197
    • /
    • 2022
  • 최근 객체 인식에 높은 성능을 가진 딥러닝 네트워크가 나오고 있다. 딥러닝을 이용한 객체 인식의 경우 성능 향상을 위해 학습 데이터 셋 구축이 중요하다. 데이터 셋을 구축하기 위해서는 이미지를 수집하고 라벨링 해야 한다. 이 과정은 많은 시간과 인력이 필요하다. 때문에 오픈 데이터 셋을 사용한다. 그러나 방대한 오픈 데이터 셋을 가지고 있지 않는 객체도 존재한다. 그 중 하나가 번호판 검출과 인식에 필요한 데이터이다. 이에 본 논문에서는 이미지를 최소화 하여 대용량 데이터 셋을 만들 수 있는 인조 번호판 생성기 시스템을 제안한다. 또한 인조 번호판 배치구조에 따른 검출률을 분석했다. 분석결과 가장 좋은 배치구조는 FVC_III, B이며 가장 적합한 네트워크는 D2Det이었다. 인조 데이터셋 성능은 실제 데이터셋의 성능보다 2~3%가 낮았지만, 인조 데이터를 구축하는 시간이 실제 데이터셋을 구축하는 시간보다 약 11배 빨라 시간적으로 효율적인 데이터 셋 구축 시스템임을 증명하였다.

I-벡터 기반 오픈세트 언어 인식을 위한 다중 판별 DNN (Multiple Discriminative DNNs for I-Vector Based Open-Set Language Recognition)

  • 강우현;조원익;강태균;김남수
    • 한국통신학회논문지
    • /
    • 제41권8호
    • /
    • pp.958-964
    • /
    • 2016
  • 본 논문에서는 여러 개의 이원 support vector machine (binary SVM)을 사용하여 세 개 이상의 클래스를 분류하는 multi-class SVM과 유사하게 다중의 판별 deep neural network (DNN) 모델을 사용하는 i-벡터 기반의 언어 인식 시스템을 제안한다. 제안하는 시스템은 NIST 2015 i-vector Machine Learning Challenge 데이터베이스에 포함된 i-벡터들을 이용하여 학습 및 테스트 되었으며, 오픈 세트에서 기존의 cosine distance, multi-class SVM 및 단일 neural network (NN) 기반의 언어 인식 시스템에 비하여 높은 성능을 보임이 확인되었다.

Knowledge-Based Numeric Open Caption Recognition for Live Sportscast

  • Sung, Si-Hun
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 Ⅳ
    • /
    • pp.1871-1874
    • /
    • 2003
  • Knowledge-based numeric open caption recognition is proposed that can recognize numeric captions generated by character generator (CG) and automatically superimpose a modified caption using the recognized text only when a valid numeric caption appears in the aimed specific region of a live sportscast scene produced by other broadcasting stations. in the proposed method, mesh features are extracted from an enhanced binary image as feature vectors, then a valuable information is recovered from a numeric image by perceiving the character using a multiplayer perceptron (MLP) network. The result is verified using knowledge-based hie set designed for a more stable and reliable output and then the modified information is displayed on a screen by CG. MLB Eye Caption based on the proposed algorithm has already been used for regular Major League Base-ball (MLB) programs broadcast five over a Korean nationwide TV network and has produced a favorable response from Korean viewer.

  • PDF

Exploring the feasibility of fine-tuning large-scale speech recognition models for domain-specific applications: A case study on Whisper model and KsponSpeech dataset

  • Jungwon Chang;Hosung Nam
    • 말소리와 음성과학
    • /
    • 제15권3호
    • /
    • pp.83-88
    • /
    • 2023
  • This study investigates the fine-tuning of large-scale Automatic Speech Recognition (ASR) models, specifically OpenAI's Whisper model, for domain-specific applications using the KsponSpeech dataset. The primary research questions address the effectiveness of targeted lexical item emphasis during fine-tuning, its impact on domain-specific performance, and whether the fine-tuned model can maintain generalization capabilities across different languages and environments. Experiments were conducted using two fine-tuning datasets: Set A, a small subset emphasizing specific lexical items, and Set B, consisting of the entire KsponSpeech dataset. Results showed that fine-tuning with targeted lexical items increased recognition accuracy and improved domain-specific performance, with generalization capabilities maintained when fine-tuned with a smaller dataset. For noisier environments, a trade-off between specificity and generalization capabilities was observed. This study highlights the potential of fine-tuning using minimal domain-specific data to achieve satisfactory results, emphasizing the importance of balancing specialization and generalization for ASR models. Future research could explore different fine-tuning strategies and novel technologies such as prompting to further enhance large-scale ASR models' domain-specific performance.

신경망 기반의 유기된 물체 인식 방법 (The Method of Abandoned Object Recognition based on Neural Networks)

  • 류동균;이재흥
    • 전기전자학회논문지
    • /
    • 제22권4호
    • /
    • pp.1131-1139
    • /
    • 2018
  • 본 논문에서는 합성곱 신경망을 이용한 유기된 물체 인식 방법을 제안한다. 유기된 물체 인식 방법은 영상 내에서 유기 물체에 대한 영역을 먼저 검출하며 검출된 영역이 있을 경우 해당 영역에 합성곱 신경망을 적용하여 어떤 물체를 나타내는지 인식하는 과정을 거친다. 실험은 쓰레기 무단투기를 검출하는 응용 시스템을 통해 진행되었다. 실험 결과, 유기 물체에 대한 영역을 효율적으로 검출하는 것을 볼 수 있었다. 검출된 영역은 합성곱 신경망으로 들어가 쓰레기인지 아닌지 분류되는 과정을 거쳤다. 이를 위해 자체적으로 수집한 쓰레기 데이터와 오픈 데이터베이스로 합성곱 신경망을 학습시켰다. 학습 결과, 학습에 포함되지 않은 테스트셋에 대해 약 97%의 정확도를 달성하였다.

Impostor Detection in Speaker Recognition Using Confusion-Based Confidence Measures

  • Kim, Kyu-Hong;Kim, Hoi-Rin;Hahn, Min-Soo
    • ETRI Journal
    • /
    • 제28권6호
    • /
    • pp.811-814
    • /
    • 2006
  • In this letter, we introduce confusion-based confidence measures for detecting an impostor in speaker recognition, which does not require an alternative hypothesis. Most traditional speaker verification methods are based on a hypothesis test, and their performance depends on the robustness of an alternative hypothesis. Compared with the conventional Gaussian mixture model-universal background model (GMM-UBM) scheme, our confusion-based measures show better performance in noise-corrupted speech. The additional computational requirements for our methods are negligible when used to detect or reject impostors.

  • PDF

잡음 환경에 강인한 원거리 음향 정보 검출 기술 연구 (Noise robust distant sound recognition)

  • 유인철;육동석
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.37-38
    • /
    • 2007
  • This paper reviews the issues in implementing sound recognizers in real environments. First is the signal corruption caused by background noises and reverberation. Second is the open-set problem which is the problem of rejecting out-of-vocabulary words and noises. These two issues must be solved for noise robust recognizers.

  • PDF

블럭 FFT를 이용한 실시간 지문 인식 알고리즘 (Automatic Real-time Identification of Fingerprint Images Using Block-FFT)

  • 안도성;김학일
    • 전자공학회논문지B
    • /
    • 제32B권6호
    • /
    • pp.909-921
    • /
    • 1995
  • The objective of this paper is to develop an algorithm for a real-time automatic fingerprint recognition system. The algorithm employs the Fast Fourier Transform (FFT) in determining the directions of ridges in fingerprint images, and utilizes statistical information in recognizing the fingerprints. The information used in fingerprint recognition is based on the dircetions along ridge curves and characteristic points such as core points and delta points. In order to find ridge directions, the algorithm applies the FFT to a small block of the size 8x8 pixels, and decides the directions by interpreting the resulted Fourier spectrum. By using the FFT, the algorithm does not require conventional preprocessing procedures such as smoothing, binarization, thinning, and restorationl. Finally, in matching two fingerprint images, the algorithm searches and compares two kinds of feature blocks, one as the blocks where the dircetions cannot be defined from the Fourier spectrum, and the other as the blocks where the changes of directions become abrupt. The proposed algorithm has been implemented on a SunSparc-2 workstation under the Open Window environment. In the experiment, the proposed algorithm has been applied to a set of fingerprint images obtained by a prism system. The result has shown that while the rate of Type II error - Incorrect recognition of two different fingerprints as the identical fingerprints - is held at 0.0%, the rate of Type I error - Incorrect recognition of two identical fingerprints as the different ones - is 2.2%.

  • PDF

Novel Category Discovery in Plant Species and Disease Identification through Knowledge Distillation

  • Jiuqing Dong;Alvaro Fuentes;Mun Haeng Lee;Taehyun Kim;Sook Yoon;Dong Sun Park
    • 스마트미디어저널
    • /
    • 제13권7호
    • /
    • pp.36-44
    • /
    • 2024
  • Identifying plant species and diseases is crucial for maintaining biodiversity and achieving optimal crop yields, making it a topic of significant practical importance. Recent studies have extended plant disease recognition from traditional closed-set scenarios to open-set environments, where the goal is to reject samples that do not belong to known categories. However, in open-world tasks, it is essential not only to define unknown samples as "unknown" but also to classify them further. This task assumes that images and labels of known categories are available and that samples of unknown categories can be accessed. The model classifies unknown samples by learning the prior knowledge of known categories. To the best of our knowledge, there is no existing research on this topic in plant-related recognition tasks. To address this gap, this paper utilizes knowledge distillation to model the category space relationships between known and unknown categories. Specifically, we identify similarities between different species or diseases. By leveraging a fine-tuned model on known categories, we generate pseudo-labels for unknown categories. Additionally, we enhance the baseline method's performance by using a larger pre-trained model, dino-v2. We evaluate the effectiveness of our method on the large plant specimen dataset Herbarium 19 and the disease dataset Plant Village. Notably, our method outperforms the baseline by 1% to 20% in terms of accuracy for novel category classification. We believe this study will contribute to the community.

A Low-Cost Speech to Sign Language Converter

  • Le, Minh;Le, Thanh Minh;Bui, Vu Duc;Truong, Son Ngoc
    • International Journal of Computer Science & Network Security
    • /
    • 제21권3호
    • /
    • pp.37-40
    • /
    • 2021
  • This paper presents a design of a speech to sign language converter for deaf and hard of hearing people. The device is low-cost, low-power consumption, and it can be able to work entirely offline. The speech recognition is implemented using an open-source API, Pocketsphinx library. In this work, we proposed a context-oriented language model, which measures the similarity between the recognized speech and the predefined speech to decide the output. The output speech is selected from the recommended speech stored in the database, which is the best match to the recognized speech. The proposed context-oriented language model can improve the speech recognition rate by 21% for working entirely offline. A decision module based on determining the similarity between the two texts using Levenshtein distance decides the output sign language. The output sign language corresponding to the recognized speech is generated as a set of sequential images. The speech to sign language converter is deployed on a Raspberry Pi Zero board for low-cost deaf assistive devices.