• Title/Summary/Keyword: End-to-end learning

Search Result 1,171, Processing Time 0.028 seconds

Conformer with lexicon transducer for Korean end-to-end speech recognition (Lexicon transducer를 적용한 conformer 기반 한국어 end-to-end 음성인식)

  • Son, Hyunsoo;Park, Hosung;Kim, Gyujin;Cho, Eunsoo;Kim, Ji-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.530-536
    • /
    • 2021
  • Recently, due to the development of deep learning, end-to-end speech recognition, which directly maps graphemes to speech signals, shows good performance. Especially, among the end-to-end models, conformer shows the best performance. However end-to-end models only focuses on the probability of which grapheme will appear at the time. The decoding process uses a greedy search or beam search. This decoding method is easily affected by the final probability output by the model. In addition, the end-to-end models cannot use external pronunciation and language information due to structual problem. Therefore, in this paper conformer with lexicon transducer is proposed. We compare phoneme-based model with lexicon transducer and grapheme-based model with beam search. Test set is consist of words that do not appear in training data. The grapheme-based conformer with beam search shows 3.8 % of CER. The phoneme-based conformer with lexicon transducer shows 3.4 % of CER.

Analysis and Application of Front-End Code Playground Tools for Web Programming Education

  • Aaron Daniel Snowberger;Semin Kim;SungHee Woo
    • Journal of Practical Engineering Education
    • /
    • v.16 no.1_spc
    • /
    • pp.11-19
    • /
    • 2024
  • Web programming courses are often included in university Computer Science programs as introductory and foundational computer programming courses. However, amateur programmers often have difficulty learning how to integrate HTML, CSS, JavaScript, and various preprocessors or libraries to create websites. Additionally, many web programming mistakes do not produce visible output in the browser. Therefore, in recent years, Front-End Code Playground (FECP) tools that incorporate HTML, CSS, and JavaScript into a single, online web-based application have become popular. These tools allow web coding to happen directly in the browser and provide immediate visual feedback to users. Such immediate visual feedback can be particularly beneficial for amateur coders to learn and practice with. Therefore, this study gathers data on various FECP tools, compares their differences, and provides an analysis of how such tools benefit students. This study concludes with an outline of the application of FECP to web programming courses to enhance the learning experience.

Development of a Visual Servo System in a Mobile Manipulator for Operating Numeral Buttons (이동형 머니퓰레이터의 숫자버튼 조작을 위한 시각제어 시스템 개발)

  • 박민규;이민철;주원동
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.21 no.7
    • /
    • pp.92-100
    • /
    • 2004
  • A service robot is expected to be useful in indoor environment such as a hotel, a hospital and so on. However, many service robots are driven by wheels so that they cannot climb stairs to move to other floors. If the robot cannot use elevators. In this paper, the mobile manipulator system was developed, which can operate numeral buttons on the operating panel in the elevator. To perform this task, the robot is composed of an image recognition module, an ultrasonic sensor module and a manipulator. The robot can recognize numeral buttons and an end-effector in manipulator by the vision system. The Learning vector quantization (LVQ) algorithm is used to recognize the number on the button. The barcode mark on the end-effector is used to recognize the end-effector. The manipulator can push numeral buttons using informations captured by the vision system. The proposed method is evaluated by experiments.

Self-Organization of Visuo-Motor Map Considering an Obstacle

  • Maruki, Yuji
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1168-1171
    • /
    • 2003
  • The visuo-motor map is based on the Kohonen's self-organizing map. The map is learned the relation of the end effecter coordinates and the joint angles. In this paper, a 3 d-o-fmanipulator which moves in the 2D space is targeted. A CCD camera is set beside the manipulator, and the end effecter coordinates are given from the image of a manipulator. As a result of learning, the end effecter can be moved to the destination without exact teaching.

  • PDF

Uncertainty Sequence Modeling Approach for Safe and Effective Autonomous Driving (안전하고 효과적인 자율주행을 위한 불확실성 순차 모델링)

  • Yoon, Jae Ung;Lee, Ju Hong
    • Smart Media Journal
    • /
    • v.11 no.9
    • /
    • pp.9-20
    • /
    • 2022
  • Deep reinforcement learning(RL) is an end-to-end data-driven control method that is widely used in the autonomous driving domain. However, conventional RL approaches have difficulties in applying it to autonomous driving tasks due to problems such as inefficiency, instability, and uncertainty. These issues play an important role in the autonomous driving domain. Although recent studies have attempted to solve these problems, they are computationally expensive and rely on special assumptions. In this paper, we propose a new algorithm MCDT that considers inefficiency, instability, and uncertainty by introducing a method called uncertainty sequence modeling to autonomous driving domain. The sequence modeling method, which views reinforcement learning as a decision making generation problem to obtain high rewards, avoids the disadvantages of exiting studies and guarantees efficiency, stability and also considers safety by integrating uncertainty estimation techniques. The proposed method was tested in the OpenAI Gym CarRacing environment, and the experimental results show that the MCDT algorithm provides efficient, stable and safe performance compared to the existing reinforcement learning method.

개도국의 기술개발 환경에 대한 국제 정치적 영향 요인 분석

  • 이태준;이광석
    • Journal of Technology Innovation
    • /
    • v.10 no.2
    • /
    • pp.131-148
    • /
    • 2002
  • This paper explores how international political factors influence the role of conventional external factors in the course of technological learning. The research goes on to investigate whether the role of the techno-economic factors has changed due to the involvement of international political factors in the technological learning mechanism. To this end, this paper examines how US political intervention affected Korean technological learning in the back-end of the nuclear fuel cycle. The export policy, prior consent policy and international political influence of the US are employed as international political factors. The empirical findings show that international political factors are very likely to restrain the impact of the techno-economic factors on technological learning process. Accordingly, this paper hypothesizes that the role of techno-economic factors in the technological learning mechanism is weaker when international political intervention is involved.

  • PDF

Superpixel-based Apple Leaf Disease Classification using Convolutional Neural Network (합성곱 신경망을 이용하는 수퍼픽셀 기반 사과잎 병충해의 분류)

  • Kim, Manbae;Choi, Changyeol
    • Journal of Broadcast Engineering
    • /
    • v.25 no.2
    • /
    • pp.208-217
    • /
    • 2020
  • The classification of plant diseases by images captured by a camera sensor has been studied over past decades. A method that has gained much interest is to use image segmentation, from which statistical features are derived and analyzed by machine learning. Recently, deep learning has been adopted in this area. However, image segmentation is still a difficult task to achieve stable performance due to a variety of environmental variations. The end-to-end learning in neural network has a demerit that train images may be different from real images acquired in outdoor fields. To solve these problems, we propose superpixel-based disease classification method using end-to-end CNN (convolutional neural network) learning. Based on experiments performed on PlantVillage apple images, the classification accuracy is 98.29% and 92.43% for full-image and superpixel. As well, the multivariate F1-score is (0.98, 0.93). Therefore we validate that the method of using superpixel is comparable to that of full-image.

Deep Learning for Classification of High-End Fashion Brand Sensibility (딥러닝을 통한 하이엔드 패션 브랜드 감성 학습)

  • Jang, Seyoon;Kim, Ha Youn;Lee, Yuri;Seol, Jinseok;Kim, Seongjae;Lee, Sang-goo
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.46 no.1
    • /
    • pp.165-181
    • /
    • 2022
  • The fashion industry is creating innovative business models using artificial intelligence. To efficiently utilize artificial intelligence (AI), fashion data must be classified. Until now, such data have been classified focusing only on the objective properties of fashion products. Their subjective attributes, such as fashion brand sensibilities, are holistic and heuristic intuitions created by a combination of design elements. This study aims to improve the performance of collaborative filtering in the fashion industry by extracting fashion brand sensibility using computer vision technology. The image data set of fashion brand sensibility consists of high-end fashion brand photos that share sensibilities and communicate well in fashion. About 26,000 fashion photos of 11 high-end fashion brand sensibility labels have been collected from the 16FW to 21SS runway and 50 years of US Vogue magazines beginning from 1971. We use EfficientNet-B1 to establish the main architecture and fine-tune the network with ImageNet-ILSVRC. After training fashion brand sensibilities through deep learning, the proposed model achieved an F-1 score of 74% on accuracy tests. Furthermore, as a result of comparing AI machine and human experts, the proposed model is expected to be expanded to mass fashion brands.

Prerequisite Research for the Development of an End-to-End System for Automatic Tooth Segmentation: A Deep Learning-Based Reference Point Setting Algorithm (자동 치아 분할용 종단 간 시스템 개발을 위한 선결 연구: 딥러닝 기반 기준점 설정 알고리즘)

  • Kyungdeok Seo;Sena Lee;Yongkyu Jin;Sejung Yang
    • Journal of Biomedical Engineering Research
    • /
    • v.44 no.5
    • /
    • pp.346-353
    • /
    • 2023
  • In this paper, we propose an innovative approach that leverages deep learning to find optimal reference points for achieving precise tooth segmentation in three-dimensional tooth point cloud data. A dataset consisting of 350 aligned maxillary and mandibular cloud data was used as input, and both end coordinates of individual teeth were used as correct answers. A two-dimensional image was created by projecting the rendered point cloud data along the Z-axis, where an image of individual teeth was created using an object detection algorithm. The proposed algorithm is designed by adding various modules to the Unet model that allow effective learning of a narrow range, and detects both end points of the tooth using the generated tooth image. In the evaluation using DSC, Euclid distance, and MAE as indicators, we achieved superior performance compared to other Unet-based models. In future research, we will develop an algorithm to find the reference point of the point cloud by back-projecting the reference point detected in the image in three dimensions, and based on this, we will develop an algorithm to divide the teeth individually in the point cloud through image processing techniques.

Korean Semantic Role Labeling using Stacked Bidirectional LSTM-CRFs (Stacked Bidirectional LSTM-CRFs를 이용한 한국어 의미역 결정)

  • Bae, Jangseong;Lee, Changki
    • Journal of KIISE
    • /
    • v.44 no.1
    • /
    • pp.36-43
    • /
    • 2017
  • Syntactic information represents the dependency relation between predicates and arguments, and it is helpful for improving the performance of Semantic Role Labeling systems. However, syntax analysis can cause computational overhead and inherit incorrect syntactic information. To solve this problem, we exclude syntactic information and use only morpheme information to construct Semantic Role Labeling systems. In this study, we propose an end-to-end SRL system that only uses morpheme information with Stacked Bidirectional LSTM-CRFs model by extending the LSTM RNN that is suitable for sequence labeling problem. Our experimental results show that our proposed model has better performance, as compare to other models.