• Title/Summary/Keyword: positional encoding

Search Result 16, Processing Time 0.023 seconds

Enhancing A Neural-Network-based ISP Model through Positional Encoding (위치 정보 인코딩 기반 ISP 신경망 성능 개선)

  • DaeYeon Kim;Woohyeok Kim;Sunghyun Cho
    • Journal of the Korea Computer Graphics Society
    • /
    • v.30 no.3
    • /
    • pp.81-86
    • /
    • 2024
  • The Image Signal Processor (ISP) converts RAW images captured by the camera sensor into user-preferred sRGB images. While RAW images contain more meaningful information for image processing than sRGB images, RAW images are rarely shared due to their large sizes. Moreover, the actual ISP process of a camera is not disclosed, making it difficult to model the inverse process. Consequently, research on learning the conversion between sRGB and RAW has been conducted. Recently, the ParamISP[1] model, which directly incorporates camera parameters (exposure time, sensitivity, aperture size, and focal length) to mimic the operations of a real camera ISP, has been proposed by advancing the simple network structures. However, existing studies, including ParamISP[1], have limitations in modeling the camera ISP as they do not consider the degradation caused by lens shading, optical aberration, and lens distortion, which limits the restoration performance. This study introduces Positional Encoding to enable the camera ISP neural network to better handle degradations caused by lens. The proposed positional encoding method is suitable for camera ISP neural networks that learn by dividing the image into patches. By reflecting the spatial context of the image, it allows for more precise image restoration compared to existing models.

CoNSIST : Consist of New methodologies on AASIST, leveraging Squeeze-and-Excitation, Positional Encoding, and Re-formulated HS-GAL

  • Jae-Hoon Ha;Joo-Won Mun;Sang-Yup Lee
    • Annual Conference of KIPS
    • /
    • 2024.05a
    • /
    • pp.692-695
    • /
    • 2024
  • With the recent advancements in artificial intelligence (AI), the performance of deep learning-based audio deepfake technology has significantly improved. This technology has been exploited for criminal activities, leading to various cases of victimization. To prevent such illicit outcomes, this paper proposes a deep learning-based audio deepfake detection model. In this study, we propose CoNSIST, an improved audio deepfake detection model, which incorporates three additional components into the graph-based end-to-end model AASIST: (i) Squeeze and Excitation, (ii) Positional Encoding, and (iii) Reformulated HS-GAL, This incorporation is expected to enable more effective feature extraction, elimination of unnecessary operations, and consideration of more diverse information, thereby improving the performance of the original AASIST. The results of multiple experiments indicate that CoNSIST has enhanced the performance of audio deepfake detection compared to existing models.

Vessel Positional Information Service using AIS and XML (선박자동식별시스템(AIS)과 XML을 이용한 선박위치정보 서비스)

  • Seo, Min-Ho;Kim, Geon-Ung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.12
    • /
    • pp.2590-2598
    • /
    • 2011
  • AIS(Automatic Identification System) is a key of maritime information system to prevent maritime accident through communication among any adjacent ship and to establish information base by fusing information collected from ships with other information. The AIS information includes various potential information, but it is not easy to use due to insufficiency storing and management. Furthermore, we need AIS transponders to acquire the information. In this paper, we propose the vessel positional information service using AIS and XML. We decode the AIS information from the NMEA-0183 encoding data and store the AIS information to the database and provide the access service in the internet using XML.

High-Speed Transformer for Panoptic Segmentation

  • Baek, Jong-Hyeon;Kim, Dae-Hyun;Lee, Hee-Kyung;Choo, Hyon-Gon;Koh, Yeong Jun
    • Journal of Broadcast Engineering
    • /
    • v.27 no.7
    • /
    • pp.1011-1020
    • /
    • 2022
  • Recent high-performance panoptic segmentation models are based on transformer architectures. However, transformer-based panoptic segmentation methods are basically slower than convolution-based methods, since the attention mechanism in the transformer requires quadratic complexity w.r.t. image resolution. Also, sine and cosine computation for positional embedding in the transformer also yields a bottleneck for computation time. To address these problems, we adopt three modules to speed up the inference runtime of the transformer-based panoptic segmentation. First, we perform channel-level reduction using depth-wise separable convolution for inputs of the transformer decoder. Second, we replace sine and cosine-based positional encoding with convolution operations, called conv-embedding. We also apply a separable self-attention to the transformer encoder to lower quadratic complexity to linear one for numbers of image pixels. As result, the proposed model achieves 44% faster frame per second than baseline on ADE20K panoptic validation dataset, when we use all three modules.

Genome-wide association studies to identify quantitative trait loci and positional candidate genes affecting meat quality-related traits in pigs

  • Jae-Bong Lee;Ji-Hoon Lim;Hee-Bok Park
    • Journal of Animal Science and Technology
    • /
    • v.65 no.6
    • /
    • pp.1194-1204
    • /
    • 2023
  • Meat quality comprises a set of key traits such as pH, meat color, water-holding capacity, tenderness and marbling. These traits are complex because they are affected by multiple genetic and environmental factors. The aim of this study was to investigate the molecular genetic basis underlying nine meat quality-related traits in a Yorkshire pig population using a genome-wide association study (GWAS) and subsequent biological pathway analysis. In total, 45,926 single nucleotide polymorphism (SNP) markers from 543 pigs were selected for the GWAS after quality control. Data were analyzed using a genome-wide efficient mixed model association (GEMMA) method. This linear mixed model-based approach identified two quantitative trait loci (QTLs) for meat color (b*) on chromosome 2 (SSC2) and one QTL for shear force on chromosome 8 (SSC8). These QTLs acted additively on the two phenotypes and explained 3.92%-4.57% of the phenotypic variance of the traits of interest. The genes encoding HAUS8 on SSC2 and an lncRNA on SSC8 were identified as positional candidate genes for these QTLs. The results of the biological pathway analysis revealed that positional candidate genes for meat color (b*) were enriched in pathways related to muscle development, muscle growth, intramuscular adipocyte differentiation, and lipid accumulation in muscle, whereas positional candidate genes for shear force were overrepresented in pathways related to cell growth, cell differentiation, and fatty acids synthesis. Further verification of these identified SNPs and genes in other independent populations could provide valuable information for understanding the variations in pork quality-related traits.

A Study about Learning Graph Representation on Farmhouse Apple Quality Images with Graph Transformer (그래프 트랜스포머 기반 농가 사과 품질 이미지의 그래프 표현 학습 연구)

  • Ji Hun Bae;Ju Hwan Lee;Gwang Hyun Yu;Gyeong Ju Kwon;Jin Young Kim
    • Smart Media Journal
    • /
    • v.12 no.1
    • /
    • pp.9-16
    • /
    • 2023
  • Recently, a convolutional neural network (CNN) based system is being developed to overcome the limitations of human resources in the apple quality classification of farmhouse. However, since convolutional neural networks receive only images of the same size, preprocessing such as sampling may be required, and in the case of oversampling, information loss of the original image such as image quality degradation and blurring occurs. In this paper, in order to minimize the above problem, to generate a image patch based graph of an original image and propose a random walk-based positional encoding method to apply the graph transformer model. The above method continuously learns the position embedding information of patches which don't have a positional information based on the random walk algorithm, and finds the optimal graph structure by aggregating useful node information through the self-attention technique of graph transformer model. Therefore, it is robust and shows good performance even in a new graph structure of random node order and an arbitrary graph structure according to the location of an object in an image. As a result, when experimented with 5 apple quality datasets, the learning accuracy was higher than other GNN models by a minimum of 1.3% to a maximum of 4.7%, and the number of parameters was 3.59M, which was about 15% less than the 23.52M of the ResNet18 model. Therefore, it shows fast reasoning speed according to the reduction of the amount of computation and proves the effect.

A Comparative Study of Spoken and Written Sentence Production in Adults with Fluent Aphasia (유창성 실어증 환자의 구어와 문어 문장산출 능력 비교)

  • Ha, Ji-Wan;Pyun, Sung-Bom;Hwang, Yu Mi;Yi, Hoyoung;Sim, Hyun Sub
    • Phonetics and Speech Sciences
    • /
    • v.5 no.3
    • /
    • pp.103-111
    • /
    • 2013
  • Traditionally it has been assumed that written abilities are completely dependent on phonology. Therefore spoken and written language skills in aphasic patients have been known to exhibit similar types of impairment. However, a number of latest studies have reported the findings that support the orthographic autonomy hypothesis. The purpose of this study was to examine whether fluent aphasic patients have discrepancy between speaking and writing skills, thereby identifying whether the two skills are realized through independent processes. To this end, this study compared the K-FAST speaking and writing tasks of 30 aphasia patients. In addition, 16 aphasia patients, who were capable of producing sentences not only in speaking but also in writing, were compared in their performances at each phase of the sentence production process. As a result, the subjects exhibited different performances between speaking and writing, along with statistically significant differences between the two language skills at positional and phonological encoding phases of the sentence production process. Therefore, the study's results suggest that written language is more likely to be produced via independent routes without the mediation of the process of spoken language production, beginning from a certain phase of the sentence production process.

A GIS Vector Data Compression Method Considering Dynamic Updates

  • Chun Woo-Je;Joo Yong-Jin;Moon Kyung-Ky;Lee Yong-Ik;Park Soo-Hong
    • Spatial Information Research
    • /
    • v.13 no.4 s.35
    • /
    • pp.355-364
    • /
    • 2005
  • Vector data sets (e.g. maps) are currently major sources of displaying, querying, and identifying locations of spatial features in a variety of applications. Especially in mobile environment, the needs for using spatial data is increasing, and the relative large size of vector maps need to be smaller. Recently, there have been several studies about vector map compression. There was clustering-based compression method with novel encoding/decoding scheme. However, precedent studies did not consider that spatial data have to be updated periodically. This paper explores the problem of existing clustering-based compression method. We propose an adaptive approximation method that is capable of handling data updates as well as reducing error levels. Experimental evaluation showed that when an updated event occurred the proposed adaptive approximation method showed enhanced positional accuracy compared with simple cluster based compression method.

  • PDF

Korean End-to-End Coreference Resolution with BERT for Long Document (긴 문서를 위한 BERT 기반의 End-to-End 한국어 상호참조해결)

  • Jo, Kyeongbin;Jung, Youngjun;Lee, Changki;Ryu, Jihee;Lim, Joonho
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.259-263
    • /
    • 2021
  • 상호참조해결은 주어진 문서에서 상호참조해결 대상이 되는 멘션(mention)을 식별하고, 동일한 개체(entity)를 의미하는 멘션들을 찾아 그룹화하는 자연어처리 태스크이다. 최근 상호참조해결에서는 BERT를 이용하여 단어의 문맥 표현을 얻은 후, 멘션 탐지와 상호참조해결을 동시에 진행하는 end-to-end 모델이 주로 연구되었으나, 512 토큰 이상의 긴 문서를 처리하기 위해서는 512 토큰 이하로 문서를 분할하여 처리하기 때문에 길이가 긴 문서에 대해서는 상호참조해결 성능이 낮아지는 문제가 있다. 본 논문에서는 512 토큰 이상의 긴 문서를 위한 BERT 기반의 end-to-end 상호참조해결 모델을 제안한다. 본 모델은 긴 문서를 512 이하의 토큰으로 쪼개어 기존의 BERT에서 단어의 1차 문맥 표현을 얻은 후, 이들을 다시 연결하여 긴 문서의 Global Positional Encoding 또는 Embedding 값을 더한 후 Global BERT layer를 거쳐 단어의 최종 문맥 표현을 얻은 후, end-to-end 상호참조해결 모델을 적용한다. 실험 결과, 본 논문에서 제안한 모델이 기존 모델과 유사한 성능을 보이면서(테스트 셋에서 0.16% 성능 향상), GPU 메모리 사용량은 1.4배 감소하고 속도는 2.1배 향상되었다.

  • PDF

A statistical journey to DNN, the third trip: Language model and transformer (심층신경망으로 가는 통계 여행, 세 번째 여행: 언어모형과 트랜스포머)

  • Yu Jin Kim;In Jun Hwang;Kisuk Jang;Yoon Dong Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.5
    • /
    • pp.567-582
    • /
    • 2024
  • Over the past decade, the remarkable advancements in deep neural networks have paralleled the development and evolution of language models. Initially, language models were developed in the form of Encoder-Decoder models using early RNNs. However, with the introduction of Attention in 2015 and the emergence of the Transformer in 2017, the field saw revolutionary growth. This study briefly reviews the development process of language models and examines in detail the working mechanism and technical elements of the Transformer. Additionally, it explores statistical models and methodologies related to language models and the Transformer.