• Title/Summary/Keyword: large-language model

Search Result 298, Processing Time 0.022 seconds

Generalization of error decision rules in a grammar checker using Korean WordNet, KorLex (명사 어휘의미망을 활용한 문법 검사기의 문맥 오류 결정 규칙 일반화)

  • So, Gil-Ja;Lee, Seung-Hee;Kwon, Hyuk-Chul
    • The KIPS Transactions:PartB
    • /
    • v.18B no.6
    • /
    • pp.405-414
    • /
    • 2011
  • Korean grammar checkers typically detect context-dependent errors by employing heuristic rules that are manually formulated by a language expert. These rules are appended each time a new error pattern is detected. However, such grammar checkers are not consistent. In order to resolve this shortcoming, we propose new method for generalizing error decision rules to detect the above errors. For this purpose, we use an existing thesaurus KorLex, which is the Korean version of Princeton WordNet. KorLex has hierarchical word senses for nouns, but does not contain any information about the relationships between cases in a sentence. Through the Tree Cut Model and the MDL(minimum description length) model based on information theory, we extract noun classes from KorLex and generalize error decision rules from these noun classes. In order to verify the accuracy of the new method in an experiment, we extracted nouns used as an object of the four predicates usually confused from a large corpus, and subsequently extracted noun classes from these nouns. We found that the number of error decision rules generalized from these noun classes has decreased to about 64.8%. In conclusion, the precision of our grammar checker exceeds that of conventional ones by 6.2%.

Research on the Utilization of Recurrent Neural Networks for Automatic Generation of Korean Definitional Sentences of Technical Terms (기술 용어에 대한 한국어 정의 문장 자동 생성을 위한 순환 신경망 모델 활용 연구)

  • Choi, Garam;Kim, Han-Gook;Kim, Kwang-Hoon;Kim, You-eil;Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.51 no.4
    • /
    • pp.99-120
    • /
    • 2017
  • In order to develop a semiautomatic support system that allows researchers concerned to efficiently analyze the technical trends for the ever-growing industry and market. This paper introduces a couple of Korean sentence generation models that can automatically generate definitional statements as well as descriptions of technical terms and concepts. The proposed models are based on a deep learning model called LSTM (Long Sort-Term Memory) capable of effectively labeling textual sequences by taking into account the contextual relations of each item in the sequences. Our models take technical terms as inputs and can generate a broad range of heterogeneous textual descriptions that explain the concept of the terms. In the experiments using large-scale training collections, we confirmed that more accurate and reasonable sentences can be generated by CHAR-CNN-LSTM model that is a word-based LSTM exploiting character embeddings based on convolutional neural networks (CNN). The results of this study can be a force for developing an extension model that can generate a set of sentences covering the same subjects, and furthermore, we can implement an artificial intelligence model that automatically creates technical literature.

Semi-supervised domain adaptation using unlabeled data for end-to-end speech recognition (라벨이 없는 데이터를 사용한 종단간 음성인식기의 준교사 방식 도메인 적응)

  • Jeong, Hyeonjae;Goo, Jahyun;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.29-37
    • /
    • 2020
  • Recently, the neural network-based deep learning algorithm has dramatically improved performance compared to the classical Gaussian mixture model based hidden Markov model (GMM-HMM) automatic speech recognition (ASR) system. In addition, researches on end-to-end (E2E) speech recognition systems integrating language modeling and decoding processes have been actively conducted to better utilize the advantages of deep learning techniques. In general, E2E ASR systems consist of multiple layers of encoder-decoder structure with attention. Therefore, E2E ASR systems require data with a large amount of speech-text paired data in order to achieve good performance. Obtaining speech-text paired data requires a lot of human labor and time, and is a high barrier to building E2E ASR system. Therefore, there are previous studies that improve the performance of E2E ASR system using relatively small amount of speech-text paired data, but most studies have been conducted by using only speech-only data or text-only data. In this study, we proposed a semi-supervised training method that enables E2E ASR system to perform well in corpus in different domains by using both speech or text only data. The proposed method works effectively by adapting to different domains, showing good performance in the target domain and not degrading much in the source domain.

Development of Algorithm in Analysis of Single Trait Animal Model for Genetic Evaluation of Hanwoo (단형질 개체모형을 이용한 한우 육종가 추정프로그램 개발)

  • Koo, Yangmo;Kim, Jungil;Song, Chieun;Lee, Kihwan;Shin, Jaeyoung;Jang, Hyungi;Choi, Taejeong;Kim, Sidong;Park, Byoungho;Cho, Kwanghyun;Lee, Seungsoo;Choy, Yunho;Kim, Byeongwoo;Lee, Junggyu;Song, Hoon
    • Journal of Animal Science and Technology
    • /
    • v.55 no.5
    • /
    • pp.359-365
    • /
    • 2013
  • Estimate breeding value can be used as single trait animal model was developed directly using the Fortran language program. The program is based on data computed by using the indirect method repeatedly. The program develops a common algorithm and imprves efficiency. Algorithm efficiency was compared between the two programs. Estimated using the solution is easy to farm and brand the service, pedigree data base was associated with the development of an improved system. The existing program that uses the single trait animal model and the comparative analysis of efficiency is weak because the estimation of the solution and the conventional algorithm programmed through regular formulation involve many repetition; therefore, the newly developed algorithm was conducted to improve speed by reducing the repetition. Single trait animal model was used to analyze Gauss-Seidel iteration method, and the aforesaid two algorithms were compared thorough the mixed model equation which is used the most commonly in estimating the current breeding value by applying the procedures such as the preparation of information necessary for modelling, removal of duplicative data, verifying the parent information of based population in the pedigree data, and assigning sequential numbers, etc. The existing conventional algorithm is the method for reading and recording the data by utilizing the successive repetitive sentences, while new algorithm is the method for directly generating the left hand side for estimation based on effect. Two programs were developed to ensure the accurate evaluation. BLUPF90 and MTDFREML were compared using the estimated solution. In relation to the pearson and spearman correlation, the estimated breeding value correlation coefficients were highest among all traits over 99.5%. Depending on the breeding value of the high correlation in Model I and Model II, accurate evaluation can be found. The number of iteration to convergence was 2,568 in Model I and 1,038 in Model II. The speed of solving was 256.008 seconds in Model I and 235.729 seconds in Model II. Model II had a speed of approximately 10% more than Model I. Therefore, it is considered to be much more effective to analyze large data through the improved algorithm than the existing method. If the corresponding program is systemized and utilized for the consulting of farm and industrial services, it would make contribution to the early selection of individual, shorten the generation, and cultivation of superior groups, and help develop the Hanwoo industry further through the improvement of breeding value based enhancement, ultimately paving the way for the country to evolve into an advanced livestock country.

High-resolution Urban Flood Modeling using Cellular Automata-based WCA2D in the Oncheon-cheon Catchment in Busan, South Korea (셀룰러 오토마타 기반 WCA2D 모형을 이용한 부산 온천천 유역 고해상도 도시 침수 해석)

  • Choi, Hyeonjin;Lee, Songhee;Woo, Hyuna;Noh, Seong Jin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.5
    • /
    • pp.587-599
    • /
    • 2023
  • As climate change increasesthe frequency and risk of flooding in major cities around theworld, the importance ofsimulation technology that can quickly and accurately analyze high-resolution 2D flooding information in large-scale areasis emerging. The physically-based approaches based on the Shallow Water Equations (SWE) often requires huge computer resources hindering high-resolution flood prediction. This study investigated the theoretical background of Weighted Cellular Automata 2D (WCA2D), which simulates spatio-temporal changes offlooding using transition rules and weight-based system, and assessed feasibility to simulate pluvial flooding in the urbancatchment, theOncheon-cheon catchmentinBusan, SouthKorea.Inaddition,the computation performancewas compared by applying versions using OpenComputing Language (OpenCL) andOpenMulti-Processing (OpenMP) parallel computing techniques. Simulationresultsshowed that the maximuminundation depthmap by theWCA2Dmodel cansimilarly reproduce historical inundation maps. Also, it can precisely simulate spatio-temporal changes of flooding extent in the urban catchment with complex topographic characteristics. For computation efficiency, parallel computing schemes, theOpenCLandOpenMP, improved the computation by about 8~14 and 5~6 folds respectively, compared to the sequential computation.

Analysis of the Abstract Structure in Scientific Papers by Gifted Students and Exploring the Possibilities of Artificial Intelligence Applied to the Educational Setting (과학 영재의 논문 초록 구조 분석 및 이에 대한 인공지능의 활용 가능성 탐색)

  • Bongwoo Lee;Hunkoog Jho
    • Journal of The Korean Association For Science Education
    • /
    • v.43 no.6
    • /
    • pp.573-582
    • /
    • 2023
  • This study aimed to explore the potential use of artificial intelligence in science education for gifted students by analyzing the structure of abstracts written by students at a gifted science academy and comparing the performance of various elements extracted using AI. The study involved an analysis of 263 graduation theses from S Science High School over five years (2017-2021), focusing on the frequency and types of background, objectives, methods, results, and discussions included in their abstracts. This was followed by an evaluation of their accuracy using AI classification methods with fine-tuning and prompts. The results revealed that the frequency of elements in the abstracts written by gifted students followed the order of objectives, methods, results, background, and discussions. However, only 57.4% of the abstracts contained all the essential elements, such as objectives, methods, and results. Among these elements, fine-tuned AI classification showed the highest accuracy, with background, objectives, and results demonstrating relatively high performance, while methods and discussions were often inaccurately classified. These findings suggest the need for a more effective use of AI, through providing a better distribution of elements or appropriate datasets for training. Educational implications of these findings were also discussed.

Automated Story Generation with Image Captions and Recursiva Calls (이미지 캡션 및 재귀호출을 통한 스토리 생성 방법)

  • Isle Jeon;Dongha Jo;Mikyeong Moon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.1
    • /
    • pp.42-50
    • /
    • 2023
  • The development of technology has achieved digital innovation throughout the media industry, including production techniques and editing technologies, and has brought diversity in the form of consumer viewing through the OTT service and streaming era. The convergence of big data and deep learning networks automatically generated text in format such as news articles, novels, and scripts, but there were insufficient studies that reflected the author's intention and generated story with contextually smooth. In this paper, we describe the flow of pictures in the storyboard with image caption generation techniques, and the automatic generation of story-tailored scenarios through language models. Image caption using CNN and Attention Mechanism, we generate sentences describing pictures on the storyboard, and input the generated sentences into the artificial intelligence natural language processing model KoGPT-2 in order to automatically generate scenarios that meet the planning intention. Through this paper, the author's intention and story customized scenarios are created in large quantities to alleviate the pain of content creation, and artificial intelligence participates in the overall process of digital content production to activate media intelligence.

Numerical investigation of the impact of geological discontinuities on the propagation of ground vibrations

  • Haghnejad, Ali;Ahangari, Kaveh;Moarefvand, Parviz;Goshtasbi, Kamran
    • Geomechanics and Engineering
    • /
    • v.14 no.6
    • /
    • pp.545-552
    • /
    • 2018
  • Blast-induced ground vibrations by a significant amount of explosives may cause many problems for mining slope stability. Geological discontinuities have a significant influence on the transmission of dynamic pressure of detonation and according to their position relative to the slope face may have damaging or useful impacts on the slope stability. In this study, the effect of geological discontinuities was investigated by modelling a slope with geological discontinuities through applying the dynamic pressure in three-dimensional discrete element code (3DEC). The geological discontinuities in four states that generally apperceived in mine slopes are considered. Given the advantages of the pressure decay function defined by some researcher, this type of function was used to develop the pressure-time profile. The peak particle velocities (PPV) values were monitored along an axis by utilization of Fish programming language and the results were used as an indicator to measure the effects. As shown in the discontinuity-free model, PPV empirical models are reliable in rocks lacking discontinuities or tightly jointed rock masses. According to the other results, the empirical models cannot be used for the case where the rock mass contains discontinuities with any direction or dip. With regard to PPVs, when the direction of discontinuities is opposite to that of the slope face, the dynamic pressure of detonation is significantly damped toward the slope direction at the surface of discontinuities. On the other hand, when the discontinuities are horizontal, the dynamic pressure of detonation affects the rock mass to a large distance.

Crisis Prediction of Regional Industry Ecosystem based on Text Sentiment Analysis Using News Data - Focused on the Automobile Industry in Gwangju - (뉴스 데이터를 활용한 텍스트 감성분석에 따른 지역 산업생태계 위기 예측 - 광주 지역 자동차 산업을 중심으로 -)

  • Kim, Hyun-Ji;Kim, Sung-Jin;Kim, Han-Gook
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.8
    • /
    • pp.1-9
    • /
    • 2020
  • As the aging problem of the regional industry ecosystem has gradually become serious, research to measure and regenerate the regional industry ecosystem decline has been actively conducted. However, little research has been done on regional industry ecosystem crises. Crisis emerges radically over a short period of time, and it is often impossible to respond by post-response, so you must respond before the crisis occurs. In other words, it is more necessary and required when looking at the crisis early and taking a proactive response from a long-term perspective. Therefore, it is necessary to develop a predictive model that can proactively recognize and respond to the crisis in the regional industry ecosystem. Therefore, this study checked the possibility of predicting the risk of regional industry and market according to the emotional score of the news by using large-scale news data. News sentiment analysis was performed using the Google sentiment analysis API, and this was organized by month to check the correlation between actual events.

A Study on Implementation of Emotional Speech Synthesis System using Variable Prosody Model (가변 운율 모델링을 이용한 고음질 감정 음성합성기 구현에 관한 연구)

  • Min, So-Yeon;Na, Deok-Su
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.8
    • /
    • pp.3992-3998
    • /
    • 2013
  • This paper is related to the method of adding a emotional speech corpus to a high-quality large corpus based speech synthesizer, and generating various synthesized speech. We made the emotional speech corpus as a form which can be used in waveform concatenated speech synthesizer, and have implemented the speech synthesizer that can be generated various synthesized speech through the same synthetic unit selection process of normal speech synthesizer. We used a markup language for emotional input text. Emotional speech is generated when the input text is matched as much as the length of intonation phrase in emotional speech corpus, but in the other case normal speech is generated. The BIs(Break Index) of emotional speech is more irregular than normal speech. Therefore, it becomes difficult to use the BIs generated in a synthesizer as it is. In order to solve this problem we applied the Variable Break[3] modeling. We used the Japanese speech synthesizer for experiment. As a result we obtained the natural emotional synthesized speech using the break prediction module for normal speech synthesize.