• Title/Summary/Keyword: Word language model

Search Result 263, Processing Time 0.026 seconds

The Parallel Corpus Approach to Building the Syntactic Tree Transfer Set in the English-to- Vietnamese Machine Translation

  • Dien Dinh;Ngan Thuy;Quang Xuan;Nam Chi
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.382-386
    • /
    • 2004
  • Recently, with the machine learning trend, most of the machine translation systems on over the world use two syntax tree sets of two relevant languages to learn syntactic tree transfer rules. However, for the English-Vietnamese language pair, this approach is impossible because until now we have not had a Vietnamese syntactic tree set which is correspondent to English one. Building of a very large correspondent Vietnamese syntactic tree set (thousands of trees) requires so much work and take the investment of specialists in linguistics. To take advantage from our available English-Vietnamese Corpus (EVC) which was tagged in word alignment, we choose the SITG (Stochastic Inversion Transduction Grammar) model to construct English- Vietnamese syntactic tree sets automatically. This model is used to parse two languages at the same time and then carry out the syntactic tree transfer. This English-Vietnamese bilingual syntactic tree set is the basic training data to carry out transferring automatically from English syntactic trees to Vietnamese ones by machine learning models. We tested the syntax analysis by comparing over 10,000 sentences in the amount of 500,000 sentences of our English-Vietnamese bilingual corpus and first stage got encouraging result $(analyzed\;about\;80\%)[5].$ We have made use the TBL algorithm (Transformation Based Learning) to carry out automatic transformations from English syntactic trees to Vietnamese ones based on that parallel syntactic tree transfer set[6].

  • PDF

An automatic pronunciation evaluation system using non-native teacher's speech model (비원어민 교수자 음성모델을 이용한 자동발음평가 시스템)

  • Park, Hye-bin;Kim, Dong Heon;Joung, Jinoo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.131-136
    • /
    • 2016
  • An appropriate evaluation on learner's pronunciation has been an important part of foreign language education. The learners should be evaluated and receive proper feedback for pronunciation improvement. Due to the cost and consistency problem of human evaluation, automatic pronunciation evaluation system has been studied. The most of the current automatic evaluation systems utilizes underlying Automatic Speech Recognition (ASR) technology. We suggest in this work to evaluate learner's pronunciation accuracy and fluency in word-level using the ASR and non-native teacher's speech model. Through the performance evaluation on our system, we confirm the overall evaluation result of pronunciation accuracy and fluency actually represents the learner's English skill level quite accurately.

Development Web-based Arabic Assessments for Deaf and Hard-of-Hearing Students

  • Atwan, Jaffar;Wedyan, Mohammad;Abbas, Abdallah;Gazzawe, Foziah;Alturki, Ryan
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.359-367
    • /
    • 2022
  • Arabic skills are the tools by which children are prepared for the educational procedures on which their life depends. Deaf and hard of hearing students (DHH), must be able to grasp the same Arabic terms as hearing students and their different meanings in a context of different sentences less than what they are supposed to be due to their inability. However, problems arise in the same Arabic word and their different meanings in a context for (DHH) students since the way of comprehending such words does not meet the needs and circumstances of (DHH) students. Therefore, researchers introduce web-based method for Arabic words and their meanings in a context prototype that can overcome those problems. Methodology: The study sample consists of 30 (DHH) students at Al Amal City of Palestine, Gaza Region (GR). Those participants that agreed to take part in this study were recruited using a purposeful sampling method. Additionally, to examine the survey information descriptively, the Statistical Packages for social Sciences (SPSS) version 24.0 was used. A sign language teaching movie is utilized in the prototype to standardize the process and verify that Arabic vocabulary and their implications are comprehended. The Evolutionary Process Model of Prototype technique was utilized to create this system. Finding: The findings of this study show that the prototype built is workable and has the ability to help DHHS differentiate between phrases that have the same letters but distinct meanings. The findings of this study are expected to contribute to a better understanding and application of Development of Web-based Arabic Assessments for (DHH) Students in developing countries, which will help to increase the use of Development of Web-based Arabic for (HDD) students in those countries. The empirical models of Web-based Arabic for (DHH) students are established as a proof of concept for the proposed model. The results of this study are predicted to have a significant impact to the information system practitioners and to the body of knowledge.

Design of a Deep Neural Network Model for Image Caption Generation (이미지 캡션 생성을 위한 심층 신경망 모델의 설계)

  • Kim, Dongha;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.4
    • /
    • pp.203-210
    • /
    • 2017
  • In this paper, we propose an effective neural network model for image caption generation and model transfer. This model is a kind of multi-modal recurrent neural network models. It consists of five distinct layers: a convolution neural network layer for extracting visual information from images, an embedding layer for converting each word into a low dimensional feature, a recurrent neural network layer for learning caption sentence structure, and a multi-modal layer for combining visual and language information. In this model, the recurrent neural network layer is constructed by LSTM units, which are well known to be effective for learning and transferring sequence patterns. Moreover, this model has a unique structure in which the output of the convolution neural network layer is linked not only to the input of the initial state of the recurrent neural network layer but also to the input of the multimodal layer, in order to make use of visual information extracted from the image at each recurrent step for generating the corresponding textual caption. Through various comparative experiments using open data sets such as Flickr8k, Flickr30k, and MSCOCO, we demonstrated the proposed multimodal recurrent neural network model has high performance in terms of caption accuracy and model transfer effect.

Automatic Target Recognition Study using Knowledge Graph and Deep Learning Models for Text and Image data (지식 그래프와 딥러닝 모델 기반 텍스트와 이미지 데이터를 활용한 자동 표적 인식 방법 연구)

  • Kim, Jongmo;Lee, Jeongbin;Jeon, Hocheol;Sohn, Mye
    • Journal of Internet Computing and Services
    • /
    • v.23 no.5
    • /
    • pp.145-154
    • /
    • 2022
  • Automatic Target Recognition (ATR) technology is emerging as a core technology of Future Combat Systems (FCS). Conventional ATR is performed based on IMINT (image information) collected from the SAR sensor, and various image-based deep learning models are used. However, with the development of IT and sensing technology, even though data/information related to ATR is expanding to HUMINT (human information) and SIGINT (signal information), ATR still contains image oriented IMINT data only is being used. In complex and diversified battlefield situations, it is difficult to guarantee high-level ATR accuracy and generalization performance with image data alone. Therefore, we propose a knowledge graph-based ATR method that can utilize image and text data simultaneously in this paper. The main idea of the knowledge graph and deep model-based ATR method is to convert the ATR image and text into graphs according to the characteristics of each data, align it to the knowledge graph, and connect the heterogeneous ATR data through the knowledge graph. In order to convert the ATR image into a graph, an object-tag graph consisting of object tags as nodes is generated from the image by using the pre-trained image object recognition model and the vocabulary of the knowledge graph. On the other hand, the ATR text uses the pre-trained language model, TF-IDF, co-occurrence word graph, and the vocabulary of knowledge graph to generate a word graph composed of nodes with key vocabulary for the ATR. The generated two types of graphs are connected to the knowledge graph using the entity alignment model for improvement of the ATR performance from images and texts. To prove the superiority of the proposed method, 227 documents from web documents and 61,714 RDF triples from dbpedia were collected, and comparison experiments were performed on precision, recall, and f1-score in a perspective of the entity alignment..

Development and Validation of the Letter-unit based Korean Sentimental Analysis Model Using Convolution Neural Network (회선 신경망을 활용한 자모 단위 한국형 감성 분석 모델 개발 및 검증)

  • Sung, Wonkyung;An, Jaeyoung;Lee, Choong C.
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.1
    • /
    • pp.13-33
    • /
    • 2020
  • This study proposes a Korean sentimental analysis algorithm that utilizes a letter-unit embedding and convolutional neural networks. Sentimental analysis is a natural language processing technique for subjective data analysis, such as a person's attitude, opinion, and propensity, as shown in the text. Recently, Korean sentimental analysis research has been steadily increased. However, it has failed to use a general-purpose sentimental dictionary and has built-up and used its own sentimental dictionary in each field. The problem with this phenomenon is that it does not conform to the characteristics of Korean. In this study, we have developed a model for analyzing emotions by producing syllable vectors based on the onset, peak, and coda, excluding morphology analysis during the emotional analysis procedure. As a result, we were able to minimize the problem of word learning and the problem of unregistered words, and the accuracy of the model was 88%. The model is less influenced by the unstructured nature of the input data and allows for polarized classification according to the context of the text. We hope that through this developed model will be easier for non-experts who wish to perform Korean sentimental analysis.

KorLexClas 1.5: A Lexical Semantic Network for Korean Numeral Classifiers (한국어 수분류사 어휘의미망 KorLexClas 1.5)

  • Hwang, Soon-Hee;Kwon, Hyuk-Chul;Yoon, Ae-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.1
    • /
    • pp.60-73
    • /
    • 2010
  • This paper aims to describe KorLexClas 1.5 which provides us with a very large list of Korean numeral classifiers, and with the co-occurring noun categories that select each numeral classifier. Differently from KorLex of other POS, of which the structure depends largely on their reference model (Princeton WordNet), KorLexClas 1.0 and its extended version 1.5 adopt a direct building method. They demand a considerable time and expert knowledge to establish the hierarchies of numeral classifiers and the relationships between lexical items. For the efficiency of construction as well as the reliability of KorLexClas 1.5, we use following processes: (1) to use various language resources while their cross-checking for the selection of classifier candidates; (2) to extend the list of numeral classifiers by using a shallow parsing techniques; (3) to set up the hierarchies of the numeral classifiers based on the previous linguistic studies; and (4) to determine LUB(Least Upper Bound) of the numeral classifiers in KorLexNoun 1.5. The last process provides the open list of the co-occurring nouns for KorLexClas 1.5 with the extensibility. KorLexClas 1.5 is expected to be used in a variety of NLP applications, including MT.

A Study on Christian Educational Implications for 6 Key Competencies of 2015 Revised National Curriculum (2015 개정 교육과정의 6개 핵심역량에 대한 기독교교육적 함의)

  • Seo, Mikyoung
    • Journal of Christian Education in Korea
    • /
    • v.63
    • /
    • pp.221-253
    • /
    • 2020
  • The purpose of this study is to define the key competency as Christian(in another word: Christian key competency) and to interpret the six key competencies of the 2015 revised curriculum in a Christian educational way. Also as an alternative to the key competencies model of the 2015 revised curriculum, this study aims to materialize the formation of a Christian key competencies model based on Christian faith. This study derived 'faith' from the key competencies as Christian throughout preceding research analysis. The 'faith' of the key competencies as Christian means the ability to know oneself, and to know the world and God within the knowledge of the Bible (knowledge of God) in the personal relationship with God, and also it is the ability to think, judge, and act with biblical values, Christian world view, and Christian self-identity. The key competency 'faith' could be the basis (standard) of motivation, attitude and the value of all competencies in cultivation and exercise. The model of Christian key competencies has the structure in which each six key competencies become to be cultivated based on the Christian key competency called "faith." Based on the structure, the six key competencies of the 2015 revised curriculum were interpreted and explained from the perspective of Christian education. In the self-management competency, self-identity can be correctly formed in relations with transcendent God. In aesthetic emotional competency, the empathic understanding of human beings comes from the understanding of the image of God, the supreme beauty, the source of beauty. About the community competency, human community is the source of God who created the universe, human and all of things. It is because a Christian community is a community within the relationship of Trinity God, Nature and others. Therefore regions, countries, and the world become one community. Communication competency first stem from good attitudes toward yourself and others with respectful mind. It comes from an understanding of Christian human beings. Also, there is a need of having a common language for communications. The common language is the Bible that given to us for our communicative companionship. Through the language of the Bible, God made us to know about God, human being and the creative world, and also made us to continue to communicate with God, others and the world. For having the knowledge-information processing competency, a standard of value for the processing and utilization of knowledge and information is required. The standard should be the basis of moral and ethical values for human respect. About creative thinking competency, the source of creativity is God who created the world. Human beings who have the image of God own creative potential. As well as, creativity has different expression forms depending on individual preferences and interests, and different approaches will be made depending on each individual's importance and achievement. Individual creativity can be found through education, and it can be embodied by converging knowledge, skills and experience.

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.

A Study on Research Trends in Literacy Education through a Key word Network Analysis (키워드 네트워크 분석을 통한 리터러시 교육 연구 동향)

  • Lee, Woo-Jin;Baek, Hye-Jin
    • Journal of Digital Convergence
    • /
    • v.20 no.5
    • /
    • pp.53-59
    • /
    • 2022
  • The purpose of this study is to examine the factors related to learning through analysis of domestic research trends in literacy and to present the direction of literacy education. Research papers from 1993 to February 2022 were collected using RISS. 'Literacy' and 'Education' were used as search keywords, and 200 papers were selected for analysis. As a result of analysis using keyword network analysis, 118 keywords appeared at least three times out of a total of 810 keywords. The order of the keywords with the highest frequency is 'digital literacy', 'media literacy', and 'elementary school'. The following direction was suggested through the analysis results. First, it is required to establish an online teaching and learning resource platform and link it with education policy. Second, it is necessary to set literacy competencies and seek ways to improve competencies. Third, a digital-based convergence education model should be developed. This study is meaningful in that it analyzed the most recent literacy studies and suggested the direction of literacy education.