• Title/Summary/Keyword: vocabulary data

Search Result 285, Processing Time 0.027 seconds

Utilizing Deep Learning for Early Diagnosis of Autism: Detecting Self-Stimulatory Behavior

  • Seongwoo Park;Sukbeom Chang;JooHee Oh
    • International Journal of Advanced Culture Technology
    • /
    • v.12 no.3
    • /
    • pp.148-158
    • /
    • 2024
  • We investigate Autism Spectrum Disorder (ASD), which is typified by deficits in social interaction, repetitive behaviors, limited vocabulary, and cognitive delays. Traditional diagnostic methodologies, reliant on expert evaluations, frequently result in deferred detection and intervention, particularly in South Korea, where there is a dearth of qualified professionals and limited public awareness. In this study, we employ advanced deep learning algorithms to enhance early ASD screening through automated video analysis. Utilizing architectures such as Convolutional Long Short-Term Memory (ConvLSTM), Long-term Recurrent Convolutional Network (LRCN), and Convolutional Neural Networks with Gated Recurrent Units (CNN+GRU), we analyze video data from platforms like YouTube and TikTok to identify stereotypic behaviors (arm flapping, head banging, spinning). Our results indicate that the LRCN model exhibited superior performance with 79.61% accuracy on the augmented platform video dataset and 79.37% on the original SSBD dataset. The ConvLSTM and CNN+GRU models also achieved higher accuracy than the original SSBD dataset. Through this research, we underscore AI's potential in early ASD detection by automating the identification of stereotypic behaviors, thereby enabling timely intervention. We also emphasize the significance of utilizing expanded datasets from social media platform videos in augmenting model accuracy and robustness, thus paving the way for more accessible diagnostic methods.

A Study on Recognition Units and Methods to Align Training Data for Korean Speech Recognition) (한국어 인식을 위한 인식 단위와 학습 데이터 분류 방법에 대한 연구)

  • 황영수
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.2
    • /
    • pp.40-45
    • /
    • 2003
  • This is the study on recognition units and segmentation of phonemes. In the case of making large vocabulary speech recognition system, it is better to use the segment than the syllable or the word as the recognition unit. In this paper, we study on the proper recognition units and segmentation of phonemes for Korean speech recognition. For experiments, we use the speech toolkit of OGI in U.S.A. The result shows that the recognition rate of the case in which the diphthong is established as a single unit is superior to that of the case in which the diphthong is established as two units, i.e. a glide plus a vowel. And recognizer using manually-aligned training data is a little superior to that using automatically-aligned training data. Also, the recognition rate of the case in which the bipbone is used as the recognition unit is better than that of the case in which the mono-Phoneme is used.

  • PDF

Inverse Document Frequency-Based Word Embedding of Unseen Words for Question Answering Systems (질의응답 시스템에서 처음 보는 단어의 역문헌빈도 기반 단어 임베딩 기법)

  • Lee, Wooin;Song, Gwangho;Shim, Kyuseok
    • Journal of KIISE
    • /
    • v.43 no.8
    • /
    • pp.902-909
    • /
    • 2016
  • Question answering system (QA system) is a system that finds an actual answer to the question posed by a user, whereas a typical search engine would only find the links to the relevant documents. Recent works related to the open domain QA systems are receiving much attention in the fields of natural language processing, artificial intelligence, and data mining. However, the prior works on QA systems simply replace all words that are not in the training data with a single token, even though such unseen words are likely to play crucial roles in differentiating the candidate answers from the actual answers. In this paper, we propose a method to compute vectors of such unseen words by taking into account the context in which the words have occurred. Next, we also propose a model which utilizes inverse document frequencies (IDF) to efficiently process unseen words by expanding the system's vocabulary. Finally, we validate that the proposed method and model improve the performance of a QA system through experiments.

Reliability in longitudinal study (종단적 연구의 신뢰도)

  • Jinuk Kim
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.1
    • /
    • pp.61-72
    • /
    • 2024
  • The purpose of this study is to investigate retest reliabilities in longitudinal study, the same test is administered repeatedly over time. Linear mixed models were used to establish various situations of tests occurred in longitudinal study. Combination of two types of true value and three types of systematic error was considered. In order to apply the models to real longitudinal data, height data from the Berkeley growth study and vocabulary score data from the University of Chicago experimental school were used. Using the mixed model, there is an advantage that the reliability can be determined by selecting the covariance structure of the true value and the error separately. However, in order to properly analyze the reliability, researchers need to consider variations that can occur in measurement, such as characteristics of subject, the test, and the the treatment applied in the study. And the proper model should be selected and the quality of the measurement should be evaluated for each trial.

Improvements of an English Pronunciation Dictionary Generator Using DP-based Lexicon Pre-processing and Context-dependent Grapheme-to-phoneme MLP (DP 알고리즘에 의한 발음사전 전처리와 문맥종속 자소별 MLP를 이용한 영어 발음사전 생성기의 개선)

  • 김회린;문광식;이영직;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.21-27
    • /
    • 1999
  • In this paper, we propose an improved MLP-based English pronunciation dictionary generator to apply to the variable vocabulary word recognizer. The variable vocabulary word recognizer can process any words specified in Korean word lexicon dynamically determined according to the current recognition task. To extend the ability of the system to task for English words, it is necessary to build a pronunciation dictionary generator to be able to process words not included in a predefined lexicon, such as proper nouns. In order to build the English pronunciation dictionary generator, we use context-dependent grapheme-to-phoneme multi-layer perceptron(MLP) architecture for each grapheme. To train each MLP, it is necessary to obtain grapheme-to-phoneme training data from general pronunciation dictionary. To automate the process, we use dynamic programming(DP) algorithm with some distance metrics. For training and testing the grapheme-to-phoneme MLPs, we use general English pronunciation dictionary with about 110 thousand words. With 26 MLPs each having 30 to 50 hidden nodes and the exception grapheme lexicon, we obtained the word accuracy of 72.8% for the 110 thousand words superior to rule-based method showing the word accuracy of 24.0%.

  • PDF

Comparative Study on Public Health Facility Color Image Vocabulary among Countries -Focusing on korea and Romania- (공공보건시설 색채이미지에 대한 국가간 인식 비교 -한국과 루마니아 중심으로-)

  • Park, Heykyung;Adelean, Ioana;Kim, Hyeyeong;Oh, Jiyoung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.3
    • /
    • pp.185-191
    • /
    • 2020
  • This study aims to understand the differences in cultural and emotional perceptions about the color image of public healthcare facilities in Romania, an Eastern European country that is relatively lacking in recognition but is gradually expanding trade. For this, color images were selected through a review of previous studies, and a questionnaire survey was constructed based on the colorimetric data by visiting 8 public healthcare facilities such as medical facilities, 4 social sports facilities, and 8 nursing facilities. An online survey was conducted on the color image of public facilities with 89 Koreans and 86 Romanians, and frequency and cross-analysis was conducted using the SPSS statistical analysis program to examine the color images of public healthcare facilities of Koreans and Romanians. The difference in perception was identified. As a result, it was found that there was a statistically significant difference in the perception of color images of public healthcare facilities between countries in vocabulary evaluation and image evaluation, and this was interpreted as different meanings for groups residing in different cultures. Therefore, it implies that cultural differences in perception should be considered when establishing an environment related to this in the future.

Effects of low-dose topiramate on language function in children with migraine

  • Han, Seung-A;Yang, Eu Jeen;Kong, Younghwa;Joo, Chan-Uhng;Kim, Sun Jun
    • Clinical and Experimental Pediatrics
    • /
    • v.60 no.7
    • /
    • pp.227-231
    • /
    • 2017
  • Purpose: This study aimed to verify the safety of low-dose topiramate on language development in pediatric patients with migraine. Methods: Thirty newly diagnosed pediatric patients with migraine who needed topiramate were enrolled and assessed twice with standard language tests, including the Test of Language Problem Solving Abilities (TOPs), Receptive and Expressive Vocabulary Test, Urimal Test of Articulation and Phonology, and computerized speech laboratory analysis. Data were collected before treatment, and topiramate as monotherapy was sustained for at least 3 months. The mean follow-up period was $4.3{\pm}2.7months$. The mean topiramate dosage was 0.9 mg/kg/day. Results: The patient's mean age was $144.1{\pm}42.3months$ (male-to-female ratio, 9:21). The values of all the language parameters of the TOPs were not changed significantly after the topiramate treatment as follows: Determine cause, from $15.0{\pm}4.4$ to $15.4{\pm}4.8$ (P>0.05); making inference, from $17.6{\pm}5.6$ to $17.5{\pm}6.6$ (P>0.05); predicting, from $11.5{\pm}4.5$ to $12.3{\pm}4.0$ (P>0.05); and total TOPs score, from $44.1{\pm}13.4$ to $45.3{\pm}13.6$ (P>0.05). The total mean length of utterance in words during the test decreased from $44.1{\pm}13.4$ to $45.3{\pm}13.6$ (P<0.05). The Receptive and Expressive Vocabulary Test results decreased from $97.7{\pm}22.1$ to $96.3{\pm}19.9months$, and from $81.8{\pm}23.4$ to $82.3{\pm}25.4months$, respectively (P>0.05). In the articulation and phonology validation in both groups, speech pitch and energy were not significant, and all the vowel test results showed no other significant values. Conclusion: No significant difference was found in the language-speaking ability between the patients; however, the number of vocabularies used decreased. Therefore, topiramate should be used cautiously for children with migraine.

Analysis of Research Trends in Korean English Education Journals Using Topic Modeling (토픽 모델링을 활용한 한국 영어교육 학술지에 나타난 연구동향 분석)

  • Won, Yongkook;Kim, Youngwoo
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.4
    • /
    • pp.50-59
    • /
    • 2021
  • To understand the research trends of English education in Korea for the last 20 years from 2000 to 2019, 12 major academic journals in Korea in the field of English education were selected, and bibliographic information of 7,329 articles published in these journals were collected and analyzed. The total number of articles increased from the 2000s to the first half of the 2010s, but decreased somewhat in the late 2010s and the number of publications by journal has become similar. These results show that the overall influence of English education journals has decreased and then leveled in terms of quantity. Next, 34 topics were extracted by applying latent Dirichlet allocation (LDA) topic modeling using the English abstract of the articles. Teacher, word, culture/media, and grammar appeared as topics that were highly studied. Topics such as word, vocabulary, and testing and evaluation appeared through unique keywords, and various topics related to learner factors emerged, becoming topics of interest in English education research. Then, topics were analyzed to determine which ones were rising or falling in frequency. As a result of this analysis, qualitative research, vocabulary, learner factor, and testing were found to be rising topics, while falling topics included CALL, language, teaching, and grammar. This change in research topics shows that research interests in the field of English education are shifting from static research topics to data-driven and dynamic research topics.

Nonlinear Vector Alignment Methodology for Mapping Domain-Specific Terminology into General Space (전문어의 범용 공간 매핑을 위한 비선형 벡터 정렬 방법론)

  • Kim, Junwoo;Yoon, Byungho;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.127-146
    • /
    • 2022
  • Recently, as word embedding has shown excellent performance in various tasks of deep learning-based natural language processing, researches on the advancement and application of word, sentence, and document embedding are being actively conducted. Among them, cross-language transfer, which enables semantic exchange between different languages, is growing simultaneously with the development of embedding models. Academia's interests in vector alignment are growing with the expectation that it can be applied to various embedding-based analysis. In particular, vector alignment is expected to be applied to mapping between specialized domains and generalized domains. In other words, it is expected that it will be possible to map the vocabulary of specialized fields such as R&D, medicine, and law into the space of the pre-trained language model learned with huge volume of general-purpose documents, or provide a clue for mapping vocabulary between mutually different specialized fields. However, since linear-based vector alignment which has been mainly studied in academia basically assumes statistical linearity, it tends to simplify the vector space. This essentially assumes that different types of vector spaces are geometrically similar, which yields a limitation that it causes inevitable distortion in the alignment process. To overcome this limitation, we propose a deep learning-based vector alignment methodology that effectively learns the nonlinearity of data. The proposed methodology consists of sequential learning of a skip-connected autoencoder and a regression model to align the specialized word embedding expressed in each space to the general embedding space. Finally, through the inference of the two trained models, the specialized vocabulary can be aligned in the general space. To verify the performance of the proposed methodology, an experiment was performed on a total of 77,578 documents in the field of 'health care' among national R&D tasks performed from 2011 to 2020. As a result, it was confirmed that the proposed methodology showed superior performance in terms of cosine similarity compared to the existing linear vector alignment.

A Preliminary Study on Extending OAK Metadata for Research Data (연구데이터 관리를 위한 OAK 메타데이터 확장 방안 연구)

  • Lee, Mihwa;Lee, Eun-Ju;Rho, Jee-Hyun
    • Journal of Korean Library and Information Science Society
    • /
    • v.51 no.3
    • /
    • pp.27-51
    • /
    • 2020
  • This study aims to propose an extended OAK metadata for research data that would be described in OAK, an open access repository of the National Library of Korea. As a research method, literature review, case studies, and interviews with related parties were conducted. The method of extending the existing OAK metadata for research data was derived as follows. First, in modeling for research data, the structure of the collection> item> file is maintained, the collection is placed as a higher group to which the research data can be grouped, and item was combined metadata and files or digital objects of various formats together. Second, by mapping the metadata standard and case organizations with the existing OAK metadata, elements judged to need to be extended to OAK for research data were selected and reflected in the existing OAK. Third, the controlled vocabulary and syntax are also proposed so that it can be used for search or later statistics through structured data. By expanding the OAK metadata to describe research data, research data produced in Korea can be officially stored and used, which is the basis for preventing duplication of research and sharing and recycling research results nationally.