• Title/Summary/Keyword: Domain Dictionary

Search Result 59, Processing Time 0.025 seconds

Sparse reconstruction of guided wavefield from limited measurements using compressed sensing

  • Qiao, Baijie;Mao, Zhu;Sun, Hao;Chen, Songmao;Chen, Xuefeng
    • Smart Structures and Systems
    • /
    • v.25 no.3
    • /
    • pp.369-384
    • /
    • 2020
  • A wavefield sparse reconstruction technique based on compressed sensing is developed in this work to dramatically reduce the number of measurements. Firstly, a severely underdetermined representation of guided wavefield at a snapshot is established in the spatial domain. Secondly, an optimal compressed sensing model of guided wavefield sparse reconstruction is established based on l1-norm penalty, where a suite of discrete cosine functions is selected as the dictionary to promote the sparsity. The regular, random and jittered undersampling schemes are compared and selected as the undersampling matrix of compressed sensing. Thirdly, a gradient projection method is employed to solve the compressed sensing model of wavefield sparse reconstruction from highly incomplete measurements. Finally, experiments with different excitation frequencies are conducted on an aluminum plate to verify the effectiveness of the proposed sparse reconstruction method, where a scanning laser Doppler vibrometer as the true benchmark is used to measure the original wavefield in a given inspection region. Experiments demonstrate that the missing wavefield data can be accurately reconstructed from less than 12% of the original measurements; The reconstruction accuracy of the jittered undersampling scheme is slightly higher than that of the random undersampling scheme in high probability, but the regular undersampling scheme fails to reconstruct the wavefield image; A quantified mapping relationship between the sparsity ratio and the recovery error over a special interval is established with respect to statistical modeling and analysis.

A Study on Named Entity Recognition for Effective Dialogue Information Prediction (효율적 대화 정보 예측을 위한 개체명 인식 연구)

  • Go, Myunghyun;Kim, Hakdong;Lim, Heonyeong;Lee, Yurim;Jee, Minkyu;Kim, Wonil
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.58-66
    • /
    • 2019
  • Recognition of named entity such as proper nouns in conversation sentences is the most fundamental and important field of study for efficient conversational information prediction. The most important part of a task-oriented dialogue system is to recognize what attributes an object in a conversation has. The named entity recognition model carries out recognition of the named entity through the preprocessing, word embedding, and prediction steps for the dialogue sentence. This study aims at using user - defined dictionary in preprocessing stage and finding optimal parameters at word embedding stage for efficient dialogue information prediction. In order to test the designed object name recognition model, we selected the field of daily chemical products and constructed the named entity recognition model that can be applied in the task-oriented dialogue system in the related domain.

Development of Online Fashion Thesaurus and Taxonomy for Text Mining (텍스트마이닝을 위한 패션 속성 분류체계 및 말뭉치 웹사전 구축)

  • Seyoon Jang;Ha Youn Kim;Songmee Kim;Woojin Choi;Jin Jeong;Yuri Lee
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.46 no.6
    • /
    • pp.1142-1160
    • /
    • 2022
  • Text data plays a significant role in understanding and analyzing trends in consumer, business, and social sectors. For text analysis, there must be a corpus that reflects specific domain knowledge. However, in the field of fashion, the professional corpus is insufficient. This study aims to develop a taxonomy and thesaurus that considers the specialty of fashion products. To this end, about 100,000 fashion vocabulary terms were collected by crawling text data from WSGN, Pantone, and online platforms; text subsequently was extracted through preprocessing with Python. The taxonomy was composed of items, silhouettes, details, styles, colors, textiles, and patterns/prints, which are seven attributes of clothes. The corpus was completed through processing synonyms of terms from fashion books such as dictionaries. Finally, 10,294 vocabulary words, including 1,956 standard Korean words, were classified in the taxonomy. All data was then developed into a web dictionary system. Quantitative and qualitative performance tests of the results were conducted through expert reviews. The performance of the thesaurus also was verified by comparing the results of text mining analysis through the previously developed corpus. This study contributes to achieving a text data standard and enables meaningful results of text mining analysis in the fashion field.

Recommending Core and Connecting Keywords of Research Area Using Social Network and Data Mining Techniques (소셜 네트워크와 데이터 마이닝 기법을 활용한 학문 분야 중심 및 융합 키워드 추천 서비스)

  • Cho, In-Dong;Kim, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.127-138
    • /
    • 2011
  • The core service of most research portal sites is providing relevant research papers to various researchers that match their research interests. This kind of service may only be effective and easy to use when a user can provide correct and concrete information about a paper such as the title, authors, and keywords. However, unfortunately, most users of this service are not acquainted with concrete bibliographic information. It implies that most users inevitably experience repeated trial and error attempts of keyword-based search. Especially, retrieving a relevant research paper is more difficult when a user is novice in the research domain and does not know appropriate keywords. In this case, a user should perform iterative searches as follows : i) perform an initial search with an arbitrary keyword, ii) acquire related keywords from the retrieved papers, and iii) perform another search again with the acquired keywords. This usage pattern implies that the level of service quality and user satisfaction of a portal site are strongly affected by the level of keyword management and searching mechanism. To overcome this kind of inefficiency, some leading research portal sites adopt the association rule mining-based keyword recommendation service that is similar to the product recommendation of online shopping malls. However, keyword recommendation only based on association analysis has limitation that it can show only a simple and direct relationship between two keywords. In other words, the association analysis itself is unable to present the complex relationships among many keywords in some adjacent research areas. To overcome this limitation, we propose the hybrid approach for establishing association network among keywords used in research papers. The keyword association network can be established by the following phases : i) a set of keywords specified in a certain paper are regarded as co-purchased items, ii) perform association analysis for the keywords and extract frequent patterns of keywords that satisfy predefined thresholds of confidence, support, and lift, and iii) schematize the frequent keyword patterns as a network to show the core keywords of each research area and connecting keywords among two or more research areas. To estimate the practical application of our approach, we performed a simple experiment with 600 keywords. The keywords are extracted from 131 research papers published in five prominent Korean journals in 2009. In the experiment, we used the SAS Enterprise Miner for association analysis and the R software for social network analysis. As the final outcome, we presented a network diagram and a cluster dendrogram for the keyword association network. We summarized the results in Section 4 of this paper. The main contribution of our proposed approach can be found in the following aspects : i) the keyword network can provide an initial roadmap of a research area to researchers who are novice in the domain, ii) a researcher can grasp the distribution of many keywords neighboring to a certain keyword, and iii) researchers can get some idea for converging different research areas by observing connecting keywords in the keyword association network. Further studies should include the following. First, the current version of our approach does not implement a standard meta-dictionary. For practical use, homonyms, synonyms, and multilingual problems should be resolved with a standard meta-dictionary. Additionally, more clear guidelines for clustering research areas and defining core and connecting keywords should be provided. Finally, intensive experiments not only on Korean research papers but also on international papers should be performed in further studies.

A Method for Prediction of Quality Defects in Manufacturing Using Natural Language Processing and Machine Learning (자연어 처리 및 기계학습을 활용한 제조업 현장의 품질 불량 예측 방법론)

  • Roh, Jeong-Min;Kim, Yongsung
    • Journal of Platform Technology
    • /
    • v.9 no.3
    • /
    • pp.52-62
    • /
    • 2021
  • Quality control is critical at manufacturing sites and is key to predicting the risk of quality defect before manufacturing. However, the reliability of manual quality control methods is affected by human and physical limitations because manufacturing processes vary across industries. These limitations become particularly obvious in domain areas with numerous manufacturing processes, such as the manufacture of major nuclear equipment. This study proposed a novel method for predicting the risk of quality defects by using natural language processing and machine learning. In this study, production data collected over 6 years at a factory that manufactures main equipment that is installed in nuclear power plants were used. In the preprocessing stage of text data, a mapping method was applied to the word dictionary so that domain knowledge could be appropriately reflected, and a hybrid algorithm, which combined n-gram, Term Frequency-Inverse Document Frequency, and Singular Value Decomposition, was constructed for sentence vectorization. Next, in the experiment to classify the risky processes resulting in poor quality, k-fold cross-validation was applied to categorize cases from Unigram to cumulative Trigram. Furthermore, for achieving objective experimental results, Naive Bayes and Support Vector Machine were used as classification algorithms and the maximum accuracy and F1-score of 0.7685 and 0.8641, respectively, were achieved. Thus, the proposed method is effective. The performance of the proposed method were compared and with votes of field engineers, and the results revealed that the proposed method outperformed field engineers. Thus, the method can be implemented for quality control at manufacturing sites.

Movie Recommended System base on Analysis for the User Review utilizing Ontology Visualization (온톨로지 시각화를 활용한 사용자 리뷰 분석 기반 영화 추천 시스템)

  • Mun, Seong Min;Kim, Gi Nam;Choi, Gyeong cheol;Lee, Kyung Won
    • Design Convergence Study
    • /
    • v.15 no.2
    • /
    • pp.347-368
    • /
    • 2016
  • Recently, researches for the word of mouth(WOM) imply that consumers use WOM informations of products in their purchase process. This study suggests methods using opinion mining and visualization to understand consumers' opinion of each goods and each markets. For this study we conduct research that includes developing domain ontology based on reviews confined to "movie" category because people who want to have watching movie refer other's movie reviews recently, and it is analyzed by opinion mining and visualization. It has differences comparing other researches as conducting attribution classification of evaluation factors and comprising verbal dictionary about evaluation factors when we conduct ontology process for analyzing. We want to prove through the result if research method will be valid. Results derived from this study can be largely divided into three. First, This research explains methods of developing domain ontology using keyword extraction and topic modeling. Second, We visualize reviews of each movie to understand overall audiences' opinion about specific movies. Third, We find clusters that consist of products which evaluated similar assessments in accordance with the evaluation results for the product. Case study of this research largely shows three clusters containing 130 movies that are used according to audiences'opinion.

Analysis of Terms on Panel Descriptions of the Domain for Astronomy at the Gwacheon National Science Museum (국립과천과학관의 천문영역 패널 설명의 용어 분석)

  • Yun, Hye-Ryun;Sohn, Jungjoo
    • Journal of Science Education
    • /
    • v.36 no.2
    • /
    • pp.329-340
    • /
    • 2012
  • The purpose of this study is to analyze the terms which were described in panels for astronomic article on exhibition at the Gwacheon National Science Museum, and to clarify that the terms were appropriate and easily understandable or not. In research, totally, 965 terms were collected in 52 panels(14 panels in planetarium, 17 panels in national history part, and 21 panels in traditional science part). All terms were categorized to 4types, as 1.Standard/Scientific terms, 2.Non-Standard/Scientific terms, 3.Standard/Non-Scientific terms, 4. Non-Standanrd/Non-Scientific casual words, based on 'Dictionary of Standard Korean' and 'Terminology of Astronomy'. And questionnaires survey was done to 24 in-service teachers at elementary school, middle school, and high school to clarify that the level of the terms are appropriate to students. The results of this study show that accurate scientific terms were 68.5%, and many of students had difficulty in understanding those scientific terms in the panels because of unfamiliarity. Therefore, in order to make students get more interest and better understanding, it is proposed to minimize scientific terms and to substitute them to casual terms which were related with practical life.

  • PDF

A Study on the Integration of Information Extraction Technology for Detecting Scientific Core Entities based on Large Resources (대용량 자원 기반 과학기술 핵심개체 탐지를 위한 정보추출기술 통합에 관한 연구)

  • Choi, Yun-Soo;Cheong, Chang-Hoo;Choi, Sung-Pil;You, Beom-Jong;Kim, Jae-Hoon
    • Journal of Information Management
    • /
    • v.40 no.4
    • /
    • pp.1-22
    • /
    • 2009
  • Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In this study, we define scientific as a set of 10 types of named entities and technical terminologies in a biomedical domain. in order to automatically extract these entities from scientific documents at once, we develop a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer, co-reference resolver and terminology extractor. Each module of the integrated system has been evaluated with various corpus as well as KEEC 2009. The system will be utilized for various information service areas such as information retrieval, question-answering(Q&A), document indexing, dictionary construction, and so on.

A Study on the Understanding of the Base Area of Solid Figures in the Elementary Mathematics (초등수학에서 입체도형의 밑넓이 이해에 대한 연구)

  • Kim, Sung Joon
    • Journal of the Korean School Mathematics Society
    • /
    • v.17 no.2
    • /
    • pp.167-191
    • /
    • 2014
  • In this study, we investigate the term-sets of 'base' or 'bottom': 'the bottom side of a polygon' and 'the base side (of a geometrical figure)'. And we study the concept of 'the base area' in the solid figures and the formula of 'the bottom dimensions'. We start from the 6th grade math problem: 'Find the bottom dimension of the rectangular.' The primary answer is that it does not use the term('the bottom dimensions') in the elementary mathematics. However, in the middle school mathematics, 'the base area' is used as means of 'the area of one bottom side', which is not explained anywhere from the elementary mathematics to middle school mathematics. In addition, the base is defined and 'the surface area' and 'the side area' is taught in the elementary mathematics, so we naturally think of 'the base area'. Therefore we first investigate the term-sets of 'base' or 'bottom' which has two elements: 'the bottom side of a polygon' and 'the base side (of a geometrical figure)'. Next we discuss 'the base area' through curriculum and textbooks, dictionary definitions and so on. In addition, we survey pre-service teachers and teachers about the solid figures and analyse the understanding of 'the base side' and 'the base area' comparatively. In particular, we compare the changes and the tendency of correct answers from the first question to the last question. As a result, we verify 'the cognitive gap' between the elementary mathematics and the middle school mathematics, we suggest the teaching of 'the base area' and succession subjects to teach figure domain in the elementary mathematics.

  • PDF