• Title/Summary/Keyword: Text-search

Search Result 554, Processing Time 0.022 seconds

A Study on Educational Data Mining for Public Data Portal through Topic Modeling Method with Latent Dirichlet Allocation (LDA기반 토픽모델링을 활용한 공공데이터 기반의 교육용 데이터마이닝 연구)

  • Seungki Shin
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.5
    • /
    • pp.439-448
    • /
    • 2022
  • This study aims to search for education-related datasets provided by public data portals and examine what data types are constructed through classification using topic modeling methods. Regarding the data of the public data portal, 3,072 cases of file data in the education field were collected based on the classification system. Text mining analysis was performed using the LDA-based topic modeling method with stopword processing and data pre-processing for each dataset. Program information and student-supporting notifications were usually provided in the pre-classified dataset for education from the data portal. On the other hand, the characteristics of educational programs and supporting information for the disabled, parents, the elderly, and children through the perspective of lifelong education were generally indicated in the dataset collected by searching for education. The results of data analysis through this study show that providing sufficient educational information through the public data portal would be better to help the students' data science-based decision-making and problem-solving skills.

An Automatic Cosmetic Ingredient Analysis System based on Text Recognition Techniques (텍스트 인식 기법에 기반한 화장품 성분 자동 분석 시스템)

  • Ye-Won Kim;Sun-Mi Hong;Seong-Yong Ohm
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.1
    • /
    • pp.565-570
    • /
    • 2023
  • There are people who are sensitive to cosmetic ingredients, such as pregnant women and skin disease patients. There are also people who experience side effects from cosmetics. To avoid this, it is cumbersome to search for harmful ingredients in cosmetics one by one when shopping. In addition, knowing and remembering functional ingredients that suit you is helpful when purchasing new cosmetics. There is a need for a system that allows you to immediately know the cosmetics ingredients in the field through photography. In this paper, we introduce an application for smartphones, <Hwa Ahn>, which allows you to immediately know the cosmetics ingredients by photographing the ingredients displayed in the cosmetics. This system is more effective and convenient than the existing system in that it automatically recognizes and automatically classifies the ingredients of the cosmetic when the camera is illuminated on the cosmetic ingredients or retrieves the photos of the cosmetic ingredients from the album. If the system is widely used, it is expected that it will prevent skin diseases caused by cosmetics in daily life and reduce purchases of cosmetics that are not suitable for you.

A study on the current status of DIY clothing products related to fabric using text mining (텍스트마이닝을 활용한 패브릭 관련 DIY 의류 상품 현황 연구)

  • Eun-Hye Lee;Ha-Eun Lee;Jeong-Wook Choi
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.25 no.2
    • /
    • pp.111-122
    • /
    • 2023
  • This study aims to collect Big Data related to DIY clothing, analyze the results on a year-by-year basis, understand consumers' perceptions, the status, and reality of DIY clothing. The reference period for the evaluation of DIY clothing trends was set from 2012 to 2022. The data in this study was collected and analyzed using Textom, a Big Data solution program certified as a Good Software by the Telecommunications Technology Association (TTA). For the analysis of fabric-related DIY products, the keyword was set to "DIY clothing", and for data cleansing following collection, the "Espresso K" module was employed. Also, via data collection on a year-by-year basis, a total of 11 lists were generated and the collected data was analyzed by period. The following are the findings of this study's data collection on DIY clothing. The total number of keywords collected over a period of ten years on search engines "Naver" and "Google" between January 1, 2012 and December 31, 2022 was 16,315, and data trends by period indicate a continuous upward trend. In addition, a keyword analysis was conducted to analyze TF-IDF (Term Frequency-Inverse Document Frequency), a statistical measure that reflects the importance of a word within data, and the relationship with N-gram, an analysis of the correlation concerning the relationship between words. Using these results, it was possible to evaluate the popularity and growing tendency of DIY clothing products in conjunction with the evolving social environment, as well as the desire to explore DIY trends among consumers. Therefore, this study is valuable in that it provides preliminary data for DIY clothing research by analyzing the status and reality of DIY products, and furthermore, contributes to the development and production of DIY clothing.

A Keyphrase Extraction Model for Each Conference or Journal (학술대회 및 저널별 기술 핵심구 추출 모델)

  • Jeong, Hyun Ji;Jang, Gwangseon;Kim, Tae Hyun;Sin, Donggu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.81-83
    • /
    • 2022
  • Understanding research trends is necessary to select research topics and explore related works. Most researchers search representative keywords of interesting domains or technologies to understand research trends. However some conferences in artificial intelligence or data mining fields recently publish hundreds to thousands of papers for each year. It makes difficult for researchers to understand research trend of interesting domains. In our paper, we propose an automatic technology keyphrase extraction method to support researcher to understand research trend for each conference or journal. Keyphrase extraction that extracts important terms or phrases from a text, is a fundamental technology for a natural language processing such as summarization or searching, etc. Previous keyphrase extraction technologies based on pretrained language model extract keyphrases from long texts so performances are degraded in short texts like titles of papers. In this paper, we propose a techonolgy keyphrase extraction model that is robust in short text and considers the importance of the word.

  • PDF

A Study on Speech Synthesizer Using Distributed System (분산형 시스템을 적용한 음성합성에 관한 연구)

  • Kim, Jin-Woo;Min, So-Yeon;Na, Deok-Su;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.209-215
    • /
    • 2010
  • Recently portable terminal is received attention by wireless networks and mass capacity ROM. In this result, TTS(Text to Speech) system is inserted to portable terminal. Nevertheless high quality synthesis is difficult in portable terminal, users need high quality synthesis. In this paper, we proposed Distributed TTS (DTTS) that was composed of server and terminal. The DTTS on corpus based speech synthesis can be high quality synthesis. Synthesis system in server that generate optimized speech concatenation information after database search and transmit terminal. Synthesis system in terminal make high quality speech synthesis as low computation using transmitted speech concatenation information from server. The proposed method that can be reducing complexity, smaller power consumption and efficient maintenance.

A method for metadata extraction from a collection of records using Named Entity Recognition in Natural Language Processing (자연어 처리의 개체명 인식을 통한 기록집합체의 메타데이터 추출 방안)

  • Chiho Song
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.24 no.2
    • /
    • pp.65-88
    • /
    • 2024
  • This pilot study explores a method of extracting metadata values and descriptions from records using named entity recognition (NER), a technique in natural language processing (NLP), a subfield of artificial intelligence. The study focuses on handwritten records from the Guro Industrial Complex, produced during the 1960s and 1970s, comprising approximately 1,200 pages and 80,000 words. After the preprocessing process of the records, which included digitization, the study employed a publicly available language API based on Google's Bidirectional Encoder Representations from Transformers (BERT) language model to recognize entity names within the text. As a result, 173 names of people and 314 of organizations and institutions were extracted from the Guro Industrial Complex's past records. These extracted entities are expected to serve as direct search terms for accessing the contents of the records. Furthermore, the study identified challenges that arose when applying the theoretical methodology of NLP to real-world records consisting of semistructured text. It also presents potential solutions and implications to consider when addressing these issues.

A Study on the Direction of Reading and Information Service through Analysis of Digital Reading and Information Literacy Competencies Evaluation Items: Focusing on PIAAC and PISA (디지털 독서 및 정보 리터러시 평가 문항 분석을 통한 독서 및 정보 서비스의 방향 탐색 - PIAAC와 PISA를 중심으로 -)

  • Park, Juhyeon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.52 no.3
    • /
    • pp.61-89
    • /
    • 2018
  • The purpose of this study is to analyze the items related to digital reading and information literacy which were measured by PIAAC and PISA, to examine the measurement contents and methods of these literacy items, and to derive the implications for providing reading and information services for librarians at public libraries and teacher librarians. In order to solve the questions measuring digital reading literacy and digital information literacy, respondents commonly needed ICT skills as well as cognitive strategies. However, in digital reading literacy measurement items, the ability to comprehend and critically think about texts was emphasized. And in digital information literacy measurement items, the ability to use ICT skills, navigate, and evaluate whether or not to read the retrieved text was emphasized. Librarians and teacher librarians need to encourage readers to read and provide a customized competencies improvement program to reflect the performance results and characteristics of a particular group. And It is also necessary to improve and develop the library environment so that library user can understand and use library search system and the Korean decimal classification.

Resistance to sliding in orthodontics: misconception or method error? A systematic review and a proposal of a test protocol

  • Savoldi, Fabio;Papoutsi, Aggeliki;Dianiskova, Simona;Dalessandri, Domenico;Bonetti, Stefano;Tsoi, James K.H.;Matinlinna, Jukka P.;Paganelli, Corrado
    • The korean journal of orthodontics
    • /
    • v.48 no.4
    • /
    • pp.268-280
    • /
    • 2018
  • Resistance to sliding (RS) between the bracket, wire, and ligature has been largely debated in orthodontics. Despite the extensive number of published studies, the lack of discussion of the methods used has led to little understanding of this phenomenon. The aim of this study was to discuss variables affecting RS in orthodontics and to suggest an operative protocol. The search included $PubMed^{(c)}$, $Medline^{(c)}$, and the Cochrane $Library^{(c)}$. References of full-text articles were manually analyzed. English-language articles published between January 2007 and January 2017 that performed an in vitro analysis of RS between the bracket, wire, and ligature were included. Study methods were analyzed based on the study design, description of materials, and experimental setup, and a protocol to standardize the testing methods was proposed. From 404 articles identified from the database search and 242 records selected from published references, 101 were eligible for the qualitative analysis, and six for the quantitative synthesis. One or more experimental parameters were incompatible and a meta-analysis was not performed. Major factors regarding the study design, materials, and experimental setup were not clearly described by most studies. The normal force, that is the force perpendicular to the sliding of the wire and one of the most relevant variable in RS, was not considered by most studies. Different variables were introduced, often acting as confounding factors. A protocol was suggested to standardize testing procedures and enhance the understanding of in vitro findings.

A Study on Contents-based Retrieval using Wavelet (Wavelet을 이용한 내용기반 검색에 관한 연구)

  • 강진석;박재필;나인호;최연성;김장형
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.4 no.5
    • /
    • pp.1051-1066
    • /
    • 2000
  • According to the recent advances of digital encoding technologies and computing power, large amounts of multimedia informations such as image, graphic, audio and video are fully used in multimedia systems through Internet. By this, diverse retrieval mechanisms are required for users to search dedicated informations stored in multimedia systems, and especially it is preferred to use contents-based retrieval method rather than text-type keyword retrieval method. In this paper, we propose a new contents-based indexing and searching algorithm which aims to get both high efficiency and high retrieval performance. To achieve these objectives, firstly the proposed algorithm classifies images by a pre-processing process of edge extraction, range division, and multiple filtering, and secondly it searches the target images using spatial and textural characteristics of colors, which are extracted from the previous process, in a image. In addition, we describe the simulation results of search requests and retrieval outputs for several images of company's trade-mark using the proposed contents-based retrieval algorithm based on wavelet.

  • PDF

Analysis of the influence of food-related social issues on corporate management performance using a portal search index

  • Yoon, Chaebeen;Hong, Seungjee;Kim, Sounghun
    • Korean Journal of Agricultural Science
    • /
    • v.46 no.4
    • /
    • pp.955-969
    • /
    • 2019
  • Analyzing on-line consumer responses is directly related to the management performance of food companies. Therefore, this study collected and analyzed data from an on-line portal site created by consumers about food companies with issues and examined the relationships between the data and the management performance. Through this process, we identified consumers' awareness of these companies obtained from big data analysis and analyzed the relationship between the results and the sales and stock prices of the companies through a time-series graph and correlation analysis. The results of this study were as follows. First, the result of the text mining analysis suggests that consumers respond more sensitively to negative issues than to positive issues. Second, the emotional analysis showed that companies' ethics issues (Enterprise 3 and 4) have a higher level of emotional continuity than that of food safety issues. It can be interpreted that the problem of ethical management has great influence on consumers' purchasing behavior. Finally, In the case of all negative food issues, the number of word frequency and emotional scores showed opposite trends. As a result of the correlation analysis, there was a correlation between word frequency and stock price in the case of all negative food issues and also between emotional scores and stock price. Recently, studies using big data analytics have been conducted in various fields. Therefore, based on this research, it is expected that studies using big data analytics will be done in the agricultural field.