• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.03 seconds

Construction of Korean Speech DB for Common Use and Implementation of Workbench for Spoken Language Data Acquisition (공동이용을 위한 음성DB의 구축 및 음성 자료 수집을 위한 Workbench의 구현)

  • Kim Bong-wan;Lee Yong-Ju
    • MALSORI
    • /
    • no.35_36
    • /
    • pp.189-209
    • /
    • 1998
  • This study discusses Korean speech database that has been designed and constructed for common use, especially focusing on designing a list of words or sentences that covers various phonological environments. As the results, PBW(Phonetically Balanced words) and PBS(Phonetically Balanced Sentences) was selected from balanced text corpus using maximum entropy method. And, implemented workbench for spoken language data acquisition is presented in this paper. The workbench consists of grapheme to phoneme converter, utterance list selection module, speech data editing module, multi-layer labelling module, and phoneme context search module.

  • PDF

Detection of Damages in Concrete Structures Using Non-Contact Air-Coupled Sensing Methods

  • Shin, Sung-Woo
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.30 no.3
    • /
    • pp.282-289
    • /
    • 2010
  • Most nondestructive testing techniques require good contact between the sensor and tested concrete surface to obtain reliable data. But the surface preparation is often very time and labor consuming due to the rough surface or limited access of concrete structures. One approach to speed up the data collection process is to eliminate the need for physical contact between the sensor and tested structure. Non-contact air-coupled sensing technique can be a good solution to this problem. An obvious advantage of the non-contact air-coupled sensing technique is which can greatly speed up the data collection in field and thus the damage detection process can be completed very rapidly. In this article, recent developments in non-contact air-coupled sensing technique for rapid detection of damages in concrete structures are summarized to evoke interest, discussion and further developments on this technique to a NDT research community in Korea. It is worth noting that the works in this article have been published in the types of thesis, proceedings, and journals. All published sources are cited in the text and listed in reference.

Storing Digital Information in Long-Read DNA

  • Ahn, TaeJin;Ban, Hamin;Park, Hyunsoo
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.30.1-30.6
    • /
    • 2018
  • There is urgent need for effective and cost-efficient data storage, as the worldwide requirement for data storage is rapidly growing. DNA has introduced a new tool for storing digital information. Recent studies have successfully stored digital information, such as text and gif animation. Previous studies tackled technical hurdles due to errors from DNA synthesis and sequencing. Studies also have focused on a strategy that makes use of 100-150-bp read sizes in both synthesis and sequencing. In this paper, we a suggest novel data encoding/decoding scheme that makes use of long-read DNA (~1,000 bp). This enables accurate recovery of stored digital information with a smaller number of reads than the previous approach. Also, this approach reduces sequencing time.

A Study on the Performance Analysis of Entity Name Recognition Techniques Using Korean Patent Literature

  • Gim, Jangwon
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.2
    • /
    • pp.139-151
    • /
    • 2020
  • Entity name recognition is a part of information extraction that extracts entity names from documents and classifies the types of extracted entity names. Entity name recognition technologies are widely used in natural language processing, such as information retrieval, machine translation, and query response systems. Various deep learning-based models exist to improve entity name recognition performance, but studies that compared and analyzed these models on Korean data are insufficient. In this paper, we compare and analyze the performance of CRF, LSTM-CRF, BiLSTM-CRF, and BERT, which are actively used to identify entity names using Korean data. Also, we compare and evaluate whether embedding models, which are variously used in recent natural language processing tasks, can affect the entity name recognition model's performance improvement. As a result of experiments on patent data and Korean corpus, it was confirmed that the BiLSTM-CRF using FastText method showed the highest performance.

Implementation of Framework for Efficient and Scalable Disaster Response Services

  • Seokjin Im
    • International Journal of Advanced Culture Technology
    • /
    • v.11 no.1
    • /
    • pp.290-295
    • /
    • 2023
  • The global warming by greenhouse gases causes climate change and disasters such as earthquakes and tsunamis frequently, leading to great damage. It is important to build efficient and scalable disaster response services to minimize the damage. Existing disaster warning service by the mobile text is limited by the scalability and the data size to be delivered. In this paper, we propose a framework for disaster response services that is efficient and flexible by allowing to adopt various indexing schemes and scalable by supporting any number of clients in disaster situations anytime and anywhere. Also, the framework by wireless data broadcast can be free from the limitation of the size of data to be delivered. We design and implement the proposed framework and evaluate the framework. For the evaluation, we simulate the implemented framework by adopting various indexing schemes like HCI, DSI and TTSI, and by comparing the access times of the clients. Through the evaluation, we show that the proposed framework can provide efficient and scalable and flexible disaster response services.

Study on Promotion of ESG Tourism in Bhutan through Big Data Analysis - Focusing on comparison with ESG Tourism status in Korea-

  • Min Kyeong Kim
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.2
    • /
    • pp.39-48
    • /
    • 2023
  • The purpose of this study is to revitalize ESG tourism in Bhutan by comparing and analyzing the ESG tourism status in Bhutan and the ESG tourism status in Korea. Big data analysis using text mining was performed by selecting "Bhutan ESG Tourism" and "Korea ESG Tourism" as keywords. The top 30 keywords were extracted through word purification, and based on this, data visualization was conducted through network analysis and Concor analysis between each keyword. As a result of the analysis, it was confirmed that Bhutan, unlike Korea, did not utilize it even though it had elements to incorporate ESG and the tourism industry into the country itself. As a result, since it is necessary to combine ESG elements owned by Bhutan and combine them with the tourism industry, we would like to suggest the direction of combining ESG and the tourism industry through this study.

The Effect of Dessert Cafe's Servicescape on CustomerEngagement through Big Data Analysis (빅데이터 분석을 통한 디저트 카페의 서비스스케이프가 고객인게이지먼트에 미치는 영향)

  • DAYOUNG NO;GI-HWAN RYU
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.693-697
    • /
    • 2023
  • As of 2022, dessert cafe trends are changing faster, customers' needs are becoming more demanding, and Koreans' consumption tendencies are changing rapidly, so this study investigates servicescape and customer engagement factors for dessert cafes through big data to identify servicescape and customer engagement factors.

Topic Modeling on Patent and Article Big Data Using BERTopic and Analyzing Technological Trends of AI Semiconductor Industry (BERTopic을 활용한 텍스트마이닝 기반 인공지능 반도체 기술 및 연구동향 분석)

  • Hyeonkyeong Kim;Junghoon Lee;Sunku Kang
    • Journal of Information Technology Applications and Management
    • /
    • v.31 no.1
    • /
    • pp.139-161
    • /
    • 2024
  • The Fourth Industrial Revolution has spurred widespread adoption of AI-based services, driving global interest in AI semiconductors for efficient large-scale computation. Text mining research, historically using LDA, has evolved with machine learning integration, exemplified by the 2021 BERTopic technology. This study employs BERTopic to analyze AI semiconductor-related patents and research data, generating 48 topics from 2,256 patents and 40 topics from 1,112 publications. While providing valuable insights into technology trends, the study acknowledges limitations in taking a macro approach to the entire AI semiconductor industry. Future research may explore specific technologies for more nuanced insights as the industry matures.

Emotion Classification from Text based on Natural Language Processing (자연어 처리 기반 텍스트 감정 분류 모델)

  • Minju Kim;Hyojeong Jin;Junghoon Lee
    • Annual Conference of KIPS
    • /
    • 2024.05a
    • /
    • pp.690-691
    • /
    • 2024
  • 본 논문에서는 특정 서비스군의 소비자 니즈를 신속히 파악하기 위하여 일기와 같은 자연언어 텍스트를 활용한 분류 모델을 개발한다. 목적에 맞는 감정상태군을 정의하여 필수적인 감정들로 통합한 후 주어진 데이터셋에서 해당 감정 컬럼을 추출하여 텍스트 형식을 통일한다. 파이썬의 Keras 라이브러리를 사용하여 임베딩 레이어, LSTM 레이어, 밀집 레이어 등으로 학습 네트워크를 구성한 후 추출된 텍스트로 학습한 결과는 15회의 이포크 수행으로 98%의 정확도에 도달한다.

Methodology for Issue-related R&D Keywords Packaging Using Text Mining (텍스트 마이닝 기반의 이슈 관련 R&D 키워드 패키징 방법론)

  • Hyun, Yoonjin;Shun, William Wong Xiu;Kim, Namgyu
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.57-66
    • /
    • 2015
  • Considerable research efforts are being directed towards analyzing unstructured data such as text files and log files using commercial and noncommercial analytical tools. In particular, researchers are trying to extract meaningful knowledge through text mining in not only business but also many other areas such as politics, economics, and cultural studies. For instance, several studies have examined national pending issues by analyzing large volumes of text on various social issues. However, it is difficult to provide successful information services that can identify R&D documents on specific national pending issues. While users may specify certain keywords relating to national pending issues, they usually fail to retrieve appropriate R&D information primarily due to discrepancies between these terms and the corresponding terms actually used in the R&D documents. Thus, we need an intermediate logic to overcome these discrepancies, also to identify and package appropriate R&D information on specific national pending issues. To address this requirement, three methodologies are proposed in this study-a hybrid methodology for extracting and integrating keywords pertaining to national pending issues, a methodology for packaging R&D information that corresponds to national pending issues, and a methodology for constructing an associative issue network based on relevant R&D information. Data analysis techniques such as text mining, social network analysis, and association rules mining are utilized for establishing these methodologies. As the experiment result, the keyword enhancement rate by the proposed integration methodology reveals to be about 42.8%. For the second objective, three key analyses were conducted and a number of association rules between national pending issue keywords and R&D keywords were derived. The experiment regarding to the third objective, which is issue clustering based on R&D keywords is still in progress and expected to give tangible results in the future.