• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.032 seconds

Analysis on the Trend of The Journal of Information Systems Using TLS Mining (TLS 마이닝을 이용한 '정보시스템연구' 동향 분석)

  • Yun, Ji Hye;Oh, Chang Gyu;Lee, Jong Hwa
    • The Journal of Information Systems
    • /
    • v.31 no.1
    • /
    • pp.289-304
    • /
    • 2022
  • Purpose The development of the network and mobile industries has induced companies to invest in information systems, leading a new industrial revolution. The Journal of Information Systems, which developed the information system field into a theoretical and practical study in the 1990s, retains a 30-year history of information systems. This study aims to identify academic values and research trends of JIS by analyzing the trends. Design/methodology/approach This study aims to analyze the trend of JIS by compounding various methods, named as TLS mining analysis. TLS mining analysis consists of a series of analysis including Term Frequency-Inverse Document Frequency (TF-IDF) weight model, Latent Dirichlet Allocation (LDA) topic modeling, and a text mining with Semantic Network Analysis. Firstly, keywords are extracted from the research data using the TF-IDF weight model, and after that, topic modeling is performed using the Latent Dirichlet Allocation (LDA) algorithm to identify issue keywords. Findings The current study used the summery service of the published research paper provided by Korea Citation Index to analyze JIS. 714 papers that were published from 2002 to 2012 were divided into two periods: 2002-2011 and 2012-2021. In the first period (2002-2011), the research trend in the information system field had focused on E-business strategies as most of the companies adopted online business models. In the second period (2012-2021), data-based information technology and new industrial revolution technologies such as artificial intelligence, SNS, and mobile had been the main research issues in the information system field. In addition, keywords for improving the JIS citation index were presented.

A Study on the Quantitative Evaluation of Initial Coin Offering (ICO) Using Unstructured Data (비정형 데이터를 이용한 ICO(Initial Coin Offering) 정량적 평가 방법에 대한 연구)

  • Lee, Han Sol;Ahn, Sangho;Kang, Juyoung
    • Smart Media Journal
    • /
    • v.11 no.5
    • /
    • pp.63-74
    • /
    • 2022
  • Initial public offering (IPO) has a legal framework for investor protection, and because there are various quantitative evaluation factors, objective analysis is possible, and various studies have been conducted. In addition, crowdfunding also has several devices to prevent indiscriminate funding as the legal system for investor protection. On the other hand, the blockchain-based cryptocurrency white paper (ICO), which has recently been in the spotlight, has ambiguous legal means and standards to protect investors and lacks quantitative evaluation methods to evaluate ICOs objectively. Therefore, this study collects online-published ICO white papers to detect fraud in ICOs, performs ICO fraud predictions based on BERT, a text embedding technique, and compares them with existing Random Forest machine learning techniques, and shows the possibility on fraud detection. Finally, this study is expected to contribute to the study of ICO fraud detection based on quantitative methods by presenting the possibility of using a quantitative approach using unstructured data to identify frauds in ICOs.

An Intelligent Recommendation System by Integrating the Attributes of Product and Customer in the Movie Reviews (영화 리뷰의 상품 속성과 고객 속성을 통합한 지능형 추천시스템)

  • Hong, Taeho;Hong, Junwoo;Kim, Eunmi;Kim, Minsu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.1-18
    • /
    • 2022
  • As digital technology converges into the e-commerce market across industries, online transactions have activated, and the use of online has increased. With the recent spread of infectious diseases such as COVID-19, this market flow is accelerating, and various product information can be provided to customers online. Providing a variety of information provides customers with various opportunities but causes difficulties in decision-making. The recommendation system can help customers to make a decision more effectively. However, the previous research on recommendation systems is limited to only quantitative data and does not reflect detailed factors of products and customers. In this study, we propose an intelligent recommendation system that quantifies the attributes of products and customers by applying text mining techniques to qualitative data based on online reviews and integrates the existing objective indicators of total star rating, sentiment, and emotion. The proposed integrated recommendation model showed superior performance to the overall rating-oriented recommendation model. It expects the new business value to be created through the recommendation result reflecting detailed factors of products and customers.

Comparative co-expression analysis of RNA-Seq transcriptome revealing key genes, miRNA and transcription factor in distinct metabolic pathways in diabetic nerve, eye, and kidney disease

  • Asmy, Veerankutty Subaida Shafna;Natarajan, Jeyakumar
    • Genomics & Informatics
    • /
    • v.20 no.3
    • /
    • pp.26.1-26.19
    • /
    • 2022
  • Diabetes and its related complications are associated with long term damage and failure of various organ systems. The microvascular complications of diabetes considered in this study are diabetic retinopathy, diabetic neuropathy, and diabetic nephropathy. The aim is to identify the weighted co-expressed and differentially expressed genes (DEGs), major pathways, and their miRNA, transcription factors (TFs) and drugs interacting in all the three conditions. The primary goal is to identify vital DEGs in all the three conditions. The overlapped five genes (AKT1, NFKB1, MAPK3, PDPK1, and TNF) from the DEGs and the co-expressed genes were defined as key genes, which differentially expressed in all the three cases. Then the protein-protein interaction network and gene set linkage analysis (GSLA) of key genes was performed. GSLA, gene ontology, and pathway enrichment analysis of the key genes elucidates nine major pathways in diabetes. Subsequently, we constructed the miRNA-gene and transcription factor-gene regulatory network of the five gene of interest in the nine major pathways were studied. hsa-mir-34a-5p, a major miRNA that interacted with all the five genes. RELA, FOXO3, PDX1, and SREBF1 were the TFs interacting with the major five gene of interest. Finally, drug-gene interaction network elucidates five potential drugs to treat the genes of interest. This research reveals biomarker genes, miRNA, TFs, and therapeutic drugs in the key signaling pathways, which may help us, understand the processes of all three secondary microvascular problems and aid in disease detection and management.

Images of Nurses Appeared in Media Reports Before and After Outbreak of COVID-19: Text Network Analysis and Topic Modeling (COVID-19 발생 전·후 언론보도에 나타난 간호사 이미지에 대한 텍스트 네트워크 분석 및 토픽 모델링)

  • Park, Min Young;Jeong, Seok Hee;Kim, Hee Sun;Lee, Eun Jee
    • Journal of Korean Academy of Nursing
    • /
    • v.52 no.3
    • /
    • pp.291-307
    • /
    • 2022
  • Purpose: The aims of study were to identify the main keywords, the network structure, and the main topics of press articles related to nurses that have appeared in media reports. Methods: Data were media articles related to the topic "nurse" reported in 16 central media within a one-year period spanning July 1, 2019 to June 30, 2020. Data were collected from the Big Kinds database. A total of 7,800 articles were searched, and 1,038 were used for the final analysis. Text network analysis and topic modeling were performed using NetMiner 4.4. Results: The number of media reports related to nurses increased by 3.86 times after the novel coronavirus (COVID-19) outbreak compared to prior. Pre- and post-COVID-19 network characteristics were density 0.002, 0.001; average degree 4.63, 4.92; and average distance 4.25, 4.01, respectively. Four topics were derived before and after the COVID-19 outbreak, respectively. Pre-COVID-19 example topics are "a nurse who committed suicide because she could not withstand the Taewoom at work" and "a nurse as a perpetrator of a newborn abuse case," while post-COVID-19 examples are "a nurse as a victim of COVID-19," "a nurse working with the support of the people," and "a nurse as a top contributor and a warrior to protect from COVID-19." Conclusion: Topic modeling shows that topics become more positive after the COVID-19 outbreak. Individual nurses and nursing organizations should continuously monitor and conduct further research on nurses' image.

Analysis of the ESG Research Trend : Focusing on SCOPUS DB (ESG 주요 연구 동향 분석: SCOPUS DB를 중심으로)

  • Kyoo-Sung Noh
    • Journal of Digital Convergence
    • /
    • v.21 no.2
    • /
    • pp.9-16
    • /
    • 2023
  • The purpose of this study is to analyze research trends on ESG (Environmental, Social, and Governance), and to present a direction for companies and investors to use ESG information. To this end, text mining, one of the atypical data mining techniques, was used for analysis. Thesis abstracts from January 2014 to February 2023 were collected from the SCOPUS database, and Economics, Econometrics and Finance were the most common. The United States and China published the most ESG papers, and Korea published the 6th most papers in the world. This study is meaningful in that it analyzed the main research trends of ESG using text mining techniques such as LDA and topic modeling. It was confirmed that ESG is being conducted in various fields, not in a specific field, and it is differentiated from previous studies in that it analyzed various influencing factors and ripple effects of ESG.

Analysis of Work-Related Musculoskeletal Disorders Research Trends Using Keyword Frequency Analysis and CONCOR Technique

  • Geon-Hui Lee;Seo-Yeon Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.8
    • /
    • pp.137-144
    • /
    • 2023
  • One of the methods being suggested as a way to address social issues is the utilization of big data analysis techniques. In this study, we utilized keyword network analysis and CONCOR analysis techniques to analyze the research trends on work-related musculoskeletal disorders. The findings of this study are as follows: Firstly, the number of papers on work-related musculoskeletal disorders has been consistently increasing, with an average of over 33 articles published per year since the investigation of musculoskeletal risk factors in 2003. The publication rate showed an increase from 2007 to 2009. Secondly, the frequency of the top keywords identified through text mining were as follows: work (4,940), musculoskeletal disorders (2,197), symptoms (1,836), related (1,769), musculoskeletal system (1,421). Thirdly, the CONCOR analysis resulted in the formation of four clusters: ' Musculoskeletal disorder treatment', 'Occupational health and safety management', 'Work environment assessment', and ' Workplace environment measurement'. It is expected that this study will contribute to the development of research on musculoskeletal disorders and provide various directions for future studies.

A Developing a Machine Leaning-Based Defect Data Management System For Multi-Family Housing Unit (기계학습 알고리즘 기반 하자 정보 관리 시스템 개발 - 공동주택 전용부분을 중심으로 -)

  • Park, Da-seul;Cha, Hee-sung
    • Korean Journal of Construction Engineering and Management
    • /
    • v.24 no.5
    • /
    • pp.35-43
    • /
    • 2023
  • Along with the increase in Multi-unit housing defect disputes, the importance of defect management is also increased. However, previous studies have mostly focused on the Multi-unit housing's 'common part'. In addition, there is a lack of research on the system for the 'management office', which is a part of the subject of defect management. These resulted in the lack of defect management capability of the management office and the deterioration of management quality. Therefore, this paper proposes a machine learning-based defect data management system for management offices. The goal is to solve the inconvenience of management by using Optical Character Recognition (OCR) and Natural Language Processing (NLP) modules. This system converts handwritten defect information into online text via OCR. By using the language model, the defect information is regenerated along with the form specified by the user. Eventually, the generated text is stored in a database and statistical analysis is performed. Through this chain of system, management office is expected to improve its defect management capabilities and support decision-making.

Perceived Characteristics of Grains during the Choseon Dynasty - A Study Applying Text Frequency Analysis Using the Choseonwangjoshilrok Data - (조선왕조실록 텍스트 빈도 분석을 통한 조선시대 곡물에 관한 인식 특성 고찰)

  • Mi-Hye, Kim
    • Journal of the Korean Society of Food Culture
    • /
    • v.38 no.1
    • /
    • pp.26-37
    • /
    • 2023
  • This study applied the text frequency method to analyze the crops prevalent during the Chosunwangjoshilrok dynasty, and categorized the results by each king. Contemporary perception of grains was observed by examining the staple crop types. Staple species were examined using the word cloud and semantic network analysis. Totally, 101,842 types of crop consumption were recorded during the Chosunwangjoshilrok period. Of these, 51,337 (50.4%) were grains, 50,407 (49.5%) were beans, and 98 (0.1%) were seeds. Rice was the most frequently consumed grain (37.1%), followed by pii (11.9%), millet (11.3%), barley (4.5%), proso (0.8%), wheat (0.6%), buckwheat (0.1%), and adlay (0.05%). Grain chronological frequency in the Choseon dynasty was determined to be 15,520 cases in the 15th century (30.2%), 11,201 cases in the 18th century (21.8%), 9,421 cases in the 17th century (18.4%), 9,113 cases in the 16th century (17.8%), and 6,082 cases in the 19th century (11.8%). Interest in grain amongst the 27 kings of Choseon was evaluated based on the frequency of records. The 15th century King Sejong recorded the maximum interest with 13,363 cases (13.1%), followed by King Jungjo (8,501 cases in the 18th century; 8.4%), King Sungjong (7,776 cases in the 15th century; 7.6%).

BackTranScription (BTS)-based Jeju Automatic Speech Recognition Post-processor Research (BackTranScription (BTS)기반 제주어 음성인식 후처리기 연구)

  • Park, Chanjun;Seo, Jaehyung;Lee, Seolhwa;Moon, Heonseok;Eo, Sugyeong;Jang, Yoonna;Lim, Heuiseok
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.178-185
    • /
    • 2021
  • Sequence to sequence(S2S) 기반 음성인식 후처리기를 훈련하기 위한 학습 데이터 구축을 위해 (음성인식 결과(speech recognition sentence), 전사자(phonetic transcriptor)가 수정한 문장(Human post edit sentence))의 병렬 말뭉치가 필요하며 이를 위해 많은 노동력(human-labor)이 소요된다. BackTranScription (BTS)이란 기존 S2S기반 음성인식 후처리기의 한계점을 완화하기 위해 제안된 데이터 구축 방법론이며 Text-To-Speech(TTS)와 Speech-To-Text(STT) 기술을 결합하여 pseudo 병렬 말뭉치를 생성하는 기술을 의미한다. 해당 방법론은 전사자의 역할을 없애고 방대한 양의 학습 데이터를 자동으로 생성할 수 있기에 데이터 구축에 있어서 시간과 비용을 단축 할 수 있다. 본 논문은 BTS를 바탕으로 제주어 도메인에 특화된 음성인식 후처리기의 성능을 향상시키기 위하여 모델 수정(model modification)을 통해 성능을 향상시키는 모델 중심 접근(model-centric) 방법론과 모델 수정 없이 데이터의 양과 질을 고려하여 성능을 향상시키는 데이터 중심 접근(data-centric) 방법론에 대한 비교 분석을 진행하였다. 실험결과 모델 교정없이 데이터 중심 접근 방법론을 적용하는 것이 성능 향상에 더 도움이 됨을 알 수 있었으며 모델 중심 접근 방법론의 부정적 측면 (negative result)에 대해서 분석을 진행하였다.

  • PDF