• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.036 seconds

Sentence Similarity Measurement Method Using a Set-based POI Data Search (집합 기반 POI 검색을 이용한 문장 유사도 측정 기법)

  • Ko, EunByul;Lee, JongWoo
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.12
    • /
    • pp.711-716
    • /
    • 2014
  • With the gradual increase of interest in plagiarism and intelligent file content search, the demand for similarity measuring between two sentences is increasing. There is a lot of researches for sentence similarity measurement methods in various directions such as n-gram, edit-distance and LSA. However, these methods have their own advantages and disadvantages. In this paper, we propose a new sentence similarity measurement method approaching from another direction. The proposed method uses the set-based POI data search that improves search performance compared to the existing hard matching method when data includes the inverse, omission, insertion and revision of characters. Using this method, we are able to measure the similarity between two sentences more accurately and more quickly. We modified the data loading and text search algorithm of the set-based POI data search. We also added a word operation algorithm and a similarity measure between two sentences expressed as a percentage. From the experimental results, we observe that our sentence similarity measurement method shows better performance than n-gram and the set-based POI data search.

An Analysis of the Discourse Topics of Users who Exhibit Symptoms of Depression on Social Media (소셜미디어를 통한 우울 경향 이용자 담론 주제 분석)

  • Seo, Harim;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.4
    • /
    • pp.207-226
    • /
    • 2019
  • Depression is a serious psychological disease that is expected to afflict an increasing number of people. And studies on depression have been conducted in the context of social media because social media is a platform through which users often frankly express their emotions and often reveal their mental states. In this study, large amounts of Korean text were collected and analyzed to determine whether such data could be used to detect depression in users. This study analyzed data collected from Twitter users who had and did not have depressive tendencies between January 2016 and February 2019. The data for each user was separately analyzed before and after the appearance of depressive tendencies to see how their expression changed. In this study the data were analyzed through co-occurrence word analysis, topic modeling, and sentiment analysis. This study's automated data collection method enabled analyses of data collected over a relatively long period of time. Also it compared the textual characteristics of users with depressive tendencies to those without depressive tendencies.

Implementation and Design of WISD(Web Interface System based DICOM) for Efficient Sharing of Medical Information between Clinics (의료기관간 효과적인 의료정보 공유를 위한 WISD의 설계 및 구현)

  • Cho, Ik-Sung;Kwon, Hyeog-Soong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.3
    • /
    • pp.500-508
    • /
    • 2008
  • For efficient compatible system between medical clinics, the medical information has to be built on a standardized protocol such as a HL7 for text data and a DICOM for image data. But it is difficult to exchange information between medical clinics because the systems and softwares are different and also a structure of data and a type of code. Therefore we analyze a structure of DICOM file and design an integrated database for effective information sharing and exchange. The WISD system suggested in this paper separate the DICOM file transmitted by medical clinics to text data and image data and store it in the integrated DB(database) by standardized protocol respectively. It is very efficient that each medical clinic can search and exchange information by web browser using the suggested system. The WISD system can not only search and control of image data and patient information through integrated database and internet, but share medical information without extra charge like construction of new system.

Using Data Mining Techniques for Analysis of the Impacts of COVID-19 Pandemic on the Domestic Stock Prices: Focusing on Healthcare Industry (데이터 마이닝 기법을 통한 COVID-19 팬데믹의 국내 주가 영향 분석: 헬스케어산업을 중심으로)

  • Kim, Deok Hyun;Yoo, Dong Hee;Jeong, Dae Yul
    • The Journal of Information Systems
    • /
    • v.30 no.3
    • /
    • pp.21-45
    • /
    • 2021
  • Purpose This paper analyzed the impacts of domestic stock market by a global pandemic such as COVID-19. We investigated how the overall pattern of the stock market changed due to the impact of the COVID-19 pandemic. In particular, we analyzed in depth the pattern of stock price, as well, tried to find what factors affect on stock market index(KOSPI) in the healthcare industry due to the COVID-19 pandemic. Design/methodology/approach We built a data warehouse from the databases in various industrial and economic fields to analyze the changes in the KOSPI due to COVID-19, particularly, the changes in the healthcare industry centered on bio-medicine. We collected daily stock price data of the KOSPI centered on the KOSPI-200 about two years before and one year after the outbreak of COVID-19. In addition, we also collected various news related to COVID-19 from the stock market by applying text mining techniques. We designed four experimental data sets to develop decision tree-based prediction models. Findings All prediction models from the four data sets showed the significant predictive power with explainable decision tree models. In addition, we derived significant 10 to 14 decision rules for each prediction model. The experimental results showed that the decision rules were enough to explain the domestic healthcare stock market patterns for before and after COVID-19.

Text Big Data Analysis and Summary for Free Semester Operational Plan Document (자유학기제 운영계획서에 대한 텍스트 빅데이터 분석 및 요약)

  • Lee, Suan;Park, Beomjun;Kim, Minkyu;Shin, Hye Sook;Kim, Jinho
    • The Journal of Korean Association of Computer Education
    • /
    • v.22 no.3
    • /
    • pp.135-146
    • /
    • 2019
  • Big data analysis is actively used for collecting and analyzing direct information on related topics in each field of society. Applying big data analysis technology in education field is increasingly interested in Korea, because applying this technology helps to identify the effectiveness of education methods and policies and applying them for policy formulation. In this paper, we propose our approach of utilizing big data analysis technology in education field. We focus on free semester program, one of the current core education policies, and we analyze the main points of interests and differences in the free semester through analysis and visualization of texts that are written on the operation reports prepared by each school. We compare regional differences in key characteristics and interests based on the free semester operation reports from middle schools particularly at Seoul and Gangwon-do regions. In conclusion, applying and utilizing big data analysis technology according to the needs and requirements of education field is a great significance.

Development of MSDS Map for Visual Safety Management of Hazardous and Chemical Materials (유해화학물질의 시각적 안전관리를 위한 MSDS 지도 개발)

  • Shin, Myungwoo;Suh, Yongyoon
    • Journal of the Korean Society of Safety
    • /
    • v.34 no.2
    • /
    • pp.48-55
    • /
    • 2019
  • For preventing the accidents generated from the chemical materials, thus far, MSDS (Material Safety Data Sheet) data have been made to notify how to use and manage the hazardous and chemical materials in safety. However, it is difficult for users who handle these materials to understand the MSDS data because they are only listed based on the alphabetical order, not based on the specific factors such as similarity of characteristics. It is limited in representing the types of chemical materials with respect to their characteristics. Thus, in this study, a lots of MSDS data are visualized based on relationships of the characteristics among the chemical materials for supporting safety managers. For this, we used the textmining algorithm which extracts text keywords contained in documents and the Self-Organizing Map (SOM) algorithm which visually addresses textual data information. In the case of Occupational Safety and Health Administration (OSHA) in the United States, the guide texts contained in MSDS documents, which include use information such as reactivity and potential risks of materials, are gathered as the target data. First, using the textmining algorithm, the information of chemicals is extracted from these guide texts. Next, the MSDS map is developed using SOM in terms of similarity of text information of chemical materials. The MSDS map is helpful for effectively classifying chemical materials by mapping prohibited and hazardous substances on the developed the SOM map. As a result, using the MSDS map, it is easy for safety managers to detect prohibited and hazardous substances with respect to the Industrial Safety and Health Act standards.

Analysis of Public Perception and Policy Implications of Foreign Workers through Social Big Data analysis (소셜 빅데이터분석을 통한 외국인근로자에 관한 국민 인식 분석과 정책적 함의)

  • Ha, Jae-Been;Lee, Do-Eun
    • Journal of Digital Convergence
    • /
    • v.19 no.11
    • /
    • pp.1-10
    • /
    • 2021
  • This paper aimed to look at the awareness of foreign workers in social platforms by using text mining, one of the big data techniques and draw suggestions for foreign workers. To achieve this purpose, data collection was conducted with search keyword 'Foreign Worker' from Jan. 1, to Dec. 31, 2020, and frequency analysis, TF-IDF analysis, and degree centrality analysis and 100 parent keywords were drawn for comparison. Furthermore, Ucinet6.0 and Netdraw were used to analyze semantic networks, and through CONCOR analysis, data were clustered into the following eight groups: foreigner policy issue, regional community issue, business owner's perspective issue, employment issue, working environment issue, legal issue, immigration issue, and human rights issue. Based on such analyzed results, it identified national awareness of foreign workers and main issues and provided the basic data on policy proposals for foreign workers and related researches.

A Study on the Perception of Artificial Intelligence Literacy and Artificial Intelligence Convergence Education Using Text Mining Analysis Techniques (텍스트 마이닝 분석기법을 활용한 인공지능 리터러시 및 인공지능 융합 교육에 관한 인식 연구)

  • Hyeok Yun;Jeongrang Kim
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.6
    • /
    • pp.553-566
    • /
    • 2022
  • This study collects social data and academic research data from portal sites and RISS, and analyzes TF-IDF, N-Gram, semantic network analysis, and CONCOR analysis to analyze the social awareness and current aspects of 'AI Literacy' and 'AI Convergence Education'. Through this, we tried to understand the social awareness aspect and the current situation, and to suggest implications and directions. In the social data, the collection of 'AI Convergence Education' was more than twice that of 'AI Literacy', indicating that awareness of 'AI Literacy' was relatively low. In 'AI Literacy', the keyword 'human' in social data showed no cluster to which it belonged, indicating a lack of philosophical interest in and awareness of humanities and AI. In addition, the keyword 'Ministry of Education' showed high frequency, importance, and centrality of connection only in the social data of 'AI convergence education', confirming that 'AI convergence education' is closely related to government policy.

Implementation of AESA Radar Integration Analysis System by using Heterogeneous Media

  • Min-Jung Kang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.3
    • /
    • pp.117-125
    • /
    • 2024
  • In this paper, implement and propose an Active Electronically Scanned Array (AESA) radar integration analysis system which specialized for radar development by using heterogeneous media. Most analysis systems are used to analyze and improve the cause of defects, so they help the test easier. However, previous log analysis systems that operate only based on text are not intuitive and difficult to find the information user want at once if there is a lot of log information. so when an equipment defect occurs, there are limitations in analyzing the cause of defect. Therefore, the analysis system in this paper utilizes heterogeneous media. The media defined in this paper refers to recording text-based data, displaying data as image or video and visualizing data. The proposed analysis system classifies and stores data that transmitted and received between radar devices, radar target detection and Tracking algorithm data, etc. also displays and visualizes radar operation results and equipment defect information in real time. With this analysis system, it can quickly provide information what user want and assistance in developing high quality radar.

A Morphological Analysis Method of Predicting Place-Event Performance by Online News Titles (온라인 뉴스 제목 분석을 통한 특정 장소 이벤트 성과 예측을 위한 형태소 분석 방법)

  • Choi, Sukjae;Lee, Jaewoong;Kwon, Ohbyung
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.1
    • /
    • pp.15-32
    • /
    • 2016
  • Online news on the Internet, as published open data, contain facts or opinions about a specific affair and hence influences considerably on the decisions of the general publics who are interested in a particular issue. Therefore, we can predict the people's choices related with the issue by analyzing a large number of related internet news. This study aims to propose a text analysis methodto predict the outcomes of events that take place in a specific place. We used topics of the news articles because the topics contains more essential text than the news articles. Moreover, when it comes to mobile environment, people tend to rely more on the news topics before clicking into the news articles. We collected the titles of news articles and divided them into the learning and evaluation data set. Morphemes are extracted and their polarity values are identified with the learning data. Then we analyzed the sensitivity of the entire articles. As a result, the prediction success rate was 70.6% and it showed a clear difference with other analytical methods to compare. Derived prediction information will be helpful in determining the expected demand of goods when preparing the event.