• Title/Summary/Keyword: Text data

Search Result 2,957, Processing Time 0.034 seconds

An Improved Text Classification (향상된 텍스트 분류)

  • Wang, Guangxing;Shin, Seong-Yoon;Shin, Kwang-Weong;Lee, Hyun-Chang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.125-126
    • /
    • 2019
  • In this paper, we propose an improved kNN classification method. Through improved the mothed and normalizing the data, the purpose of improving the accuracy is achieved. Then we compared the three classification algorithms and the improved algorithm by experimental data.

  • PDF

Analysis of Weather News using Big Data Analytics Tools R (빅데이터 분석도구 R을 활용한 기상뉴스 데이터분석)

  • Kim, YongSu;Ban, ChaeHoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.448-450
    • /
    • 2016
  • 정보기술과 디지털 경제의 확산으로 대규모의 데이터가 생산되는 정보화시대에서 빅 데이터의 중요성이 강조되고 있으며 다양한 분야에서 이를 응용하고 있다. 빅 데이터 분석도구인 R은 통계 기반의 정보 분석을 가능하게 하는 언어와 환경이다. 본 논문에서는 R을 이용하여 기상뉴스에 나타난 기상관련 빅 데이터를 분석한다. 다양한 뉴스에서 기상 관련 데이터를 수집하고 어떠한 텍스트가 분포되어 있는지 빈도 조사를 수행한다.

  • PDF

Preparation of Soil Input Files to a Crop Model Using the Korean Soil Information System (흙토람 데이터베이스를 활용한 작물 모델의 토양입력자료 생성)

  • Yoo, Byoung Hyun;Kim, Kwang Soo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.19 no.3
    • /
    • pp.174-179
    • /
    • 2017
  • Soil parameters are required inputs to crop models, which estimate crop yield under a given environment condition. The Korean Soil Information System (KSIS), which provides detailed soil profile record of 390 soil series in the HTML (HyperText Markup Language) format, would be useful to prepare soil input files. Korean Soil Information System Processing Tool (KSISPT) was developed to aid generation of soil input data based on the KSIS database. Java was used to implement the tool that consists of a set of modules for parsing the HTML document of the KSIS, storing data required for preparing soil input file, calculating additional soil parameter, and writing soil input file to a local disk. Using the automated soil data preparation tool, about 940 soil input data were created for the DSSAT model and the ORYZA 2000 model, respectively. In combination with soil series distribution map at 30m resolution, spatial analysis of crop yield could be projected under climate change, which would help the development of adaptation strategies.

Obscene Material Searching Method in WWW (WWW상에서 음란물 검색기법)

  • 노경택;김경우;이기영;김규호
    • Journal of the Korea Society of Computer and Information
    • /
    • v.4 no.2
    • /
    • pp.1-7
    • /
    • 1999
  • World-Wide Web(WWW) is a protocol for changing information exchanges which is central to text documents in the existing network to make a multimedia data exchanges. It is possible for a beginner to search and access data which he wants to find as data were stored in the form of hypertext. The easiness for searching and accessing the multimedia data in WWW makes a important role for obscene materials to be toward generalization and multimedia and occurs social problems for them to be commercialized, while other researchers have actively studied the way to block effectively the site providing obscene materials for solving such problems. This paper presents and implements the blocking method for the sites having obscene material as it effectively search them. The proposed model was based on Link-Based information retrieval method and proved that it accomplished more efficient retrieval of relevant documents than probabilistic model when compared the one with the other which is known to generate the most correct results. The improvements in the average recall and precision ratio were shown as 12% and 8% respectively. Especially, the retrieval capability of relevant documents which include non-text data and have a few links increased highly.

  • PDF

Study of the Activation Plan for Rural Tourism of the Jeollabuk-do Using Big Data Analysis (빅데이터 분석을 통한 농촌관광 실태와 활성화 방안 연구: 전라북도를 중심으로)

  • Park, Ro Un;Lee, Ki Hoon
    • The Korean Journal of Community Living Science
    • /
    • v.27 no.spc
    • /
    • pp.665-679
    • /
    • 2016
  • This study examined the main factors for activating rural tourism of Jeollabuk-do using big data analysis. The tourism big data was gathered from public open data sources and social network services (SNS), and the analysis tools, 'Opinion Mining', 'Text Mining', and 'Social Network Analysis(SNA)' were used. The opinion mining and text mining analysis identified the key local contents of the 14 areas of Jeollabuk-do and the evaluations of customers on rural tourism. Social network analysis detected the relationships between their contents and determined the importance of the contents. The results of this research showed that each location in Jeollabuk-do had their specific contents attracting visitors and the number of contents affected the scale of tourists. In addition, the number of visitors might be large when their tourism contents were strongly correlated with the other contents. Hence, strong connections among their contents are a point to activate rural tourism. Social network analysis divided the contents into several clusters and derived the eigenvector centralities of the content nodes implying the importance of them in the network. Tourism was active when the nodes at high value of the eigenvector centrality were distributed evenly in every cluster; however the results were contrary when the nodes were located in a few clusters. This study suggests an action plan to extend rural tourism that develop valuable contents and connect the content clusters properly.

The Comparative Analysis of Inquiry Activity in Primary Science Curricular Materials of Korea and SCIIS (한국의 국민학교 자연 교과서와 SCIIS의 탐구 활동 버교 분석)

  • Kim, Jin-Yong;Chun, Wan-Ho;Hur, Myung
    • Journal of The Korean Association For Science Education
    • /
    • v.13 no.1
    • /
    • pp.56-65
    • /
    • 1993
  • The purpose of this study is to analyze the inquiry activities of SCIIS and Korea primary school science curricular meterials and to make suggestions for the improvement of inquiry learning based on the analysis The Scientific Inquiry Evaluation Inventory (SIEI: Myung Hur, 1984) was used to evaluate the inquiry activity content of the primary school "Science, Level-6" and "SCIIS, Level-6" textbooks. The results are as follows: 1) The inquiry activities of Korean science textbooks are stressing on gathering and organizing data, but rarely require students to formulate a hypothesis, to design an experiment. 2) The SCIIS textbooks relatively tended to put more weight on interpreting/ analysing data and hypothesizing/ designing experiments. 3)The Korean science textbooks had little concern about establishing hypothesis and designing experiments, interpreting / analysing data. 4) The SCIIS textbooks require students to perform a variety of inquiry skills when compare to Korean science textbooks. 5) Competition / Cooperation Scale checks the level of competition and cooperation among student teams inherent in science curricular materials. The result from each team is incorporated into the formation of a class result. The communication is required to formulate a synthesized class response, enhances cooperation among teams. The SCIIS(84%) is the higher than Korea(50%) in cooperation scale. 6) Korean science textbooks rarely require students to discuss about experiment when compare to SCIIS textbooks. 7) Korean science textbooks provide students with both inquiry problems and experimental procedure, or including answers SCIIS textbooks provide students with both inquiry problems and experimental procedure, or problems only. 8) The Korean textbooks emphasize demonstrating or verifying of the text while the SCIlS emphasize extending the content of the text in inquiry scope scsle. The inquiry pyramid which helps analysis the inquiry activity curriculum as a whole is one of type 1- the course is centered on gathering and organizing data. The SCIIS are better than the Korean science textbook in the light of proportion of interpreting / analysing data and hypothesizing / designing experiments.

  • PDF

A Study on the Smart Tourism Awareness through Bigdata Analysis

  • LEE, Song-Yi;LEE, Hwan-Soo
    • The Journal of Industrial Distribution & Business
    • /
    • v.11 no.5
    • /
    • pp.45-52
    • /
    • 2020
  • Purpose: In the 4th industrial revolution, services that incorporate various smart technologies in the tourism sector have begun to gain popularity. Accordingly, academic discussions on smart tourism have also started to become active in various fields. Despite recent research, the definition of smart tourism is still ambiguous, and it is not easy to differentiate its scope or characteristics from traditional tourism concepts. Thus, this study aims to analyze the perception of smart tourism exposed online to identify the current point of smart tourism in Korea and present the research direction for conceptualizing smart tourism suitable for the domestic situation. Research design, data, and methodology: This study analyzes the perception of smart tourism exposed online based on 20,198 news data from portal sites over the past six years. Data on words used with smart tourism were collected from the leading portal sites Naver, Daum, and Google. Text mining techniques were applied to identify the social awareness status of smart tourism. Network analysis was used to visualize the results between words related to smart tourism, and CONCOR analysis was conducted to derive clusters formed by words having similarity. Results: As a result of keyword analysis, the frequency of words related to the development and construction of smart tourism areas was high. The analysis of the centrality of the connection between words showed that the frequency of keywords was similar, and that the words "smartphones" and "China" had relatively high connection centrality. The results of network analysis and CONCOR indicated that words were formed into eight groups including related technologies, promotion, globalization, service introduction, innovation, regional society, activation, and utilization guide. The overall results of data analysis showed that the development of smart tourism cities was a noticeable issue. Conclusions: This study is meaningful in that it clearly reflects the differences in the perception of smart tourism between online and research trends despite various efforts to develop smart tourism in Korea. In addition, this study highlights the need to understand smart tourism concepts and enhance academic discussions. It is expected that such academic discussions will contribute to improving the competitiveness of smart tourism research in Korea.

The Implementation of the Solar Inverter Monitoring System using an AJAX (AJAX를 이용한 태양광 인버터의 모니터링 시스템 구현)

  • Kwon, Hyo-Sang;Yang, Oh
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.9
    • /
    • pp.1915-1922
    • /
    • 2012
  • In this paper, the Solar Inverter will be monitored by using the AJAX(Asynchronous JavaScript and XML). AJAX is the one of the technologies that can make the RIA(Rich Internet Application) with DHTML(Dynamic Hyper Text Makeup Language) and other java script technology. By using this, a strong application program that is comparable to the general application program can be made. With an existing data-processing technique, the request and response of data can't be processed dynamically on the same page. However, real-time monitoring of data and operation statuses can be confirmed by using the AJAX an asynchronous method of communication. Also without changing the page, the amount of data transmission used the AJAX with significantly small amounts of data to build a Solar Inverter monitoring system that is able to efficiently handle management and monitoring, operating all functions within one page.

Pan-Genomics of Lactobacillus plantarum Revealed Group-Specific Genomic Profiles without Habitat Association

  • Choi, Sukjung;Jin, Gwi-Deuk;Park, Jongbin;You, Inhwan;Kim, Eun Bae
    • Journal of Microbiology and Biotechnology
    • /
    • v.28 no.8
    • /
    • pp.1352-1359
    • /
    • 2018
  • Lactobacillus plantarum is a lactic acid bacterium that promotes animal intestinal health as a probiotic and is found in a wide variety of habitats. Here, we investigated the genomic features of different clusters of L. plantarum strains via pan-genomic analysis. We compared the genomes of 108 L. plantarum strains that were available from the NCBI GenBank database. These genomes were 2.9-3.7 Mbp in size and 44-45% in G+C content. A total of 8,847 orthologs were collected, and 1,709 genes were identified to be shared as core genes by all the strains analyzed. On the basis of SNPs from the core genes, 108 strains were clustered into five major groups (G1-G5) that are different from previous reports and are not clearly associated with habitats. Analysis of group-specific enriched or depleted genes revealed that G1 and G2 were rich in genes for carbohydrate utilization (${\text\tiny{L}}-arabinose$, ${\text\tiny{L}}-rhamnose$, and fructooligosaccharides) and that G3, G4, and G5 possessed more genes for the restriction-modification system and MazEF toxin-antitoxin. These results indicate that there are critical differences in gene content and survival strategies among genetically clustered L. plantarum strains, regardless of habitats.

Compression of Medical Examination Data Based on Modified Gamma-Coding (수정된 감마 코딩 기반 의료 검진 데이터 압축)

  • Ku, Dong Youn;Park, Jae Wook;Lee, Yong Kyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.2
    • /
    • pp.133-142
    • /
    • 2014
  • According to the development of medical information systems, shortened examination time per patient could increase the number of treatments, resulting in the rapid growth of the amount of medical data. Studies on how to efficiently compress and store medical text data of increasing patients are in progress. However, previous methods have the shortcoming of compressing medical text data as it is, resulting in low compression rate. This research tries to overcome the problem by using the gamma coding method which enables compression in bit unit. We propose a new compression scheme which encodes the deviations between measured values and normal range values. Furthermore, we suggest to use the previous value with the least deviation from the measurement as the standard value to encode that deviation. Even though the suggested methods are simple, they have high compression rates. Through performance evaluation, we show that the suggested methods are more efficient than the previous methods.