• 제목/요약/키워드: Text data

Search Result 2,959, Processing Time 0.029 seconds

An Analysis of School Life Sensibility of Students at Korea National College of Agriculture and Fisheries Using Unstructured Data Mining(1) (비정형 데이터 마이닝을 활용한 한국농수산대학 재학생의 학교생활 감성 분석(1))

  • Joo, J.S.;Lee, S.Y.;Kim, J.S.;Song, C.Y.;Shin, Y.K.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.21 no.1
    • /
    • pp.99-114
    • /
    • 2019
  • In this study we examined the preferences of eight college living factors for students at Korea National College of Agriculture and Fisheries(KNCAF). Analytical techniques of unstructured data used opinion mining and text mining techniques, and the analysis results of text mining were visualized as word cloud. The college life factors included eight topics that were closely related to students: 'my present', 'my 10 years later', 'friendship', 'college festival', 'student restaurant', 'college dormitory', 'KNCAF', and 'long-term field practice'. In the text submitted by the students, we have established a dictionary of positive words and negative words to evaluate the preference by classifying the emotions of positive and negative. As a result, KNCAF students showed more than 85% positive emotions about the theme of 'student restaurant' and 'friendship'. But students' positive feelings about 'long-term field practice' and 'college dormitory' showed the lowest satisfaction rate of not exceeding 60%. The rest of the topics showed satisfaction of 69.3~74.2%. The gender differences showed that the positive emotions of male students were high in the topics of 'my present', 'my 10 years later', 'friendship', 'college dormitory' and 'long-term field practice'. And those of female were high in 'college festival', 'student restaurant' and 'KNCAF'. In addition, using text mining technique, the main words of positive and negative words were extracted, and word cloud was created to visualize the results.

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.

The Design and Implementation of Two-Way Search Algorithm using Mobile Instant Messenger (모바일 인스턴스 메신저를 이용한 양방향 검색 알고리즘의 설계 및 구현)

  • Lee, Daesik;Jang, Chungryong;Lee, Yongkwon
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.11 no.2
    • /
    • pp.55-66
    • /
    • 2015
  • In this paper, we design and implement a two-way search algorithm that can provide a customized service through the user with real-time two-way communication using a mobile instant messaging service. Therefore, we design and implement the automative search system which enables delivering message to each user mobile terminal from a plurality of relay mobile terminals by utilizing the mobile instant messenger, not to deliver a message from the main server to the mobile instant messenger user directly. Two-way search system using the mobile instant messenger can be immediately collect the user's response is easy to identify the orientation of each user, and thus can be provided to establish a differentiated service plan. Also, It provides a number of services(text, photos, videos, etc) in real-time information to the user by utilizing the mobile instant messenger service without the need to install a separate application. Experiment results, data processing speed of the category processing way to search for the data of the DB server from a user mobile terminal is about 7.06sec, data processing number per minute is about 13 times. The data processing speed of the instruction processing way is about 3.10sec, data processing number per minute is about 10 times. The data processing speed of the natural language processing way is about 5.13sec, per data processing number per minute is about 7 times. Therefore in category processing way, command processing way and natural language processing way, instruction processing way is the most excellent in aspect of data processing speed, otherwise in aspect of per data processing number per minute, the category processing way is the best method.

Image Features Based Secure Access Control for Data Content Protection (데이터 내용 보호를 위한 이미지 특징 기반의 보안 접근 제어)

  • Ha, Sunju;Park, Daechul
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.171-180
    • /
    • 2013
  • Data security is always an important issue. In particular, the current emerging cloud computing system inevitably raises the issue of data security. However, data security is no longer safe with a simple way, but requires rather advanced method to secure the data. In this paper, instead of exploiting the existing text-based cryptography approach an image-based access control of data content is studied to present a higher level of data security. Color key chain is generated both using histogram value of the original image, and the location information and featured color information extracted by geometric transformation to form the security key to access secure data content. Finally, the paper addresses design interface and implementation for data content access control for evaluation of the proposed scheme.

Informal Quality Data Analysis via Sentimental analysis and Word2vec method (감성분석과 Word2vec을 이용한 비정형 품질 데이터 분석)

  • Lee, Chinuk;Yoo, Kook Hyun;Mun, Byeong Min;Bae, Suk Joo
    • Journal of Korean Society for Quality Management
    • /
    • v.45 no.1
    • /
    • pp.117-128
    • /
    • 2017
  • Purpose: This study analyzes automobile quality review data to develop alternative analytical method of informal data. Existing methods to analyze informal data are based mainly on the frequency of informal data, however, this research tries to use correlation information of each informal data. Method: After sentimental analysis to acquire the user information for automobile products, three classification methods, that is, $na{\ddot{i}}ve$ Bayes, random forest, and support vector machine, were employed to accurately classify the informal user opinions with respect to automobile qualities. Additionally, Word2vec was applied to discover correlated information about informal data. Result: As applicative results of three classification methods, random forest method shows most effective results compared to the other classification methods. Word2vec method manages to discover closest relevant data with automobile components. Conclusion: The proposed method shows its effectiveness in terms of accuracy and sensitivity on the analysis of informal quality data, however, only two sentiments (positive or negative) can be categorized due to human errors. Further studies are required to derive more sentiments to accurately classify informal quality data. Word2vec method also shows comparative results to discover the relevance of components precisely.

A Production Traceability Information Gathering System based on Handwritten Data Digitalization Technology in Agro-livestock Products (수기정보 전자화 기술 기반의 농축산물 생산이력정보 수집 시스템)

  • Son, Bong-Ki
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.10
    • /
    • pp.4632-4641
    • /
    • 2011
  • The detailed production traceability information is a fundamental element in successful introduction and revitalization of traceability system. In this paper, we propose a production traceability information gathering system which is based on handwritten data digitalization technology in agro-livestock products. By the proposed system, we can effectively gather the detailed production traceability information with digital pen and the management ledger of paper document type by only writing the ledger. The server of the system generates the same digital image as the ledger and converts the handwritten data into digital text to insert the data into the database. Because the system is superior to data gathering system based on PC, PDA and touch screen in mobility, usability, data input speed, suitability in agro-livestock environment, it is possible to effectively gather traceability information of high quality by users even if they have low information ability and insufficient time to input data. We expect that the handwritten data digitalization technology is used to gather document based information in stage of manufacturing, distribution and marketing. In addition, this technology is applied to implementing advanced traceability system with RFID/USN based systems.

A Design on Informal Big Data Topic Extraction System Based on Spark Framework (Spark 프레임워크 기반 비정형 빅데이터 토픽 추출 시스템 설계)

  • Park, Kiejin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.521-526
    • /
    • 2016
  • As on-line informal text data have massive in its volume and have unstructured characteristics in nature, there are limitations in applying traditional relational data model technologies for data storage and data analysis jobs. Moreover, using dynamically generating massive social data, social user's real-time reaction analysis tasks is hard to accomplish. In the paper, to capture easily the semantics of massive and informal on-line documents with unsupervised learning mechanism, we design and implement automatic topic extraction systems according to the mass of the words that consists a document. The input data set to the proposed system are generated first, using N-gram algorithm to build multiple words to capture the meaning of the sentences precisely, and Hadoop and Spark (In-memory distributed computing framework) are adopted to run topic model. In the experiment phases, TB level input data are processed for data preprocessing and proposed topic extraction steps are applied. We conclude that the proposed system shows good performance in extracting meaningful topics in time as the intermediate results come from main memories directly instead of an HDD reading.

A Study of Path-based Retrieval for JSON Data Using Suffix Arrays (접미사 배열을 이용한 JSON 데이터의 경로 기반 검색에 대한 연구)

  • Kim, Sung Wan
    • Journal of Creative Information Culture
    • /
    • v.7 no.3
    • /
    • pp.157-165
    • /
    • 2021
  • As the use of various application services utilizing Web and IoT and the need for large amounts of data management expand accordingly, the importance of efficient data expression and exchange scheme and data query processing is increasing. JSON, characterized by its simplicity, is being used in various fields as a format for data exchange and data storage instead of XML, which is a standard data expression and exchange language on the Web. This means that it is important to develop indexing and query processing techniques to effectively access and search large amounts of data expressed in JSON. Therefore, in this paper, we modeled JSON data with a hierarchical structure in a tree form, and proposed indexing and query processing using the path concept. In particular, we designed an index structure using a suffix array widely used in text search and introduced simple and complex path-based JSON data query processing methods.

Visualizing Article Material using a Big Data Analytical Tool R Language (빅데이터 분석 도구 R 언어를 이용한 논문 데이터 시각화)

  • Nam, Soo-Tai;Shin, Seong-Yoon;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.326-327
    • /
    • 2021
  • Newly, big data utilization has been widely interested in a wide variety of industrial fields. Big data analysis is the process of discovering meaningful new correlations, patterns, and trends in large volumes of data stored in data stores and creating new value. Thus, most big data analysis technology methods include data mining, machine learning, natural language processing, and pattern recognition used in existing statistical computer science. Also, using the R language, a big data tool, we can express analysis results through various visualization functions using pre-processing text data. The data used in this study were analyzed for 29 papers in a specific journal. In the final analysis results, the most frequently mentioned keyword was "Research", which ranked first 743 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

  • PDF

Analysis of Research Trends in Data Curation Using Text Mining Techniques (텍스트 마이닝을 활용한 국외 데이터 큐레이션 연구 동향 분석)

  • Jaeeun Choi
    • Journal of the Korean Society for information Management
    • /
    • v.41 no.3
    • /
    • pp.85-107
    • /
    • 2024
  • This study analyzes trends in data curation research. A total of 1,849 scholarly records were extracted from Scopus and WoS, with 1,797 papers selected after removing duplicates. Titles, keywords, and abstracts were analyzed through keyword frequency analysis, LDA topic modeling, and network analysis. Frequent keywords like 'research' and 'information' suggest that data curation is widely applied in medical research, biomedical research, data management, and infrastructure. LDA modeling identified five main topics: improving medical data quality, enhancing big data management, managing scientific data and repositories, annotating and modeling medical data, and gene/protein database research. Network analysis showed that 'analysis' was central in global discussions, while 'gene' and 'system' were locally central. These findings highlight the importance of data curation in various research areas.