• Title/Summary/Keyword: semantic mining

Search Result 220, Processing Time 0.033 seconds

Online Clustering Algorithms for Semantic-Rich Network Trajectories

  • Roh, Gook-Pil;Hwang, Seung-Won
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.4
    • /
    • pp.346-353
    • /
    • 2011
  • With the advent of ubiquitous computing, a massive amount of trajectory data has been published and shared in many websites. This type of computing also provides motivation for online mining of trajectory data, to fit user-specific preferences or context (e.g., time of the day). While many trajectory clustering algorithms have been proposed, they have typically focused on offline mining and do not consider the restrictions of the underlying road network and selection conditions representing user contexts. In clear contrast, we study an efficient clustering algorithm for Boolean + Clustering queries using a pre-materialized and summarized data structure. Our experimental results demonstrate the efficiency and effectiveness of our proposed method using real-life trajectory data.

Development of Semantic-Based XML Mining for Intelligent Knowledge Services (지능형 지식서비스를 위한 의미기반 XML 마이닝 시스템 연구)

  • Paik, Juryon;Kim, Jinyeong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.59-62
    • /
    • 2018
  • XML을 대상으로 하는 연구가 최근 5~6년 사이에 꾸준한 증가를 보이며 이루어지고 있지만 대다수의 연구들은 XML을 구성하고 있는 엘리먼트 자체에 대한 통계적인 모델을 기반으로 이루어졌다. 이는 XML의 고유 속성인 트리 구조에서의 텍스트, 문장, 문장 구성 성분이 가지고 있는 의미(semantics)가 명시적으로 분석, 표현되어 사용되기 보다는 통계적인 방법으로만 데이터의 발생을 계산하여 사용자가 요구한 질의에 대한 결과, 즉 해당하는 정보 및 지식을 제공하는 형식이다. 지능형 지식서비스 제공을 위한 환경에 부합하기 위한 정보 추출은, 텍스트 및 문장의 구성 요소를 분석하여 문서의 내용을 단순한 단어 집합보다는 풍부한 의미를 내포하는 형식으로 표현함으로써 보다 정교한 지식과 정보의 추출이 수행될 수 있도록 하여야 한다. 본 연구는 범람하는 XML 데이터로부터 사용자 요구의 의미까지 파악하여 정확하고 다양한 지식을 추출할 수 있는 방법을 연구하고자 한다. 레코드 구조가 아닌 트리 구조 데이터로부터 의미 추출이 가능한 효율적인 마이닝 기법을 진일보시킴으로써 다양한 사용자 중심의 서비스 제공을 최종 목적으로 한다.

  • PDF

Systematic Review on Chatbot Techniques and Applications

  • Park, Dong-Min;Jeong, Seong-Soo;Seo, Yeong-Seok
    • Journal of Information Processing Systems
    • /
    • v.18 no.1
    • /
    • pp.26-47
    • /
    • 2022
  • Chatbots were an important research subject in the past. A chatbot is a computer program or an artificial intelligence program that participates in a conversation via auditory or textual methods. As the research on chatbots progressed, some important issues regarding them changed over time. Therefore, it is necessary to review the technology with a focus on recent advancements and core research technologies. In this paper, we introduce five different chatbot technologies: natural language processing, pattern matching, semantic web, data mining, and context-aware computer. We also introduce the latest technology for the chatbot researchers to recognize the present situation and channelize it in the right direction.

Text Mining Analysis of the Online Counseling Contents of Nursery School Teachers (텍스트 마이닝을 활용한 어린이집교사 온라인 상담의 내용분석)

  • Jeon, Ji Won;Lim, Sun Ah;Jung, Yunhee
    • Korean Journal of Childcare and Education
    • /
    • v.16 no.6
    • /
    • pp.253-272
    • /
    • 2020
  • Objective: This study aimed to analyze the counseling contents of daycare center teachers by using text mining and semantic network analysis methods to find the necessary support directions for daycare teachers and to improve the quality of child-care. Methods: Five hundred thirteen cases of counseling recorded on the open bulletin board of online counseling (Naver Bands for Nursery Teacher Counseling) were collected, and frequency analysis, centrality solidarity analysis, and machine learning-based topic analysis were conducted using the NetMiner4.3 program. Results: First, 'teacher-to-child ratio' was highest in the frequency. Second, 'colleagues' were all high in all centrality analysis. Third, machine learning-based topical analysis shows that the topics were categorized as subjects about 'childcare and education', 'working environment that supports professional development' and 'working condition', and among them, 'first-time teacher concerns' accounted for 44% of the total counseling content. Conclusion/Implications: This study implied that it is necessary to provide high-quality child-care and education to infants by lowering the 'teacher-to-child ratio', and a systematic program is needed to help improve effective communication skills in interpersonal relationships such as between parents, fellow teachers, and principals. In addition, self-development and efforts to improve teachers expertise should be prioritized in order to improve infant care quality and quality of teachers.

MSFM: Multi-view Semantic Feature Fusion Model for Chinese Named Entity Recognition

  • Liu, Jingxin;Cheng, Jieren;Peng, Xin;Zhao, Zeli;Tang, Xiangyan;Sheng, Victor S.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1833-1848
    • /
    • 2022
  • Named entity recognition (NER) is an important basic task in the field of Natural Language Processing (NLP). Recently deep learning approaches by extracting word segmentation or character features have been proved to be effective for Chinese Named Entity Recognition (CNER). However, since this method of extracting features only focuses on extracting some of the features, it lacks textual information mining from multiple perspectives and dimensions, resulting in the model not being able to fully capture semantic features. To tackle this problem, we propose a novel Multi-view Semantic Feature Fusion Model (MSFM). The proposed model mainly consists of two core components, that is, Multi-view Semantic Feature Fusion Embedding Module (MFEM) and Multi-head Self-Attention Mechanism Module (MSAM). Specifically, the MFEM extracts character features, word boundary features, radical features, and pinyin features of Chinese characters. The acquired font shape, font sound, and font meaning features are fused to enhance the semantic information of Chinese characters with different granularities. Moreover, the MSAM is used to capture the dependencies between characters in a multi-dimensional subspace to better understand the semantic features of the context. Extensive experimental results on four benchmark datasets show that our method improves the overall performance of the CNER model.

Ontology and Text Mining-based Advanced Historical People Finding Service (온톨로지와 텍스트 마이닝 기반 지능형 역사인물 검색 서비스)

  • Jeong, Do-Heon;Hwang, Myunggwon;Cho, Minhee;Jung, Hanmin;Yoon, Soyoung;Kim, Kyungsun;Kim, Pyung
    • Journal of Internet Computing and Services
    • /
    • v.13 no.5
    • /
    • pp.33-43
    • /
    • 2012
  • Semantic web is utilized to construct advanced information service by using semantic relationships between entities. Text mining can be applied to generate semantic relationships from unstructured data resources. In this study, ontology schema guideline, ontology instance generation, disambiguation of same name by text mining and advanced historical people finding service by reasoning have been proposed. Various relationships between historical event, organization, people, which are created by domain experts, are linked to literatures of National Institute of Korean History (NIKH). It improves the effectiveness of user access and proposes advanced people finding service based on relationships. In order to distinguish between people with the same name, we compares the structure and edge, nodes of personal social network. To provide additional information, external resources including thesaurus and web are linked to all of internal related resources as well.

A Study of Consumer Perception on Fashion Show Using Big Data Analysis (빅데이터를 활용한 패션쇼에 대한 소비자 인식 연구)

  • Kim, Da Jeong;Lee, Seunghee
    • Journal of Fashion Business
    • /
    • v.23 no.3
    • /
    • pp.85-100
    • /
    • 2019
  • This study examines changes in consumer perceptions of fashion shows, which are critical elements in the apparel industry and a means to represent a brand's image and originality. For this purpose, big data in clothing marketing, text mining, semantic network analysis techniques were applied. This study aims to verify the effectiveness and significance of fashion shows in an effort to give directions for their future utilization. The study was conducted in two major stages. First, data collection with the key word, "fashion shows," was conducted across websites, including Naver and Daum between 2015 and 2018. The data collection period was divided into the first- and second-half periods. Next, Textom 3.0 was utilized for data refinement, text mining, and word clouding. The Ucinet 6.0 and NetDraw, were used for semantic network analysis, degree centrality, CONCOR analysis and also visualization. The level of interest in "models" was found to be the highest among the perception factors related to fashion shows in both periods. In the first-half period, the consumer interests focused on detailed visual stimulants such as model and clothing while in the second-half period, perceptions changed as the value of designers and brands were increasingly recognized over time. The findings of this study can be utilized as a tool to evaluate fashion shows, the apparel industry sectors, and the marketing methods. Additionally, it can also be used as a theoretical framework for big data analysis and as a basis of strategies and research in industrial developments.

Social perception of the Arduino lecture as seen in big data (빅데이터 분석을 통한 아두이노 강의에 대한 사회적 인식)

  • Lee, Eunsang
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.6
    • /
    • pp.935-945
    • /
    • 2021
  • The purpose of this study is to analyze the social perception of Arduino lecture using big data analysis method. For this purpose, data from January 2012 to May 2021 were collected using the Textom website as a keyword searched for 'arduino + lecture' in blogs, cafes, and news channels of NAVER website. The collected data was refined using the Textom website, and text mining analysis and semantic network analysis were performed by opening the Textom website, Ucinet 6, and Netdraw programs. As a result of text mining analysis such as frequency analysis, TF-IDF analysis, and degree centrality it was confirmed that 'education' and 'coding' were the top keywords. As a result of CONCOR analysis for semantic network analysis, four clusters can be identified: 'Arduino-related education', 'Physical computing-related lecture', 'Arduino special lecture', and 'GUI programming'. Through this study, it was possible to confirm various meaningful social perceptions of the general public in relation to Arduino lecture on the Internet. The results of this study will be used as data that provides meaningful implications for instructors preparing for Arduino lectures, researchers studying the subject, and policy makers who establish software education or coding education and related policies.

Big Data Analysis on the Perception of Home Training According to the Implementation of COVID-19 Social Distancing

  • Hyun-Chang Keum;Kyung-Won Byun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.3
    • /
    • pp.211-218
    • /
    • 2023
  • Due to the implementation of COVID-19 distancing, interest and users in 'home training' are rapidly increasing. Therefore, the purpose of this study is to identify the perception of 'home training' through big data analysis on social media channels and provide basic data to related business sector. Social media channels collected big data from various news and social content provided on Naver and Google sites. Data for three years from March 22, 2020 were collected based on the time when COVID-19 distancing was implemented in Korea. The collected data included 4,000 Naver blogs, 2,673 news, 4,000 cafes, 3,989 knowledge IN, and 953 Google channel news. These data analyzed TF and TF-IDF through text mining, and through this, semantic network analysis was conducted on 70 keywords, big data analysis programs such as Textom and Ucinet were used for social big data analysis, and NetDraw was used for visualization. As a result of text mining analysis, 'home training' was found the most frequently in relation to TF with 4,045 times. The next order is 'exercise', 'Homt', 'house', 'apparatus', 'recommendation', and 'diet'. Regarding TF-IDF, the main keywords are 'exercise', 'apparatus', 'home', 'house', 'diet', 'recommendation', and 'mat'. Based on these results, 70 keywords with high frequency were extracted, and then semantic indicators and centrality analysis were conducted. Finally, through CONCOR analysis, it was clustered into 'purchase cluster', 'equipment cluster', 'diet cluster', and 'execute method cluster'. For the results of these four clusters, basic data on the 'home training' business sector were presented based on consumers' main perception of 'home training' and analysis of the meaning network.

A Study on the User Experience at Unmanned Cafe Using Big Data Analsis: Focus on text mining and semantic network analysis (빅데이터를 활용한 무인카페 소비자 인식에 관한 연구: 텍스트 마이닝과 의미연결망 분석을 중심으로)

  • Seung-Yeop Lee;Byeong-Hyeon Park;Jang-Hyeon Nam
    • Asia-Pacific Journal of Business
    • /
    • v.14 no.3
    • /
    • pp.241-250
    • /
    • 2023
  • Purpose - The purpose of this study was to investigate the perception of 'unmanned cafes' on the network through big data analysis, and to identify the latest trends in rapidly changing consumer perception. Based on this, I would like to suggest that it can be used as basic data for the revitalization of unmanned cafes and differentiated marketing strategies. Design/methodology/approach - This study collected documents containing unmanned cafe keywords for about three years, and the data collected using text mining techniques were analyzed using methods such as keyword frequency analysis, centrality analysis, and keyword network analysis. Findings - First, the top 10 words with a high frequency of appearance were identified in the order of unmanned cafes, unmanned cafes, start-up, operation, coffee, time, coffee machine, franchise, and robot cafes. Second, visualization of the semantic network confirmed that the key keyword "unmanned cafe" was at the center of the keyword cluster. Research implications or Originality - Using big data to collect and analyze keywords with high web visibility, we tried to identify new issues or trends in unmanned cafe recognition, which consists of keywords related to start-ups, mainly deals with topics related to start-ups when unmanned cafes are mentioned on the network.