• Title/Summary/Keyword: 온라인 마이닝

Search Result 243, Processing Time 0.023 seconds

Development and Application of a Big Data Platform for Education Longitudinal Study Analysis (교육종단연구 분석을 위한 빅데이터 플랫폼 개발 및 적용)

  • Park, Jung;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.11-27
    • /
    • 2020
  • In this paper, we developed a big data platform to store, process, and analyze effectively on such education longitudinal study data. And it was applied to the Seoul Education Longitudinal Study(SELS) to confirm its usefulness. The developed platform consists of data preprocessing unit and data analysis unit. The data preprocessing unit 1) masking, 2) converts each item into a factor 3) normalizes / creates dummy variables 4) data derivation, and 5) data warehousing. The data analysis unit consists of OLAP and data mining(DM). In the multidimensional analysis, OLAP is performed after selecting a measure and designing a schema. The DM process involves variable selection, research model selection, data modification, parameter tuning, model training, model evaluation, and interpretation of the results. The data warehouse created through the preprocessing process on this platform can be shared by various researchers, and the continuous accumulation of data sets makes further analysis easier for subsequent researchers. In addition, policy-makers can access the SELS data warehouse directly and analyze it online through multi-dimensional analysis, enabling scientific decision making. To prove the usefulness of the developed platform, SELS data was built on the platform and OLAP and DM were performed by selecting the mathematics academic achievement as a measure, and various factors affecting the measurements were analyzed using DM techniques. This enabled us to quickly and effectively derive implications for data-based education policies.

Analysis of Use Behavior of Urban Park Users Expressing Depression on Social Media Using Text Mining Technique (텍스트 마이닝 기법을 활용한 SNS 상에서 우울감을 언급한 도시공원 이용자의 이용행태 분석)

  • Oh, Jiyeon;Nam, Seongwoo;Lee, Peter Sang-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.6
    • /
    • pp.319-328
    • /
    • 2022
  • The purpose of this study was to investigate the relationship between depression due to the COVID-19 pandemic and park use behaviors using on line posts. During the period of the pandemic prevention activities, text data containing both 'park' and 'depression' were collected from blogs and cafes in the search engine of Naver and Daum, then analyzed using Text Mining and Social Network techniques. As a result, the main usage behaviors of park users who mentioned depression were 'look', 'stroll(walk)' and 'eat'. Other types of behaviors were connected centering around 'look', one of the communication behaviors. Also, from CONCOR analysis, as the cluster referred from communication behavior and dynamic behavior was formed as a single behavior type, it was considered park users with depression perceived the park as the space for communication and physical activities. As the spread of COVID-19 caused the restriction of communication activities, the users might consider parks as one of the solutions. In addition, it was considered that passive usage behaviors have prevailed rather than active ones due to the depression. Resulting outcomes would be useful to plan helpful urban park for citizens. It is necessary to further analyze the park use behavior of users in relation to the period of before/after the COVID-19 pandemic and the existence/nonexistence of depression.

Analysis of trends in domestic research on addiction using text mining and CONCOR (텍스트마이닝과 CONCOR을 활용한 중독 관련 국내 연구 동향 분석)

  • Sol-Ji Lee;Ki-Hyok Youn
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.6
    • /
    • pp.99-110
    • /
    • 2023
  • This study analyzed 817 articles published in Korean professional journals over the past three years, from 2020 to 2022, using text mining techniques to identify trends in addiction research in Korea and explore development directions. The analysis results are as follows. First, as a result of the analysis of the top keywords, online addiction studies such as smartphones, games, Internet, gambling, and relationship addiction were prominent as the top keywords. Second, as a result of TF-IDF analysis, many addiction studies related to behavioral addiction such as smartphones, games, the Internet, and work addiction have been conducted over the past three years, and in particular, there are many studies on addiction problems such as smartphones, games, and the Internet that have not yet been clinically diagnosed as addiction problems. This is the same as the result of word frequency analysis, and it can be interpreted that recent studies have been remarkably conducted on more diverse addiction problems. Third, the 2-gram analysis shows that words that mainly correspond to behavioral addiction, such as smartphones, games, and the Internet, appear side by side with the keyword addiction, and among them, words paired with smartphones are mentioned a lot in research papers and are being studied. Fourth, as a result of the CONCOR analysis, there were five clusters: a study on universal addiction issues such as alcohol use disorders and the Internet, a study of recovery on drug and gambling addiction, a study on mobile devices and media addiction, a study on the latest trends related to behavioral addiction, and other addiction issues. Finally, based on the results of this study, a direction for future addiction-related research was suggested.

XML Document Analysis based on Similarity (유사성 기반 XML 문서 분석 기법)

  • Lee, Jung-Won;Lee, Ki-Ho
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.6
    • /
    • pp.367-376
    • /
    • 2002
  • XML allows users to define elements using arbitrary words and organize them in a nested structure. These features of XML offer both challenges and opportunities in information retrieval and document management. In this paper, we propose a new methodology for computing similarity considering XML semantics - meanings of the elements and nested structures of XML documents. We generate extended-element vectors, using thesaurus, to normalize synonyms, compound words, and abbreviations and build similarity matrix using them. And then we compute similarity between XML elements. We also discover and minimize XML structure using automata(NFA(Nondeterministic Finite Automata) and DFA(Deterministic Finite automata). We compute similarity between XML structures using similarity matrix between elements and minimized XML structures. Our methodology considering XML semantics shows 100% accuracy in identifying the category of real documents from on-line bookstore.

A Korean Product Review Analysis System Using a Semi-Automatically Constructed Semantic Dictionary (반자동으로 구축된 의미 사전을 이용한 한국어 상품평 분석 시스템)

  • Myung, Jae-Seok;Lee, Dong-Joo;Lee, Sang-Goo
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.6
    • /
    • pp.392-403
    • /
    • 2008
  • User reviews are valuable information that can be used for various purposes. In particular, the product reviews on online shopping sites are important information which can directly affect the purchasing decision of the customers. In this paper, we present our design and implementation of a system for summarizing the customer's opinion and the features of each product by analyzing reviews on a commercial shopping site. During the analysis process, several natural language processing(NLP) techniques and the semantic dictionary were used. The semantic dictionary contains vocabularies that are used to express product features and customer's opinions. And it was constructed in semi-automatic way with the help of the tool we implemented. Furthermore, we discuss how to handle the vocabularies that have different meanings according to the context. We analyzed 1796 reviews about 20 products of 2 categories collected from an actual shopping site and implemented a novel ranking system. We obtained 88.94% for precision and 47.92% for recall on extracting opinion expression, which means our system can be applicable for real use.

A Customer Profile Model for Collaborative Recommendation in e-Commerce (전자상거래에서의 협업 추천을 위한 고객 프로필 모델)

  • Lee, Seok-Kee;Jo, Hyeon;Chun, Sung-Yong
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.5
    • /
    • pp.67-74
    • /
    • 2011
  • Collaborative recommendation is one of the most widely used methods of automated product recommendation in e-Commerce. For analyzing the customer's preference, traditional explicit ratings are less desirable than implicit ratings because it may impose an additional burden to the customers of e-commerce companies which deals with a number of products. Cardinal scales generally used for representing the preference intensity also ineffective owing to its increasing estimation errors. In this paper, we propose a new way of constructing the ordinal scale-based customer profile for collaborative recommendation. A Web usage mining technique and lexicographic consensus are employed. An experiment shows that the proposed method performs better than existing CF methodologies.

Analysis of User Requirements Prioritization Using Text Mining : Focused on Online Game (텍스트마이닝을 활용한 사용자 요구사항 우선순위 도출 방법론 : 온라인 게임을 중심으로)

  • Jeong, Mi Yeon;Heo, Sun-Woo;Baek, Dong Hyun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.3
    • /
    • pp.112-121
    • /
    • 2020
  • Recently, as the internet usage is increasing, accordingly generated text data is also increasing. Because this text data on the internet includes users' comments, the text data on the Internet can help you get users' opinion more efficiently and effectively. The topic of text mining has been actively studied recently, but it primarily focuses on either the content analysis or various improving techniques mostly for the performance of target mining algorithms. The objective of this study is to propose a novel method of analyzing the user's requirements by utilizing the text-mining technique. To complement the existing survey techniques, this study seeks to present priorities together with efficient extraction of customer requirements from the text data. This study seeks to identify users' requirements, derive the priorities of requirements, and identify the detailed causes of high-priority requirements. The implications of this study are as follows. First, this study tried to overcome the limitations of traditional investigations such as surveys and VOCs through text mining of online text data. Second, decision makers can derive users' requirements and prioritize without having to analyze numerous text data manually. Third, user priorities can be derived on a quantitative basis.

An Analysis of Online Black Market: Using Data Mining and Social Network Analysis (온라인 해킹 불법 시장 분석: 데이터 마이닝과 소셜 네트워크 분석 활용)

  • Kim, Minsu;Kim, Hee-Woong
    • The Journal of Information Systems
    • /
    • v.29 no.2
    • /
    • pp.221-242
    • /
    • 2020
  • Purpose This study collects data of the recently activated online black market and analyzes it to present a specific method for preparing for a hacking attack. This study aims to make safe from the cyber attacks, including hacking, from the perspective of individuals and businesses by closely analyzing hacking methods and tools in a situation where they are easily shared. Design/methodology/approach To prepare for the hacking attack through the online black market, this study uses the routine activity theory to identify the opportunity factors of the hacking attack. Based on this, text mining and social network techniques are applied to reveal the most dangerous areas of security. It finds out suitable targets in routine activity theory through text mining techniques and motivated offenders through social network analysis. Lastly, the absence of guardians and the parts required by guardians are extracted using both analysis techniques simultaneously. Findings As a result of text mining, there was a large supply of hacking gift cards, and the demand to attack sites such as Amazon and Netflix was very high. In addition, interest in accounts and combos was in high demand and supply. As a result of social network analysis, users who actively share hacking information and tools can be identified. When these two analyzes were synthesized, it was found that specialized managers are required in the areas of proxy, maker and many managers are required for the buyer network, and skilled managers are required for the seller network.

Implementation of Customer Behavior Evaluation System Using Real-time Web Log Stream Data (실시간 웹로그 스트림데이터를 이용한 고객행동평가시스템 구현)

  • Lee, Hanjoo;Park, Hongkyu;Lee, Wonsuk
    • The Journal of Korean Institute of Information Technology
    • /
    • v.16 no.12
    • /
    • pp.1-11
    • /
    • 2018
  • Recently, the volume of online shopping market continues to be fast-growing, that is important to provide customized service based on customer behavior evaluation analysis. The existing systems only provide analysis data on the profiles and behaviors of the consumers, and there is a limit to the processing in real time due to disk based mining. There are problems of accuracy and system performance problems to apply existing systems to web services that require real-time processing and analysis. Therefore, The system proposed in this paper analyzes the web click log streams generated in real time to calculate the concentration level of specific products and finds interested customers which are likely to purchase the products, and provides and intensive promotions to interested customers. And we verify the efficiency and accuracy of the proposed system.

Keyword Analysis of Data Technology Using Big Data Technique (빅데이터 기법을 활용한 Data Technology의 키워드 분석)

  • Park, Sung-Uk
    • Journal of Korea Technology Innovation Society
    • /
    • v.22 no.2
    • /
    • pp.265-281
    • /
    • 2019
  • With the advent of the Internet-based economy, the dramatic changes in consumption patterns have been witnessed during the last decades. The seminal change has led by Data Technology, the integrated platform of mobile, online, offline and artificial intelligence, which remained unchallenged. In this paper, I use data analysis tool (TexTom) in order to articulate the definitfite notion of data technology from Internet sources. The data source is collected for last three years (November 2015 ~ November 2018) from Google and Naver. And I have derived several key keywords related to 'Data Technology'. As a result, it was found that the key keyword technologies of Big Data, O2O (Offline-to-Online), AI, IoT (Internet of things), and cloud computing are related to Data Technology. The results of this study can be used as useful information that can be referred to when the Data Technology age comes.