• Title/Summary/Keyword: 의미 연관성 기반 추출

Search Result 50, Processing Time 0.03 seconds

Development of Extracting System for Meaning·Subject Related Social Topic using Deep Learning (딥러닝을 통한 의미·주제 연관성 기반의 소셜 토픽 추출 시스템 개발)

  • Cho, Eunsook;Min, Soyeon;Kim, Sehoon;Kim, Bonggil
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.14 no.4
    • /
    • pp.35-45
    • /
    • 2018
  • Users are sharing many of contents such as text, image, video, and so on in SNS. There are various information as like as personal interesting, opinion, and relationship in social media contents. Therefore, many of recommendation systems or search systems are being developed through analysis of social media contents. In order to extract subject-related topics of social context being collected from social media channels in developing those system, it is necessary to develop ontologies for semantic analysis. However, it is difficult to develop formal ontology because social media contents have the characteristics of non-formal data. Therefore, we develop a social topic system based on semantic and subject correlation. First of all, an extracting system of social topic based on semantic relationship analyzes semantic correlation and then extracts topics expressing semantic information of corresponding social context. Because the possibility of developing formal ontology expressing fully semantic information of various areas is limited, we develop a self-extensible architecture of ontology for semantic correlation. And then, a classifier of social contents and feed back classifies equivalent subject's social contents and feedbacks for extracting social topics according semantic correlation. The result of analyzing social contents and feedbacks extracts subject keyword, and index by measuring the degree of association based on social topic's semantic correlation. Deep Learning is applied into the process of indexing for improving accuracy and performance of mapping analysis of subject's extracting and semantic correlation. We expect that proposed system provides customized contents for users as well as optimized searching results because of analyzing semantic and subject correlation.

Forecasting of Customer's Purchasing Intention Using Support Vector Machine (Support Vector Machine 기법을 이용한 고객의 구매의도 예측)

  • Kim, Jin-Hwa;Nam, Ki-Chan;Lee, Sang-Jong
    • Information Systems Review
    • /
    • v.10 no.2
    • /
    • pp.137-158
    • /
    • 2008
  • Rapid development of various information technologies creates new opportunities in online and offline markets. In this changing market environment, customers have various demands on new products and services. Therefore, their power and influence on the markets grow stronger each year. Companies have paid great attention to customer relationship management. Especially, personalized product recommendation systems, which recommend products and services based on customer's private information or purchasing behaviors in stores, is an important asset to most companies. CRM is one of the important business processes where reliable information is mined from customer database. Data mining techniques such as artificial intelligence are popular tools used to extract useful information and knowledge from these customer databases. In this research, we propose a recommendation system that predicts customer's purchase intention. Then, customer's purchasing intention of specific product is predicted by using data mining techniques using receipt data set. The performance of this suggested method is compared with that of other data mining technologies.

A Study on the Expansion of Workflow for the Collection of Surface Web-based OSINT(Open Source Intelligence) (표면 웹기반 공개정보 수집을 위한 워크플로우 확장 연구)

  • Lee, SuGyeong;Choi, Eunjung;Kim, Jiyeon;Lee, Insoo;Lee, Seunghoon;Kim, Myuhngjoo
    • Journal of Digital Convergence
    • /
    • v.20 no.4
    • /
    • pp.367-376
    • /
    • 2022
  • In traditional criminal cases, there is a limit to information collection because information on the subject of investigation is provided only with personal information held by the national organization of legal. Surface web-based OSINT(Open Source Intelligence), including SNS and portal sites that can be searched by general search engines, can be used for meaningful profiling for criminal investigations. The Korean-style OSINT workflow can effectively profile based on OSINT, but in the case of individuals, OSINT that can be collected is limited because it begins with "name", and the reliability is limited, such as collecting information of the persons with the same name. In order to overcome these limitations, this paper defines information related to individuals, i.e., equivalent information, and enables efficient and accurate information collection based on this. Therefore, we present an improved workflow that can extract information related to a specific person, ie., equivalent information, from OSINT. For this purpose, different workflows are presented according to the person's profile. Through this, effective profiling of a person (individuals) is possible, thereby increasing reliability in collecting investigation information. According to this study, in the future, by developing a system that can automate the analysis process of information collected using artificial intelligence technology, it can lay the foundation for the use of OSINT in criminal investigations and contribute to diversification of investigation methods.

A Study on Ontology and Topic Modeling-based Multi-dimensional Knowledge Map Services (온톨로지와 토픽모델링 기반 다차원 연계 지식맵 서비스 연구)

  • Jeong, Hanjo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.79-92
    • /
    • 2015
  • Knowledge map is widely used to represent knowledge in many domains. This paper presents a method of integrating the national R&D data and assists of users to navigate the integrated data via using a knowledge map service. The knowledge map service is built by using a lightweight ontology and a topic modeling method. The national R&D data is integrated with the research project as its center, i.e., the other R&D data such as research papers, patents, and reports are connected with the research project as its outputs. The lightweight ontology is used to represent the simple relationships between the integrated data such as project-outputs relationships, document-author relationships, and document-topic relationships. Knowledge map enables us to infer further relationships such as co-author and co-topic relationships. To extract the relationships between the integrated data, a Relational Data-to-Triples transformer is implemented. Also, a topic modeling approach is introduced to extract the document-topic relationships. A triple store is used to manage and process the ontology data while preserving the network characteristics of knowledge map service. Knowledge map can be divided into two types: one is a knowledge map used in the area of knowledge management to store, manage and process the organizations' data as knowledge, the other is a knowledge map for analyzing and representing knowledge extracted from the science & technology documents. This research focuses on the latter one. In this research, a knowledge map service is introduced for integrating the national R&D data obtained from National Digital Science Library (NDSL) and National Science & Technology Information Service (NTIS), which are two major repository and service of national R&D data servicing in Korea. A lightweight ontology is used to design and build a knowledge map. Using the lightweight ontology enables us to represent and process knowledge as a simple network and it fits in with the knowledge navigation and visualization characteristics of the knowledge map. The lightweight ontology is used to represent the entities and their relationships in the knowledge maps, and an ontology repository is created to store and process the ontology. In the ontologies, researchers are implicitly connected by the national R&D data as the author relationships and the performer relationships. A knowledge map for displaying researchers' network is created, and the researchers' network is created by the co-authoring relationships of the national R&D documents and the co-participation relationships of the national R&D projects. To sum up, a knowledge map-service system based on topic modeling and ontology is introduced for processing knowledge about the national R&D data such as research projects, papers, patent, project reports, and Global Trends Briefing (GTB) data. The system has goals 1) to integrate the national R&D data obtained from NDSL and NTIS, 2) to provide a semantic & topic based information search on the integrated data, and 3) to provide a knowledge map services based on the semantic analysis and knowledge processing. The S&T information such as research papers, research reports, patents and GTB are daily updated from NDSL, and the R&D projects information including their participants and output information are updated from the NTIS. The S&T information and the national R&D information are obtained and integrated to the integrated database. Knowledge base is constructed by transforming the relational data into triples referencing R&D ontology. In addition, a topic modeling method is employed to extract the relationships between the S&T documents and topic keyword/s representing the documents. The topic modeling approach enables us to extract the relationships and topic keyword/s based on the semantics, not based on the simple keyword/s. Lastly, we show an experiment on the construction of the integrated knowledge base using the lightweight ontology and topic modeling, and the knowledge map services created based on the knowledge base are also introduced.

A Study of Relationship Derivation Technique using object extraction Technique (개체추출기법을 이용한 관계성 도출기법)

  • Kim, Jong-hee;Lee, Eun-seok;Kim, Jeong-su;Park, Jong-kook;Kim, Jong-bae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.309-311
    • /
    • 2014
  • Despite increasing demands for big data application based on the analysis of scattered unstructured data, few relevant studies have been reported. Accordingly, the present study suggests a technique enabling a sentence-based semantic analysis by extracting objects from collected web information and automatically analyzing the relationships between such objects with collective intelligence and language processing technology. To be specific, collected information is stored in DBMS in a structured form, and then morpheme and feature information is analyzed. Obtained morphemes are classified into objects of interest, marginal objects and objects of non-interest. Then, with an inter-object attribute recognition technique, the relationships between objects are analyzed in terms of the degree, scope and nature of such relationships. As a result, the analysis of relevance between the information was based on certain keywords and used an inter-object relationship extraction technique that can determine positivity and negativity. Also, the present study suggested a method to design a system fit for real-time large-capacity processing and applicable to high value-added services.

  • PDF

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.

Representation of Population Distribution based on Residential Building Types by using the Dasymetric Mapping in Seoul (대시메트릭 매핑 기법을 이용한 서울시 건축물별 주거인구밀도의 재현)

  • Lee, Sukjoon;Lee, Sang Wook;Hong, Bo Yeong;Eom, Hongmin;Shin, Hyu-Seok;Kim, Kyung-Min
    • Spatial Information Research
    • /
    • v.22 no.3
    • /
    • pp.89-99
    • /
    • 2014
  • The aim of this study is to represent the residential population distribution in Seoul, Korea more precisely through the dasymetric mapping method. Dasymetric mapping can be defined as a mapping method to calculate details from truncated spatial distribution of main statistical data by using ancillary data which is spatial data related to the main data. In this research, there are two types of data used for dasymetric mapping: the population data (2010) based on a output area survey in Seoul as the main data and the building footprint data including register information as ancillary spatial data. Using the binary method, it extracts residential buildings as actual areas where residents do live in. After that, the regression method is used for calculating the weights on population density by considering the building types and their gross floor areas. Finally, it can be reproduced three-dimensional density of residential population and drew a detailed dasymetric map. As a result, this allows to extract a more realistic calculating model of population distribution and draw a more accurate map of population distribution in Seoul. Therefore, this study has an important meaning as a source which can be applied in various researches concerning regional population in the future.

Methodology for Issue-related R&D Keywords Packaging Using Text Mining (텍스트 마이닝 기반의 이슈 관련 R&D 키워드 패키징 방법론)

  • Hyun, Yoonjin;Shun, William Wong Xiu;Kim, Namgyu
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.57-66
    • /
    • 2015
  • Considerable research efforts are being directed towards analyzing unstructured data such as text files and log files using commercial and noncommercial analytical tools. In particular, researchers are trying to extract meaningful knowledge through text mining in not only business but also many other areas such as politics, economics, and cultural studies. For instance, several studies have examined national pending issues by analyzing large volumes of text on various social issues. However, it is difficult to provide successful information services that can identify R&D documents on specific national pending issues. While users may specify certain keywords relating to national pending issues, they usually fail to retrieve appropriate R&D information primarily due to discrepancies between these terms and the corresponding terms actually used in the R&D documents. Thus, we need an intermediate logic to overcome these discrepancies, also to identify and package appropriate R&D information on specific national pending issues. To address this requirement, three methodologies are proposed in this study-a hybrid methodology for extracting and integrating keywords pertaining to national pending issues, a methodology for packaging R&D information that corresponds to national pending issues, and a methodology for constructing an associative issue network based on relevant R&D information. Data analysis techniques such as text mining, social network analysis, and association rules mining are utilized for establishing these methodologies. As the experiment result, the keyword enhancement rate by the proposed integration methodology reveals to be about 42.8%. For the second objective, three key analyses were conducted and a number of association rules between national pending issue keywords and R&D keywords were derived. The experiment regarding to the third objective, which is issue clustering based on R&D keywords is still in progress and expected to give tangible results in the future.

Study on the Policy of Supporting University Students in the Beauty Field through Social Big Data Analysis: Based on exploratory data analytics (소셜 빅 데이터 분석을 통한 미용분야 대학생 창업지원 정책에 관한 연구 -탐색적 데이터 분석법을 기반으로-)

  • Mi-Yun Yoon;Nam-hoon Park
    • Journal of the Korean Applied Science and Technology
    • /
    • v.39 no.6
    • /
    • pp.853-863
    • /
    • 2022
  • In order to revitalize start-ups in the beauty field, this study attempted to derive characteristic patterns of changes in demand and differences in emotions and meaning for 'beauty start-ups' by dividing the period by year from 2019 to 2021 based on exploratory data analysis (EDA). Most of the search terms related to the keyword "beauty start-up" showed more interest in institutions or certificates that can learn beauty skills than professional start-up education, which still does not recognize the importance of start-up education, and as an alternative, it is necessary to develop customized start-up education programs for each major. We establish hypotheses through exploratory data analysis and verify hypotheses by combining traditional corroborative data analysis (CDA). There has never been an exploratory data analysis method for beauty startups, and rather than mentioning the need for formal start-up education, analyzing changes in interest in beauty startups and the requirements of prospective start-ups with exploratory data will help develop customized start-up programs.

The Effects of Cognitive Bias on Entrepreneurial Opportunity Evaluations through Perceived Risks in Entrepreneurial Self-Efficacy (창업가의 인지편향이 지각된 위험과 조절된 창업효능감에 따라 창업기회평가에 미치는 영향)

  • Kim, Daeyop;Park, Jaehwan
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.15 no.1
    • /
    • pp.95-112
    • /
    • 2020
  • This paper is to investigate how cognitive bias of college students and entrepreneurs relates to perceived risks and entrepreneurial opportunities that represent uncertainty, and how various cognitive bias and entrepreneurial efficacy In the same way. The purpose of this study is to find improvement points of entrepreneurship education for college students and to suggest problems and improvement possibilities in the decision making process of current entrepreneurs. This empirical study is a necessary to improve the decision-making of individuals who want to start a business at the time when various attempts are made to activate the start-up business and increase the sustainability of the existing SME management. And understanding of the difference in opportunity evaluation, and suggests that it is necessary to provide good opportunities together with the upbringing of entrepreneurs. In order to achieve the purpose of the study, questionnaires were conducted for college students and entrepreneurs. A total of 363 questionnaire data were obtained and demonstrated through structural equation modeling. This study confirms that there is some relationship between perceived risk and cognitive bias. Overconfidence and control illusions among cognitive bias have a significant relationship between perceived risk and wealth. Especially, it is confirmed that control illusion of college students has a significant relationship with perceived risk. Second, cognitive bias demonstrated some significant relationship with opportunity evaluation. Although we did not find evidence that excess self-confidence is related to opportunity evaluation, we have verified that control illusions and current status bias are related to opportunity evaluation. Control illusions were significant in both college students and entrepreneurs. Third, perceived risk has a negative relationship with opportunity evaluation. All students, regardless of whether they are college students or entrepreneurs, judge opportunities positively if they perceive low risk. Fourth, it can be seen from the college students 'group that entrepreneurial efficacy has a moderating effect between perceived risk and opportunity evaluation, but no significant results were found in the entrepreneurs' group. Fifth, the college students and entrepreneurs have different cognitive bias, and they have proved that there is a different relationship between entrepreneurial opportunity evaluation and perceived risk. On the whole, there are various cognitive biases that are caused by time pressure or stress on college students and entrepreneurs who have to make judgments in uncertain opportunities, and in this respect, they can improve their judgment in the future. At the same time, university students can have a positive view of new opportunities based on high entrepreneurial efficacy, but if they fully understand the intrinsic risks of entrepreneurship through entrepreneurial education and fully understand the cognitive bias present in direct entrepreneurial experience, You will get a better opportunity assessment. This study has limitations in that it is based on the fact that university students and entrepreneurs are integrated, and that the survey respondents are selected by the limited random sampling method. It is necessary to conduct more systematic research based on more faithful data in the absence of the accumulation of entrepreneurial research data. Second, the translation tools used in the previous studies were translated and the meaning of the measurement tools might not be conveyed due to language differences. Therefore, it is necessary to construct a more precise scale for the accuracy of the study. Finally, complementary research should be done to identify what competitive opportunities are and what opportunities are appropriate for entrepreneurs.