• Title/Summary/Keyword: Intelligent Data Analysis

Search Result 1,456, Processing Time 0.03 seconds

What factors drive AI project success? (무엇이 AI 프로젝트를 성공적으로 이끄는가?)

  • KyeSook Kim;Hyunchul Ahn
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.327-351
    • /
    • 2023
  • This paper aims to derive success factors that successfully lead an artificial intelligence (AI) project and prioritize importance. To this end, we first reviewed prior related studies to select success factors and finally derived 17 factors through expert interviews. Then, we developed a hierarchical model based on the TOE framework. With a hierarchical model, a survey was conducted on experts from AI-using companies and experts from supplier companies that support AI advice and technologies, platforms, and applications and analyzed using AHP methods. As a result of the analysis, organizational and technical factors are more important than environmental factors, but organizational factors are a little more critical. Among the organizational factors, strategic/clear business needs, AI implementation/utilization capabilities, and collaboration/communication between departments were the most important. Among the technical factors, sufficient amount and quality of data for AI learning were derived as the most important factors, followed by IT infrastructure/compatibility. Regarding environmental factors, customer preparation and support for the direct use of AI were essential. Looking at the importance of each 17 individual factors, data availability and quality (0.2245) were the most important, followed by strategy/clear business needs (0.1076) and customer readiness/support (0.0763). These results can guide successful implementation and development for companies considering or implementing AI adoption, service providers supporting AI adoption, and government policymakers seeking to foster the AI industry. In addition, they are expected to contribute to researchers who aim to study AI success models.

A Reflection of Aging Society in Online Communities: An Exploratory Study on Changes in Conversation Style and Language Usage (온라인 커뮤니티에서 보여지는 노령화 사회의 단면: 대화 방식과 사용 언어의 변화에 대한 탐색적 연구)

  • Jung Lee;Jinyoung Han;Juyeon Ham
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.51-68
    • /
    • 2023
  • With the emergence of the internet and the increasing use of online communities for over 20 years, the age range of users has also been rising. This study explores the linguistic changes that have occurred as the user age in online communities has increased. To do this, data was collected and analyzed from an online community that has been actively operating, despite new member registrations being closed nine years ago. By comparing the posts over an 11-year period from 2012 to 2022, changes such as an increase in average comments, a decrease in interrogative sentences, and a decrease in imperative statements were observed. The study also proposed loneliness due to aging and a decline in curiosity and confidence as potential causes of these changes. In South Korea, which is rapidly entering an aging society unprecedentedly fast on a global scale, the increase in single-person households has evolved loneliness from a personal issue to a social problem, manifested in an increase in solitary deaths and reclusive individuals. This research sheds light on one aspect of these social phenomena through the analysis of data from a large online community.

Analysis of biodiversity change trend on urban development project - Focusing on terrestrial species in Environmental Impact Assessment - (도시의 개발 사업에 따른 생물다양성 변화 추세 분석 - 환경영향평가의 육상 동물종을 중심으로 -)

  • Kim, Eun-Sub;Lee, Dong-Kun;Jeon, Yoon-Ho;Choi, Ji-Young;Kim, Shin-Woo;Hwang, Hye-Mi;Kim, Da-Seul;Moon, Hyun-Bin;Bae, Ji-Ho
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.26 no.6
    • /
    • pp.21-32
    • /
    • 2023
  • The Environmental Impact Assessment (EIA) plays a pivotal role in predicting the potential environmental impacts of proposed developments and planning appropriate mitigation measures to minimize effects on species. However, as concerns over biodiversity loss rise, there's ongoing debate about the efficacy of these mitigation plans. In this study, we utilized data from EIAs and post-environmental impact surveys to understand the trends in biodiversity during construction and operation phases. By examining 30 urban development projects, we categorized species richness indices of mammals, birds, amphibians, and reptiles into pre-construction, during construction, and post-construction operational stages. The biodiversity trends were analyzed based on the rate of change in these indices. The results revealed three distinct biodiversity change patterns: (A) An initial increase in biodiversity indices post-development, followed by a gradual decline over time; (B) a sustained increase in biodiversity as a result of mitigation measures; and (C) a continuous decline in biodiversity post-development. Furthermore, all species exhibited a higher rate of biodiversity decline during the construction phase compared to the operational phase, with mammals showing the most significant rate of change. Notably, the biodiversity change rate during operation was generally lower than during construction. In particular, mammals seemed to be most influenced by mitigation measures, displaying the smallest rate of change. This study provides empirical evidence on the efficacy of mitigation measures and deliberates on ways to enhance their effectiveness in minimizing the adverse impacts of urban development on biodiversity. These findings can serve as foundational data for addressing terrestrial biodiversity reduction.

Word-of-Mouth Effect for Online Sales of K-Beauty Products: Centered on China SINA Weibo and Meipai (K-Beauty 구전효과가 온라인 매출액에 미치는 영향: 중국 SINA Weibo와 Meipai 중심으로)

  • Liu, Meina;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.197-218
    • /
    • 2019
  • In addition to economic growth and national income increase, China is also experiencing rapid growth in consumption of cosmetics. About 67% of the total trade volume of Chinese cosmetics is made by e-commerce and especially K-Beauty products, which are Korean cosmetics are very popular. According to previous studies, 80% of consumer goods such as cosmetics are affected by the word of mouth information, searching the product information before purchase. Mostly, consumers acquire information related to cosmetics through comments made by other consumers on SNS such as SINA Weibo and Wechat, and recently they also use information about beauty related video channels. Most of the previous online word-of-mouth researches were mainly focused on media itself such as Facebook, Twitter, and blogs. However, the informational characteristics and the expression forms are also diverse. Typical types are text, picture, and video. This study focused on these types. We analyze the unstructured data of SINA Weibo, the SNS representative platform of China, and Meipai, the video platform, and analyze the impact of K-Beauty brand sales by dividing online word-of-mouth information with quantity and direction information. We analyzed about 330,000 data from Meipai, and 110,000 data from SINA Weibo and analyzed the basic properties of cosmetics. As a result of analysis, the amount of online word-of-mouth information has a positive effect on the sales of cosmetics irrespective of the type of media. However, the online videos showed higher impacts than the pictures and texts. Therefore, it is more effective for companies to carry out advertising and promotional activities in parallel with the existing SNS as well as video related information. It is understood that it is important to generate the frequency of exposure irrespective of media type. The positiveness of the video media was significant but the positiveness of the picture and text media was not significant. Due to the nature of information types, the amount of information in video media is more than that in text-oriented media, and video-related channels are emerging all over the world. In particular, China has made a number of video platforms in recent years and has enjoyed popularity among teenagers and thirties. As a result, existing SNS users are being dispersed to video media. We also analyzed the effect of online type of information on the online cosmetics sales by dividing the product type of cosmetics into basic cosmetics and color cosmetics. As a result, basic cosmetics had a positive effect on the sales according to the number of online videos and it was affected by the negative information of the videos. In the case of basic cosmetics, effects or characteristics do not appear immediately like color cosmetics, so information such as changes after use is often transmitted over a period of time. Therefore, it is important for companies to move more quickly to issues generated from video media. Color cosmetics are largely influenced by negative oral statements and sensitive to picture and text-oriented media. Information such as picture and text has the advantage and disadvantage that the process of making it can be made easier than video. Therefore, complaints and opinions are generally expressed in SNS quickly and immediately. Finally, we analyzed how product diversity affects sales according to online word of mouth information type. As a result of the analysis, it can be confirmed that when a variety of products are introduced in a video channel, they have a positive effect on online cosmetics sales. The significance of this study in the theoretical aspect is that, as in the previous studies, online sales have basically proved that K-Beauty cosmetics are also influenced by word-of-mouth. However this study focused on media types and both media have a positive impact on sales, as in previous studies, but it has been proven that video is more informative and influencing than text, depending on media abundance. In addition, according to the existing research on information direction, it is said that the negative influence has more influence, but in the basic study, the correlation is not significant, but the effect of negation in the case of color cosmetics is large. In the case of temporal fashion products such as color cosmetics, fast oral effect is influenced. In practical terms, it is expected that it will be helpful to use advertising strategies on the sales and advertising strategy of K-Beauty cosmetics in China by distinguishing basic and color cosmetics. In addition, it can be said that it recognized the importance of a video advertising strategy such as YouTube and one-person media. The results of this study can be used as basic data for analyzing the big data in understanding the Chinese cosmetics market and establishing appropriate strategies and marketing utilization of related companies.

A study on the prediction of korean NPL market return (한국 NPL시장 수익률 예측에 관한 연구)

  • Lee, Hyeon Su;Jeong, Seung Hwan;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.123-139
    • /
    • 2019
  • The Korean NPL market was formed by the government and foreign capital shortly after the 1997 IMF crisis. However, this market is short-lived, as the bad debt has started to increase after the global financial crisis in 2009 due to the real economic recession. NPL has become a major investment in the market in recent years when the domestic capital market's investment capital began to enter the NPL market in earnest. Although the domestic NPL market has received considerable attention due to the overheating of the NPL market in recent years, research on the NPL market has been abrupt since the history of capital market investment in the domestic NPL market is short. In addition, decision-making through more scientific and systematic analysis is required due to the decline in profitability and the price fluctuation due to the fluctuation of the real estate business. In this study, we propose a prediction model that can determine the achievement of the benchmark yield by using the NPL market related data in accordance with the market demand. In order to build the model, we used Korean NPL data from December 2013 to December 2017 for about 4 years. The total number of things data was 2291. As independent variables, only the variables related to the dependent variable were selected for the 11 variables that indicate the characteristics of the real estate. In order to select the variables, one to one t-test and logistic regression stepwise and decision tree were performed. Seven independent variables (purchase year, SPC (Special Purpose Company), municipality, appraisal value, purchase cost, OPB (Outstanding Principle Balance), HP (Holding Period)). The dependent variable is a bivariate variable that indicates whether the benchmark rate is reached. This is because the accuracy of the model predicting the binomial variables is higher than the model predicting the continuous variables, and the accuracy of these models is directly related to the effectiveness of the model. In addition, in the case of a special purpose company, whether or not to purchase the property is the main concern. Therefore, whether or not to achieve a certain level of return is enough to make a decision. For the dependent variable, we constructed and compared the predictive model by calculating the dependent variable by adjusting the numerical value to ascertain whether 12%, which is the standard rate of return used in the industry, is a meaningful reference value. As a result, it was found that the hit ratio average of the predictive model constructed using the dependent variable calculated by the 12% standard rate of return was the best at 64.60%. In order to propose an optimal prediction model based on the determined dependent variables and 7 independent variables, we construct a prediction model by applying the five methodologies of discriminant analysis, logistic regression analysis, decision tree, artificial neural network, and genetic algorithm linear model we tried to compare them. To do this, 10 sets of training data and testing data were extracted using 10 fold validation method. After building the model using this data, the hit ratio of each set was averaged and the performance was compared. As a result, the hit ratio average of prediction models constructed by using discriminant analysis, logistic regression model, decision tree, artificial neural network, and genetic algorithm linear model were 64.40%, 65.12%, 63.54%, 67.40%, and 60.51%, respectively. It was confirmed that the model using the artificial neural network is the best. Through this study, it is proved that it is effective to utilize 7 independent variables and artificial neural network prediction model in the future NPL market. The proposed model predicts that the 12% return of new things will be achieved beforehand, which will help the special purpose companies make investment decisions. Furthermore, we anticipate that the NPL market will be liquidated as the transaction proceeds at an appropriate price.

Major Class Recommendation System based on Deep learning using Network Analysis (네트워크 분석을 활용한 딥러닝 기반 전공과목 추천 시스템)

  • Lee, Jae Kyu;Park, Heesung;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.95-112
    • /
    • 2021
  • In university education, the choice of major class plays an important role in students' careers. However, in line with the changes in the industry, the fields of major subjects by department are diversifying and increasing in number in university education. As a result, students have difficulty to choose and take classes according to their career paths. In general, students choose classes based on experiences such as choices of peers or advice from seniors. This has the advantage of being able to take into account the general situation, but it does not reflect individual tendencies and considerations of existing courses, and has a problem that leads to information inequality that is shared only among specific students. In addition, as non-face-to-face classes have recently been conducted and exchanges between students have decreased, even experience-based decisions have not been made as well. Therefore, this study proposes a recommendation system model that can recommend college major classes suitable for individual characteristics based on data rather than experience. The recommendation system recommends information and content (music, movies, books, images, etc.) that a specific user may be interested in. It is already widely used in services where it is important to consider individual tendencies such as YouTube and Facebook, and you can experience it familiarly in providing personalized services in content services such as over-the-top media services (OTT). Classes are also a kind of content consumption in terms of selecting classes suitable for individuals from a set content list. However, unlike other content consumption, it is characterized by a large influence of selection results. For example, in the case of music and movies, it is usually consumed once and the time required to consume content is short. Therefore, the importance of each item is relatively low, and there is no deep concern in selecting. Major classes usually have a long consumption time because they have to be taken for one semester, and each item has a high importance and requires greater caution in choice because it affects many things such as career and graduation requirements depending on the composition of the selected classes. Depending on the unique characteristics of these major classes, the recommendation system in the education field supports decision-making that reflects individual characteristics that are meaningful and cannot be reflected in experience-based decision-making, even though it has a relatively small number of item ranges. This study aims to realize personalized education and enhance students' educational satisfaction by presenting a recommendation model for university major class. In the model study, class history data of undergraduate students at University from 2015 to 2017 were used, and students and their major names were used as metadata. The class history data is implicit feedback data that only indicates whether content is consumed, not reflecting preferences for classes. Therefore, when we derive embedding vectors that characterize students and classes, their expressive power is low. With these issues in mind, this study proposes a Net-NeuMF model that generates vectors of students, classes through network analysis and utilizes them as input values of the model. The model was based on the structure of NeuMF using one-hot vectors, a representative model using data with implicit feedback. The input vectors of the model are generated to represent the characteristic of students and classes through network analysis. To generate a vector representing a student, each student is set to a node and the edge is designed to connect with a weight if the two students take the same class. Similarly, to generate a vector representing the class, each class was set as a node, and the edge connected if any students had taken the classes in common. Thus, we utilize Node2Vec, a representation learning methodology that quantifies the characteristics of each node. For the evaluation of the model, we used four indicators that are mainly utilized by recommendation systems, and experiments were conducted on three different dimensions to analyze the impact of embedding dimensions on the model. The results show better performance on evaluation metrics regardless of dimension than when using one-hot vectors in existing NeuMF structures. Thus, this work contributes to a network of students (users) and classes (items) to increase expressiveness over existing one-hot embeddings, to match the characteristics of each structure that constitutes the model, and to show better performance on various kinds of evaluation metrics compared to existing methodologies.

Product Evaluation Criteria Extraction through Online Review Analysis: Using LDA and k-Nearest Neighbor Approach (온라인 리뷰 분석을 통한 상품 평가 기준 추출: LDA 및 k-최근접 이웃 접근법을 활용하여)

  • Lee, Ji Hyeon;Jung, Sang Hyung;Kim, Jun Ho;Min, Eun Joo;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.97-117
    • /
    • 2020
  • Product evaluation criteria is an indicator describing attributes or values of products, which enable users or manufacturers measure and understand the products. When companies analyze their products or compare them with competitors, appropriate criteria must be selected for objective evaluation. The criteria should show the features of products that consumers considered when they purchased, used and evaluated the products. However, current evaluation criteria do not reflect different consumers' opinion from product to product. Previous studies tried to used online reviews from e-commerce sites that reflect consumer opinions to extract the features and topics of products and use them as evaluation criteria. However, there is still a limit that they produce irrelevant criteria to products due to extracted or improper words are not refined. To overcome this limitation, this research suggests LDA-k-NN model which extracts possible criteria words from online reviews by using LDA and refines them with k-nearest neighbor. Proposed approach starts with preparation phase, which is constructed with 6 steps. At first, it collects review data from e-commerce websites. Most e-commerce websites classify their selling items by high-level, middle-level, and low-level categories. Review data for preparation phase are gathered from each middle-level category and collapsed later, which is to present single high-level category. Next, nouns, adjectives, adverbs, and verbs are extracted from reviews by getting part of speech information using morpheme analysis module. After preprocessing, words per each topic from review are shown with LDA and only nouns in topic words are chosen as potential words for criteria. Then, words are tagged based on possibility of criteria for each middle-level category. Next, every tagged word is vectorized by pre-trained word embedding model. Finally, k-nearest neighbor case-based approach is used to classify each word with tags. After setting up preparation phase, criteria extraction phase is conducted with low-level categories. This phase starts with crawling reviews in the corresponding low-level category. Same preprocessing as preparation phase is conducted using morpheme analysis module and LDA. Possible criteria words are extracted by getting nouns from the data and vectorized by pre-trained word embedding model. Finally, evaluation criteria are extracted by refining possible criteria words using k-nearest neighbor approach and reference proportion of each word in the words set. To evaluate the performance of the proposed model, an experiment was conducted with review on '11st', one of the biggest e-commerce companies in Korea. Review data were from 'Electronics/Digital' section, one of high-level categories in 11st. For performance evaluation of suggested model, three other models were used for comparing with the suggested model; actual criteria of 11st, a model that extracts nouns by morpheme analysis module and refines them according to word frequency, and a model that extracts nouns from LDA topics and refines them by word frequency. The performance evaluation was set to predict evaluation criteria of 10 low-level categories with the suggested model and 3 models above. Criteria words extracted from each model were combined into a single words set and it was used for survey questionnaires. In the survey, respondents chose every item they consider as appropriate criteria for each category. Each model got its score when chosen words were extracted from that model. The suggested model had higher scores than other models in 8 out of 10 low-level categories. By conducting paired t-tests on scores of each model, we confirmed that the suggested model shows better performance in 26 tests out of 30. In addition, the suggested model was the best model in terms of accuracy. This research proposes evaluation criteria extracting method that combines topic extraction using LDA and refinement with k-nearest neighbor approach. This method overcomes the limits of previous dictionary-based models and frequency-based refinement models. This study can contribute to improve review analysis for deriving business insights in e-commerce market.

The Framework of Research Network and Performance Evaluation on Personal Information Security: Social Network Analysis Perspective (개인정보보호 분야의 연구자 네트워크와 성과 평가 프레임워크: 소셜 네트워크 분석을 중심으로)

  • Kim, Minsu;Choi, Jaewon;Kim, Hyun Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.177-193
    • /
    • 2014
  • Over the past decade, there has been a rapid diffusion of electronic commerce and a rising number of interconnected networks, resulting in an escalation of security threats and privacy concerns. Electronic commerce has a built-in trade-off between the necessity of providing at least some personal information to consummate an online transaction, and the risk of negative consequences from providing such information. More recently, the frequent disclosure of private information has raised concerns about privacy and its impacts. This has motivated researchers in various fields to explore information privacy issues to address these concerns. Accordingly, the necessity for information privacy policies and technologies for collecting and storing data, and information privacy research in various fields such as medicine, computer science, business, and statistics has increased. The occurrence of various information security accidents have made finding experts in the information security field an important issue. Objective measures for finding such experts are required, as it is currently rather subjective. Based on social network analysis, this paper focused on a framework to evaluate the process of finding experts in the information security field. We collected data from the National Discovery for Science Leaders (NDSL) database, initially collecting about 2000 papers covering the period between 2005 and 2013. Outliers and the data of irrelevant papers were dropped, leaving 784 papers to test the suggested hypotheses. The co-authorship network data for co-author relationship, publisher, affiliation, and so on were analyzed using social network measures including centrality and structural hole. The results of our model estimation are as follows. With the exception of Hypothesis 3, which deals with the relationship between eigenvector centrality and performance, all of our hypotheses were supported. In line with our hypothesis, degree centrality (H1) was supported with its positive influence on the researchers' publishing performance (p<0.001). This finding indicates that as the degree of cooperation increased, the more the publishing performance of researchers increased. In addition, closeness centrality (H2) was also positively associated with researchers' publishing performance (p<0.001), suggesting that, as the efficiency of information acquisition increased, the more the researchers' publishing performance increased. This paper identified the difference in publishing performance among researchers. The analysis can be used to identify core experts and evaluate their performance in the information privacy research field. The co-authorship network for information privacy can aid in understanding the deep relationships among researchers. In addition, extracting characteristics of publishers and affiliations, this paper suggested an understanding of the social network measures and their potential for finding experts in the information privacy field. Social concerns about securing the objectivity of experts have increased, because experts in the information privacy field frequently participate in political consultation, and business education support and evaluation. In terms of practical implications, this research suggests an objective framework for experts in the information privacy field, and is useful for people who are in charge of managing research human resources. This study has some limitations, providing opportunities and suggestions for future research. Presenting the difference in information diffusion according to media and proximity presents difficulties for the generalization of the theory due to the small sample size. Therefore, further studies could consider an increased sample size and media diversity, the difference in information diffusion according to the media type, and information proximity could be explored in more detail. Moreover, previous network research has commonly observed a causal relationship between the independent and dependent variable (Kadushin, 2012). In this study, degree centrality as an independent variable might have causal relationship with performance as a dependent variable. However, in the case of network analysis research, network indices could be computed after the network relationship is created. An annual analysis could help mitigate this limitation.

School Experiences and the Next Gate Path : An analysis of Univ. Student activity log (대학생의 학창경험이 사회 진출에 미치는 영향: 대학생활 활동 로그분석을 중심으로)

  • YI, EUNJU;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.149-171
    • /
    • 2020
  • The period at university is to make decision about getting an actual job. As our society develops rapidly and highly, jobs are diversified, subdivided, and specialized, and students' job preparation period is also getting longer and longer. This study analyzed the log data of college students to see how the various activities that college students experience inside and outside of school might have influences on employment. For this experiment, students' various activities were systematically classified, recorded as an activity data and were divided into six core competencies (Job reinforcement competency, Leadership & teamwork competency, Globalization competency, Organizational commitment competency, Job exploration competency, and Autonomous implementation competency). The effect of the six competency levels on the employment status (employed group, unemployed group) was analyzed. As a result of the analysis, it was confirmed that the difference in level between the employed group and the unemployed group was significant for all of the six competencies, so it was possible to infer that the activities at the school are significant for employment. Next, in order to analyze the impact of the six competencies on the qualitative performance of employment, we had ANOVA analysis after dividing the each competency level into 2 groups (low and high group), and creating 6 groups by the range of first annual salary. Students with high levels of globalization capability, job search capability, and autonomous implementation capability were also found to belong to a higher annual salary group. The theoretical contributions of this study are as follows. First, it connects the competencies that can be extracted from the school experience with the competencies in the Human Resource Management field and adds job search competencies and autonomous implementation competencies which are required for university students to have their own successful career & life. Second, we have conducted this analysis with the competency data measured form actual activity and result data collected from the interview and research. Third, it analyzed not only quantitative performance (employment rate) but also qualitative performance (annual salary level). The practical use of this study is as follows. First, it can be a guide when establishing career development plans for college students. It is necessary to prepare for a job that can express one's strengths based on an analysis of the world of work and job, rather than having a no-strategy, unbalanced, or accumulating excessive specifications competition. Second, the person in charge of experience design for college students, at an organizations such as schools, businesses, local governments, and governments, can refer to the six competencies suggested in this study to for the user-useful experiences design that may motivate more participation. By doing so, one event may bring mutual benefits for both event designers and students. Third, in the era of digital transformation, the government's policy manager who envisions the balanced development of the country can make a policy in the direction of achieving the curiosity and energy of college students together with the balanced development of the country. A lot of manpower is required to start up novel platform services that have not existed before or to digitize existing analog products, services and corporate culture. The activities of current digital-generation-college-students are not only catalysts in all industries, but also for very benefit and necessary for college students by themselves for their own successful career development.

A Study on Ontology and Topic Modeling-based Multi-dimensional Knowledge Map Services (온톨로지와 토픽모델링 기반 다차원 연계 지식맵 서비스 연구)

  • Jeong, Hanjo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.79-92
    • /
    • 2015
  • Knowledge map is widely used to represent knowledge in many domains. This paper presents a method of integrating the national R&D data and assists of users to navigate the integrated data via using a knowledge map service. The knowledge map service is built by using a lightweight ontology and a topic modeling method. The national R&D data is integrated with the research project as its center, i.e., the other R&D data such as research papers, patents, and reports are connected with the research project as its outputs. The lightweight ontology is used to represent the simple relationships between the integrated data such as project-outputs relationships, document-author relationships, and document-topic relationships. Knowledge map enables us to infer further relationships such as co-author and co-topic relationships. To extract the relationships between the integrated data, a Relational Data-to-Triples transformer is implemented. Also, a topic modeling approach is introduced to extract the document-topic relationships. A triple store is used to manage and process the ontology data while preserving the network characteristics of knowledge map service. Knowledge map can be divided into two types: one is a knowledge map used in the area of knowledge management to store, manage and process the organizations' data as knowledge, the other is a knowledge map for analyzing and representing knowledge extracted from the science & technology documents. This research focuses on the latter one. In this research, a knowledge map service is introduced for integrating the national R&D data obtained from National Digital Science Library (NDSL) and National Science & Technology Information Service (NTIS), which are two major repository and service of national R&D data servicing in Korea. A lightweight ontology is used to design and build a knowledge map. Using the lightweight ontology enables us to represent and process knowledge as a simple network and it fits in with the knowledge navigation and visualization characteristics of the knowledge map. The lightweight ontology is used to represent the entities and their relationships in the knowledge maps, and an ontology repository is created to store and process the ontology. In the ontologies, researchers are implicitly connected by the national R&D data as the author relationships and the performer relationships. A knowledge map for displaying researchers' network is created, and the researchers' network is created by the co-authoring relationships of the national R&D documents and the co-participation relationships of the national R&D projects. To sum up, a knowledge map-service system based on topic modeling and ontology is introduced for processing knowledge about the national R&D data such as research projects, papers, patent, project reports, and Global Trends Briefing (GTB) data. The system has goals 1) to integrate the national R&D data obtained from NDSL and NTIS, 2) to provide a semantic & topic based information search on the integrated data, and 3) to provide a knowledge map services based on the semantic analysis and knowledge processing. The S&T information such as research papers, research reports, patents and GTB are daily updated from NDSL, and the R&D projects information including their participants and output information are updated from the NTIS. The S&T information and the national R&D information are obtained and integrated to the integrated database. Knowledge base is constructed by transforming the relational data into triples referencing R&D ontology. In addition, a topic modeling method is employed to extract the relationships between the S&T documents and topic keyword/s representing the documents. The topic modeling approach enables us to extract the relationships and topic keyword/s based on the semantics, not based on the simple keyword/s. Lastly, we show an experiment on the construction of the integrated knowledge base using the lightweight ontology and topic modeling, and the knowledge map services created based on the knowledge base are also introduced.