• Title/Summary/Keyword: Intelligence community

Search Result 174, Processing Time 0.023 seconds

Keyword Network Analysis for Technology Forecasting (기술예측을 위한 특허 키워드 네트워크 분석)

  • Choi, Jin-Ho;Kim, Hee-Su;Im, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.227-240
    • /
    • 2011
  • New concepts and ideas often result from extensive recombination of existing concepts or ideas. Both researchers and developers build on existing concepts and ideas in published papers or registered patents to develop new theories and technologies that in turn serve as a basis for further development. As the importance of patent increases, so does that of patent analysis. Patent analysis is largely divided into network-based and keyword-based analyses. The former lacks its ability to analyze information technology in details while the letter is unable to identify the relationship between such technologies. In order to overcome the limitations of network-based and keyword-based analyses, this study, which blends those two methods, suggests the keyword network based analysis methodology. In this study, we collected significant technology information in each patent that is related to Light Emitting Diode (LED) through text mining, built a keyword network, and then executed a community network analysis on the collected data. The results of analysis are as the following. First, the patent keyword network indicated very low density and exceptionally high clustering coefficient. Technically, density is obtained by dividing the number of ties in a network by the number of all possible ties. The value ranges between 0 and 1, with higher values indicating denser networks and lower values indicating sparser networks. In real-world networks, the density varies depending on the size of a network; increasing the size of a network generally leads to a decrease in the density. The clustering coefficient is a network-level measure that illustrates the tendency of nodes to cluster in densely interconnected modules. This measure is to show the small-world property in which a network can be highly clustered even though it has a small average distance between nodes in spite of the large number of nodes. Therefore, high density in patent keyword network means that nodes in the patent keyword network are connected sporadically, and high clustering coefficient shows that nodes in the network are closely connected one another. Second, the cumulative degree distribution of the patent keyword network, as any other knowledge network like citation network or collaboration network, followed a clear power-law distribution. A well-known mechanism of this pattern is the preferential attachment mechanism, whereby a node with more links is likely to attain further new links in the evolution of the corresponding network. Unlike general normal distributions, the power-law distribution does not have a representative scale. This means that one cannot pick a representative or an average because there is always a considerable probability of finding much larger values. Networks with power-law distributions are therefore often referred to as scale-free networks. The presence of heavy-tailed scale-free distribution represents the fundamental signature of an emergent collective behavior of the actors who contribute to forming the network. In our context, the more frequently a patent keyword is used, the more often it is selected by researchers and is associated with other keywords or concepts to constitute and convey new patents or technologies. The evidence of power-law distribution implies that the preferential attachment mechanism suggests the origin of heavy-tailed distributions in a wide range of growing patent keyword network. Third, we found that among keywords that flew into a particular field, the vast majority of keywords with new links join existing keywords in the associated community in forming the concept of a new patent. This finding resulted in the same outcomes for both the short-term period (4-year) and long-term period (10-year) analyses. Furthermore, using the keyword combination information that was derived from the methodology suggested by our study enables one to forecast which concepts combine to form a new patent dimension and refer to those concepts when developing a new patent.

Pareto Ratio and Inequality Level of Knowledge Sharing in Virtual Knowledge Collaboration: Analysis of Behaviors on Wikipedia (지식 공유의 파레토 비율 및 불평등 정도와 가상 지식 협업: 위키피디아 행위 데이터 분석)

  • Park, Hyun-Jung;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.19-43
    • /
    • 2014
  • The Pareto principle, also known as the 80-20 rule, states that roughly 80% of the effects come from 20% of the causes for many events including natural phenomena. It has been recognized as a golden rule in business with a wide application of such discovery like 20 percent of customers resulting in 80 percent of total sales. On the other hand, the Long Tail theory, pointing out that "the trivial many" produces more value than "the vital few," has gained popularity in recent times with a tremendous reduction of distribution and inventory costs through the development of ICT(Information and Communication Technology). This study started with a view to illuminating how these two primary business paradigms-Pareto principle and Long Tail theory-relates to the success of virtual knowledge collaboration. The importance of virtual knowledge collaboration is soaring in this era of globalization and virtualization transcending geographical and temporal constraints. Many previous studies on knowledge sharing have focused on the factors to affect knowledge sharing, seeking to boost individual knowledge sharing and resolve the social dilemma caused from the fact that rational individuals are likely to rather consume than contribute knowledge. Knowledge collaboration can be defined as the creation of knowledge by not only sharing knowledge, but also by transforming and integrating such knowledge. In this perspective of knowledge collaboration, the relative distribution of knowledge sharing among participants can count as much as the absolute amounts of individual knowledge sharing. In particular, whether the more contribution of the upper 20 percent of participants in knowledge sharing will enhance the efficiency of overall knowledge collaboration is an issue of interest. This study deals with the effect of this sort of knowledge sharing distribution on the efficiency of knowledge collaboration and is extended to reflect the work characteristics. All analyses were conducted based on actual data instead of self-reported questionnaire surveys. More specifically, we analyzed the collaborative behaviors of editors of 2,978 English Wikipedia featured articles, which are the best quality grade of articles in English Wikipedia. We adopted Pareto ratio, the ratio of the number of knowledge contribution of the upper 20 percent of participants to the total number of knowledge contribution made by the total participants of an article group, to examine the effect of Pareto principle. In addition, Gini coefficient, which represents the inequality of income among a group of people, was applied to reveal the effect of inequality of knowledge contribution. Hypotheses were set up based on the assumption that the higher ratio of knowledge contribution by more highly motivated participants will lead to the higher collaboration efficiency, but if the ratio gets too high, the collaboration efficiency will be exacerbated because overall informational diversity is threatened and knowledge contribution of less motivated participants is intimidated. Cox regression models were formulated for each of the focal variables-Pareto ratio and Gini coefficient-with seven control variables such as the number of editors involved in an article, the average time length between successive edits of an article, the number of sections a featured article has, etc. The dependent variable of the Cox models is the time spent from article initiation to promotion to the featured article level, indicating the efficiency of knowledge collaboration. To examine whether the effects of the focal variables vary depending on the characteristics of a group task, we classified 2,978 featured articles into two categories: Academic and Non-academic. Academic articles refer to at least one paper published at an SCI, SSCI, A&HCI, or SCIE journal. We assumed that academic articles are more complex, entail more information processing and problem solving, and thus require more skill variety and expertise. The analysis results indicate the followings; First, Pareto ratio and inequality of knowledge sharing relates in a curvilinear fashion to the collaboration efficiency in an online community, promoting it to an optimal point and undermining it thereafter. Second, the curvilinear effect of Pareto ratio and inequality of knowledge sharing on the collaboration efficiency is more sensitive with a more academic task in an online community.

A Study on Design of Agent based Nursing Records System in Attending System (에이전트기반 개방병원 간호기록시스템 설계에 관한 연구)

  • Kim, Kyoung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.2
    • /
    • pp.73-94
    • /
    • 2010
  • The attending system is a medical system that allows doctors in clinics to use the extra equipment in hospitals-beds, laboratory, operating room, etc-for their patient's care under a contract between the doctors and hospitals. Therefore, the system is very beneficial in terms of the efficiency of the usage of medical resources. However, it is necessary to develop a strong support system to strengthen its weaknesses and supplement its merits. If doctors use hospital beds under the attending system of hospitals, they would be able to check a patient's condition often and provide them with nursing care services. However, the current attending system lacks delivery and assistance support. Thus, for the successful performance of the attending system, a networking system should be developed to facilitate communication between the doctors and nurses. In particular, the nursing records in the attending system could help doctors monitor the patient's condition and provision of nursing care services. A nursing record is the formal documentation associated with nursing care. It is merely a data repository that helps nurses to track their activities; nursing records thus represent a resource of primary information that can be reused. In order to maximize their usefulness, nursing records have been introduced as part of computerized patient records. However, nursing records are internal data that are not disclosed by hospitals. Moreover, the lack of standardization of the record list makes it difficult to share nursing records. Under the attending system, nurses would want to minimize the amount of effort they have to put in for the maintenance of additional records. Hence, they would try to maintain the current level of nursing records in the form of record lists and record attributes, while doctors would require more detailed and real-time information about their patients in order to monitor their condition. Therefore, this study developed a system for assisting in the maintenance and sharing of the nursing records under the attending system. In contrast to previous research on the functionality of computer-based nursing records, we have emphasized the practical usefulness of nursing records from the viewpoint of the actual implementation of the attending system. We suggested that nurses could design a nursing record dictionary for their convenience, and that doctors and nurses could confirm the definitions that they looked up in the dictionary through negotiations with intelligent agents. Such an agent-based system could facilitate networking among medical institutes. Multi-agent systems are a widely accepted paradigm for the distribution and sharing of computation workloads in the scientific community. Agent-based systems have been developed with differences in functional cooperation, coordination, and negotiation. To increase such communication, a framework for a multi-agent based system is proposed in this study. The agent-based approach is useful for developing a system that promotes trade-offs between transactions involving multiple attributes. A brief summary of our contributions follows. First, we propose an efficient and accurate utility representation and acquisition mechanism based on a preference scale while minimizing user interactions with the agent. Trade-offs between various transaction attributes can also be easily computed. Second, by providing a multi-attribute negotiation framework based on the attribute utility evaluation mechanism, we allow both the doctors in charge and nurses to negotiate over various transaction attributes in the nursing record lists that are defined by the latter. Third, we have designed the architecture of the nursing record management server and a system of agents that provides support to the doctors and nurses with regard to the framework and mechanisms proposed above. A formal protocol has also been developed to create and control the communication required for negotiations. We verified the realization of the system by developing a web-based prototype. The system was implemented using ASP and IIS5.1.

Finding Weighted Sequential Patterns over Data Streams via a Gap-based Weighting Approach (발생 간격 기반 가중치 부여 기법을 활용한 데이터 스트림에서 가중치 순차패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.55-75
    • /
    • 2010
  • Sequential pattern mining aims to discover interesting sequential patterns in a sequence database, and it is one of the essential data mining tasks widely used in various application fields such as Web access pattern analysis, customer purchase pattern analysis, and DNA sequence analysis. In general sequential pattern mining, only the generation order of data element in a sequence is considered, so that it can easily find simple sequential patterns, but has a limit to find more interesting sequential patterns being widely used in real world applications. One of the essential research topics to compensate the limit is a topic of weighted sequential pattern mining. In weighted sequential pattern mining, not only the generation order of data element but also its weight is considered to get more interesting sequential patterns. In recent, data has been increasingly taking the form of continuous data streams rather than finite stored data sets in various application fields, the database research community has begun focusing its attention on processing over data streams. The data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. In data stream processing, each data element should be examined at most once to analyze the data stream, and the memory usage for data stream analysis should be restricted finitely although new data elements are continuously generated in a data stream. Moreover, newly generated data elements should be processed as fast as possible to produce the up-to-date analysis result of a data stream, so that it can be instantly utilized upon request. To satisfy these requirements, data stream processing sacrifices the correctness of its analysis result by allowing some error. Considering the changes in the form of data generated in real world application fields, many researches have been actively performed to find various kinds of knowledge embedded in data streams. They mainly focus on efficient mining of frequent itemsets and sequential patterns over data streams, which have been proven to be useful in conventional data mining for a finite data set. In addition, mining algorithms have also been proposed to efficiently reflect the changes of data streams over time into their mining results. However, they have been targeting on finding naively interesting patterns such as frequent patterns and simple sequential patterns, which are found intuitively, taking no interest in mining novel interesting patterns that express the characteristics of target data streams better. Therefore, it can be a valuable research topic in the field of mining data streams to define novel interesting patterns and develop a mining method finding the novel patterns, which will be effectively used to analyze recent data streams. This paper proposes a gap-based weighting approach for a sequential pattern and amining method of weighted sequential patterns over sequence data streams via the weighting approach. A gap-based weight of a sequential pattern can be computed from the gaps of data elements in the sequential pattern without any pre-defined weight information. That is, in the approach, the gaps of data elements in each sequential pattern as well as their generation orders are used to get the weight of the sequential pattern, therefore it can help to get more interesting and useful sequential patterns. Recently most of computer application fields generate data as a form of data streams rather than a finite data set. Considering the change of data, the proposed method is mainly focus on sequence data streams.

Object Tracking Based on Exactly Reweighted Online Total-Error-Rate Minimization (정확히 재가중되는 온라인 전체 에러율 최소화 기반의 객체 추적)

  • JANG, Se-In;PARK, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.53-65
    • /
    • 2019
  • Object tracking is one of important steps to achieve video-based surveillance systems. Object tracking is considered as an essential task similar to object detection and recognition. In order to perform object tracking, various machine learning methods (e.g., least-squares, perceptron and support vector machine) can be applied for different designs of tracking systems. In general, generative methods (e.g., principal component analysis) were utilized due to its simplicity and effectiveness. However, the generative methods were only focused on modeling the target object. Due to this limitation, discriminative methods (e.g., binary classification) were adopted to distinguish the target object and the background. Among the machine learning methods for binary classification, total error rate minimization can be used as one of successful machine learning methods for binary classification. The total error rate minimization can achieve a global minimum due to a quadratic approximation to a step function while other methods (e.g., support vector machine) seek local minima using nonlinear functions (e.g., hinge loss function). Due to this quadratic approximation, the total error rate minimization could obtain appropriate properties in solving optimization problems for binary classification. However, this total error rate minimization was based on a batch mode setting. The batch mode setting can be limited to several applications under offline learning. Due to limited computing resources, offline learning could not handle large scale data sets. Compared to offline learning, online learning can update its solution without storing all training samples in learning process. Due to increment of large scale data sets, online learning becomes one of essential properties for various applications. Since object tracking needs to handle data samples in real time, online learning based total error rate minimization methods are necessary to efficiently address object tracking problems. Due to the need of the online learning, an online learning based total error rate minimization method was developed. However, an approximately reweighted technique was developed. Although the approximation technique is utilized, this online version of the total error rate minimization could achieve good performances in biometric applications. However, this method is assumed that the total error rate minimization can be asymptotically achieved when only the number of training samples is infinite. Although there is the assumption to achieve the total error rate minimization, the approximation issue can continuously accumulate learning errors according to increment of training samples. Due to this reason, the approximated online learning solution can then lead a wrong solution. The wrong solution can make significant errors when it is applied to surveillance systems. In this paper, we propose an exactly reweighted technique to recursively update the solution of the total error rate minimization in online learning manner. Compared to the approximately reweighted online total error rate minimization, an exactly reweighted online total error rate minimization is achieved. The proposed exact online learning method based on the total error rate minimization is then applied to object tracking problems. In our object tracking system, particle filtering is adopted. In particle filtering, our observation model is consisted of both generative and discriminative methods to leverage the advantages between generative and discriminative properties. In our experiments, our proposed object tracking system achieves promising performances on 8 public video sequences over competing object tracking systems. The paired t-test is also reported to evaluate its quality of the results. Our proposed online learning method can be extended under the deep learning architecture which can cover the shallow and deep networks. Moreover, online learning methods, that need the exact reweighting process, can use our proposed reweighting technique. In addition to object tracking, the proposed online learning method can be easily applied to object detection and recognition. Therefore, our proposed methods can contribute to online learning community and object tracking, detection and recognition communities.

A study on factors affecting high school students of school violence - Focusing on personality factors - (고등학생의 학교폭력에 영향을 미치는 요인에 관한 연구 : 인성요인을 중심으로)

  • Lee, Jung-Duk;Chang, Jeong-Hyeon
    • Korean Security Journal
    • /
    • no.42
    • /
    • pp.393-422
    • /
    • 2015
  • The school is committed to the role of public education for the effective development of student academic achievement and intellectual ability. It is also a space to expand the range of care and understanding for others with as a member of the academic community as well. However, the reality of our country schooling has been pointed out that if a lot about the lack of character education related to the attitude of life care for others and feel a sense of responsibility. An individual is not necessarily emotional intelligence There are side out, as well as grow into adults, schools are obliged to teach a variety of methods associated with it. Nevertheless, education is not of the country beyond the formal excessive administrative work or other activities due to the process of character education to neglect, including the work of teachers. Therefore, students brought the expansion of the act that should not, and eventually lead to serious social issues that directly harm others, such as school violence. Therefore, students brought the expansion of the act that should not, and eventually lead to serious social issues that directly harm others, such as school violence. This study is based on an act of juvenile delinquency and criminology education was to refine the concept of toughness and validate the relationship between school violence through empirical research. Accordingly, from July 1, 2013 September 31, 2013 to 277 high schools across the country are attending the third year of the schools available for non-response and analysis of the students who participated in the admission and simulated typical presentation of K University in Gyeonggi-do not judge students in the final analysis, except for the data and data from a total of 1045 patients were utilized. As a result, many schools have experienced violence, male student work can be applied to a lot of rock school violence was experienced. Also, a lot of experience can be applied to a healthy student rock school violence, anger-control and empathy, this is considered a low student showed consciousness experienced school violence exerted.

  • PDF

Analysis of Research Trends of 'Word of Mouth (WoM)' through Main Path and Word Co-occurrence Network (주경로 분석과 연관어 네트워크 분석을 통한 '구전(WoM)' 관련 연구동향 분석)

  • Shin, Hyunbo;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.179-200
    • /
    • 2019
  • Word-of-mouth (WoM) is defined by consumer activities that share information concerning consumption. WoM activities have long been recognized as important in corporate marketing processes and have received much attention, especially in the marketing field. Recently, according to the development of the Internet, the way in which people exchange information in online news and online communities has been expanded, and WoM is diversified in terms of word of mouth, score, rating, and liking. Social media makes online users easy access to information and online WoM is considered a key source of information. Although various studies on WoM have been preceded by this phenomenon, there is no meta-analysis study that comprehensively analyzes them. This study proposed a method to extract major researches by applying text mining techniques and to grasp the main issues of researches in order to find the trend of WoM research using scholarly big data. To this end, a total of 4389 documents were collected by the keyword 'Word-of-mouth' from 1941 to 2018 in Scopus (www.scopus.com), a citation database, and the data were refined through preprocessing such as English morphological analysis, stopwords removal, and noun extraction. To carry out this study, we adopted main path analysis (MPA) and word co-occurrence network analysis. MPA detects key researches and is used to track the development trajectory of academic field, and presents the research trend from a macro perspective. For this, we constructed a citation network based on the collected data. The node means a document and the link means a citation relation in citation network. We then detected the key-route main path by applying SPC (Search Path Count) weights. As a result, the main path composed of 30 documents extracted from a citation network. The main path was able to confirm the change of the academic area which was developing along with the change of the times reflecting the industrial change such as various industrial groups. The results of MPA revealed that WoM research was distinguished by five periods: (1) establishment of aspects and critical elements of WoM, (2) relationship analysis between WoM variables, (3) beginning of researches of online WoM, (4) relationship analysis between WoM and purchase, and (5) broadening of topics. It was found that changes within the industry was reflected in the results such as online development and social media. Very recent studies showed that the topics and approaches related WoM were being diversified to circumstantial changes. However, the results showed that even though WoM was used in diverse fields, the main stream of the researches of WoM from the start to the end, was related to marketing and figuring out the influential factors that proliferate WoM. By applying word co-occurrence network analysis, the research trend is presented from a microscopic point of view. Word co-occurrence network was constructed to analyze the relationship between keywords and social network analysis (SNA) was utilized. We divided the data into three periods to investigate the periodic changes and trends in discussion of WoM. SNA showed that Period 1 (1941~2008) consisted of clusters regarding relationship, source, and consumers. Period 2 (2009~2013) contained clusters of satisfaction, community, social networks, review, and internet. Clusters of period 3 (2014~2018) involved satisfaction, medium, review, and interview. The periodic changes of clusters showed transition from offline to online WoM. Media of WoM have become an important factor in spreading the words. This study conducted a quantitative meta-analysis based on scholarly big data regarding WoM. The main contribution of this study is that it provides a micro perspective on the research trend of WoM as well as the macro perspective. The limitation of this study is that the citation network constructed in this study is a network based on the direct citation relation of the collected documents for MPA.

An Exploration of MIS Quarterly Research Trends: Applying Topic Modeling and Keyword Network Analysis (MIS Quarterly 연구동향 탐색: 토픽모델링 및 키워드 네트워크 분석 활용)

  • Kang, Eunkyung;Jung, Yeonsik;Yang, Seonuk;Kwon, Jiyoon;Yang, Sung-Byung
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.207-235
    • /
    • 2022
  • In a knowledge-based society where knowledge and information industries are the main pillars of the economy, knowledge sharing and diffusion and its systematic management are recognized as essential strategies for improving national competitiveness and sustainable social development. In the field of Information Systems (IS) research, where the convergence of information technology and management takes place in various ways, the evolution of knowledge occurs only when researchers cooperate in turning old knowledge into new knowledge from the perspective of the scientific knowledge network. In particular, it is possible to derive new insights by identifying topics of interest in the relevant research field, applied methodologies, and research trends through network-based interdisciplinary graftings such as citations, co-authorships, and keywords. In previous studies, various attempts have been made to understand the structure of the knowledge system and the research trends of the relevant community by revealing the relationship between research topics, methodologies, and co-authors. However, most studies have compared two or more journals and been limited to a certain period; hence, there is a lack of research that looked at research trends covering the entire history of IS research. Therefore, this study was conducted in the following order for all the papers (from its first issue in 1977 to the first quarter of 2022) published in the MIS Quarterly (MISQ) Journal, which plays a leading role in revealing knowledge in the IS research field: (1) After extracting keywords, (2) classifying the extracted keywords into research topics, methodologies, and theories, and (3) using topic modeling and keyword network analysis in order to identify the changes from the beginning to the present of the IS research in a chronological manner. Through this study, it is expected that by examining the changes in IS research published in MISQ, the developing patterns of IS research can be revealed, and a new research direction can be presented to IS researchers, nurturing the sustainability of future research.

Improved Social Network Analysis Method in SNS (SNS에서의 개선된 소셜 네트워크 분석 방법)

  • Sohn, Jong-Soo;Cho, Soo-Whan;Kwon, Kyung-Lag;Chung, In-Jeong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.117-127
    • /
    • 2012
  • Due to the recent expansion of the Web 2.0 -based services, along with the widespread of smartphones, online social network services are being popularized among users. Online social network services are the online community services which enable users to communicate each other, share information and expand human relationships. In the social network services, each relation between users is represented by a graph consisting of nodes and links. As the users of online social network services are increasing rapidly, the SNS are actively utilized in enterprise marketing, analysis of social phenomenon and so on. Social Network Analysis (SNA) is the systematic way to analyze social relationships among the members of the social network using the network theory. In general social network theory consists of nodes and arcs, and it is often depicted in a social network diagram. In a social network diagram, nodes represent individual actors within the network and arcs represent relationships between the nodes. With SNA, we can measure relationships among the people such as degree of intimacy, intensity of connection and classification of the groups. Ever since Social Networking Services (SNS) have drawn increasing attention from millions of users, numerous researches have made to analyze their user relationships and messages. There are typical representative SNA methods: degree centrality, betweenness centrality and closeness centrality. In the degree of centrality analysis, the shortest path between nodes is not considered. However, it is used as a crucial factor in betweenness centrality, closeness centrality and other SNA methods. In previous researches in SNA, the computation time was not too expensive since the size of social network was small. Unfortunately, most SNA methods require significant time to process relevant data, and it makes difficult to apply the ever increasing SNS data in social network studies. For instance, if the number of nodes in online social network is n, the maximum number of link in social network is n(n-1)/2. It means that it is too expensive to analyze the social network, for example, if the number of nodes is 10,000 the number of links is 49,995,000. Therefore, we propose a heuristic-based method for finding the shortest path among users in the SNS user graph. Through the shortest path finding method, we will show how efficient our proposed approach may be by conducting betweenness centrality analysis and closeness centrality analysis, both of which are widely used in social network studies. Moreover, we devised an enhanced method with addition of best-first-search method and preprocessing step for the reduction of computation time and rapid search of the shortest paths in a huge size of online social network. Best-first-search method finds the shortest path heuristically, which generalizes human experiences. As large number of links is shared by only a few nodes in online social networks, most nods have relatively few connections. As a result, a node with multiple connections functions as a hub node. When searching for a particular node, looking for users with numerous links instead of searching all users indiscriminately has a better chance of finding the desired node more quickly. In this paper, we employ the degree of user node vn as heuristic evaluation function in a graph G = (N, E), where N is a set of vertices, and E is a set of links between two different nodes. As the heuristic evaluation function is used, the worst case could happen when the target node is situated in the bottom of skewed tree. In order to remove such a target node, the preprocessing step is conducted. Next, we find the shortest path between two nodes in social network efficiently and then analyze the social network. For the verification of the proposed method, we crawled 160,000 people from online and then constructed social network. Then we compared with previous methods, which are best-first-search and breath-first-search, in time for searching and analyzing. The suggested method takes 240 seconds to search nodes where breath-first-search based method takes 1,781 seconds (7.4 times faster). Moreover, for social network analysis, the suggested method is 6.8 times and 1.8 times faster than betweenness centrality analysis and closeness centrality analysis, respectively. The proposed method in this paper shows the possibility to analyze a large size of social network with the better performance in time. As a result, our method would improve the efficiency of social network analysis, making it particularly useful in studying social trends or phenomena.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.