• Title/Summary/Keyword: web Graph

Search Result 220, Processing Time 0.024 seconds

An Improved Automatic Text Summarization Based on Lexical Chaining Using Semantical Word Relatedness (단어 간 의미적 연관성을 고려한 어휘 체인 기반의 개선된 자동 문서요약 방법)

  • Cha, Jun Seok;Kim, Jeong In;Kim, Jung Min
    • Smart Media Journal
    • /
    • v.6 no.1
    • /
    • pp.22-29
    • /
    • 2017
  • Due to the rapid advancement and distribution of smart devices of late, document data on the Internet is on the sharp increase. The increment of information on the Web including a massive amount of documents makes it increasingly difficult for users to understand corresponding data. In order to efficiently summarize documents in the field of automated summary programs, various researches are under way. This study uses TextRank algorithm to efficiently summarize documents. TextRank algorithm expresses sentences or keywords in the form of a graph and understands the importance of sentences by using its vertices and edges to understand semantic relations between vocabulary and sentence. It extracts high-ranking keywords and based on keywords, it extracts important sentences. To extract important sentences, the algorithm first groups vocabulary. Grouping vocabulary is done using a scale of specific weight. The program sorts out sentences with higher scores on the weight scale, and based on selected sentences, it extracts important sentences to summarize the document. This study proved that this process confirmed an improved performance than summary methods shown in previous researches and that the algorithm can more efficiently summarize documents.

Allocation Techniques for NVM-Based Fast Storage Considering Application Characteristics (응용의 특성을 고려한 NVM 기반 고속 스토리지의 배치 방안)

  • Kim, Jisun;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.4
    • /
    • pp.65-69
    • /
    • 2019
  • This paper presents an optimized adoption of NVM for the storage system considering application characteristics. To do so, we first characterize the storage access patterns for different application types, and make two prominent observations that can be exploited in allocating NVM storage efficiently. The first observation is that a bulk of I/O does not happen on a single storage partition, but it is varied significantly for different application categories. Our second observation is that there exists a large proportion of single accessing in storage data. Based on these observations, we show that maximizing the storage performance with NVM is not obtained by fixing it as a specific storage partition but by allocating it adaptively for different applications. Specifically, for graph, database, and web applications, using NVM as a swap, a journal, and a file system partitions, respectively, performs well.

Employee's Discontent Text Analysis on Anonymous Company Review Web and Suggestions for Discontent Resolve (기업 리뷰 웹 사이트 텍스트 분석을 통한 직원 불만 표현 추출과 불만 원인 도출 및 해소 방안)

  • Baek, HyeYeon;Park, Yongsuk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.4
    • /
    • pp.357-364
    • /
    • 2019
  • As industrial information disclosure by insider's rate is around 80%, most of relevant researches explain briefly its causes are discontent of salary or human resources system. This paper scrapes texts on Jobplanet, an anonymous company review website and analyzes discontent keyword by 7 related area and their contexts to find out more details on brief causes referred above. After drawing LGG (Local Grammar Graph) by each areas with related dictionary list, this paper shows an example of concordance as a proof and several ways for human resources leakage prevention. Finally, text analysis results are compared with previous researches based on survey with limited questions and answers. This study is meaningful to expand the scope of employee discontent analysis with company review text and provide more specific, granular and honest discontent vocabularies.

Web based Fault Tolerance 3D Visualization of IoT Sensor Information (웹 기반 IoT 센서 수집 정보의 결함 허용 3D 시각화)

  • Min, Kyoung-Ju;Jin, Byeong-Chan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.1
    • /
    • pp.146-152
    • /
    • 2022
  • Information collected from temperature, humidity, inclination, and pressure sensors using Raspberry Pi or Arduino is used in automatic constant temperature and constant humidity systems. In addition, by using it in the agricultural and livestock industry to remotely control the system with only a smartphone, workers in the agricultural and livestock industry can use it conveniently. In general, temperature and humidity are expressed in a line graph, etc., and the change is monitored in real time. The technology to visually express the temperature has recently been used intuitively by using an infrared device to test the fever of Corona 19. In this paper, the information collected from the Raspberry Pi and the DHT11 sensor is used to predict the temperature change in space through intuitive visualization and to make a immediate response. To this end, an algorithm was created to effectively visualize temperature and humidity, and data representation is possible even if some sensors are defective.

Generating Pairwise Comparison Set for Crowed Sourcing based Deep Learning (크라우드 소싱 기반 딥러닝 선호 학습을 위한 쌍체 비교 셋 생성)

  • Yoo, Kihyun;Lee, Donggi;Lee, Chang Woo;Nam, Kwang Woo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.5
    • /
    • pp.1-11
    • /
    • 2022
  • With the development of deep learning technology, various research and development are underway to estimate preference rankings through learning, and it is used in various fields such as web search, gene classification, recommendation system, and image search. Approximation algorithms are used to estimate deep learning-based preference ranking, which builds more than k comparison sets on all comparison targets to ensure proper accuracy, and how to build comparison sets affects learning. In this paper, we propose a k-disjoint comparison set generation algorithm and a k-chain comparison set generation algorithm, a novel algorithm for generating paired comparison sets for crowd-sourcing-based deep learning affinity measurements. In particular, the experiment confirmed that the k-chaining algorithm, like the conventional circular generation algorithm, also has a random nature that can support stable preference evaluation while ensuring connectivity between data.

A Study on the Semantic Modeling of Manufacturing Facilities based on Status Definition and Diagnostic Algorithms (상태 정의 및 진단 알고리즘 기반 제조설비 시멘틱 모델링에 대한 연구)

  • Kwang-Jin, Kwak;Jeong-Min, Park
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.1
    • /
    • pp.163-170
    • /
    • 2023
  • This paper introduces the semantic modeling technology for autonomous control of manufacturing facilities and status definition algorithm. With the development of digital twin technology and various ICT technologies of the smart factory, a new production management model is being built in the manufacturing industry. Based on the advanced smart manufacturing technology, the status determination algorithm was presented as a methodology to quickly identify and respond to problems with autonomous control and facilities in the factory. But the existing status determination algorithm informs the user or administrator of error information through the grid map and is presented as a model for coping with it. However, the advancement and direction of smart manufacturing technology is diversifying into flexible production and production tailored to consumer needs. Accordingly, in this paper, a technology that can design and build a factory using a semantic-based Linked List data structure and provide only necessary information to users or managers through graph-based information is introduced to improve management efficiency. This methodology can be used as a structure suitable for flexible production and small-volume production of various types.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Semantic Process Retrieval with Similarity Algorithms (유사도 알고리즘을 활용한 시맨틱 프로세스 검색방안)

  • Lee, Hong-Joo;Klein, Mark
    • Asia pacific journal of information systems
    • /
    • v.18 no.1
    • /
    • pp.79-96
    • /
    • 2008
  • One of the roles of the Semantic Web services is to execute dynamic intra-organizational services including the integration and interoperation of business processes. Since different organizations design their processes differently, the retrieval of similar semantic business processes is necessary in order to support inter-organizational collaborations. Most approaches for finding services that have certain features and support certain business processes have relied on some type of logical reasoning and exact matching. This paper presents our approach of using imprecise matching for expanding results from an exact matching engine to query the OWL(Web Ontology Language) MIT Process Handbook. MIT Process Handbook is an electronic repository of best-practice business processes. The Handbook is intended to help people: (1) redesigning organizational processes, (2) inventing new processes, and (3) sharing ideas about organizational practices. In order to use the MIT Process Handbook for process retrieval experiments, we had to export it into an OWL-based format. We model the Process Handbook meta-model in OWL and export the processes in the Handbook as instances of the meta-model. Next, we need to find a sizable number of queries and their corresponding correct answers in the Process Handbook. Many previous studies devised artificial dataset composed of randomly generated numbers without real meaning and used subjective ratings for correct answers and similarity values between processes. To generate a semantic-preserving test data set, we create 20 variants for each target process that are syntactically different but semantically equivalent using mutation operators. These variants represent the correct answers of the target process. We devise diverse similarity algorithms based on values of process attributes and structures of business processes. We use simple similarity algorithms for text retrieval such as TF-IDF and Levenshtein edit distance to devise our approaches, and utilize tree edit distance measure because semantic processes are appeared to have a graph structure. Also, we design similarity algorithms considering similarity of process structure such as part process, goal, and exception. Since we can identify relationships between semantic process and its subcomponents, this information can be utilized for calculating similarities between processes. Dice's coefficient and Jaccard similarity measures are utilized to calculate portion of overlaps between processes in diverse ways. We perform retrieval experiments to compare the performance of the devised similarity algorithms. We measure the retrieval performance in terms of precision, recall and F measure? the harmonic mean of precision and recall. The tree edit distance shows the poorest performance in terms of all measures. TF-IDF and the method incorporating TF-IDF measure and Levenshtein edit distance show better performances than other devised methods. These two measures are focused on similarity between name and descriptions of process. In addition, we calculate rank correlation coefficient, Kendall's tau b, between the number of process mutations and ranking of similarity values among the mutation sets. In this experiment, similarity measures based on process structure, such as Dice's, Jaccard, and derivatives of these measures, show greater coefficient than measures based on values of process attributes. However, the Lev-TFIDF-JaccardAll measure considering process structure and attributes' values together shows reasonably better performances in these two experiments. For retrieving semantic process, we can think that it's better to consider diverse aspects of process similarity such as process structure and values of process attributes. We generate semantic process data and its dataset for retrieval experiment from MIT Process Handbook repository. We suggest imprecise query algorithms that expand retrieval results from exact matching engine such as SPARQL, and compare the retrieval performances of the similarity algorithms. For the limitations and future work, we need to perform experiments with other dataset from other domain. And, since there are many similarity values from diverse measures, we may find better ways to identify relevant processes by applying these values simultaneously.

Improved Social Network Analysis Method in SNS (SNS에서의 개선된 소셜 네트워크 분석 방법)

  • Sohn, Jong-Soo;Cho, Soo-Whan;Kwon, Kyung-Lag;Chung, In-Jeong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.117-127
    • /
    • 2012
  • Due to the recent expansion of the Web 2.0 -based services, along with the widespread of smartphones, online social network services are being popularized among users. Online social network services are the online community services which enable users to communicate each other, share information and expand human relationships. In the social network services, each relation between users is represented by a graph consisting of nodes and links. As the users of online social network services are increasing rapidly, the SNS are actively utilized in enterprise marketing, analysis of social phenomenon and so on. Social Network Analysis (SNA) is the systematic way to analyze social relationships among the members of the social network using the network theory. In general social network theory consists of nodes and arcs, and it is often depicted in a social network diagram. In a social network diagram, nodes represent individual actors within the network and arcs represent relationships between the nodes. With SNA, we can measure relationships among the people such as degree of intimacy, intensity of connection and classification of the groups. Ever since Social Networking Services (SNS) have drawn increasing attention from millions of users, numerous researches have made to analyze their user relationships and messages. There are typical representative SNA methods: degree centrality, betweenness centrality and closeness centrality. In the degree of centrality analysis, the shortest path between nodes is not considered. However, it is used as a crucial factor in betweenness centrality, closeness centrality and other SNA methods. In previous researches in SNA, the computation time was not too expensive since the size of social network was small. Unfortunately, most SNA methods require significant time to process relevant data, and it makes difficult to apply the ever increasing SNS data in social network studies. For instance, if the number of nodes in online social network is n, the maximum number of link in social network is n(n-1)/2. It means that it is too expensive to analyze the social network, for example, if the number of nodes is 10,000 the number of links is 49,995,000. Therefore, we propose a heuristic-based method for finding the shortest path among users in the SNS user graph. Through the shortest path finding method, we will show how efficient our proposed approach may be by conducting betweenness centrality analysis and closeness centrality analysis, both of which are widely used in social network studies. Moreover, we devised an enhanced method with addition of best-first-search method and preprocessing step for the reduction of computation time and rapid search of the shortest paths in a huge size of online social network. Best-first-search method finds the shortest path heuristically, which generalizes human experiences. As large number of links is shared by only a few nodes in online social networks, most nods have relatively few connections. As a result, a node with multiple connections functions as a hub node. When searching for a particular node, looking for users with numerous links instead of searching all users indiscriminately has a better chance of finding the desired node more quickly. In this paper, we employ the degree of user node vn as heuristic evaluation function in a graph G = (N, E), where N is a set of vertices, and E is a set of links between two different nodes. As the heuristic evaluation function is used, the worst case could happen when the target node is situated in the bottom of skewed tree. In order to remove such a target node, the preprocessing step is conducted. Next, we find the shortest path between two nodes in social network efficiently and then analyze the social network. For the verification of the proposed method, we crawled 160,000 people from online and then constructed social network. Then we compared with previous methods, which are best-first-search and breath-first-search, in time for searching and analyzing. The suggested method takes 240 seconds to search nodes where breath-first-search based method takes 1,781 seconds (7.4 times faster). Moreover, for social network analysis, the suggested method is 6.8 times and 1.8 times faster than betweenness centrality analysis and closeness centrality analysis, respectively. The proposed method in this paper shows the possibility to analyze a large size of social network with the better performance in time. As a result, our method would improve the efficiency of social network analysis, making it particularly useful in studying social trends or phenomena.

Marketing Standardization and Firm Performance in International E.Commerce (국제전자상무중적영소표준화화공사표현(国际电子商务中的营销标准化和公司表现))

  • Fritz, Wolfgang;Dees, Heiko
    • Journal of Global Scholars of Marketing Science
    • /
    • v.19 no.3
    • /
    • pp.37-48
    • /
    • 2009
  • The standardization of marketing has been one of the most focused research topics in international marketing. The term "global marketing" was often used to mean an internationally standardized marketing strategy based on similarities between foreign markets. Marketing standardization was discussed only within the context of traditional physical marketplaces. Since then, the digital "marketspace" of the Internet had emerged in the 90's, and it became one of the most important drivers of the globalization process opening new opportunities for the standardization of global marketing activities. On the other hand, the opinion that a greater adoption of the Internet by customers may lead to a higher degree of customization and differentiation of products rather than standardization is also quite popular. Considering this disagreement, it is notable that comprehensive studies which focus upon the marketing standardization especially in the context of global e-commerce are missing to a high degree. On this background, the two basic research questions being addressed in this study are: (1) To what extent do companies standardize their marketing in international e-commerce? (2) Is there an impact of marketing standardization on the performance (or success) of these companies? Following research hypotheses were generated based upon literature review: H 1: Internationally engaged e-commerce firms show a growing readiness for marketing standardization. H 2: Marketing standardization exerts positive effects on the success of companies in international e-commerce. H 3: In international e-commerce, marketing mix standardization exerts a stronger positive effect on the economic as well as the non-economic success of companies than marketing process standardization. H 4: The higher the non-economic success in international e-commerce firms, the higher the economic success. The data for this research were obtained from a questionnaire survey conducted from February to April 2005. The international e-commerce companies of various industries in Germany and all subsidiaries or headquarters of foreign e-commerce companies based in Germany were included in the survey. 118 out of 801 companies responded to the questionnaire. For structural equation modelling (SEM), the Partial-Least. Squares (PLS) approach in the version PLS-Graph 3.0 was applied (Chin 1998a; 2001). All of four research hypotheses were supported by result of data analysis. The results show that companies engaged in international e-commerce standardize in particular brand name, web page design, product positioning, and the product program to a high degree. The companies intend to intensify their efforts for marketing mix standardization in the future. In addition they want to standardize their marketing processes also to a higher degree, especially within the range of information systems, corporate language and online marketing control procedures. In this study, marketing standardization exerts a positive overall impact on company performance in international e-commerce. Standardization of marketing mix exerts a stronger positive impact on the non-economic success than standardization of marketing processes, which in turn contributes slightly stronger to the economic success. Furthermore, our findings give clear support to the assumption that the non-economic success is highly relevant to the economic success of the firm in international e-commerce. The empirical findings indicate that marketing standardization is relevant to the companies' success in international e-commerce. But marketing mix and marketing process standardization contribute to the firms' economic and non-economic success in different ways. The findings indicate that companies do standardize numerous elements of their marketing mix on the Internet. This practice is in part contrary to the popular concept of a "differentiated standardization" which argues that some elements of the marketing mix should be adapted locally and others should be standardized internationally. Furthermore, the findings suggest that the overall standardization of marketing -rather than the standardization of one particular marketing mix element - is what brings about a positive overall impact on success.

  • PDF