• Title/Summary/Keyword: graph searching

Search Result 91, Processing Time 0.023 seconds

Effect of Carrot Intake in the Prevention of Gastric Cancer: A Meta-Analysis

  • Fallahzadeh, Hossein;Jalali, Ali;Momayyezi, Mahdieh;Bazm, Soheila
    • Journal of Gastric Cancer
    • /
    • v.15 no.4
    • /
    • pp.256-261
    • /
    • 2015
  • Purpose: Gastric cancer is the third leading cause of cancer-related mortality, with the incidence and mortality being higher in men than in women. Various studies have shown that eating carrots may play a major role in the prevention of gastric cancer. We conducted a meta-analysis to determine the relationship between carrot consumption and gastric cancer. Materials and Methods: We searched multiple databases including PubMed, Cochrane Library, Scopus, ScienceDirect, and Persian databases like Scientific Information Database (SID) and IranMedx. The following search terms were used: stomach or gastric, neoplasm or cancer, carcinoma or tumor, and carrot. Statistical analyses were performed using Comprehensive Meta Analysis/2.0 software. Results: We retrieved 81 articles by searching the databases. After considering the inclusion and exclusion criteria, 5 articles were included in this study. The odds ratio (OR) obtained by fixed effects model showed that a 26% reduction in the risk of gastric cancer has been associated with the consumption of carrots) OR=0.74; 95% confidence interval=0.68~0.81; P<0.0001). According to funnel graph, the results showed that the possibility of a publication bias does not exist in this study. Conclusions: The findings of this study showed an inverse relationship between the consumption of carrots and the risk of gastric cancer.

RDBMS Based Efficient Method for Shortest Path Searching Over Large Graphs Using K-degree Index Table (대용량 그래프에서 k-차수 인덱스 테이블을 이용한 RDBMS 기반의 효율적인 최단 경로 탐색 기법)

  • Hong, Jihye;Han, Yongkoo;Lee, Young-Koo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.5
    • /
    • pp.179-186
    • /
    • 2014
  • Current networks such as social network, web page link, traffic network are big data which have the large numbers of nodes and edges. Many applications such as social network services and navigation systems use these networks. Since big networks are not fit into the memory, existing in-memory based analysis techniques cannot provide high performance. Frontier-Expansion-Merge (FEM) framework for graph search operations using three corresponding operators in the relational database (RDB) context. FEM exploits an index table that stores pre-computed partial paths for efficient shortest path discovery. However, the index table of FEM has low hit ratio because the indices are determined by distances of indices rather than the possibility of containing a shortest path. In this paper, we propose an method that construct index table using high degree nodes having high hit ratio for efficient shortest path discovery. We experimentally verify that our index technique can support shortest path discovery efficiently in real-world datasets.

A Method of Test Case Generation using BPMN-based Model Reduction for Service System (BPMN기반의 모델 단축을 이용한 서비스 시스템의 테스트 케이스 생성 기법)

  • Lee, Seung-Hoon;Kang, Dong-Su;Song, Chee-Yang;Baik, Doo-Kwon
    • The KIPS Transactions:PartD
    • /
    • v.16D no.4
    • /
    • pp.595-612
    • /
    • 2009
  • The early test can greatly reduce the cost of error correction for system development. It is still important in SOA based service system. However, the existing methods of test case generation for SOA have limitations which are restricted to only web service using XML. Therefore, this paper proposes a method of test case generation using BPMN-based model reduction for service system. For minimizing test effort, an existing BPM is transformed into S-BPM which is composed of basic elements of workflow. The process of test case generation starts with making S-BPM concerning the target service system, and transforms the target service system into directed graph. And then, we generate several service scenarios applying scenario searching algorithm and extract message moving information. Applying this method, we can obtain effective test cases which are even unlimited to web service. This result is the generation of test case which is reflected in the business-driven property of SOA.

A Study on Searching for Export Candidate Countries of the Korean Food and Beverage Industry Using Node2vec Graph Embedding and Light GBM Link Prediction (Node2vec 그래프 임베딩과 Light GBM 링크 예측을 활용한 식음료 산업의 수출 후보국가 탐색 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Seo, Jinny
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.73-95
    • /
    • 2021
  • This study uses Node2vec graph embedding method and Light GBM link prediction to explore undeveloped export candidate countries in Korea's food and beverage industry. Node2vec is the method that improves the limit of the structural equivalence representation of the network, which is known to be relatively weak compared to the existing link prediction method based on the number of common neighbors of the network. Therefore, the method is known to show excellent performance in both community detection and structural equivalence of the network. The vector value obtained by embedding the network in this way operates under the condition of a constant length from an arbitrarily designated starting point node. Therefore, it has the advantage that it is easy to apply the sequence of nodes as an input value to the model for downstream tasks such as Logistic Regression, Support Vector Machine, and Random Forest. Based on these features of the Node2vec graph embedding method, this study applied the above method to the international trade information of the Korean food and beverage industry. Through this, we intend to contribute to creating the effect of extensive margin diversification in Korea in the global value chain relationship of the industry. The optimal predictive model derived from the results of this study recorded a precision of 0.95 and a recall of 0.79, and an F1 score of 0.86, showing excellent performance. This performance was shown to be superior to that of the binary classifier based on Logistic Regression set as the baseline model. In the baseline model, a precision of 0.95 and a recall of 0.73 were recorded, and an F1 score of 0.83 was recorded. In addition, the light GBM-based optimal prediction model derived from this study showed superior performance than the link prediction model of previous studies, which is set as a benchmarking model in this study. The predictive model of the previous study recorded only a recall rate of 0.75, but the proposed model of this study showed better performance which recall rate is 0.79. The difference in the performance of the prediction results between benchmarking model and this study model is due to the model learning strategy. In this study, groups were classified by the trade value scale, and prediction models were trained differently for these groups. Specific methods are (1) a method of randomly masking and learning a model for all trades without setting specific conditions for trade value, (2) arbitrarily masking a part of the trades with an average trade value or higher and using the model method, and (3) a method of arbitrarily masking some of the trades with the top 25% or higher trade value and learning the model. As a result of the experiment, it was confirmed that the performance of the model trained by randomly masking some of the trades with the above-average trade value in this method was the best and appeared stably. It was found that most of the results of potential export candidates for Korea derived through the above model appeared appropriate through additional investigation. Combining the above, this study could suggest the practical utility of the link prediction method applying Node2vec and Light GBM. In addition, useful implications could be derived for weight update strategies that can perform better link prediction while training the model. On the other hand, this study also has policy utility because it is applied to trade transactions that have not been performed much in the research related to link prediction based on graph embedding. The results of this study support a rapid response to changes in the global value chain such as the recent US-China trade conflict or Japan's export regulations, and I think that it has sufficient usefulness as a tool for policy decision-making.

Finding Rectilinear(L1), Link Metric, and Combined Shortest Paths with an Intelligent Search Method (지능형 최단 경로, 최소 꺾임 경로 및 혼합형 최단 경로 찾기)

  • Im, Jun-Sik
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.1
    • /
    • pp.43-54
    • /
    • 1996
  • This paper presents new heuristic search algorithms for searching rectilinear r(L1), link metric, and combined shortest paths in the presence of orthogonal obstacles. The GMD(GuidedMinimum Detour) algorithm combines the best features of maze-running algorithms and line-search algorithms. The SGMD(Line-by-Line GuidedMinimum Detour)algorithm is a modiffication of the GMD algorithm that improves efficiency using line-by-line extensions. Our GMD and LGMD algorithms always find a rectilinear shortest path using the guided A search method without constructing a connection graph that contains a shortest path. The GMD and the LGMD algorithms can be implemented in O(m+eloge+NlogN) and O(eloge+NlogN) time, respectively, and O(e+N) space, where m is the total number of searched nodes, is the number of boundary sides of obstacles, and N is the total number of searched line segment. Based on the LGMD algorithm, we consider not only the problems of finding a link metric shortest path in terms of the number of bends, but also the combined L1 metric and Link Metric shortest path in terms of the length and the number of bands.

  • PDF

Efficient Collaboration Method Between CPU and GPU for Generating All Possible Cases in Combination (조합에서 모든 경우의 수를 만들기 위한 CPU와 GPU의 효율적 협업 방법)

  • Son, Ki-Bong;Son, Min-Young;Kim, Young-Hak
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.9
    • /
    • pp.219-226
    • /
    • 2018
  • One of the systematic ways to generate the number of all cases is a combination to construct a combination tree, and its time complexity is O($2^n$). A combination tree is used for various purposes such as the graph homogeneity problem, the initial model for calculating frequent item sets, and so on. However, algorithms that must search the number of all cases of a combination are difficult to use realistically due to high time complexity. Nevertheless, as the amount of data becomes large and various studies are being carried out to utilize the data, the number of cases of searching all cases is increasing. Recently, as the GPU environment becomes popular and can be easily accessed, various attempts have been made to reduce time by parallelizing algorithms having high time complexity in a serial environment. Because the method of generating the number of all cases in combination is sequential and the size of sub-task is biased, it is not suitable for parallel implementation. The efficiency of parallel algorithms can be maximized when all threads have tasks with similar size. In this paper, we propose a method to efficiently collaborate between CPU and GPU to parallelize the problem of finding the number of all cases. In order to evaluate the performance of the proposed algorithm, we analyze the time complexity in the theoretical aspect, and compare the experimental time of the proposed algorithm with other algorithms in CPU and GPU environment. Experimental results show that the proposed CPU and GPU collaboration algorithm maintains a balance between the execution time of the CPU and GPU compared to the previous algorithms, and the execution time is improved remarkable as the number of elements increases.

Dynamic Priority Search Algorithm Of Multi-Agent (멀티에이전트의 동적우선순위 탐색 알고리즘)

  • Jin-Soo Kim
    • The Journal of Engineering Research
    • /
    • v.6 no.2
    • /
    • pp.11-22
    • /
    • 2004
  • A distributed constraint satisfaction problem (distributed CSP) is a constraint satisfaction problem(CSP) in which variables and constraints are distributed among multiple automated agents. ACSP is a problem to find a consistent assignment of values to variables. Even though the definition of a CSP is very simple, a surprisingly wide variety of AI problems can be formalized as CSPs. Similarly, various application problems in DAI (Distributed AI) that are concerned with finding a consistent combination of agent actions can be formalized as distributed CAPs. In recent years, many new backtracking algorithms for solving distributed CSPs have been proposed. But most of all, they have common drawbacks that the algorithm assumes the priority of agents is static. In this thesis, we establish a basic algorithm for solving distributed CSPs called dynamic priority search algorithm that is more efficient than common backtracking algorithms in which the priority order is static. In this algorithm, agents act asynchronously and concurrently based on their local knowledge without any global control, and have a flexible organization, in which the hierarchical order is changed dynamically, while the completeness of the algorithm is guaranteed. And we showed that the dynamic priority search algorithm can solve various problems, such as the distributed 200-queens problem, the distributed graph-coloring problem that common backtracking algorithm fails to solve within a reasonable amount of time. The experimental results on example problems show that this algorithm is by far more efficient than the backtracking algorithm, in which the priority order is static. The priority order represents a hierarchy of agent authority, i.e., the priority of decision-making. Therefore, these results imply that a flexible agent organization, in which the hierarchical order is changed dynamically, actually performs better than an organization in which the hierarchical order is static and rigid. Furthermore, we describe that the agent can be available to hold multiple variables in the searching scheme.

  • PDF

Personalized Session-based Recommendation for Set-Top Box Audience Targeting (셋톱박스 오디언스 타겟팅을 위한 세션 기반 개인화 추천 시스템 개발)

  • Jisoo Cha;Koosup Jeong;Wooyoung Kim;Jaewon Yang;Sangduk Baek;Wonjun Lee;Seoho Jang;Taejoon Park;Chanwoo Jeong;Wooju Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.323-338
    • /
    • 2023
  • TV advertising with deep analysis of watching pattern of audiences is important to set-top box audience targeting. Applying session-based recommendation model(SBR) to internet commercial, or recommendation based on searching history of user showed its effectiveness in previous studies, but applying SBR to the TV advertising was difficult in South Korea due to data unavailabilities. Also, traditional SBR has limitations for dealing with user preferences, especially in data with user identification information. To tackle with these problems, we first obtain set-top box data from three major broadcasting companies in South Korea(SKB, KT, LGU+) through collaboration with Korea Broadcast Advertising Corporation(KOBACO), and this data contains of watching sequence of 4,847 anonymized users for 6 month respectively. Second, we develop personalized session-based recommendation model to deal with hierarchical data of user-session-item. Experiments conducted on set-top box audience dataset and two other public dataset for validation. In result, our proposed model outperformed baseline model in some criteria.

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.