• Title/Summary/Keyword: Indexing Process

Search Result 174, Processing Time 0.027 seconds

A Distributed Spatial Indexing Technique based on Hilbert Curve and MBR for k-NN Query Processing in a Single Broadcast Channel Environment (단일방송채널환경에서 k-최근접질의 처리를 위한 힐버트 곡선과 최소영역 사각형 기반의 분산 공간 인덱싱 기법)

  • Yi, Jung-Hyung;Jung, Sung-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.4
    • /
    • pp.203-208
    • /
    • 2010
  • This paper deals with an efficient index scheduling technique based on Hilbert curve and MBR for k-NN query in a single wireless broadcast channel environment. Previous works have two major problems. One is that they need a long time to process queries due to the back-tracking problem. The other is that they have to download too many spatial data since they can not reduce search space rapidly. Our proposed method broadcasts spatial data based on Hilbert curve order where a distributed index table is also broadcast with each spatial data. Each entry of index table represents the MBR which groups spatial data. By predicting the unknown location of spatial data, our proposed index scheme allows mobile clients to remove unnecessary data and to reduce search space rapidly. As a result, our method gives the decreased tuning time and access latency.

An Automatic LOINC Mapping Framework for Standardization of Laboratory Codes in Medical Informatics (의료 정보 검사코드 표준화를 위한 LOINC 자동 매핑 프레임웍)

  • Ahn, Hoo-Young;Park, Young-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.8
    • /
    • pp.1172-1181
    • /
    • 2009
  • An electronic medical record (EMR) is the medical system that all the test are recorded as text data. However, domestic EMR systems have various forms of medical records. There are a lot of related works to standardize the laboratory codes as a LOINC (Logical Observation Identifiers Names and Code). However the existing researches resolve the problem manually. The manual process does not work when the size of data is enormous. The paper proposes a novel automatic LOINC mapping algorithm which uses indexing techniques and semantic similarity analysis of medical information. They use file system which is not proper to enormous medical data. We designed and implemented mapping algorithm for standardization laboratory codes in medical informatics compared with the existing researches that are only proposed algorithms. The automatic creation of searching words is being possible. Moreover, the paper implemented medical searching framework based on database system that is considered large size of medical data.

  • PDF

Development of Extracting System for Meaning·Subject Related Social Topic using Deep Learning (딥러닝을 통한 의미·주제 연관성 기반의 소셜 토픽 추출 시스템 개발)

  • Cho, Eunsook;Min, Soyeon;Kim, Sehoon;Kim, Bonggil
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.14 no.4
    • /
    • pp.35-45
    • /
    • 2018
  • Users are sharing many of contents such as text, image, video, and so on in SNS. There are various information as like as personal interesting, opinion, and relationship in social media contents. Therefore, many of recommendation systems or search systems are being developed through analysis of social media contents. In order to extract subject-related topics of social context being collected from social media channels in developing those system, it is necessary to develop ontologies for semantic analysis. However, it is difficult to develop formal ontology because social media contents have the characteristics of non-formal data. Therefore, we develop a social topic system based on semantic and subject correlation. First of all, an extracting system of social topic based on semantic relationship analyzes semantic correlation and then extracts topics expressing semantic information of corresponding social context. Because the possibility of developing formal ontology expressing fully semantic information of various areas is limited, we develop a self-extensible architecture of ontology for semantic correlation. And then, a classifier of social contents and feed back classifies equivalent subject's social contents and feedbacks for extracting social topics according semantic correlation. The result of analyzing social contents and feedbacks extracts subject keyword, and index by measuring the degree of association based on social topic's semantic correlation. Deep Learning is applied into the process of indexing for improving accuracy and performance of mapping analysis of subject's extracting and semantic correlation. We expect that proposed system provides customized contents for users as well as optimized searching results because of analyzing semantic and subject correlation.

Recommender Systems using Structural Hole and Collaborative Filtering (구조적 공백과 협업필터링을 이용한 추천시스템)

  • Kim, Mingun;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.107-120
    • /
    • 2014
  • This study proposes a novel recommender system using the structural hole analysis to reflect qualitative and emotional information in recommendation process. Although collaborative filtering (CF) is known as the most popular recommendation algorithm, it has some limitations including scalability and sparsity problems. The scalability problem arises when the volume of users and items become quite large. It means that CF cannot scale up due to large computation time for finding neighbors from the user-item matrix as the number of users and items increases in real-world e-commerce sites. Sparsity is a common problem of most recommender systems due to the fact that users generally evaluate only a small portion of the whole items. In addition, the cold-start problem is the special case of the sparsity problem when users or items newly added to the system with no ratings at all. When the user's preference evaluation data is sparse, two users or items are unlikely to have common ratings, and finally, CF will predict ratings using a very limited number of similar users. Moreover, it may produces biased recommendations because similarity weights may be estimated using only a small portion of rating data. In this study, we suggest a novel limitation of the conventional CF. The limitation is that CF does not consider qualitative and emotional information about users in the recommendation process because it only utilizes user's preference scores of the user-item matrix. To address this novel limitation, this study proposes cluster-indexing CF model with the structural hole analysis for recommendations. In general, the structural hole means a location which connects two separate actors without any redundant connections in the network. The actor who occupies the structural hole can easily access to non-redundant, various and fresh information. Therefore, the actor who occupies the structural hole may be a important person in the focal network and he or she may be the representative person in the focal subgroup in the network. Thus, his or her characteristics may represent the general characteristics of the users in the focal subgroup. In this sense, we can distinguish friends and strangers of the focal user utilizing the structural hole analysis. This study uses the structural hole analysis to select structural holes in subgroups as an initial seeds for a cluster analysis. First, we gather data about users' preference ratings for items and their social network information. For gathering research data, we develop a data collection system. Then, we perform structural hole analysis and find structural holes of social network. Next, we use these structural holes as cluster centroids for the clustering algorithm. Finally, this study makes recommendations using CF within user's cluster, and compare the recommendation performances of comparative models. For implementing experiments of the proposed model, we composite the experimental results from two experiments. The first experiment is the structural hole analysis. For the first one, this study employs a software package for the analysis of social network data - UCINET version 6. The second one is for performing modified clustering, and CF using the result of the cluster analysis. We develop an experimental system using VBA (Visual Basic for Application) of Microsoft Excel 2007 for the second one. This study designs to analyzing clustering based on a novel similarity measure - Pearson correlation between user preference rating vectors for the modified clustering experiment. In addition, this study uses 'all-but-one' approach for the CF experiment. In order to validate the effectiveness of our proposed model, we apply three comparative types of CF models to the same dataset. The experimental results show that the proposed model outperforms the other comparative models. In especial, the proposed model significantly performs better than two comparative modes with the cluster analysis from the statistical significance test. However, the difference between the proposed model and the naive model does not have statistical significance.

Proposed Methodological Framework of Assessing LID (Low Impact Development) Impact on Soil-Groundwater Environmental Quality (저영향개발(Low Impact Development) 기법 적용 지역 토양·지하수 환경 영향 평가 방법론 제안 연구)

  • Kim, Jongmo;Kim, Seonghoon;Lee, Yunkyu;Choi, Hanna;Park, Joonhong
    • Journal of the Korean GEO-environmental Society
    • /
    • v.15 no.7
    • /
    • pp.39-50
    • /
    • 2014
  • The goal of this work is to develop a framework of methods to entirely evaluate effects of LID (Low Impact Development) on soil-groundwater environmental quality as well as land-scape and ecological factors. For this study, we conducted an extensive literature review. As outcomes, soil-groundwater environmental quality is newly conceptualized as a comprehensive index reflecting (i) groundwater pollution sensitivity (hydrogeological factor), (ii) biochemical contamination, and (iii) biodegradability. The methods of classifying and indexing is shown by combining selection of the items to be measured for soil-groundwater environmental quality and integrating the resulted items comprehensively. In addition, from soil-groundwater environmental quality, land-scape and ecological factors in existing environmental impact assessment a method was developed an overall index which can evaluate effects to environment by using GIS (Geographic Information System) and AHP (Analytic Hierachy Process). For optimizing LID planning, designing and post-evaluation, LCIA (Life Cycle Impact Assessment) was regarded as an appropriate method.

An Enhancing Technique for Scan Performance of a Skip List with MVCC (MVCC 지원 스킵 리스트의 범위 탐색 향상 기법)

  • Kim, Leeju;Lee, Eunji
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.5
    • /
    • pp.107-112
    • /
    • 2020
  • Recently, unstructured data is rapidly being produced based on web-based services. NoSQL systems and key value stores that process unstructured data as key and value pairs are widely used in various applications. In this paper, a study was conducted on a skip list used for in-memory data management in an LSM-tree based key value store. The skip list used in the key value store is an insertion-based skip list that does not allow overwriting and processes all changes only by inserting. This behavior can support Multi-Version Concurrency Control (MVCC), which can simultaneously process multiple read/write requests through snapshot isolation. However, since duplicate keys exist in the skip list, the performance significantly degrades due to unnecessary node visits during a list traverse. In particular, serious overhead occurs when a range query or scan operation that collectively searches a specific range of data occurs. This paper proposes a newly designed Stride SkipList to reduce this overhead. The stride skip list additionally maintains an indexing pointer for the last node of the same key to avoid unnecessary node visits. The proposed scheme is implemented using RocksDB's in-memory component, and the performance evaluation shows that the performance of SCAN operation improves by up to 350 times compared to the existing skip list for various workloads.

Range Stabbing Technique for Continuous Queries on RFID Streaming Data) (RFID 스트리밍 데이타의 연속질의를 위한 영역 스태빙 기법)

  • Park, Jae-Kwan;Hong, Bong-Hee;Lee, Ki-Han
    • Journal of KIISE:Databases
    • /
    • v.36 no.2
    • /
    • pp.112-122
    • /
    • 2009
  • The EPCglobal leading the development in RFID standards proposed Event Cycle Specification (ECSpec) and Event Cycle Reports (ECReports) for the standard about RFID middleware interface. ECSpec is a specification for filtering and collecting RFID tag data and is treated as a Continuous Query (CQ) processed during fixed time intervals repeatedly. ECReport is a specification for describing the results after ECSpec is processed. Thus, it is efficient to apply Query Indexing technique designed for the continuous query processing. This query index processes ECSpecs as data and tag events as queries for efficiency. In logistics environment, the similar or same products are transferred together. Also, when RFID tags attached to the products are acquired, the acquisition events occur massively for the short period. For these properties, it is inefficient to process the massive events one by one. In this paper, we propose a technique reducing similar search process by considering tag events which are collected by the report period in ECSpec, as a range query. For this group processing, we suggest a queuing method for collecting tag events efficiently and a structure for generating range queries in the queues. The experiments show that performance is enhanced by the proposed methods.

Rule Discovery and Matching for Forecasting Stock Prices (주가 예측을 위한 규칙 탐사 및 매칭)

  • Ha, You-Min;Kim, Sang-Wook;Won, Jung-Im;Park, Sang-Hyun;Yoon, Jee-Hee
    • Journal of KIISE:Databases
    • /
    • v.34 no.3
    • /
    • pp.179-192
    • /
    • 2007
  • This paper addresses an approach that recommends investment types for stock investors by discovering useful rules from past changing patterns of stock prices in databases. First, we define a new rule model for recommending stock investment types. For a frequent pattern of stock prices, if its subsequent stock prices are matched to a condition of an investor, the model recommends a corresponding investment type for this stock. The frequent pattern is regarded as a rule head, and the subsequent part a rule body. We observed that the conditions on rule bodies are quite different depending on dispositions of investors while rule heads are independent of characteristics of investors in most cases. With this observation, we propose a new method that discovers and stores only the rule heads rather than the whole rules in a rule discovery process. This allows investors to define various conditions on rule bodies flexibly, and also improves the performance of a rule discovery process by reducing the number of rules. For efficient discovery and matching of rules, we propose methods for discovering frequent patterns, constructing a frequent pattern base, and indexing them. We also suggest a method that finds the rules matched to a query issued by an investor from a frequent pattern base, and a method that recommends an investment type using the rules. Finally, we verify the superiority of our approach via various experiments using real-life stock data.

A correlation analysis between state variables of rainfall-runoff model and hydrometeorological variables (강우-유출 모형의 상태변수와 수문기상변량과의 상관성 분석)

  • Shim, Eunjeung;Uranchimeg, Sumiya;Lee, Yearin;Moon, Young-Il;Lee, Joo-Heon;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.12
    • /
    • pp.1295-1304
    • /
    • 2021
  • For the efficient use and management of water resources, a reliable rainfall-runoff analysis is necessary. Still, continuous hydrological data and rainfall-runoff data are insufficient to secure through measurements and models. In particular, as part of the reasonable improvement of a rainfall-runoff model in the case of an ungauged watershed, regionalization is being used to transfer the parameters necessary for the model application to the ungauged watershed. In this study, the GR4J model was selected, and the SCEM-UA method was used to optimize parameters. The rainfall-runoff model for the analysis of the correlation between watershed characteristics and parameters obtained through the model was regionalized by the Copula function, and rainfall-runoff analysis with the regionalized parameters was performed on the ungauged watershed. In the process, the intermediate state variables of the rainfall-runoff model were extracted, and the correlation analysis between water level and the ground water level was investigated. Furthermore, in the process of rainfall-runoff analysis, the Standardized State variable Drought Index (SSDI) was calculated by calculating and indexing the state variables of the GR4J model. and the calculated SSDI was compared with the standardized Precipitation index (SPI), and the hydrological suitability evaluation of the drought index was performed to confirm the possibility of drought monitoring and application in the ungauged watershed.

Establishment of Database System for Radiation Oncology (방사선 종양 자료관리 시스템 구축)

  • Kim, Dae-Sup;Lee, Chang-Ju;Yoo, Soon-Mi;Kim, Jong-Min;Lee, Woo-Seok;Kang, Tae-Young;Back, Geum-Mun;Hong, Dong-Ki;Kwon, Kyung-Tae
    • The Journal of Korean Society for Radiation Therapy
    • /
    • v.20 no.2
    • /
    • pp.91-102
    • /
    • 2008
  • Purpose: To enlarge the efficiency of operation and establish a constituency for development of new radiotherapy treatment through database which is established by arranging and indexing radiotherapy related affairs in well organized manner to have easy access by the user. Materials and Methods: In this study, Access program provided by Microsoft (MS Office Access) was used to operate the data base. The data of radiation oncology was distinguished by a business logs and maintenance expenditure in addition to stock management of accessories with respect to affairs and machinery management. Data for education and research was distinguished by education material for department duties, user manual and related thesis depending upon its property. Registration of data was designed to have input form according to its subject and the information of data was designed to be inspected by making a report. Number of machine failure in addition to its respective repairing hours from machine maintenance expenditure in a period of January 2008 to April 2009 was analyzed with the result of initial system usage and one year after the usage. Results: Radiation oncology database system was accomplished by distinguishing work related and research related criteria. The data are arranged and collected according to its subjects and classes, and can be accessed by searching the required data through referring the descriptions from each criteria. 32.3% of total average time was reduced on analyzing repairing hours by acquiring number of machine failure in addition to its type in a period of January 2008 to April 2009 through machine maintenance expenditure. Conclusion: On distinguishing and indexing present and past data upon its subjective criteria through the database system for radiation oncology, the use of information can be easily accessed to enlarge the efficiency of operation, and in further, can be a constituency for improvement of work process by acquiring various information required for new radiotherapy treatment in real time.

  • PDF