• Title/Summary/Keyword: data set

Search Result 10,970, Processing Time 0.034 seconds

Discovering and Maintaining Semantic Mappings between XML Schemas and Ontologies

  • An, Yuan;Borgida, Alex;Mylopoulos, John
    • Journal of Computing Science and Engineering
    • /
    • v.2 no.1
    • /
    • pp.44-73
    • /
    • 2008
  • There is general agreement that the problem of data semantics has to be addressed for XML data to become machine-processable. This problem can be tackled by defining a semantic mapping between an XML schema and an ontology. Unfortunately, creating such mappings is a tedious, time-consuming, and error-prone task. To alleviate this problem, we present a solution that heuristically discovers semantic mappings between XML schemas and ontologies. The solution takes as input an initial set of simple correspondences between element attributes in an XML schema and class attributes in an ontology, and then generates a set of mapping formulas. Once such a mapping is created, it is important and necessary to maintain the consistency of the mapping when the associated XML schema and ontology evolve. In this paper, we first offer a mapping formalism to represent semantic mappings. Second, we present our heuristic mapping discovery algorithm. Third, we show through an empirical study that considerable effort can be saved when discovering complex mappings by using our prototype tool. Finally, we propose a mapping maintenance plan dealing with schema evolution. Our study provides a set of effective solutions for building sustainable semantic integration systems for XML data.

Estimating Real-time Inundation Vulnerability Index at Point-unit Farmland Scale using Fuzzy set (Fuzzy set을 이용한 실시간 지점단위 농경지 침수위험 지수 산정)

  • Eun, Sangkyu;Kim, Taegon;Lee, Jimin;Jang, Min-Won;Suh, Kyo
    • Journal of Korean Society of Rural Planning
    • /
    • v.20 no.2
    • /
    • pp.1-10
    • /
    • 2014
  • Smartphones change the picture of data and information sharing and make it possible to share various real-time flooding data and information. The vulnerability indicators of farmland inundation is needed to calculate the risk of farmland flood based on changeable hydro-meteorological data over time with morphologic characteristics of flood-damaged areas. To find related variables show the vulnerability of farmland inundation using the binary-logit model and correlation analysis and to provide vulnerability indicators were estimated by fuzzy set method. The outputs of vulnerability indicators were compared with the results of Monte Carlo simulation (MCS) for verification. From the result vulnerability indicators are applicable to mobile_based information system of farmland inundation.

Huffman Code Design and PSIP Structure of Hangul Data for Digital Broadcasting (디지털 방송용 한글 허프만 부호 설계 및 PSIP 구조)

  • 황재정;진경식;한학수;최준영;이진환
    • Journal of Broadcast Engineering
    • /
    • v.6 no.1
    • /
    • pp.98-107
    • /
    • 2001
  • In this paper we derive an optimal Huffman code set with escape coding that miximizes coding efficiency for the Hangul text data. The Hangul code can be represented in the standard Wansung or Unicode format, and we can generate a set of Huffamn codes for both. The current Korean DT standard has not defined a Hangul compression algorithm which may be confronted with a serious data rate for the digital data broadcasting system Generation of the optimal Huffman code set is to solve the data transmission problem. A relevant PSIP structure for the DTB standard is also proposed As a result characters which have the probability of less than 0.0043 are escape coded, showing the optimum compression efficiency of 46%.

  • PDF

A Query Randomizing Technique for breaking 'Filter Bubble'

  • Joo, Sangdon;Seo, Sukyung;Yoon, Youngmi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.12
    • /
    • pp.117-123
    • /
    • 2017
  • The personalized search algorithm is a search system that analyzes the user's IP, cookies, log data, and search history to recommend the desired information. As a result, users are isolated in the information frame recommended by the algorithm. This is called 'Filter bubble' phenomenon. Most of the personalized data can be deleted or changed by the user, but data stored in the service provider's server is difficult to access. This study suggests a way to neutralize personalization by keeping on sending random query words. This is to confuse the data accumulated in the server while performing search activities with words that are not related to the user. We have analyzed the rank change of the URL while conducting the search activity with 500 random query words once using the personalized account as the experimental group. To prove the effect, we set up a new account and set it as a control. We then searched the same set of queries with these two accounts, stored the URL data, and scored the rank variation. The URLs ranked on the upper page are weighted more than the lower-ranked URLs. At the beginning of the experiment, the difference between the scores of the two accounts was insignificant. As experiments continue, the number of random query words accumulated in the server increases and results show meaningful difference.

Efficient Computation of a Skyline under Location Restrictions (위치 제약 조건을 고려한 효율적인 스카이라인 계산)

  • Kim, Ji-Hyun;Kim, Myung
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.313-316
    • /
    • 2011
  • The skyline of a multi-dimensional data set is a subset that consists of the data that are not dominated by other members of the set. Skyline computation can be very useful for decision making for multi-dimensional data set. However, in case that the skyline is very large, it may not be much useful for decision making. In this paper, we propose an algorithm for computing a part of the skyline considering location restrictions that the user provides, such as origin movement, degree ranges and/or distances from the origin. The algorithm eliminates noncandidate data rapidly, and returns in order the skyline points that satisfy the user's requests. We show that the algorithm is efficient by experiments.

Simultaneous Equations and Endogeneity in Corporate Finance: The Linkage between Institutional Ownership and Corporate Financial Performance

  • MALIK, Qaisar Ali;HUSSAIN, Shahzad;ULLAH, Naeem;WAHEED, Abdul;NAEEM, Muhammad;MANSOOR, Muhammad
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.3
    • /
    • pp.69-77
    • /
    • 2021
  • The objective of this research is to explore the inconclusive theoretical and empirical association between institutional ownership and firm performance in the context of emerging Pakistani economy. The data set consists of all the non-financial firms listed on the Pakistan Stock Exchange (PSX). Annual data set covers the period ranging from 2010 to 2015. However, the econometric analysis does not include those firms with incomplete data. Thus the final data set comprised of an unbalanced panel of sample of 276 firms with 1231 firms years observations. Data related to the institutional ownership and other variables taken for the study were extracted through the annual financial reports of the firms. The research used Tobin's Q as a proxy of market measure of firm performance and tested the endogenous relation with institutional ownership through OLS and 2SLS approach. The study also applied Durbin-Wu-Hausman test to determine the endogeneity before analyzing the 2SLS model. The Durbin-Wu-Hausman Test (DWH) conform the endogenous link between institutional ownership and performance and vice versa. The results derived from 2SLS also confirm a highly significant relationship and two way direct proportional relationships between the institutional investment and corporate performance in the studied companies.

Design and Implementation of an Efficient Buffer Replacement Method for Real-time Multimedia Databases Environments (실시간 멀티미디어 데이터베이스 환경을 위한 효율적인 버퍼교체 기법 설계 및 구현)

  • 신재룡;피준일;유재수;조기형
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.4
    • /
    • pp.372-385
    • /
    • 2002
  • In this paper, we propose an efficient buffer replacement method for the real-time multimedia data. The proposed method has multi level priority to consider the real-time characteristics. Each priority level is divided into a cold data set that is likely to be referenced for the first time and a hot data set that is likely to be re-referenced. An operation to select the victim data is sequentially executed from the cold set with the minimum priority level to the hot set with the maximum Priority level. It is chosen only at the lower level than or equal to the priority of the transaction that requests a buffer allocation. In the cold set, our method selects a media that has the maximum size in the level for a target of victim first of all. And in the hot set, our method selects a medium that has the maximum interval of the reference first of all. Since it maintains many popular media in the limited buffer space, the buffer hit ratio is increased. It also manages many service requests. As a result, our method improves the overall performance of the system. We compare the proposed method with the Priority-Hints method in terms of the buffer hit ratio and the deadline missing ratio of transactions. It is shown through the performance evaluation that our method outperforms the existing methods.

  • PDF

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

  • Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
    • Tuberculosis and Respiratory Diseases
    • /
    • v.86 no.3
    • /
    • pp.203-215
    • /
    • 2023
  • Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.

A Feature Set Selection Approach Based on Pearson Correlation Coefficient for Real Time Attack Detection (실시간 공격 탐지를 위한 Pearson 상관계수 기반 특징 집합 선택 방법)

  • Kang, Seung-Ho;Jeong, In-Seon;Lim, Hyeong-Seok
    • Convergence Security Journal
    • /
    • v.18 no.5_1
    • /
    • pp.59-66
    • /
    • 2018
  • The performance of a network intrusion detection system using the machine learning method depends heavily on the composition and the size of the feature set. The detection accuracy, such as the detection rate or the false positive rate, of the system relies on the feature composition. And the time it takes to train and detect depends on the size of the feature set. Therefore, in order to enable the system to detect intrusions in real-time, the feature set to beused should have a small size as well as an appropriate composition. In this paper, we show that the size of the feature set can be further reduced without decreasing the detection rate through using Pearson correlation coefficient between features along with the multi-objective genetic algorithm which was used to shorten the size of the feature set in previous work. For the evaluation of the proposed method, the experiments to classify 10 kinds of attacks and benign traffic are performed against NSL_KDD data set.

  • PDF

Development of an Editor for Reference Data Library Based on ISO 15926 (ISO 15926 기반의 참조 데이터 라이브러리 편집기의 개발)

  • Jeon, Youngjun;Byon, Su-Jin;Mun, Duhwan
    • Korean Journal of Computational Design and Engineering
    • /
    • v.19 no.4
    • /
    • pp.390-401
    • /
    • 2014
  • ISO 15926 is an international standard for integration of lifecycle data for process plants including oil and gas facilities. From the viewpoint of information modeling, ISO 15926 Parts 2 provides the general data model that is designed to be used in conjunction with reference data. Reference data are standard instances that represent classes, objects, properties, and templates common to a number of users, process plants, or both. ISO 15926 Parts 4 and 7 provide the initial set of classes, objects, properties and the initial set of templates, respectively. User-defined reference data specific to companies or organizations are defined by inheriting from the initial reference data and the initial set of templates. In order to support the extension of reference data and templates, an editor that provides creation, deletion and modification functions of user-defined reference data is needed. In this study, an editor for reference data based on ISO 15926 was developed. Sample reference data were encoded in OWL (web ontology language) according to the specification of ISO 15926 Part 8. iRINGTools and dot15926Editor were benchmarked for the design of GUI (graphical user interface). Reference data search, creation, modification, and deletion functions were implemented with XML (extensible markup language) DOM (document object model), and SPARQL (SPARQL protocol and RDF query language).