• Title/Summary/Keyword: Donor file

Search Result 7, Processing Time 0.022 seconds

A Statistical Matching Method with k-NN and Regression

  • Chung, Sung-S.;Kim, Soon-Y.;Lee, Seung-S.;Lee, Ki-H.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.4
    • /
    • pp.879-890
    • /
    • 2007
  • Statistical matching is a method of data integration for data sources that do not share the same units. It could produce rapidly lots of new information at low cost and decrease the response burden affecting the quality of data. This paper proposes a statistical matching technique combining k-NN (k-nearest neighborhood) and regression methods. We select k records in a donor file that have similarity in value with a specific observation of the common variable in a recipient file and estimate an imputation value for the recipient file, using regression modeling in the donor file. An empirical comparison study is conducted to show the properties of the proposed method.

  • PDF

A Robust Approach of Regression-Based Statistical Matching for Continuous Data

  • Sohn, Soon-Cheol;Jhun, Myoung-Shic
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.2
    • /
    • pp.331-339
    • /
    • 2012
  • Statistical matching is a methodology used to merge microdata from two (or more) files into a single matched file, the variants of which have been extensively studied. Among existing studies, we focused on Moriarity and Scheuren's (2001) method, which is a representative method of statistical matching for continuous data. We examined this method and proposed a revision to it by using a robust approach in the regression step of the procedure. We evaluated the efficiency of our revised method through simulation studies using both simulated and real data, which showed that the proposed method has distinct advantages over existing alternatives.

A Study on the Data Fusion Method using Decision Rule for Data Enrichment (의사결정 규칙을 이용한 데이터 통합에 관한 연구)

  • Kim S.Y.;Chung S.S.
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.291-303
    • /
    • 2006
  • Data mining is the work to extract information from existing data file. So, the one of best important thing in data mining process is the quality of data to be used. In this thesis, we propose the data fusion technique using decision rule for data enrichment that one phase to improve data quality in KDD process. Simulations were performed to compare the proposed data fusion technique with the existing techniques. As a result, our data fusion technique using decision rule is characterized with low MSE or misclassification rate in fusion variables.

A Study on the Data Fusion for Data Enrichment (데이터 보강을 위한 데이터 통합기법에 관한 연구)

  • 정성석;김순영;김현진
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.3
    • /
    • pp.605-617
    • /
    • 2004
  • One of the best important thing in data mining process is the quality of data used. When we perform the mining on data with excellent quality, the potential value of data mining can be improved. In this paper, we propose the data fusion technique for data enrichment that one phase can improve data quality in KDD process. We attempted to add k-NN technique to the regression technique, to improve performance of fusion technique through reduction of the loss of information. Simulations were performed to compare the proposed data fusion technique with the regression technique. As a result, the newly proposed data fusion technique is characterized with low MSE in continuous fusion variables.

A Method for Identifying Splice Sites and Translation Start Sites in Human Genomic Sequences

  • Kim, Ki-Bong;Park, Kie-Jung;Kong, Eun-Bae
    • BMB Reports
    • /
    • v.35 no.5
    • /
    • pp.513-517
    • /
    • 2002
  • We describe a new method for identifying the sequences that signal the start of translation, and the boundaries between exons and introns (donor and acceptor sites) in human mRNA. According to the mandatory keyword, ORGANISM, and feature key, CDS, a large set of standard data for each signal site was extracted from the ASCII flat file, gbpri.seq, in the GenBank release 108.0. This was used to generate the scoring matrices, which summarize the sequence information for each signal site. The scoring matrices take into account the independent nucleotide frequencies between adjacent bases in each position within the signal site regions, and the relative weight on each nucleotide in proportion to their probabilities in the known signal sites. Using a scoring scheme that is based on the nucleotide scoring matrices, the method has great sensitivity and specificity when used to locate signals in uncharacterized human genomic DNA. These matrices are especially effective at distinguishing true and false sites.

Study on the Development of Guidelines for Thesaurus Construction at University Archives: Case Study of Myongji University Archives Center (대학기록관 시소러스 구축 지침의 개발 연구 - 명지대학교 대학사료실의 사례를 중심으로 -)

  • Rieh, Hae-Young;Lee, Mi-Yeong;Lee, Eun-Yeong;Lee, Hyeok-Jun;Lee, Hyeon-Jeong;Choe, Yeong-Sil;Park, Mi-Ja
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.8 no.1
    • /
    • pp.189-210
    • /
    • 2008
  • Some issues and solutions considered for the various situations that we faced in the process of developing guidelines of thesaurus construction are described in this paper. There were many proper names and proper nouns among the terms considered in the process. The thesaurus needed to include a function of an authority file. Preferred terms were selected based on what the university's official records would use. The scope of the proper names for inclusion was the people who held official positions in the university and the people who were the subject of the materials. However, when the system allows synthesized retrieval of the field of creator and donor, inclusion of too many names were considered unnecessary.

A Study on Factors Affecting Social Welfare Centers and Facilities' Resource Mobilization (사회복지시설의 민간자원 동원에 영향을 주는 요인 연구: 후원을 중심으로)

  • Kim, Mee-Sook;Kim, Eun-Jeong
    • Korean Journal of Social Welfare
    • /
    • v.57 no.2
    • /
    • pp.5-40
    • /
    • 2005
  • Social welfare centers and residential care facilities where provide the socially disadvantaged with proper social services, face financial difficulties. This is because not only of the lack of governmental support, but also of social welfare centers and residential care facilities' lack of skills in developing abundant resources from the private sector. In this context, this study tried to find factors affecting resource mobilization of the social welfare facilities to devise policies in resource development. Mail survey was conducted with the structured questionnaire. Employees in charge of community resource development were asked to answer the questionnaire. The study population were welfare centers and residential care facilities. A total of 293 community welfare centers and 632 residential care facilities responded to the survey. The response rate was about 62%. The dependent variables of the study were the amount of resource mobilization in the year 2001 which was measured as the number of donors, the total amount of donation, and estimated amount of gift-in-kind. Three types models were constructed per each welfare institution. Independent variables were selected based on the previous research findings: community environment factor, structural factor, and resource development factor. Multiple regression was utilized to analyze the data. The resource development factor turned out to be significant variable in various models. In the models of donors, the amount of donation, and the amount of gift-in-kind (except for the welfare center model), at least one out of six variables of the resource development factors was significant welfare center. Welfare centers which establish the resource development department or hire employees to take care of resource development, utilize computer softwares to file donors, and utilize donor management programs, have more donors and/or donations than their counterparts. In addition, residential care facilities located in urban area have more donors and donations, and among residential facilities those for the disables, those with longer history and more employees, receive more donations than their counterparts. As for the gift-in-kind model, the welfare centers located in high income area and residential care facilities for the elderly, children and mentally retarded receive less gift-in-kind than their counterparts Based on the above findings, this study suggested that to mobilize resources the welfare centers as well residential care facilities need to have community resource development department or resource development staffs, adopt computer software to systematically organize donors, and utilize donor mobilizing and maintaining programs.

  • PDF