• 제목/요약/키워드: biased distribution

검색결과 159건 처리시간 0.023초

FASIM: Fragments Assembly Simulation using Biased-Sampling Model and Assembly Simulation for Microbial Genome Shotgun Sequencing

  • Hur Cheol-Goo;Kim Sunny;Kim Chang-Hoon;Yoon Sung-Ho;In Yong-Ho;Kim Cheol-Min;Cho Hwan-Gue
    • Journal of Microbiology and Biotechnology
    • /
    • 제16권5호
    • /
    • pp.683-688
    • /
    • 2006
  • We have developed a program for generating shotgun data sets from known genome sequences. Generation of synthetic data sets by computer program is a useful alternative to real data to which students and researchers have limited access. Uniformly-distributed-sampling clones that were adopted by previous programs cannot account for the real situation where sampled reads tend to come from particular regions of the target genome. To reflect such situation, a probabilistic model for biased sampling distribution was developed by using an experimental data set derived from a microbial genome project. Among the experimental parameters tested (varied fragment or read lengths, chimerism, and sequencing error), the extent of sequencing error was the most critical factor that hampered sequence assembly. We propose that an optimum sequencing strategy employing different insert lengths and redundancy can be established by performing a variety of simulations.

산업간 기술격차가 근로소득에 미치는 영향: 자영업과 임금근로의 비교 (The Earnings Effect of Inter-Industry Technology Differences : A Comparison of the Self-Employed and Wage Earners)

  • 최강식;정진화
    • 노동경제논집
    • /
    • 제33권2호
    • /
    • pp.135-164
    • /
    • 2010
  • 본 연구는 기술진보의 소득증대 효과가 근로자의 직무(숙련) 특성에 따라 다를 것이라는 전제 하에, 산업간 기술격차가 근로소득에 미치는 영향을 자영업과 임금근로로 나누어 비교 분석하였다. 실증분석에는 한국노동패널조사(KLIPS) 자료를 사용하였고, 2단계 추정법과 분위회귀분석을 사용하였다. 분석 결과, 임금근로(직업특수적 숙련)에 대해서는 기술진보의 소득증대 효과가 고학력자에 집중되는 반면, 자영업주(경영가적 숙련)에 대해서는 기술진보의 소득중대 효과가 모든 학력계층에서 나타났다. 이러한 부문별 차이는 분위회귀분석에서도 동일하게 나타난다.

  • PDF

의사결정나무에서 분리 변수 선택에 관한 연구 (A Study on Selection of Split Variable in Constructing Classification Tree)

  • 정성석;김순영;임한필
    • 응용통계연구
    • /
    • 제17권2호
    • /
    • pp.347-357
    • /
    • 2004
  • 의사결정나무에서 분리 변수를 선택하는 것은 매우 중요한 일이다. C4.5는 변수 선택에 있어 연속형 변수로의 변수 선택 편의가 심각하고, QUEST는 연속형 변수와 관련해서 정규성 가정이 위반될 경우 변수 선택력이 떨어진다. 본 논문에서는 통계적 로버스트 검정 알고리즘을 제안하고, 모의 실험을 통하여 C4.5, QUEST그러고 제안된 알고리즘의 효율성을 비교하였다. 실험 결과 제안된 알고리즘이 변수 선택 편의와 변수 선택력 측면에서 로버스트함을 알 수 있었다.

Monitoring Ion Energy Distribution in Capacitively Coupled Plasmas Using Non-invasive Radio-Frequency Voltage Measurements

  • Choi, Myung-Sun;Lee, Seok-Hwan;Jang, Yunchang;Ryu, Sangwon;Kim, Gon-Ho
    • Applied Science and Convergence Technology
    • /
    • 제23권6호
    • /
    • pp.357-365
    • /
    • 2014
  • A non-invasive method for ion energy distribution measurement at a RF biased surface is proposed for monitoring the property of ion bombardments in capacitively coupled plasma sources. To obtain the ion energy distribution, the measured electrode voltage is analyzed based on the circuit model which is developed with the linearized sheath capacitance on the assumption that the RF driven sheath behaves like a simple diode for a bias power whose frequency is much lower than the ion plasma frequency. The method is verified by comparing the ion energy distribution function obtained from the proposed model with the experimental result taken from the ion energy analyzer in a dual cathode capacitively coupled plasma source driven by a 100 MHz source power and a 400 kHz bias power.

Distribution Characteristics of Dust and Heavy Metals in the Atmosphere Around the Steel Industrial Complex

  • Hye-jin Jo;Jong-Ho Kim;Byung-Hyun Shon
    • International Journal of Advanced Culture Technology
    • /
    • 제12권2호
    • /
    • pp.334-344
    • /
    • 2024
  • In Dangjin, Chungcheongnam-do, there are not only power plants and large steel complexes, but also small and medium-sized air pollutant emission facilities. The dust generated by these facilities has a very small particle size and a large surface area due to condensation and physical and chemical reactions, and is discharged containing various harmful substances. Therefore, this study analyzed the distribution of particulate matter and heavy metal concentrations by particle size in the vicinity of the steel complex, residential area, and reference point using an eight-stage Cascade Impactor. Overall, the direct impact sites with a short distance from the steel complex had the highest concentration, followed by the indirect impact sites, and the non-impact sites had the lowest concentration, indicating that they are directly affected by the steel complex. The atmospheric dust concentration distribution showed a bimodal distribution with a minimum value around the 1.1 to 2.1 ㎛ particle diameter. However, during the yellow dust event, the maximum concentration was biased toward coarse particles. The proportion of PM2.5 in the dust tended to be higher in winter, while the ratio between PM2.5 and PM10 was relatively higher in spring. Regardless of the location of the impact point, heavy metals in the dust were dominated by iron and aluminum, followed by zinc, lead, and manganese.

Modified Ranked Ordering Set Samples for Estimating the Population Mean

  • Kim, Hyun-Gee;Kim, Dong-Hee
    • Communications for Statistical Applications and Methods
    • /
    • 제14권3호
    • /
    • pp.641-648
    • /
    • 2007
  • We propose the new sampling method, called modified ranked ordering set sampling (MROSS). Kim and Kim (2003) suggested the sign test using the ranked ordering set sampling (ROSS), and showed that the asymptotic relative efficiency (ARE) of ROSS against RSS for sign test increases as sample size does. We propose the estimator for the population mean using MROSS. The relative precision (RP) of estimator of the population mean using MROSS method with respect to the usual estimator using modified RSS is higher, and when the underlying distribution is skewed, the bias of the proposed estimator is smaller than that of several ranked set sampling estimators.

Length-biased Rayleigh distribution: reliability analysis, estimation of the parameter, and applications

  • Kayid, M.;Alshingiti, Arwa M.;Aldossary, H.
    • International Journal of Reliability and Applications
    • /
    • 제14권1호
    • /
    • pp.27-39
    • /
    • 2013
  • In this article, a new model based on the Rayleigh distribution is introduced. This model is useful and practical in physics, reliability, and life testing. The statistical and reliability properties of this model are presented, including moments, the hazard rate, the reversed hazard rate, and mean residual life functions, among others. In addition, it is shown that the distributions of the new model are ordered regarding the strongest likelihood ratio ordering. Four estimating methods, namely, method of moment, maximum likelihood method, Bayes estimation, and uniformly minimum variance unbiased, are used to estimate the parameters of this model. Simulation is used to calculate the estimates and to study their properties. Finally, the appropriateness of this model for real data sets is shown by using the chi-square goodness of fit test and the Kolmogorov-Smirnov statistic.

  • PDF

2차원 Tent-map을 이용한 RFID 인증 프로토콜 설계 (Design of RFID Authentication Protocol Using 2D Tent-map)

  • 임거수
    • 한국정보전자통신기술학회논문지
    • /
    • 제13권5호
    • /
    • pp.425-431
    • /
    • 2020
  • 산업과 기술이 고도화되면서 물류의 운송, 관리, 유통이 대량화되었고 이런 대량의 물류 정보를 효율적으로 관리하기 위해 RFID(Radio-Frequency Identification) 기술이 개발되었다. 관리를 목적으로 하는 RFID는 물류 산업뿐만 아니라 전력전송 및 에너지관리 분야까지 산업 전반에 응용되고 있는 상태이다. 그러나 RFID 장치는 프로그램 개발 용량의 제한으로 개발에 제약을 받고 있고, 이런 제약은 기존의 강인한 암호화 방법을 사용할 수 없어서 보안에 취약함을 가지고 있다. 우리는 이런 RFID의 제약적인 환경에 적용하기 쉬운 단순 연산으로 구현할 수 있는 보안용 혼돈 시스템을 설계하였다. 설계된 시스템은 2차원 Tent-map 혼돈 시스템으로 혼돈계의 매개변수에 따른 신호의 편중 분포 문제점을 해결하기 위해 암호용 매개변수(𝜇1)와 분포용 매개변수(𝜇2)그리고 키값으로 사용될 수 있는 상점 ID 값을 매개변수(𝜃)로 하는 시스템이다. 설계된 RFID 인증 시스템은 난수와 유사하며 초기값으로 재생산이 가능한 혼돈 신호의 특성이 있고 매개변수에 대한 편중 분포 문제를 해결하였기 때문에 기존의 혼돈 시스템을 이용한 암호화 방법보다 효과적이라고 할 수 있다.

가공식품 중 육류 함량을 고려한 일상적인 육류 섭취량 분포 추정 연구: 국민건강영양조사 자료(2009년) 활용 (Estimation of Usual Meat Intake Distribution Considering Meat Content in Processed Foods: Based on the KNHANES 2009)

  • 신윤정;김애정;김동우
    • 대한지역사회영양학회지
    • /
    • 제25권2호
    • /
    • pp.150-158
    • /
    • 2020
  • Objectives: This study was conducted to estimate usual meat intake distribution, which may have been over/underestimated when estimations were made using only the third food codes of the Korea National Health and Nutrition Examination Survey (KNHANES). Methods: For this purpose, 24-hour recall data from the 2009 Korea National Health and Nutrition Examination Survey, which conducted a partial 2-day survey of food intake, were used. The Multiple Source Method (MSM) was used to estimate the distribution of the usual intake of red and processed meats. Results: The results of this study show that the mean intake of red meat was 45.07 g while that of processed meat was 4.33 g. These results are slightly higher than the consumption calculated using only tertiary food code, and the difference was statistically significant. Furthermore, characteristics of the estimated usual intake distribution were a smaller standard deviation, increased lower percentiles, and decreased upper percentiles compared to the 2-day mean intake distribution for both red and processed meats. The proportion of individuals not consuming red meat decreased substantially from approximately 37% to 0.7%. The proportion of consumption that exceeded 90 g, which is the upper limit of red meat intake recommended by the National Health Service (NHS), was only approximately 10% in the distribution of usual intake. Conclusions: As the consumption of processed foods is expected to continuously increase, caution is needed regarding the processes used to calculate food (group) intake to avoid over/underestimation. Moreover, use of KNHANES data to calculate the proportion of the population at risk of insufficiency or excess intake of certain nutrients or food (group), based on one day intake that does not address within-individual variation, may lead to biased estimates.

Chewing Lice of Swan Geese (Anser cygnoides): New Host-Parasite Associations

  • Choi, Chang-Yong;Takekawa, John Y.;Prosser, Diann J.;Smith, Lacy M.;Ely, Craig R.;Fox, Anthony D.;Cao, Lei;Wang, Xin;Batbayar, Nyambayar;Natsagdorj, Tseveenmayadag;Xiao, Xiangming
    • Parasites, Hosts and Diseases
    • /
    • 제54권5호
    • /
    • pp.685-691
    • /
    • 2016
  • Chewing lice (Phthiraptera) that parasitize the globally threatened swan goose Anser cygnoides have been long recognized since the early 19th century, but those records were probably biased towards sampling of captive or domestic geese due to the small population size and limited distribution of its wild hosts. To better understand the lice species parasitizing swan geese that are endemic to East Asia, we collected chewing lice from 14 wild geese caught at 3 lakes in northeastern Mongolia. The lice were morphologically identified as 16 Trinoton anserinum (Fabricius, 1805), 11 Ornithobius domesticus Arnold, 2005, and 1 Anaticola anseris (Linnaeus, 1758). These species are known from other geese and swans, but all of them were new to the swan goose. This result also indicates no overlap in lice species between older records and our findings from wild birds. Thus, ectoparasites collected from domestic or captive animals may provide biased information on the occurrence, prevalence, host selection, and host-ectoparasite interactions from those on wild hosts.