The Health Examinees (HEXA) Study: Rationale, Study Design and Baseline Characteristics

Korea has experienced a rapid economic development in a very short period A mixture of traditional and modern risk factors coexists and rapid changes in non-genetic factors intersect with features opportunity and Abstract Background: Korea has experienced rapid economic development in a very short period of time. A mixture of traditional and modern risk factors coexists and the rapid change in non-genetic factors interacts with genetic constituents. With consideration of these unique aspects of Korean society, a large-scale genomic cohort study-the Health Examinees (HEXA) Study-has been conducted to investigate epidemiologic characteristics, genomic features, and gene-environment interactions of major chronic diseases including cancer in the Korean population. Materials and Methods: Following a standardized study protocol, the subjects were prospectively recruited from 38 health examination centers and training hospitals throughout the country. An interview-based questionnaire survey was conducted to collect information on socio-demographic characteristics, medical history, medication usage, family history, lifestyle factors, diet, physical activity, and reproductive factors for women. Various biological specimens (i.e., plasma, serum, buffy coat, blood cells, genomic DNA, and urine) were collected for biorepository according to the standardized protocol. Skilled medical staff also performed physical examinations. Results: Between 2004 and 2013, a total of 167,169 subjects aged 40–69 years were recruited for the HEXA study. Participants are being followed up utilizing active and passive methods. The first wave of active follow-up began in 2012 and it will be continued until 2015. The principal purpose of passive follow-up is based on data linkages with the National Death Certificate, the National Cancer Registry, and the National Health Insurance Claim data. Conclusions: The HEXA study will render an opportunity to investigate biomarkers of early health index and the chronological changes associated with chronic diseases.


Introduction
The spectrum of disease occurrence varies substantially worldwide. The shift from infectious diseases to chronic diseases as the leading causes of death is suspected to be caused by improvements in medical care, population aging, and public health interventions (WHO, 2009;WHO, 2011). Along with the world trend in medicine and public health, Korea has another understanding of the transition: Korea has experienced a rapid economic development in a very short period of time. A mixture of traditional and modern risk factors coexists and rapid changes in non-genetic factors inevitably intersect with genetic constituents. These features have resulted in the burden of diseases-in particular, various types of cancer and cancer-related disorders-and induced to significant challenges in Korea (Han et al., 2011;Cho et al., 2013;You et al., 2013). On the other hand, this provides an opportunity to efficiently examine genetic and environmental factors as well as their interactions, and how this influences the development of diseases including cancers.

The Health Examinees (HEXA) Study: Rationale, Study Design and Baseline Characteristics
Health Examinees (HEXA) Study Group* There are several unique attributes in Korea that facilitate an opportunity for a new genomic cohort study. First, over the past 50 years, Korea has experienced rapid changes in dietary patterns and energy imbalances. For example, carbohydrate intake has decreased from 80.3% of total energy intake in 1969 to 64.9% in 2012, whereas fat intake has increased from 7.2% to 20.4% during this time period (Ministry of Health and Welfare and the Korean Centers for Disease Control and Prevention, 2013). Although protein intake remained relatively constant (12.5% vs. 14.7%, respectively), the source of protein intake has shifted from rice and soy products in the past to primarily animal sources in the present day (Ministry of Health and Welfare and the Korean Centers for Disease Control and Prevention, 2013). Specifically, statistics reveal that there has been a shift from less than 10% of protein intake per capita per day from animal sources in 1948 to almost 50% in 1995 (Kim et al., 2000). Moreover, consumption of cereals and grain products has decreased from 53% total grams of food intake in 1969 to only 28% in 1995, while the intake of all animal food products increased 10-fold during the same time period *Details of the study group members are provided in the Acknowledgments. For correspondence: dhkang@snu.ac.kr (Kim et al., 2000). These transitions in food intake may have led to the substantial contemporary presence of energy imbalances and the associated increases in obesity. Indeed, the prevalence of obese adults has grown from 26% (25.1% among males and 26.2% among females) in 1998 to 32% (36.3% among males and 28.0% among females) in 2012 (Ministry of Health and Welfare and the Korean Centers for Disease Control and Prevention, 2013). The increase in obesity was more prominent in the male population, demonstrating an incremental growth of more than 10% during the same time period (Ministry of Health and Welfare and the Korean Centers for Disease Control and Prevention, 2013). Furthermore, of greatest concern is the dramatic growth in obesity among children, which has increased from 1.7% (among boys) and 2.6 % (among girls) in 1979 to 17.9% (among boys) and 10.9% (among girls) in 2002 (Park et al., 2004). These changes in dietary patterns and energy imbalances can be connected with a dramatic increase in cancer incidence and mortality (Woo and Kim, 2011).
Second, Korea has undergone rapid changes in the age of menarche and total fertility rates. Earlier menarche is a phenomenon throughout the world (Okasha et al., 2001). As a decrease of menarcheal age proceeded around 2-3 months every 10 years during the last 150 years in Western countries, age at menarche had decreased approximately two years between 1980 and 1990 birth cohorts in Korea (Cho et al., 1999;Park et al., 2006). At the same time, the total fertility rate has decreased from 4.5 children born per women in 1970 to 1.1 in 2005; and this is accompanied by an increase in women's age at first marriage from 24.8 in 1990 to 28.3 in 200828.3 in (KNSO, 2009). These changes (e.g., early menarche, lower fertility rates, later age at the first full-term pregnancy, and shorter lactation) to the female reproductive system appear to be intimately associated with women's disease etiologies such as those related to breast cancer and cardiovascular diseases in Korea (Yoo et al., 2006).
Third, in terms of infrastructural aspects, Korea has an efficient and unique medical system with regards to the national health checkup services. With the aim of national health promotion and disease prevention, the National Health Insurance Corporation (NHIC) provides biannual health examinations to all Korean adults over the age of 40 (Langenberg et al., 2006). The health examinations performed by the NHIC provide the advantages of automatic follow-up and case ascertainment, longitudinal repeated measurements, and a pool of subjects that are representative of the majority of the Korean population. Furthermore, Korea has a unique national identification system, wherein a 13 digit resident registration number assigned to each individual. This facilitates the implementation of a nationwide cohort study with avoidance of duplication of subjects, as well as easy active and passive participant follow-up process.
With these potential strengths in establishing a large scale genomic cohort, the Health Examinees (HEXA) Study is aimed to achieve the followings ( Figure 1): First, it aims to establish a large-scale genomic epidemiologic cohort and set up a biorepository system. Second, it can facilitate close examination of genomic risk factors for major chronic diseases within the Korean population between the living habitual factors affecting the occurrence of disease and the genomic features. Third, it can be utilized to develop more comprehensive, practicable, and individualized preventive guidelines for the common diseases occurring among Koreans. Finally, it provides a foundation that will enable lifetime health monitoring and further, helps to establish cancer prevention strategies in Korea (Yoo et al., 2005). In the present study, we will describe the rationale and study design of the HEXA study with reference to methodological literature from international cohort studies (Alavanja et al., 1996;Riboli and Kaaks , 1997;Konishi et al., 2001;Tsugane and Sobue, 2001;Watanabe et al., 2001;Slimani et al., 2002;Zheng et al., 2005).

Subject recruitment
The subjects were adult males and females aged 40-69 who were recruited between 2004 and 2014 at 38 health examination centers and training hospitals located in 8 regions in Korea. The first phase was initiated at 10 hospitals, which was then extended to other regions until a total of 38 sites were participating ( Figure 2). The selection criteria for participating centers were as follows: 1) has experience constructing a cohort, 2) has the capacity to recruit over 2,000 participants per year, 3) has a health examinee system that represents the community, 4) has a built-in infrastructure for repeated surveys and likely to have a follow-up rate of 50% or more, and 5) has experience in multicenter network research.
At baseline, participants were prospectively recruited following a standardized study protocol that was approved by the Ethics Committee of the Korean Health and Genomic Study of the Korean National Institute of Health and institutional review boards from all participating hospitals. All study participants voluntarily signed a consent form before entering the study. A standardized questionnaire survey was conducted by well-trained research staff, and data collection included gathering information on socio-demographic characteristics,

Figure 1. Study schemes of Health Examinees (HEXA) Study, 2004-2013
medical history, medication usage, family medical history, lifetime consumption of alcohol and tobacco, diet, physical activity, and, for women, reproductive factors (Table 1). Biological specimens such as plasma, serum, buffy coat, blood cells, genomic DNA, and urine were also collected. Physical examinations (i.e., height, weight, waist circumference, body composition, blood pressure, and pulse) and laboratory analyses (i.e., clinical blood and urine tests) were also performed by skillful medical staff.
The baseline survey was performed using a two-stage approach: phase I occurred between 2004 and 2008, and phase II between 2009 and 2013. To maximize the study quality and protocol standardization, a centralized system for all of the procedures (other than sample collection) was applied at the beginning of phase II. Moreover, in phase II a computer-assisted personal interviewing system was instituted and an integrated web-database was established. These special efforts to enhance the precision and standardization of the HEXA study fulfill the international standards for genomic cohort studies.

Questionnaire development
The questionnaire was developed based on an extensive literature review of National Institute of Health (NIH) funded studies and well-known questionnaires such as the Korea National Health and Nutrition Examination Survey (KNHANES), the Taylor's Minnesota Leisure Time Physical Activity Questionnaire (MLTPAQ) for leisure time activity, and the Psychosocial Well-being Index-Short Form (PWI-SF) measuring depression. All the questionnaires were revised twice using the backtranslation process established by the National Cancer Institute (NCI). In addition, the iterative process of draft, review, comment, and redraft was utilized. Prior to the study, a pilot study was conducted in order to check the feasibility and efficacy of the questionnaire. For dietary assessment, Anh et al.'s Semi-Quantitative Food Frequency Questionnaire (SQFFQ) consisting of 106 food items and supplement intake was developed and validated using 12 days of diet record data derived from 124 subjects was utilized (Ahn et al., 2007).

Biospecimen collection, processing, and storage
For each participant, a total sample of at least 19 ccs of blood is drawn into one serum separator tube (SST) tube and two ethylene-diamine-tetra-acetic acid (EDTA) tubes and a sample of urine of more than 12 ml is placed into conical tube for laboratory tests and storage (Figure 3). The biospecimens are given study IDs that are matched to the participants' questionnaires and labeled with 2D bar code stickers. Biospecimens are kept in refrigerators at each medical institution until collected within 24 hours by a courier from the commercial laboratory responsible for all

Data editing and quality control procedures
Interviewers were selected from a pool of nurses and other medical professionals. They are educated using standardized training more than twice a year. All interviewers participated in fieldwork for the baseline survey and sample collection and entered data utilizing a standardized method.
The entering and management system is a client/ server-based database accessible via Internet, and it is used for data entry including the questionnaire, blood test results, and body composition analysis results. Thus, work activities such as entering, reading, searching, and analyzing data can be done anytime regardless of location. An entering error is controlled at the entering stage. When a survey respondent provides a contradictory response or the person entering data makes a mistake, a computer program developed to detect the error can identify it at the time. In addition, data is entered again at the verification screen after finishing the initial entry. After confirming entry and other errors, the data manager sends the relevant data back to the interviewer in each area. The interviewers then phone subjects to confirm the data and reenter it.   The study is organized so that it continuously maintains mutual communication between community-based medical institutions and cohort members. In addition, it is designed to facilitate mutual online interaction. It also utilizes a flexible operating system that adapts to the various situations of each community-based medical institution. The participant follow-up will be performed using both active and passive follow-up methods.

Follow-up and outcome ascertainment
Medical institutions have conducted active follow-up and recruit existing cohort members at 2-year intervals. To do so, they send an information leaflet by mail and make phone calls about the follow-up study. At follow-up, cohort members are examined by questionnaire, laboratory tests, and provide blood and urine samples.
The principal purpose of passive follow-up is to ascertain cases by employing Korean health-related databases. HEXA researchers can identify changes in health status of cohort members for comparison using the NHIC and the Health Insurance Review and Assessment Service (HIRA). In addition, the occurrence of cancer can be verified through National Cancer Registration data from the National Cancer Center (NCC). In cases of deaths during follow-up, the National Statistical Office confirms the cause of death.

Results
At the end of the HEXA baseline survey in 2013, a total of 167,169 participants aged 40-69 years were successfully recruited. However, the procedure of data editing and quality control for new enrollments in 2013 has not yet been completed. Consequently, analysis of subjects' basic characteristics was restricted only to those recruited between 2004 and 2012, whose data was fully cleaned according to standardized protocol (N = 162,412).
The distribution of the selected characteristics of the 162,142 participants is shown in Table 2. Females were recruited almost twice as frequently as males. The mean age was 53.7 for males and 52.6 for females. Males were more highly educated than females. Marital status did not differ between the groups. Seventy percent of the males were either ex-smokers or current smokers, while 95.5% of the females were never smokers. The proportion of participants who never consumed alcohol was 19.7% among males and 66.0% among females. Males engaged in regular exercise more frequently than females. Females had lower BMIs than males (mean±SD, 24.4±2.7 among males, 23.7±2.9 among females). Finally, females reported to have more family histories of major diseases (diabetes, hypertension, myocardial infarction, stroke, and cancer at any site) than males. Table 3 shows the estimated number of incident cases of selected major diseases during the future follow-up period between 2018 and 2023, based on cancer statistics from 2011 and statistics for other diseases derived from literature (Center for Genome Science and Korea National Institute of Health, 2008;Hong KS et al., 2013). Incident cases of all sites of cancer are estimated to increase by 9,172 (5,597 among males and 3,575 among females) and 13,390 (8,162 among males and 5,228 among females) by 2018 and 2023, respectively. For ischemic heart disease, cardiovascular accident, and diabetes mellitus, incident cases are estimated that they will increase by more than 5,000 cases and 9,000 cases by 2018 and 2023, respectively (Table 3).

Discussion
The HEXA study is built on the existing health examination system of the NHIC in Korea and has the advantages of the nation's healthcare system in addition to its own strengths. The subjects were the health examination participants from 38 health examination centers and training hospitals throughout the country. HEXA has several advantages. First, a large number of individuals voluntarily participate in biannual health examinations at health examination centers. Thus, a study conducted in conjunction with the existing system will enable researchers to build-up a large-sized cohort in a relatively short period of time. Second, the information on the interviewer-administered questionnaires can be obtained every 2 years because it coincides with the biannual cycle of the health examinations implemented by the government. Since the medical institutions participating in HEXA are primarily training hospitals or large general hospitals located in major cities in Korea, study participants will also be most likely visit the same hospitals for diagnosis and for treatment during the follow-up time period and it will be easier to confirm their medical records. Finally, since Korea has a nationwide health insurance system and a good quality central cancer registry (Seo et al., 2012), active and passive follow-up methods can proceed more efficiently. Moreover, passive follow-up based on record linkages with population-based cancer registries is a reliable and efficient way to ascertain disease outcomes (Cho et al., 2009).
Careful standardization of survey methodology and quality control are essential for multicenter recruitment designs. By centralizing all of the procedures (other than sample collection in phase II), human errors at various stages were minimized, the length of the process was reduced, and the quality of specimens will be increased. HEXA has put special effort into educating and training each participating medical institution, interviewers, and nurses by regularly holding training workshops every 6 months. In addition to the intensive workshops, trainees' field performance is evaluated by selected examiners from the coordinating centers or using questionnaires. In addition, the participant questionnaire used during phase I was validated before the study began. Furthermore, to fulfill international standards and to prepare for pooled analyses with other studies and consortia, few changes have been made for phase II of the study and regular inspections and revisions by expert groups are conducted.
In summary, over the past few years, we have successfully established a community-based cohort of 167,169 subjects for long-term epidemiologic studies of various chronic diseases. By utilizing a longer follow-up period, the HEXA should provide a valuable opportunity to investigate biomarkers and their changes in association with the major chronic diseases prevalent in the Korean population.