• Title/Summary/Keyword: Cross - Validation

Search Result 994, Processing Time 0.028 seconds

Determination of Survival of Gastric Cancer Patients With Distant Lymph Node Metastasis Using Prealbumin Level and Prothrombin Time: Contour Plots Based on Random Survival Forest Algorithm on High-Dimensionality Clinical and Laboratory Datasets

  • Zhang, Cheng;Xie, Minmin;Zhang, Yi;Zhang, Xiaopeng;Feng, Chong;Wu, Zhijun;Feng, Ying;Yang, Yahui;Xu, Hui;Ma, Tai
    • Journal of Gastric Cancer
    • /
    • v.22 no.2
    • /
    • pp.120-134
    • /
    • 2022
  • Purpose: This study aimed to identify prognostic factors for patients with distant lymph node-involved gastric cancer (GC) using a machine learning algorithm, a method that offers considerable advantages and new prospects for high-dimensional biomedical data exploration. Materials and Methods: This study employed 79 features of clinical pathology, laboratory tests, and therapeutic details from 289 GC patients whose distant lymphadenopathy was presented as the first episode of recurrence or metastasis. Outcomes were measured as any-cause death events and survival months after distant lymph node metastasis. A prediction model was built based on possible outcome predictors using a random survival forest algorithm and confirmed by 5×5 nested cross-validation. The effects of single variables were interpreted using partial dependence plots. A contour plot was used to visually represent survival prediction based on 2 predictive features. Results: The median survival time of patients with GC with distant nodal metastasis was 9.2 months. The optimal model incorporated the prealbumin level and the prothrombin time (PT), and yielded a prediction error of 0.353. The inclusion of other variables resulted in poorer model performance. Patients with higher serum prealbumin levels or shorter PTs had a significantly better prognosis. The predicted one-year survival rate was stratified and illustrated as a contour plot based on the combined effect the prealbumin level and the PT. Conclusions: Machine learning is useful for identifying the important determinants of cancer survival using high-dimensional datasets. The prealbumin level and the PT on distant lymph node metastasis are the 2 most crucial factors in predicting the subsequent survival time of advanced GC.

A modified U-net for crack segmentation by Self-Attention-Self-Adaption neuron and random elastic deformation

  • Zhao, Jin;Hu, Fangqiao;Qiao, Weidong;Zhai, Weida;Xu, Yang;Bao, Yuequan;Li, Hui
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.1-16
    • /
    • 2022
  • Despite recent breakthroughs in deep learning and computer vision fields, the pixel-wise identification of tiny objects in high-resolution images with complex disturbances remains challenging. This study proposes a modified U-net for tiny crack segmentation in real-world steel-box-girder bridges. The modified U-net adopts the common U-net framework and a novel Self-Attention-Self-Adaption (SASA) neuron as the fundamental computing element. The Self-Attention module applies softmax and gate operations to obtain the attention vector. It enables the neuron to focus on the most significant receptive fields when processing large-scale feature maps. The Self-Adaption module consists of a multiplayer perceptron subnet and achieves deeper feature extraction inside a single neuron. For data augmentation, a grid-based crack random elastic deformation (CRED) algorithm is designed to enrich the diversities and irregular shapes of distributed cracks. Grid-based uniform control nodes are first set on both input images and binary labels, random offsets are then employed on these control nodes, and bilinear interpolation is performed for the rest pixels. The proposed SASA neuron and CRED algorithm are simultaneously deployed to train the modified U-net. 200 raw images with a high resolution of 4928 × 3264 are collected, 160 for training and the rest 40 for the test. 512 × 512 patches are generated from the original images by a sliding window with an overlap of 256 as inputs. Results show that the average IoU between the recognized and ground-truth cracks reaches 0.409, which is 29.8% higher than the regular U-net. A five-fold cross-validation study is performed to verify that the proposed method is robust to different training and test images. Ablation experiments further demonstrate the effectiveness of the proposed SASA neuron and CRED algorithm. Promotions of the average IoU individually utilizing the SASA and CRED module add up to the final promotion of the full model, indicating that the SASA and CRED modules contribute to the different stages of model and data in the training process.

3D Digital Restoration of Koguryo Ceremonial Flag "Jeol" (고구려 의장기 절(節)의 3D 디지털 복원)

  • KONG, Jeonyoung;KONG, Seokkoo
    • Korean Journal of Heritage: History & Science
    • /
    • v.55 no.3
    • /
    • pp.6-20
    • /
    • 2022
  • The restoration of cultural heritage materials is an important research theme. This study improved the existing cultural heritage restoration method and attempted to establish a restoration system for cultural heritage data based on historical documents and visual materials. Recognizing the limitations of existing studies, this paper attempted to restore cultural heritage data through interdisciplinary research. In addition, 3D restoration was carried out after restoration in 2D form based on literature documents rather than existing visual sources. The object of restoration that was selected was "Jeol," which represents the power of the king of Koguryo. Koguryo's Jeol is a type of flag. Jeol appears in the mural in Anak Tomb No. 3. Rather than using only photographic materials of murals, the restoration was carried out through cross-validation of literature data and materials on archaeological art history. This is important in that the restoration carried out in this study is an accurate restoration with a historical understanding based on the literature of the relevant cultural heritage. In this study, a restoration process based on historical records was established. A 3D restoration process was performed by adding and applying visual materials after the object was first shaped based on the literature data. Restoration based on literature and visual materials was carried out based on interdisciplinary research. Therefore, this study aims to build a digital restoration system for cultural heritages and to contribute to spreading the 3D digital restoration research of cultural heritages that can be applied to various platforms.

Improved Estimation of Hourly Surface Ozone Concentrations using Stacking Ensemble-based Spatial Interpolation (스태킹 앙상블 모델을 이용한 시간별 지상 오존 공간내삽 정확도 향상)

  • KIM, Ye-Jin;KANG, Eun-Jin;CHO, Dong-Jin;LEE, Si-Woo;IM, Jung-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.25 no.3
    • /
    • pp.74-99
    • /
    • 2022
  • Surface ozone is produced by photochemical reactions of nitrogen oxides(NOx) and volatile organic compounds(VOCs) emitted from vehicles and industrial sites, adversely affecting vegetation and the human body. In South Korea, ozone is monitored in real-time at stations(i.e., point measurements), but it is difficult to monitor and analyze its continuous spatial distribution. In this study, surface ozone concentrations were interpolated to have a spatial resolution of 1.5km every hour using the stacking ensemble technique, followed by a 5-fold cross-validation. Base models for the stacking ensemble were cokriging, multi-linear regression(MLR), random forest(RF), and support vector regression(SVR), while MLR was used as the meta model, having all base model results as additional input variables. The results showed that the stacking ensemble model yielded the better performance than the individual base models, resulting in an averaged R of 0.76 and RMSE of 0.0065ppm during the study period of 2020. The surface ozone concentration distribution generated by the stacking ensemble model had a wider range with a spatial pattern similar with terrain and urbanization variables, compared to those by the base models. Not only should the proposed model be capable of producing the hourly spatial distribution of ozone, but it should also be highly applicable for calculating the daily maximum 8-hour ozone concentrations.

Analyzing Korean Math Word Problem Data Classification Difficulty Level Using the KoEPT Model (KoEPT 기반 한국어 수학 문장제 문제 데이터 분류 난도 분석)

  • Rhim, Sangkyu;Ki, Kyung Seo;Kim, Bugeun;Gweon, Gahgene
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.8
    • /
    • pp.315-324
    • /
    • 2022
  • In this paper, we propose KoEPT, a Transformer-based generative model for automatic math word problems solving. A math word problem written in human language which describes everyday situations in a mathematical form. Math word problem solving requires an artificial intelligence model to understand the implied logic within the problem. Therefore, it is being studied variously across the world to improve the language understanding ability of artificial intelligence. In the case of the Korean language, studies so far have mainly attempted to solve problems by classifying them into templates, but there is a limitation in that these techniques are difficult to apply to datasets with high classification difficulty. To solve this problem, this paper used the KoEPT model which uses 'expression' tokens and pointer networks. To measure the performance of this model, the classification difficulty scores of IL, CC, and ALG514, which are existing Korean mathematical sentence problem datasets, were measured, and then the performance of KoEPT was evaluated using 5-fold cross-validation. For the Korean datasets used for evaluation, KoEPT obtained the state-of-the-art(SOTA) performance with 99.1% in CC, which is comparable to the existing SOTA performance, and 89.3% and 80.5% in IL and ALG514, respectively. In addition, as a result of evaluation, KoEPT showed a relatively improved performance for datasets with high classification difficulty. Through an ablation study, we uncovered that the use of the 'expression' tokens and pointer networks contributed to KoEPT's state of being less affected by classification difficulty while obtaining good performance.

Estimating soils properties using NIRS to assess amendments in intensive horticultural production

  • Pena, Francisco;Gallardo, Natalia;Campillo, Carmen Del;Garrido, Ana;Cabanas, Victor Fernandez;Delgado, Antonio
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1615-1615
    • /
    • 2001
  • During the past ten years, Near Infrared Spectroscopy has been successfully applied to the analysis of a great variety of agriculture products. Previous works (Morra et al., 1991; Salgo et al., 1998) have shown the potential of this technology for soil analysis, estimating different parameters just with one single scan. The main advantages of NIR applications in soils are the speed of response, allowing the increase of the number of samples analysed to define a particular soil, and the instantaneous elaboration of recommendations for fertilization and soil amendment. Another advantage is to avoid the use of chemical reagents at all, being an environmentally safe technique. In this paper, we have studied a set of 129 soil samples selected from representative glasshouse soils from Southern Spain. The samples were dried, milled, and sieved to pass a 2 mm sieve and then analysed for organic carbon, total nitrogen, inorganic nitrogen (nitrate ammonium), hygroscopic humidity, pH and electrical conductivity in the 1:1 extract. NIR spectra of all samples were obtained in reflectance mode using a Foss NIR Systems 6500 spectrophotometer equipped with a spinning module. Calibration equations were developed for seven analytical parameters (ph, Total nitrogen, organic nitrogen, organic carbon, C/N ratio and Electric Conductivity). Preliminary results show good correlation coefficients and standard errors of cross validation in equations obtained for Organic Carbon, Organic Nitrogen, Total Nitrogen and C/N ratio. Calibrations for nitrates and nitrites, ammonia and electric conductivity were not acceptable. Calibration obtained for pH had an acceptable SECV, but the determination coefficient was found very poor probably due to the reduced range in reference values. Since the estimation of Organic Carbon and C/N ratio are acceptable NIIRS could be used as a fast method to assess the necessity of organic amendments in soils from Mediterranean regions where the low level of organic matter in soils constitutes an important agronomic problem. Furthermore, the possibility of a single and fast estimation of Total Nitrogen (tedious determination by modifications of the Kjeldahl procedure) could provide and interesting data to use in the estimation of nitrogen fertilizer rates by means of nitrogen balances.

  • PDF

A Study on Consumer Type Data Analysis Methodology - Focusing on www.ethno-mining.com data - (소비자유형 데이터 분석방법론 연구 - www.ethno-mining.com 데이터를 중심으로 -)

  • Wookwhan, Jung;Jinho, Ahn;Joseph, Na
    • Journal of Service Research and Studies
    • /
    • v.12 no.2
    • /
    • pp.80-93
    • /
    • 2022
  • This study is a study on a methodology that can extract various factors that affect purchase and use of products/services from the consumer's point of view through previous studies, and analyze the types and tendencies of consumers according to age and gender. To this end, we quantify factors in terms of general personal propensity, consumption influence, consumption decision, etc. to check the consistency of data, and based on these studies, we conduct research to suggest and prove data analysis methodologies of consumer types that are meaningful from the perspectives of startups and SMEs. did As a result, it was confirmed through cross-validation that there is a correlation between the three main factors assumed for data analysis from the consumer's point of view, the general tendency, the general consumption tendency, and the factors influencing the consumption decision. verified. This study presented a data analysis methodology and a framework for consumer data analysis from the consumer's point of view. In the current data analysis trend, where digital infrastructure develops exponentially and seeks ways to project individual preferences, this data analysis perspective can be a valid insight.

A Nomogram for Predicting Extraperigastric Lymph Node Metastasis in Patients With Early Gastric Cancer

  • Hyun Joo Yoo;Hayemin Lee;Han Hong Lee;Jun Hyun Lee;Kyong-Hwa Jun;Jin-jo Kim;Kyo-young Song;Dong Jin Kim
    • Journal of Gastric Cancer
    • /
    • v.23 no.2
    • /
    • pp.355-364
    • /
    • 2023
  • Background: There are no clear guidelines to determine whether to perform D1 or D1+ lymph node dissection in early gastric cancer (EGC). This study aimed to develop a nomogram for estimating the risk of extraperigastric lymph node metastasis (LNM). Materials and Methods: Between 2009 and 2019, a total of 4,482 patients with pathologically confirmed T1 disease at 6 affiliated hospitals were included in this study. The basic clinicopathological characteristics of the positive and negative extraperigastric LNM groups were compared. The possible risk factors were evaluated using univariate and multivariate analyses. Based on these results, a risk prediction model was developed. A nomogram predicting extraperigastric LNM was used for internal validation. Results: Multivariate analyses showed that tumor size (cut-off value 3.0 cm, odds ratio [OR]=1.886, P=0.030), tumor depth (OR=1.853 for tumors with sm2 and sm3 invasion, P=0.010), cross-sectional location (OR=0.490 for tumors located on the greater curvature, P=0.0303), differentiation (OR=0.584 for differentiated tumors, P=0.0070), and lymphovascular invasion (OR=11.125, P<0.001) are possible risk factors for extraperigastric LNM. An equation for estimating the risk of extraperigastric LNM was derived from these risk factors. The equation was internally validated by comparing the actual metastatic rate with the predicted rate, which showed good agreement. Conclusions: A nomogram for estimating the risk of extraperigastric LNM in EGC was successfully developed. Although there are some limitations to applying this model because it was developed based on pathological data, it can be optimally adapted for patients who require curative gastrectomy after endoscopic submucosal dissection.

Backpack- and UAV-based Laser Scanning Application for Estimating Overstory and Understory Biomass of Forest Stands (임분 상하층의 바이오매스 조사를 위한 백팩형 라이다와 드론 라이다의 적용성 평가)

  • Heejae Lee;Seunguk Kim;Hyeyeong Choe
    • Journal of Korean Society of Forest Science
    • /
    • v.112 no.3
    • /
    • pp.363-373
    • /
    • 2023
  • Forest biomass surveys are regularly conducted to assess and manage forests as carbon sinks. LiDAR (Light Detection and Ranging), a remote sensing technology, has attracted considerable attention, as it allows for objective acquisition of forest structure information with minimal labor. In this study, we propose a method for estimating overstory and understory biomass in forest stands using backpack laser scanning (BPLS) and unmanned aerial vehicle laser scanning (UAV-LS), and assessed its accuracy. For overstory biomass, we analyzed the accuracy of BPLS and UAV-LS in estimating diameter at breast height (DBH) and tree height. For understory biomass, we developed a multiple regression model for estimating understory biomass using the best combination of vertical structure metrics extracted from the BPLS data. The results indicated that BPLS provided accurate estimations of DBH (R2 =0.92), but underestimated tree height (R2 =0.63, bias=-5.56 m), whereas UAV-LS showed strong performance in estimating tree height (R2 =0.91). For understory biomass, metrics representing the mean height of the points and the point density of the fourth layer were selected to develop the model. The cross-validation result of the understory biomass estimation model showed a coefficient of determination of 0.68. The study findings suggest that the proposed overstory and understory biomass survey methods using BPLS and UAV-LS can effectively replace traditional biomass survey methods.

Development and Validation of Psychological Difficulties Scale of Working Moms (워킹맘 심리적 어려움 척도 개발 및 타당화: 대졸이상 고학력 워킹맘 중심으로)

  • Jung, Hyun;Tak, Jinkook
    • The Korean Journal of Coaching Psychology
    • /
    • v.4 no.2
    • /
    • pp.1-26
    • /
    • 2020
  • The purpose of this study was to develop and validate the Psychological Difficulties Scale of Working Moms(PDSWM). In the first study, 69items and 17 factors of the inventory were obtained based on interview, open-ended questionnaires. In the second study, the on-line surveys from 306 working mom was carried to analyze factor structure of the PDSWM. The final result showed that the 12 factor model with 64 items was appropriate. The third study was collected from 638 working mom and in order to make certain the cross-validity of the inventory, the group was divided into two groups (each group with 319 employees). The results of exploratory factor analyses using data of group 1 showed that the 8 factor structure with 48 items was appropriate. Also the results of confirmatory factor analysis using data of group 2 showed that the 8 factor structure indicated a satisfactory fit. Final 8 factors were as follows: 1) Feeling Apologetic to family members 2) Discrimination at workplace 3) burnt-out: both body and mind 4) Unequal distribution of child-rearing and house chore labor 5) Conflict with the babysitter/grandparents 6) Limit in further strengthening work competency 7) Social prejudice 8) Difficulty being on time for work. The PDSWM was significantly correlated with various criteria such as organizational commitment, life satisfaction, and work engagement. Based on such findings, implications, limitations, and the suggestions for future study were discussed.

  • PDF