• Title/Summary/Keyword: data set

Search Result 10,939, Processing Time 0.04 seconds

An Filtering Automatic Technique of LiDAR Data by Multiple Linear Regression Analysis (다중선형 회귀분석에 의한 LiDAR 자료의 필터링 자동화 기법)

  • Choi, Seung-Pil;Cho, Ji-Hyun;Kim, Jun-Seong
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.19 no.4
    • /
    • pp.109-118
    • /
    • 2011
  • In this research estimated accuracies that were results in all the area of filtering of the plane equation that was used by whole data set, and regional of filtering that was driven by the plane equation for each vertual Grid. All of this estimates were based by all the area of filtering that deduced the plane equation by multiple linear regression analysis that was used by ground data set. Therefore, accuracy of all the area of filtering that used whole data set has been dropped about 2~3% when average of accuracy of all the area of filtering was based on ground data set while accuracy of Regional of filtering dropped 2~4% when based on virtual Grid. Moreover, as virtual Grid which was set 3~4 cm was difference about 2% of accuracy from standard data. Thus, it leads conclusion of set 3~4 times bigger size in virtual Grid filtering over LiDAR scan gap will be more appropriated. Hence, the result of this research allow us to conclude that there was difference in average accuracy has been noticed when we applied each different approaches, I strongly suggest that it need to research more about real topography for further filtering accuracy.

Design and Implementation of Cyber Warfare Training Data Set Generation Method based on Traffic Distribution Plan (트래픽 유통계획 기반 사이버전 훈련데이터셋 생성방법 설계 및 구현)

  • Kim, Yong Hyun;Ahn, Myung Kil
    • Convergence Security Journal
    • /
    • v.20 no.4
    • /
    • pp.71-80
    • /
    • 2020
  • In order to provide realistic traffic to the cyber warfare training system, it is necessary to prepare a traffic distribution plan in advance and to create a training data set using normal/threat data sets. This paper presents the design and implementation results of a method for creating a traffic distribution plan and a training data set to provide background traffic like a real environment to a cyber warfare training system. We propose a method of a traffic distribution plan by using the network topology of the training environment to distribute traffic and the traffic attribute information collected in real and simulated environments. We propose a method of generating a training data set according to a traffic distribution plan using a unit traffic and a mixed traffic method using the ratio of the protocol. Using the implemented tool, a traffic distribution plan was created, and the training data set creation result according to the distribution plan was confirmed.

Japanese Vowel Sound Classification Using Fuzzy Inference System

  • Phitakwinai, Suwannee;Sawada, Hideyuki;Auephanwiriyakul, Sansanee;Theera-Umpon, Nipon
    • Journal of the Korea Convergence Society
    • /
    • v.5 no.1
    • /
    • pp.35-41
    • /
    • 2014
  • An automatic speech recognition system is one of the popular research problems. There are many research groups working in this field for different language including Japanese. Japanese vowel recognition is one of important parts in the Japanese speech recognition system. The vowel classification system with the Mamdani fuzzy inference system was developed in this research. We tested our system on the blind test data set collected from one male native Japanese speaker and four male non-native Japanese speakers. All subjects in the blind test data set were not the same subjects in the training data set. We found out that the classification rate from the training data set is 95.0 %. In the speaker-independent experiments, the classification rate from the native speaker is around 70.0 %, whereas that from the non-native speakers is around 80.5 %.

Nondestructive Quantification of Intact Ambroxol Tablet using Near-infrared Spectroscopy (근적외분광분석법을 사용한 암브록솔 정제의 비파괴적 정량분석)

  • 임현량;우영아;김도형;김효진;강신정;최현철;최한곤
    • YAKHAK HOEJI
    • /
    • v.48 no.1
    • /
    • pp.60-64
    • /
    • 2004
  • Near-infrared (NIR) spectroscopy was used to determine rapidly and nondestructively the content of ambroxol in intact ambroxol tablets containing 30 mg (12.5% m/m nominal concentration) by collecting NIR spectra in range 1100-1750 nm. The laboratory-made samples had 10.3∼15.9% m/m nominal ambroxol concentration. The measurements were made by reflection using a fiber-optic probe and calibration was carried out by partial least square regression (PLSR) with autoscaling. Model validation was performed by randomly splitting the data set into calibration and validation data set (7 samples as a calibration data set and 5 samples as a validation data set). The developed NIR method gave results comparable to the known values of tablets in a laboratorial manufacturing Process, standard error of calibration (SEC) and standard error of prediction (SEP) being 0.49% and 0.49% m/m respectively. The method showed good accuracy and repeatability NIR spectroscopic determination in intact tablets allowed the potential use of real time monitoring for a running production process.

A Study of Safety Accident Prediction Model (Focusing on Military Traffic Accident Cases) (안전사고 예측모형 개발 방안에 관한 연구(군 교통사고 사례를 중심으로))

  • Ki, Jae-Sug;Hong, Myeong-Gi
    • Journal of the Society of Disaster Information
    • /
    • v.17 no.3
    • /
    • pp.427-441
    • /
    • 2021
  • Purpose: This study proposes a method for developing a model that predicts the probability of traffic accidents in advance to prevent the most frequent traffic accidents in the military. Method: For this purpose, CRISP-DM (Cross Industry Standard Process for Data Mining) was applied in this study. The CRISP-DM process consists of 6 stages, and each stage is not unidirectional like the Waterfall Model, but improves the level of completeness through feedback between stages. Results: As a result of modeling the same data set as the previously constructed accident investigation data for the entire group, when the classification criterion was 0.5, Significant results were derived from the accuracy, specificity, sensitivity, and AUC of the model for predicting traffic accidents. Conclusion: In the process of designing the prediction model, it was confirmed that it was difficult to obtain a meaningful prediction value due to the lack of data. The methodology for designing a predictive model using the data set was proposed by reorganizing and expanding a data set capable of rational inference to solve the data shortage.

Set Menu Preferences of Middle and High School Students in School Foodservice (남녀 중,고등학생의 학교급식 세트메뉴에 대한 선호도)

  • Lee, Na-Yeong;Gwak, Dong-Gyeong;Lee, Gyeong-Eun
    • Journal of the Korean Dietetic Association
    • /
    • v.13 no.1
    • /
    • pp.1-14
    • /
    • 2007
  • The purpose of this study was to assess students’ preference on set menus served in school foodservice. Questionnaires were distributed to 4,050 students enrolled in 34 middle and high schools located in Seoul, Gyeonggi, and Gyeongnam provinces. The students were asked to assess their preferences on 78 set menus using a 5-point Likert-type scale(1 : very dislike - 5 : very like). Excluding responses with significant missing data, usable responses were 3,433. Data were analyzed with descriptive analysis, t-test, and one-way analysis of variance. There was no difference between middle and high school students in terms of set menu preferences. On the other hand, there was significant difference between boys' and girls' set menu preferences. Among the seven given set menu groups(rice and soup with side dishes, tangs, rice with toppings, fried rice, western foods, noodles.ddeokguk.dumpling soups, and bibimbaps), boys had higher preference scores for the rice and soup with side dishes, tangs, rice with toppings, and fried rice than that of girls. Fried rice set menus were chosen to be boys’ favorite menus while western food set menus were most preferred by the girls. Rice and soup with side dishes set menus were least preferred by both boys and girls.

  • PDF

Estimating pile setup parameter using XGBoost-based optimized models

  • Xigang Du;Ximeng Ma;Chenxi Dong;Mehrdad Sattari Nikkhoo
    • Geomechanics and Engineering
    • /
    • v.36 no.3
    • /
    • pp.259-276
    • /
    • 2024
  • The undrained shear strength is widely acknowledged as a fundamental mechanical property of soil and is considered a critical engineering parameter. In recent years, researchers have employed various methodologies to evaluate the shear strength of soil under undrained conditions. These methods encompass both numerical analyses and empirical techniques, such as the cone penetration test (CPT), to gain insights into the properties and behavior of soil. However, several of these methods rely on correlation assumptions, which can lead to inconsistent accuracy and precision. The study involved the development of innovative methods using extreme gradient boosting (XGB) to predict the pile set-up component "A" based on two distinct data sets. The first data set includes average modified cone point bearing capacity (qt), average wall friction (fs), and effective vertical stress (σvo), while the second data set comprises plasticity index (PI), soil undrained shear cohesion (Su), and the over consolidation ratio (OCR). These data sets were utilized to develop XGBoost-based methods for predicting the pile set-up component "A". To optimize the internal hyperparameters of the XGBoost model, four optimization algorithms were employed: Particle Swarm Optimization (PSO), Social Spider Optimization (SSO), Arithmetic Optimization Algorithm (AOA), and Sine Cosine Optimization Algorithm (SCOA). The results from the first data set indicate that the XGBoost model optimized using the Arithmetic Optimization Algorithm (XGB - AOA) achieved the highest accuracy, with R2 values of 0.9962 for the training part and 0.9807 for the testing part. The performance of the developed models was further evaluated using the RMSE, MAE, and VAF indices. The results revealed that the XGBoost model optimized using XGBoost - AOA outperformed other models in terms of accuracy, with RMSE, MAE, and VAF values of 0.0078, 0.0015, and 99.6189 for the training part and 0.0141, 0.0112, and 98.0394 for the testing part, respectively. These findings suggest that XGBoost - AOA is the most accurate model for predicting the pile set-up component.

Statistical Method of Ranking Candidate Genes for the Biomarker

  • Kim, Byung-Soo;Kim, In-Young;Lee, Sun-Ho;Rha, Sun-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.1
    • /
    • pp.169-182
    • /
    • 2007
  • Receive operating characteristic (ROC) approach can be employed to rank candidate genes from a microarray experiment, in particular, for the biomarker development with the purpose of population screening of a cancer. In the cancer microarray experiment based on n patients the researcher often wants to compare the tumor tissue with the normal tissue within the same individual using a common reference RNA. Ideally, this experiment produces n pairs of microarray data. However, it is often the case that there are missing values either in the normal or tumor tissue data. Practically, we have $n_1$ pairs of complete observations, $n_2$ "normal only" and $n_3$ "tumor only" data for the microarray. We refer to this data set as a mixed data set. We develop a ROC approach on the mixed data set to rank candidate genes for the biomarker development for the colorectal cancer screening. It turns out that the correlation between two ranks in terms of ROC and t statistics based on the top 50 genes of ROC rank is less than 0.6. This result indicates that employing a right approach of ranking candidate genes for the biomarker development is important for the allocation of resources.

Analysis of Factors Affecting Mode Choice Behavior by Stated Preference(SP) Data in Secondary Cities (SP Data에 의한 지방도시의 교통수단선택 요인분석에 관한 연구)

  • ;山川仁;申運稙
    • Journal of Korean Society of Transportation
    • /
    • v.10 no.3
    • /
    • pp.21-42
    • /
    • 1992
  • As for the travel demand analysis of the past, forcasting has been conducted by the use of revealed preference(RP) informations about actual or observed choices made by individuals. Forcasting method using RP data needs implicit assumptions that there will be no remarkable changes in existing transport conditions. However in case of occuring the great changes in existing conditions or adding a new choice-set of hypothetical options, it is very difficult to predict future travel demand. Fortunately in recent years, especially in the mode choice analysis, it has been perceived that the importance of individual performance data using stated preference(SP) experiments as well as RP data. But the research reports has not been reported sufficiently from models estimated using SP data. Under this background, we analyze the factors affecting the mode choice behavior as a fundamental study against the modelling task with SP choice data. For this analysis, we assumed subway operations in the secondary cities where there are no subway lines until now, and set up a choice-set of hypothetical options based on Experimental Design Method.

  • PDF