• Title/Summary/Keyword: training sets

Search Result 509, Processing Time 0.03 seconds

A Deep Learning-based Regression Model for Predicting Government Officer Education Satisfaction (공무원 직무 전문교육 만족도 예측을 위한 딥러닝 기반 회귀 모델 설계)

  • Sumin Oh;Sungyeon Yoon;Minseo Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.5
    • /
    • pp.667-671
    • /
    • 2024
  • Professional job training for government officers emphasizes establishing desirable values as public officials and improving professionalism in public service. To provide customized education, some studies are analyzed factors affecting education satisfaction. However, there is a lack of research predicting education satisfaction with educational contents. Therefore, we propose a deep learning-based regression model that predicts government officer education satisfaction with educational contents. We use education information data for government officer. We use one-hot encoding to categorize variables collected in text format, such as education targets, education classifications, and education types. We quantify the education contents stored in text format as TF-IDF. We train our deep learning-based regression model and validate model performance with 10-Fold Cross Validation. Our proposed model showed 99.87% accuracy on test sets. We expect that customized education recommendations based on our model will help provide and improve optimized education content.

A Study on AI-Based Real Estate Rate of Return Decision Models of 5 Sectors for 5 Global Cities: Seoul, New York, London, Paris and Tokyo (인공지능 (AI) 기반 섹터별 부동산 수익률 결정 모델 연구- 글로벌 5개 도시를 중심으로 (서울, 뉴욕, 런던, 파리, 도쿄) -)

  • Wonboo Lee;Jisoo Lee;Minsang Kim
    • Journal of Korean Society for Quality Management
    • /
    • v.52 no.3
    • /
    • pp.429-457
    • /
    • 2024
  • Purpose: This study aims to provide useful information to real estate investors by developing a profit determination model using artificial intelligence. The model analyzes the real estate markets of six selected cities from multiple perspectives, incorporating characteristics of the real estate market, economic indicators, and policies to determine potential profits. Methods: Data on real estate markets, economic indicators, and policies for five cities were collected and cleaned. The data was then normalized and split into training and testing sets. An AI model was developed using machine learning algorithms and trained with this data. The model was applied to the six cities, and its accuracy was evaluated using metrics such as Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-squared by comparing predicted profits to actual outcomes. Results: The profit determination model was successfully applied to the real estate markets of six cities, showing high accuracy and predictability in profit forecasts. The study provided valuable insights for real estate investors, demonstrating the model's utility for informed investment decisions. Conclusion: The study identified areas for future improvement, suggesting the integration of diverse data sources and advanced machine learning techniques to enhance predictive capabilities.

Automated Terrain Data Generation for Urban Flood Risk Mapping Using c-GAN and BBDM

  • Jonghyuk Lee;Sangik Lee;Byung-hun Seo;Dongsu Kim;Yejin Seo;Dongwoo Kim;Yerim Cho;Won Choi
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.1294-1294
    • /
    • 2024
  • Flood risk maps are used in urban flooding to understand the spatial extent and depth of inundation damage. To construct these maps, hydrodynamic modeling capable of simulating flood waves is necessary. Flood waves are typically fast, and inundation patterns can significantly vary depending on the terrain, making it essential to accurately represent the terrain of the flood source in flood wave analysis. Recently, methods using UAVs for terrain data construction through Structure-from-Motion or LiDAR have been utilized. These methods are crucial for UAV operations, and thus, still require a lot of time and manpower, and are limited when UAV operations are not possible. Therefore, for efficient nationwide monitoring, this study developed a model that can automatically generate terrain data by estimating depth information from a single image using c-GAN (Conditional Generative Adversarial Networks) and BBDM (Brownian Bridge Diffusion Model). The training, utilization, and validation datasets employed images from the ISPRS (2018) and directly aerial photographed image sets from five locations in the territory of the Republic of Korea. Compared to the ground truth of the test data set, it is considered sufficiently usable as terrain data for flood wave analysis, capable of generating highly accurate and precise terrain data with high reproducibility.

The Study of Metrics development for Entrepreneurial Program Effectiveness (청소년 창업교육프로그램 효과성 측정지표 개발 연구)

  • Byun, Youngjo;Kim, Myung Seuk;Yang, Young Seok
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.9 no.4
    • /
    • pp.77-85
    • /
    • 2014
  • A goal of Bizcool entrepreneurship education targeting on the youth falls on letting understand the process of starts-up, enhance entrepreneurship will and their business creativities rather than training trivial starts-up skills such as writing business plan for successful starts-up. The effects of education enable Bizcoo students to recognize rightly the concept of starts-up training and lead to spread out demand for entrepreneurship education. The feedback check-up for how entrepreneurship education affects students getting through of it is necessary and possible to bring its' improvement alternatives. Despite of such highlight, not many measuring tools and indexes of evaluating an effectiveness of entrepreneurship education are developed and studied up until. This research suggests for the optimal indexes for them. In specific, this research 49 the first question sets of evaluating an effectiveness of entrepreneurship education classified 3 large categories and 11 following sub categories each of them such as entrepreneurship orientation, creativity, entrepreneurship preparing activities etc,. representing embedding education effects though entrepreneurship education. This research carry out the empirical survey research utilizing driven question sets against 5 different Bizcools sampling 287 students. The survey research delivers the final 3 large categories and 8 following sub categories(Innovativeness, risk-taking, problem-solving potent, cooperative decision-making potent, efficient behavior capacity, data collecting potent, career search, starts-up search and preparation), and 38 measuring indexes by search and confirming factor analysis. This research never drop the confidence test over each indexes and obtain the proper figures. Last but not least, this research confirm the gap between starts-up club members and non members as to an effectiveness of entrepreneurship education and 9 different indexes.

  • PDF

Regeneration of a defective Railroad Surface for defect detection with Deep Convolution Neural Networks (Deep Convolution Neural Networks 이용하여 결함 검출을 위한 결함이 있는 철도선로표면 디지털영상 재 생성)

  • Kim, Hyeonho;Han, Seokmin
    • Journal of Internet Computing and Services
    • /
    • v.21 no.6
    • /
    • pp.23-31
    • /
    • 2020
  • This study was carried out to generate various images of railroad surfaces with random defects as training data to be better at the detection of defects. Defects on the surface of railroads are caused by various factors such as friction between track binding devices and adjacent tracks and can cause accidents such as broken rails, so railroad maintenance for defects is necessary. Therefore, various researches on defect detection and inspection using image processing or machine learning on railway surface images have been conducted to automate railroad inspection and to reduce railroad maintenance costs. In general, the performance of the image processing analysis method and machine learning technology is affected by the quantity and quality of data. For this reason, some researches require specific devices or vehicles to acquire images of the track surface at regular intervals to obtain a database of various railway surface images. On the contrary, in this study, in order to reduce and improve the operating cost of image acquisition, we constructed the 'Defective Railroad Surface Regeneration Model' by applying the methods presented in the related studies of the Generative Adversarial Network (GAN). Thus, we aimed to detect defects on railroad surface even without a dedicated database. This constructed model is designed to learn to generate the railroad surface combining the different railroad surface textures and the original surface, considering the ground truth of the railroad defects. The generated images of the railroad surface were used as training data in defect detection network, which is based on Fully Convolutional Network (FCN). To validate its performance, we clustered and divided the railroad data into three subsets, one subset as original railroad texture images and the remaining two subsets as another railroad surface texture images. In the first experiment, we used only original texture images for training sets in the defect detection model. And in the second experiment, we trained the generated images that were generated by combining the original images with a few railroad textures of the other images. Each defect detection model was evaluated in terms of 'intersection of union(IoU)' and F1-score measures with ground truths. As a result, the scores increased by about 10~15% when the generated images were used, compared to the case that only the original images were used. This proves that it is possible to detect defects by using the existing data and a few different texture images, even for the railroad surface images in which dedicated training database is not constructed.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

Diagnostic Classification of Chest X-ray Pneumonia using Inception V3 Modeling (Inception V3를 이용한 흉부촬영 X선 영상의 폐렴 진단 분류)

  • Kim, Ji-Yul;Ye, Soo-Young
    • Journal of the Korean Society of Radiology
    • /
    • v.14 no.6
    • /
    • pp.773-780
    • /
    • 2020
  • With the development of the 4th industrial, research is being conducted to prevent diseases and reduce damage in various fields of science and technology such as medicine, health, and bio. As a result, artificial intelligence technology has been introduced and researched for image analysis of radiological examinations. In this paper, we will directly apply a deep learning model for classification and detection of pneumonia using chest X-ray images, and evaluate whether the deep learning model of the Inception series is a useful model for detecting pneumonia. As the experimental material, a chest X-ray image data set provided and shared free of charge by Kaggle was used, and out of the total 3,470 chest X-ray image data, it was classified into 1,870 training data sets, 1,100 validation data sets, and 500 test data sets. I did. As a result of the experiment, the result of metric evaluation of the Inception V3 deep learning model was 94.80% for accuracy, 97.24% for precision, 94.00% for recall, and 95.59 for F1 score. In addition, the accuracy of the final epoch for Inception V3 deep learning modeling was 94.91% for learning modeling and 89.68% for verification modeling for pneumonia detection and classification of chest X-ray images. For the evaluation of the loss function value, the learning modeling was 1.127% and the validation modeling was 4.603%. As a result, it was evaluated that the Inception V3 deep learning model is a very excellent deep learning model in extracting and classifying features of chest image data, and its learning state is also very good. As a result of matrix accuracy evaluation for test modeling, the accuracy of 96% for normal chest X-ray image data and 97% for pneumonia chest X-ray image data was proven. The deep learning model of the Inception series is considered to be a useful deep learning model for classification of chest diseases, and it is expected that it can also play an auxiliary role of human resources, so it is considered that it will be a solution to the problem of insufficient medical personnel. In the future, this study is expected to be presented as basic data for similar studies in the case of similar studies on the diagnosis of pneumonia using deep learning.

Development an Artificial Neural Network to Predict Infectious Bronchitis Virus Infection in Laying Hen Flocks (산란계의 전염성 기관지염을 예측하기 위한 인공신경망 모형의 개발)

  • Pak Son-Il;Kwon Hyuk-Moo
    • Journal of Veterinary Clinics
    • /
    • v.23 no.2
    • /
    • pp.105-110
    • /
    • 2006
  • A three-layer, feed-forward artificial neural network (ANN) with sixteen input neurons, three hidden neurons, and one output neuron was developed to identify the presence of infectious bronchitis (IB) infection as early as possible in laying hen flocks. Retrospective data from flocks that enrolled IB surveillance program between May 2003 and November 2005 were used to build the ANN. Data set of 86 flocks was divided randomly into two sets: 77 cases for training set and 9 cases for testing set. Input factors were 16 epidemiological findings including characteristics of the layer house, management practice, flock size, and the output was either presence or absence of IB. ANN was trained using training set with a back-propagation algorithm and test set was used to determine the network's capability to predict outcomes that it has never seen. Diagnostic performance of the trained network was evaluated by constructing receiver operating characteristic (ROC) curve with the area under the curve (AUC), which were also used to determine the best positivity criterion for the model. Several different ANNs with different structures were created. The best-fitted trained network, IBV_D1, was able to predict IB in 73 cases out of 77 (diagnostic accuracy 94.8%) in the training set. Sensitivity and specificity of the trained neural network was 95.5% (42/44, 95% CI, 84.5-99.4) and 93.9% (31/33, 95% CI, 79.8-99.3), respectively. For testing set, AVC of the ROC curve for the IBV_D1 network was 0.948 (SE=0.086, 95% CI 0.592-0.961) in recognizing IB infection status accurately. At a criterion of 0.7149, the diagnostic accuracy was the highest with a 88.9% with the highest sensitivity of 100%. With this value of sensitivity and specificity together with assumed 44% of IB prevalence, IBV_D1 network showed a PPV of 80% and an NPV of 100%. Based on these findings, the authors conclude that neural network can be successfully applied to the development of a screening model for identifying IB infection in laying hen flocks.

A Methodology for Automatic Multi-Categorization of Single-Categorized Documents (단일 카테고리 문서의 다중 카테고리 자동확장 방법론)

  • Hong, Jin-Sung;Kim, Namgyu;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.77-92
    • /
    • 2014
  • Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we propose a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. First, we attempt to find the relationship between documents and topics by using the result of topic analysis for single-categorized documents. Second, we construct a correspondence table between topics and categories by investigating the relationship between them. Finally, we calculate the matching scores for each document to multiple categories. The results imply that a document can be classified into a certain category if and only if the matching score is higher than the predefined threshold. For example, we can classify a certain document into three categories that have larger matching scores than the predefined threshold. The main contribution of our study is that our methodology can improve the applicability of traditional multi-category classifiers by generating multi-categorized documents from single-categorized documents. Additionally, we propose a module for verifying the accuracy of the proposed methodology. For performance evaluation, we performed intensive experiments with news articles. News articles are clearly categorized based on the theme, whereas the use of vulgar language and slang is smaller than other usual text document. We collected news articles from July 2012 to June 2013. The articles exhibit large variations in terms of the number of types of categories. This is because readers have different levels of interest in each category. Additionally, the result is also attributed to the differences in the frequency of the events in each category. In order to minimize the distortion of the result from the number of articles in different categories, we extracted 3,000 articles equally from each of the eight categories. Therefore, the total number of articles used in our experiments was 24,000. The eight categories were "IT Science," "Economy," "Society," "Life and Culture," "World," "Sports," "Entertainment," and "Politics." By using the news articles that we collected, we calculated the document/category correspondence scores by utilizing topic/category and document/topics correspondence scores. The document/category correspondence score can be said to indicate the degree of correspondence of each document to a certain category. As a result, we could present two additional categories for each of the 23,089 documents. Precision, recall, and F-score were revealed to be 0.605, 0.629, and 0.617 respectively when only the top 1 predicted category was evaluated, whereas they were revealed to be 0.838, 0.290, and 0.431 when the top 1 - 3 predicted categories were considered. It was very interesting to find a large variation between the scores of the eight categories on precision, recall, and F-score.

Radiomics Analysis of Gray-Scale Ultrasonographic Images of Papillary Thyroid Carcinoma > 1 cm: Potential Biomarker for the Prediction of Lymph Node Metastasis (Radiomics를 이용한 1 cm 이상의 갑상선 유두암의 초음파 영상 분석: 림프절 전이 예측을 위한 잠재적인 바이오마커)

  • Hyun Jung Chung;Kyunghwa Han;Eunjung Lee;Jung Hyun Yoon;Vivian Youngjean Park;Minah Lee;Eun Cho;Jin Young Kwak
    • Journal of the Korean Society of Radiology
    • /
    • v.84 no.1
    • /
    • pp.185-196
    • /
    • 2023
  • Purpose This study aimed to investigate radiomics analysis of ultrasonographic images to develop a potential biomarker for predicting lymph node metastasis in papillary thyroid carcinoma (PTC) patients. Materials and Methods This study included 431 PTC patients from August 2013 to May 2014 and classified them into the training and validation sets. A total of 730 radiomics features, including texture matrices of gray-level co-occurrence matrix and gray-level run-length matrix and single-level discrete two-dimensional wavelet transform and other functions, were obtained. The least absolute shrinkage and selection operator method was used for selecting the most predictive features in the training data set. Results Lymph node metastasis was associated with the radiomics score (p < 0.001). It was also associated with other clinical variables such as young age (p = 0.007) and large tumor size (p = 0.007). The area under the receiver operating characteristic curve was 0.687 (95% confidence interval: 0.616-0.759) for the training set and 0.650 (95% confidence interval: 0.575-0.726) for the validation set. Conclusion This study showed the potential of ultrasonography-based radiomics to predict cervical lymph node metastasis in patients with PTC; thus, ultrasonography-based radiomics can act as a biomarker for PTC.