• 제목/요약/키워드: Generate Data

Search Result 3,084, Processing Time 0.032 seconds

Generating Training Dataset of Machine Learning Model for Context-Awareness in a Health Status Notification Service (사용자 건강 상태알림 서비스의 상황인지를 위한 기계학습 모델의 학습 데이터 생성 방법)

  • Mun, Jong Hyeok;Choi, Jong Sun;Choi, Jae Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.1
    • /
    • pp.25-32
    • /
    • 2020
  • In the context-aware system, rule-based AI technology has been used in the abstraction process for getting context information. However, the rules are complicated by the diversification of user requirements for the service and also data usage is increased. Therefore, there are some technical limitations to maintain rule-based models and to process unstructured data. To overcome these limitations, many studies have applied machine learning techniques to Context-aware systems. In order to utilize this machine learning-based model in the context-aware system, a management process of periodically injecting training data is required. In the previous study on the machine learning based context awareness system, a series of management processes such as the generation and provision of learning data for operating several machine learning models were considered, but the method was limited to the applied system. In this paper, we propose a training data generating method of a machine learning model to extend the machine learning based context-aware system. The proposed method define the training data generating model that can reflect the requirements of the machine learning models and generate the training data for each machine learning model. In the experiment, the training data generating model is defined based on the training data generating schema of the cardiac status analysis model for older in health status notification service, and the training data is generated by applying the model defined in the real environment of the software. In addition, it shows the process of comparing the accuracy by learning the training data generated in the machine learning model, and applied to verify the validity of the generated learning data.

Multidimensional data generation of water distribution systems using adversarially trained autoencoder (적대적 학습 기반 오토인코더(ATAE)를 이용한 다차원 상수도관망 데이터 생성)

  • Kim, Sehyeong;Jun, Sanghoon;Jung, Donghwi
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.7
    • /
    • pp.439-449
    • /
    • 2023
  • Recent advancements in data measuring technology have facilitated the installation of various sensors, such as pressure meters and flow meters, to effectively assess the real-time conditions of water distribution systems (WDSs). However, as cities expand extensively, the factors that impact the reliability of measurements have become increasingly diverse. In particular, demand data, one of the most significant hydraulic variable in WDS, is challenging to be measured directly and is prone to missing values, making the development of accurate data generation models more important. Therefore, this paper proposes an adversarially trained autoencoder (ATAE) model based on generative deep learning techniques to accurately estimate demand data in WDSs. The proposed model utilizes two neural networks: a generative network and a discriminative network. The generative network generates demand data using the information provided from the measured pressure data, while the discriminative network evaluates the generated demand outputs and provides feedback to the generator to learn the distinctive features of the data. To validate its performance, the ATAE model is applied to a real distribution system in Austin, Texas, USA. The study analyzes the impact of data uncertainty by calculating the accuracy of ATAE's prediction results for varying levels of uncertainty in the demand and the pressure time series data. Additionally, the model's performance is evaluated by comparing the results for different data collection periods (low, average, and high demand hours) to assess its ability to generate demand data based on water consumption levels.

Comparison of Estimation Methods for the Density on Expressways Using Vehicular Trajectory Data from a Radar Detector (레이더검지기의 차량궤적 정보기반의 고속도로 밀도산출방법에 관한 비교)

  • Kim, Sang-Gu;Han, Eum;Lee, Hwan-Pil;Kim, Hae;Yun, Ilsoo
    • International Journal of Highway Engineering
    • /
    • v.18 no.5
    • /
    • pp.117-125
    • /
    • 2016
  • PURPOSES : The density in uninterrupted traffic flow facilities plays an important role in representing the current status of traffic flow. For example, the density is used for the primary measures of effectiveness in the capacity analysis for freeway facilities. Therefore, the estimation of density has been a long and tough task for traffic engineers for a long time. This study was initiated to evaluate the performance of density values that were estimated using VDS data and two traditional methods, including a method using traffic flow theory and another method using occupancy by comparing the density values estimated using vehicular trajectory data generated from a radar detector. METHODS : In this study, a radar detector which can generate very accurate vehicular trajectory within the range of 250 m on the Joongbu expressway near to Dongseoul tollgate, where two VDS were already installed. The first task was to estimate densities using different data and methods. Thus, the density values were estimated using two traditional methods and the VDS data on the Joongbu expressway. The density values were compared with those estimated using the vehicular trajectory data in order to evaluate the quality of density estimation. Then, the relationship between the space mean speed and density were drawn using two sets of densities and speeds based on the VDS data and one set of those using the radar detector data. CONCLUSIONS : As a result, the three sets of density showed minor differences when the density values were under 20 vehicles per km per lane. However, as the density values become greater than 20 vehicles per km per lane, the three methods showed a significant difference among on another. The density using the vehicular trajectory data showed the lowest values in general. Based on the in-depth study, it was found out that the space mean speed plays a critical role in the calculation of density. The speed estimated from the VDS data was higher than that from the radar detector. In order to validate the difference in the speed data, the traffic flow models using the relationships between the space mean speed and the density were carefully examined in this study. Conclusively, the traffic flow models generated using the radar data seems to be more realistic.

Recurrent Neural Network Modeling of Etch Tool Data: a Preliminary for Fault Inference via Bayesian Networks

  • Nawaz, Javeria;Arshad, Muhammad Zeeshan;Park, Jin-Su;Shin, Sung-Won;Hong, Sang-Jeen
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2012.02a
    • /
    • pp.239-240
    • /
    • 2012
  • With advancements in semiconductor device technologies, manufacturing processes are getting more complex and it became more difficult to maintain tighter process control. As the number of processing step increased for fabricating complex chip structure, potential fault inducing factors are prevail and their allowable margins are continuously reduced. Therefore, one of the key to success in semiconductor manufacturing is highly accurate and fast fault detection and classification at each stage to reduce any undesired variation and identify the cause of the fault. Sensors in the equipment are used to monitor the state of the process. The idea is that whenever there is a fault in the process, it appears as some variation in the output from any of the sensors monitoring the process. These sensors may refer to information about pressure, RF power or gas flow and etc. in the equipment. By relating the data from these sensors to the process condition, any abnormality in the process can be identified, but it still holds some degree of certainty. Our hypothesis in this research is to capture the features of equipment condition data from healthy process library. We can use the health data as a reference for upcoming processes and this is made possible by mathematically modeling of the acquired data. In this work we demonstrate the use of recurrent neural network (RNN) has been used. RNN is a dynamic neural network that makes the output as a function of previous inputs. In our case we have etch equipment tool set data, consisting of 22 parameters and 9 runs. This data was first synchronized using the Dynamic Time Warping (DTW) algorithm. The synchronized data from the sensors in the form of time series is then provided to RNN which trains and restructures itself according to the input and then predicts a value, one step ahead in time, which depends on the past values of data. Eight runs of process data were used to train the network, while in order to check the performance of the network, one run was used as a test input. Next, a mean squared error based probability generating function was used to assign probability of fault in each parameter by comparing the predicted and actual values of the data. In the future we will make use of the Bayesian Networks to classify the detected faults. Bayesian Networks use directed acyclic graphs that relate different parameters through their conditional dependencies in order to find inference among them. The relationships between parameters from the data will be used to generate the structure of Bayesian Network and then posterior probability of different faults will be calculated using inference algorithms.

  • PDF

Probe Vehicle Data Collecting Intervals for Completeness of Link-based Space Mean Speed Estimation (링크 공간평균속도 신뢰성 확보를 위한 프로브 차량 데이터 적정 수집주기 산정 연구)

  • Oh, Chang-hwan;Won, Minsu;Song, Tai-jin
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.5
    • /
    • pp.70-81
    • /
    • 2020
  • Point-by-point data, which is abundantly collected by vehicles with embedded GPS (Global Positioning System), generate useful information. These data facilitate decisions by transportation jurisdictions, and private vendors can monitor and investigate micro-scale driver behavior, traffic flow, and roadway movements. The information is applied to develop app-based route guidance and business models. Of these, speed data play a vital role in developing key parameters and applying agent-based information and services. Nevertheless, link speed values require different levels of physical storage and fidelity, depending on both collecting and reporting intervals. Given these circumstances, this study aimed to establish an appropriate collection interval to efficiently utilize Space Mean Speed information by vehicles with embedded GPS. We conducted a comparison of Probe-vehicle data and Image-based vehicle data to understand PE(Percentage Error). According to the study results, the PE of the Probe-vehicle data showed a 95% confidence level within an 8-second interval, which was chosen as the appropriate collection interval for Probe-vehicle data. It is our hope that the developed guidelines facilitate C-ITS, and autonomous driving service providers will use more reliable Space Mean Speed data to develop better related C-ITS and autonomous driving services.

Coastal Wave Hind-Casting Modelling Using ECMWF Wind Dataset (ECMWF 바람자료를 이용한 연안 파랑후측모델링)

  • Kang, Tae-Soon;Park, Jong-Jip;Eum, Ho-Sik
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.21 no.5
    • /
    • pp.599-607
    • /
    • 2015
  • The purpose of this study is to reproduce long-term wave fields in coastal waters of Korea based on wave hind-casting modelling and discuss its applications. To validate wind data(NCEP, ECMWF, JMA-MSM), comparison of wind data was done with wave buoy data. JMA-MSM predicted wind data with high accuracy. But due to relatively longer period of ECMWF wind data as compared to that of JMA-MSM, wind data set of ECMWF(2001~2014) was used to perform wave hind-casting modelling. Results from numerical modelling were verified with the observed data of wave buoys installed by Korea Meteorological Administration(KMA) and Korea Hydrographic and Oceanographic Agency(KHOA) on offshore waters. The results agree well with observations at buoy stations, especially during the event periods such as a typhoon. Consequently, the wave data reproduced by wave hind-casting modelling was used to obtain missing data in wave observation buoys. The obtained missing data indicated underestimation of maximum wave height during the event period at some points of buoys. Reasons for such underestimation may be due to larger time interval and resolution of the input wind data, water depth and grid size etc. The methodology used in present study can be used to analyze coastal erosion data in conjunction with a wave characteristic of the event period in coastal areas. Additionally, the method can be used in the coastal disaster vulnerability assessment to generate wave points of interest.

Contract-based Access Control Method for NFT Use Rights

  • Jeong, Yoonsung;Ko, Deokyoon;Seo, Jungwon;Park, Sooyong;Kim, Seong-Jin;Kim, Bum-Soo;Kim, Do-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.1-11
    • /
    • 2022
  • In this paper, we propose an NFT(Non-Fungible Token)-based access control method for safely sharing data between users in blockchain environment. Since all data stored in the blockchain can be accessed by anyone due to the nature of the technology, it is necessary to control access except for authorized users when sharing sensitive data. For that, we generate each data as NFT and controls access to the data through the smart contract. In addition, in order to overcome the limitations of single ownership of the existing NFT, we separated the NFT into ownership and use rights, so that data can be safely shared between users. Ownership is represented as an original NFT, use rights is represented as a copied NFT, and all data generated as NFT is encrypted and uploaded, so data can be shared only through the smart contract with access control. To verify this approach, we set up a hypothetical scenario called Building Information Modeling (BIM) data trade, and deployed a smart contract that satisfies 32 function call scenarios that require access control. Also, we evaluated the stability in consideration of the possibility of decryption through brute-force attack. Through our approach, we confirmed that the data can be safely shared between users in blockchain environment.

Efficient Mining of Frequent Subgraph with Connectivity Constraint

  • Moon, Hyun-S.;Lee, Kwang-H.;Lee, Do-Heon
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.267-271
    • /
    • 2005
  • The goal of data mining is to extract new and useful knowledge from large scale datasets. As the amount of available data grows explosively, it became vitally important to develop faster data mining algorithms for various types of data. Recently, an interest in developing data mining algorithms that operate on graphs has been increased. Especially, mining frequent patterns from structured data such as graphs has been concerned by many research groups. A graph is a highly adaptable representation scheme that used in many domains including chemistry, bioinformatics and physics. For example, the chemical structure of a given substance can be modelled by an undirected labelled graph in which each node corresponds to an atom and each edge corresponds to a chemical bond between atoms. Internet can also be modelled as a directed graph in which each node corresponds to an web site and each edge corresponds to a hypertext link between web sites. Notably in bioinformatics area, various kinds of newly discovered data such as gene regulation networks or protein interaction networks could be modelled as graphs. There have been a number of attempts to find useful knowledge from these graph structured data. One of the most powerful analysis tool for graph structured data is frequent subgraph analysis. Recurring patterns in graph data can provide incomparable insights into that graph data. However, to find recurring subgraphs is extremely expensive in computational side. At the core of the problem, there are two computationally challenging problems. 1) Subgraph isomorphism and 2) Enumeration of subgraphs. Problems related to the former are subgraph isomorphism problem (Is graph A contains graph B?) and graph isomorphism problem(Are two graphs A and B the same or not?). Even these simplified versions of the subgraph mining problem are known to be NP-complete or Polymorphism-complete and no polynomial time algorithm has been existed so far. The later is also a difficult problem. We should generate all of 2$^n$ subgraphs if there is no constraint where n is the number of vertices of the input graph. In order to find frequent subgraphs from larger graph database, it is essential to give appropriate constraint to the subgraphs to find. Most of the current approaches are focus on the frequencies of a subgraph: the higher the frequency of a graph is, the more attentions should be given to that graph. Recently, several algorithms which use level by level approaches to find frequent subgraphs have been developed. Some of the recently emerging applications suggest that other constraints such as connectivity also could be useful in mining subgraphs : more strongly connected parts of a graph are more informative. If we restrict the set of subgraphs to mine to more strongly connected parts, its computational complexity could be decreased significantly. In this paper, we present an efficient algorithm to mine frequent subgraphs that are more strongly connected. Experimental study shows that the algorithm is scaling to larger graphs which have more than ten thousand vertices.

  • PDF

Interactions between Soil Moisture and Weather Prediction in Rainfall-Runoff Application : Korea Land Data Assimilation System(KLDAS) (수리 모형을 이용한 Korea Land Data Assimilation System (KLDAS) 자료의 수문자료에 대한 영향력 분석)

  • Jung, Yong;Choi, Minha
    • 한국방재학회:학술대회논문집
    • /
    • 2011.02a
    • /
    • pp.172-172
    • /
    • 2011
  • The interaction between land surface and atmosphere is essentially affected by hydrometeorological variables including soil moisture. Accurate estimation of soil moisture at spatial and temporal scales is crucial to better understand its roles to the weather systems. The KLDAS(Korea Land Data Assimilation System) is a regional, specifically Korea peninsula land surface information systems. As other prior land data assimilation systems, this can provide initial soil field information which can be used in atmospheric simulations. For this study, as an enabling high-resolution tool, weather research and forecasting(WRF-ARW) model is applied to produce precipitation data using GFS(Global Forecast System) with GFS embedded and KLDAS soil moisture information as initialization data. WRF-ARW generates precipitation data for a specific region using different parameters in physics options. The produced precipitation data will be employed for simulations of Hydrological Models such as HEC(Hydrologic Engineering Center) - HMS(Hydrologic Modeling System) as predefined input data for selected regional water responses. The purpose of this study is to show the impact of a hydrometeorological variable such as soil moisture in KLDAS on hydrological consequences in Korea peninsula. The study region, Chongmi River Basin, is located in the center of Korea Peninsular. This has 60.8Km river length and 17.01% slope. This region mostly consists of farming field however the chosen study area placed in mountainous area. The length of river basin perimeter is 185Km and the average width of river is 9.53 meter with 676 meter highest elevation in this region. We have four different observation locations : Sulsung, Taepyung, Samjook, and Sangkeug observatoriesn, This watershed is selected as a tentative research location and continuously studied for getting hydrological effects from land surface information. Simulations for a real regional storm case(June 17~ June 25, 2006) are executed. WRF-ARW for this case study used WSM6 as a micro physics, Kain-Fritcsch Scheme for cumulus scheme, and YSU scheme for planetary boundary layer. The results of WRF simulations generate excellent precipitation data in terms of peak precipitation and date, and the pattern of daily precipitation for four locations. For Sankeug observatory, WRF overestimated precipitation approximately 100 mm/day on July 17, 2006. Taepyung and Samjook display that WRF produced either with KLDAS or with GFS embedded initial soil moisture data higher precipitation amounts compared to observation. Results and discussions in detail on accuracy of prediction using formerly mentioned manners are going to be presented in 2011 Annual Conference of the Korean Society of Hazard Mitigation.

  • PDF

A Study on the Simulated Radar Terrain Scan Data Generated from Discrete Terrain (이산지형정보에서 생성된 레이다 모의 지형 스캔 정보에 관한 연구)

  • Seunghun, Kang;Sunghyun, Hahn;Jiyeon, Jeon;Dongju, Lim;Sangchul, Lee
    • Journal of Aerospace System Engineering
    • /
    • v.16 no.6
    • /
    • pp.1-7
    • /
    • 2022
  • A simulated radar terrain scan data generation method is employed for terrain following. This method scans the discrete terrain by sequentially radiating beams from the radar to the desired scan area with the same azimuth but varying elevation angles. The terrain data collected from the beam is integrated to generate the simulated radar terrain scan data, which comprises radar-detected points. However, these points can be located far from the beam centerline when the radar is far from them due to beam divergence. This paper proposes a geometry-based terrain scan data generation method for analysing simulated radar terrain scan data. The method involves detecting geometric points along the beam centerline, which forms the geometry-based terrain scan data. The analysis of the simulated radar terrain scan data utilising this method confirms that the beam width effects are accounted for in the results.