• Title/Summary/Keyword: Learning Machine

Search Result 5,616, Processing Time 0.033 seconds

Spark based Scalable RDFS Ontology Reasoning over Big Triples with Confidence Values (신뢰값 기반 대용량 트리플 처리를 위한 스파크 환경에서의 RDFS 온톨로지 추론)

  • Park, Hyun-Kyu;Lee, Wan-Gon;Jagvaral, Batselem;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.87-95
    • /
    • 2016
  • Recently, due to the development of the Internet and electronic devices, there has been an enormous increase in the amount of available knowledge and information. As this growth has proceeded, studies on large-scale ontological reasoning have been actively carried out. In general, a machine learning program or knowledge engineer measures and provides a degree of confidence for each triple in a large ontology. Yet, the collected ontology data contains specific uncertainty and reasoning such data can cause vagueness in reasoning results. In order to solve the uncertainty issue, we propose an RDFS reasoning approach that utilizes confidence values indicating degrees of uncertainty in the collected data. Unlike conventional reasoning approaches that have not taken into account data uncertainty, by using the in-memory based cluster computing framework Spark, our approach computes confidence values in the data inferred through RDFS-based reasoning by applying methods for uncertainty estimating. As a result, the computed confidence values represent the uncertainty in the inferred data. To evaluate our approach, ontology reasoning was carried out over the LUBM standard benchmark data set with addition arbitrary confidence values to ontology triples. Experimental results indicated that the proposed system is capable of running over the largest data set LUBM3000 in 1179 seconds inferring 350K triples.

An Improvement in K-NN Graph Construction using re-grouping with Locality Sensitive Hashing on MapReduce (MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선)

  • Lee, Inhoe;Oh, Hyesung;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.681-688
    • /
    • 2015
  • The k nearest neighbor (k-NN) graph construction is an important operation with many web-related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Despite its many elegant properties, the brute force k-NN graph construction method has a computational complexity of $O(n^2)$, which is prohibitive for large scale data sets. Thus, (Key, Value)-based distributed framework, MapReduce, is gaining increasingly widespread use in Locality Sensitive Hashing which is efficient for high-dimension and sparse data. Based on the two-stage strategy, we engage the locality sensitive hashing technique to divide users into small subsets, and then calculate similarity between pairs in the small subsets using a brute force method on MapReduce. Specifically, generating a candidate group stage is important since brute-force calculation is performed in the following step. However, existing methods do not prevent large candidate groups. In this paper, we proposed an efficient algorithm for approximate k-NN graph construction by regrouping candidate groups. Experimental results show that our approach is more effective than existing methods in terms of graph accuracy and scan rate.

Feature Extraction Algorithm for Distant Unmmaned Aerial Vehicle Detection (원거리 무인기 신호 식별을 위한 특징추출 알고리즘)

  • Kim, Juho;Lee, Kibae;Bae, Jinho;Lee, Chong Hyun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.3
    • /
    • pp.114-123
    • /
    • 2016
  • The effective feature extraction method for unmanned aerial vehicle (UAV) detection is proposed and verified in this paper. The UAV engine sound is harmonic complex tone whose frequency ratio is integer and its variation is continuous in time. Using these characteristic, we propose the feature vector composed of a mean and standard deviation of difference value between fundamental frequency with 1st overtone as well as mean variation of their frequency. It was revealed by simulation that the suggested feature vector has excellent discrimination in target signal identification from various interfering signals including frequency variation with time. By comparing Fisher scores, three features based on frequency show outstanding discrimination of measured UAV signals with low signal to noise ratio (SNR). Detection performance with simulated interference signal is compared by MFCC by using ELM classifier and the suggested feature vector shows 37.6% of performance improvement As the SNR increases with time, the proposed feature can detect the target signal ahead of MFCC that needs 4.5 dB higher signal power to detect the target.

Counter Measures by using Execution Plan Analysis against SQL Injection Attacks (실행계획 분석을 이용한 SQL Injection 공격 대응방안)

  • Ha, Man-Seok;Namgung, Jung-Il;Park, Soo-Hyun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.2
    • /
    • pp.76-86
    • /
    • 2016
  • SQL Injection attacks are the most widely used and also they are considered one of the oldest traditional hacking techniques. SQL Injection attacks are getting quite complicated and they perform a high portion among web hacking. The big data environments in the future will be widely used resulting in many devices and sensors will be connected to the internet and the amount of data that flows among devices will be highly increased. The scale of damage caused by SQL Injection attacks would be even greater in the future. Besides, creating security solutions against SQL Injection attacks are high costs and time-consuming. In order to prevent SQL Injection attacks, we have to operate quickly and accurately according to this data analysis techniques. We utilized data analytics and machine learning techniques to defend against SQL Injection attacks and analyzed the execution plan of the SQL command input if there are abnormal patterns through checking the web log files. Herein, we propose a way to distinguish between normal and abnormal SQL commands. We have analyzed the value entered by the user in real time using the automated SQL Injection attacks tools. We have proved that it is possible to ensure an effective defense through analyzing the execution plan of the SQL command.

Study for implementation of smart water management system on Cisangkuy river basin in Indonesia (인도네시아 찌상쿠이강 유역의 지능형 물관리 시스템 적용 연구)

  • Kim, Eugene;Ko, Ick Hwan;Park, Chan Ho
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2017.05a
    • /
    • pp.469-469
    • /
    • 2017
  • 기후 변화 및 환경오염으로 인하여 물부족 국가가 세계적으로 증가하고 있는 추세이며, 특히 집중형 강우의 형태가 많아짐에 따라 홍수피해 및 상수공급의 문제가 사회적으로 큰 이슈가 되고 있다. 최근 20여 년간의 급속한 경제성장과 도시화 과정에서 인도네시아는 인구와 산업의 과도한 도시집중으로 지난 1960-80년대 한국이 산업화 과정에서 겪었던 것보다 훨씬 심각한 환경문제에 직면하고 있으며, 자카르타와 반둥을 포함하는 광역 수도권 지역의 물 부족과 수질 오염, 환경문제가 이미 매우 위험한 수준에 도달하고 있는 실정이다. 특히, 찌따룸강 중상류에 위치한 인도네시아 3대 도시인 반둥시는 고질적인 용수부족 문제를 겪고 있다. 2010년 현재 약 일평균 15 CMS의 용수가 부족한 상황이며, 2030년에는 지속적인 인구증가로 약 23 CMS의 용수가 추가로 더 필요한 것으로 전망된다. 이러한 용수공급 문제 해결을 위해 반둥시 및 찌따룸강 유역관리청은 댐 및 지하수 개발, 유역 간 물이동 등의 구조적인 대책뿐만 아니라 비구조적인 대책으로써 기존 및 신규 저수지 연계운영을 통한 용수이용의 효율성을 높이는 방안을 모색하고 있다. 이에 따라 본 연구에서는 해당유역의 용수공급 부족 문제를 해소할 수 있는 비구조적인 대책의 일환으로써 다양한 댐 및 보, 소수력 발전, 취수장 등 유역 내 수리 시설물의 운영 최적화를 위한 지능형 물관리 시스템 적용 방안을 제시하고자 한다. 본 연구의 지능형 물관리 시스템은 센서 및 사물 인터넷(Internet of Things, IoT), 네트워크 기술을 바탕으로 시설물 및 운영자, 유관기관 간의 양방향 통신을 통해 유기적인 상호연계 체계를 제공 할 수 있다. 또한 유역의 수문상황과 시설물의 운영현황, 용수공급 및 수요 현황을 실시간으로 확인함으로써 수요에 따른 즉각적인 용수공급량의 조절이 가능하다. 또한, 빅데이터 분석 및 기계학습(Machine Learning)을 통해 개별 물관리 시설물에 대한 최적 운영룰을 업데이트할 수 있으며, 유역의 수문상황과 용수 수요 현황을 고려하여 최적의 용수공급 우선순위를 선정할 수 있다. 지능형 물관리 시스템 개발의 목적은 찌상쿠이 유역의 수문현황을 실시간으로 모니터링하고, 하천시설물의 운영을 분석하여 최적의 용수공급 및 배분을 통해 유역의 수자원 활용 효율성을 향상시키는 데 있다. 이를 위해 수문자료의 수집체계를 구축하고 기관간 정보공유체계를 수립함으로써 분석을 위한 기반 인프라를 구성하며, 이를 기반으로 유역 유출을 비롯한 저수지 운영, 물수지 분석을 수행하고, 분석 및 예측결과, 과거 운영 자료를 토대로 새로운 물관리 시설 운영룰 및 시설물 간 연계운영 방안, 용수공급 우선순위 의사결정 등을 지원하고자 한다. 본 연구의 지능형 물관리 시스템은 통합 DB를 기반으로 수리수문 현상의 모의 분석을 통해 하천 시설물 운영의 합리적 기준을 제시함으로써 다양한 관리주체들의 시설물운영에 대한 이견 및 분쟁을 해소하고, 한정된 수자원과 다양한 수요 간의 효율적이고 합리적인 분배 및 시설물 운영문제를 해결하기 위한 의사결정도구로써 활용할 수 있을 것으로 기대된다.

  • PDF

Bhumipol Dam Operation Improvement via smart system for the Thor Tong Daeng Irrigation Project, Ping River Basin, Thailand

  • Koontanakulvong, Sucharit;Long, Tran Thanh;Van, Tuan Pham
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.164-175
    • /
    • 2019
  • The Tor Tong Daeng Irrigation Project with the irrigation area of 61,400 hectares is located in the Ping Basin of the Upper Central Plain of Thailand where farmers depended on both surface water and groundwater. In the drought year, water storage in the Bhumipol Dam is inadequate to allocate water for agriculture, and caused water deficit in many irrigation projects. Farmers need to find extra sources of water such as water from farm pond or groundwater as a supplement. The operation of Bhumipol Dam and irrigation demand estimation are vital for irrigation water allocation to help solve water shortage issue in the irrigation project. The study aims to determine the smart dam operation system to mitigate water shortage in this irrigation project via introduction of machine learning to improve dam operation and irrigation demand estimation via soil moisture estimation from satellite images. Via ANN technique application, the inflows to the dam are generated from the upstream rain gauge stations using past 10 years daily rainfall data. The input vectors for ANN model are identified base on regression and principal component analysis. The structure of ANN (length of training data, the type of activation functions, the number of hidden nodes and training methods) is determined from the statistics performance between measurements and ANN outputs. On the other hands, the irrigation demand will be estimated by using satellite images, LANDSAT. The Enhanced Vegetation Index (EVI) and Temperature Vegetation Dryness Index (TVDI) values are estimated from the plant growth stage and soil moisture. The values are calibrated and verified with the field plant growth stages and soil moisture data in the year 2017-2018. The irrigation demand in the irrigation project is then estimated from the plant growth stage and soil moisture in the area. With the estimated dam inflow and irrigation demand, the dam operation will manage the water release in the better manner compared with the past operational data. The results show how smart system concept was applied and improve dam operation by using inflow estimation from ANN technique combining with irrigation demand estimation from satellite images when compared with the past operation data which is an initial step to develop the smart dam operation system in Thailand.

  • PDF

Study on predicting the commercial parts discontinuance using unstructured data and artificial neural network (상용 부품 비정형 데이터와 인공 신경망을 이용한 부품 단종 예측 방안 연구)

  • Park, Yun-kyung;Lee, Ik-Do;Lee, Kang-Taek;Kim, Du-Jeoung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.10
    • /
    • pp.277-283
    • /
    • 2019
  • Advances in technology have allowed the development and commercialization of various parts; however this has shortened the discontinuation cycle of the components. This means that repair and logistic support of weapon system which is applied to thousands of part components and operated over the long-term is difficult, which is the one of main causes of the decrease in the availability of weapon system. To improve this problem, the United States has created a special organization for this problem, whereas in Korea, commercial tools are used to predict and manage DMSMS. However, there is rarely a method to predict life cycle of parts that are not presented DMSMS information at the commercial tools. In this study, the structured and unstructured data of parts of a commercial tool were gathered, preprocessed, and embedded using neural network algorithm. Then, a method is suggested to predict the life cycle risk (LC Risk) and year to end of life (YTEOL). In addition, to validate the prediction performance of LC Risk and YTEOL, the prediction value is compared with descriptive statistics.

Convergence Analysis of Risk factors for Readmission in Cardiovascular Disease: A Machine Learning Approach (의사결정나무분석을 이용한 심혈관질환자의 재입원 위험 요인에 대한 융합적 분석)

  • Kim, Hyun-Su
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.12
    • /
    • pp.115-123
    • /
    • 2019
  • This is descriptive study to 2nd analysis data KNHANES IV-VI about risk factors of readmission among patients with cardiovascular disease. Among the total 65,973 adults, 1,037 with angina or myocardial infarction were analyzed. The analysis was conducted using SPSS window 21 Program and CHAID decision tree was used in the classification analysis. Root nodes are economic activity(χ2=12.063, p=.001), children's nodes are personal income(χ2=6.575, p=.031), weight change(χ2=12.758, p=.001), residential area(χ2=4.025, p=.045), direct smoking(χ2=3.884, p=.031). p=.049), level of education(χ2=9.630, p=.024). Terminal nodes are hypertension(χ2=3.854, p=.050), diabetes mellitus(χ2=6.056, p=.014), occupation type(χ2=7.799, p=.037). We suggest that the development and operation of programs considering the integrated approach of various factors is necessary for the readmission management of cardiovascular patients.

Convergence of Artificial Intelligence Techniques and Domain Specific Knowledge for Generating Super-Resolution Meteorological Data (기상 자료 초해상화를 위한 인공지능 기술과 기상 전문 지식의 융합)

  • Ha, Ji-Hun;Park, Kun-Woo;Im, Hyo-Hyuk;Cho, Dong-Hee;Kim, Yong-Hyuk
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.10
    • /
    • pp.63-70
    • /
    • 2021
  • Generating a super-resolution meteological data by using a high-resolution deep neural network can provide precise research and useful real-life services. We propose a new technique of generating improved training data for super-resolution deep neural networks. To generate high-resolution meteorological data with domain specific knowledge, Lambert conformal conic projection and objective analysis were applied based on observation data and ERA5 reanalysis field data of specialized institutions. As a result, temperature and humidity analysis data based on domain specific knowledge showed improved RMSE by up to 42% and 46%, respectively. Next, a super-resolution generative adversarial network (SRGAN) which is one of the aritifial intelligence techniques was used to automate the manual data generation technique using damain specific techniques as described above. Experiments were conducted to generate high-resolution data with 1 km resolution from global model data with 10 km resolution. Finally, the results generated with SRGAN have a higher resoltuion than the global model input data, and showed a similar analysis pattern to the manually generated high-resolution analysis data, but also showed a smooth boundary.

Classification Modeling for Predicting Medical Subjects using Patients' Subjective Symptom Text (환자의 주관적 증상 텍스트에 대한 진료과목 분류 모델 구축)

  • Lee, Seohee;Kang, Juyoung
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.51-62
    • /
    • 2021
  • In the field of medical artificial intelligence, there have been a lot of researches on disease prediction and classification algorithms that can help doctors judge, but relatively less interested in artificial intelligence that can help medical consumers acquire and judge information. The fact that more than 150,000 questions have been asked about which hospital to go over the past year in NAVER portal will be a testament to the need to provide medical information suitable for medical consumers. Therefore, in this study, we wanted to establish a classification model that classifies 8 medical subjects for symptom text directly described by patients which was collected from NAVER portal to help consumers choose appropriate medical subjects for their symptoms. In order to ensure the validity of the data involving patients' subject matter, we conducted similarity measurements between objective symptom text (typical symptoms by medical subjects organized by the Seoul Emergency Medical Information Center) and subjective symptoms (NAVER data). Similarity measurements demonstrated that if the two texts were symptoms of the same medical subject, they had relatively higher similarity than symptomatic texts from different medical subjects. Following the above procedure, the classification model was constructed using a ridge regression model for subjective symptom text that obtained validity, resulting in an accuracy of 0.73.