• Title/Summary/Keyword: 데이터 부족 문제

Search Result 553, Processing Time 0.036 seconds

A Study on DDPM-based Molecular Generation and Semi-Supervised Learning for Improving the Performance of Optical Chemical Structure Recognition (광학 분자구조 인식 성능 향상을 위한 DDPM 기반의 분자구조 생성 및 준지도학습 연구)

  • Jin-Hyeok Kim;Tae-Woong Song;Jonghwan Choi
    • Annual Conference of KIPS
    • /
    • 2024.05a
    • /
    • pp.721-722
    • /
    • 2024
  • 문헌자료에 나타나는 분자구조 정보를 인식하고, 분석에 용이한 형태로의 데이터 변환하는 기술은 화학정보학 데이터 수집을 용이하게 만드는 중요 정보처리 기술 중 하나이다. 딥러닝 기반의 분자구조 인식 기술이 여럿 개발되었으나, 소규모 분자구조 이미지 데이터집합에 대해서는 학습이 충분하기 어려워 인식 정확도를 향상시키기 위한 학습 전략이 필요하다. 본 연구에서는 데이터 부족으로 인한 학습 효율 저하 문제를 극복하기 위해 이미지 생성 모델을 활용한 준지도학습 알고리즘을 연구하였다. 제안하는 학습 알고리즘은 대조군 대비 5.4%p 성능 향상을 보여주었다.

Anomaly Detection System for Cloud Resources Using Representation Learning-Based Deep Learning Models (표현 학습 기반의 딥러닝 모델을 활용한 클라우드 자원 이상 감지 시스템)

  • Min-Yeong Lee;Heon-Chang Yu
    • Annual Conference of KIPS
    • /
    • 2024.05a
    • /
    • pp.658-661
    • /
    • 2024
  • 퍼블릭 클라우드 시장이 성장하면서 퍼블릭 클라우드에서 호스팅하는 컴퓨팅 자원으로 구축된 거대하고 복잡한 IT 시스템이 점차 많아지고 있다. 이러한 시스템의 증가는 서비스 장애 발생 확률을 높이므로, 장애 관리 및 선제 감지를 위한 퍼블릭 클라우드 자원의 이상 감지 연구에 대한 수요 또한 증가하고 있다. 그러나 연구에 활용할 수 있는 벤치마크 데이터셋이 없다는 점과, 실제 자원에서 추출할 수 있는 데이터는 레이블링이 되어 있지 않은 불균형 데이터라는 점 때문에 관련 연구가 부족한 상황이다. 이러한 문제를 해결하고자 본 논문은 비지도 방식의 표현 학습 기반 딥러닝 모델을 활용한 이상 감지 시스템을 제안한다. 시스템의 이상 감지 성능을 유지하고자 일정 주기마다 다수의 딥러닝 모델을 재학습하고 비교하여 최적의 모델로 업데이트 하는 방식을 고안하였다. 해당 시스템의 평가에는 실제 퍼블릭 클라우드 자원에서 발생한 메트릭 데이터가 활용됐으며, 그 결과 준수한 이상 감지 성능을 보인다는 것을 확인하였다.

A Study on Development of Basic Data Science Education Contents for Artificial Intelligence Capability (인공지능 기반의 기초 데이터 과학 교육에 관한 연구)

  • Jo, Junghee
    • 한국정보교육학회:학술대회논문집
    • /
    • 2021.08a
    • /
    • pp.393-400
    • /
    • 2021
  • Data science is a scientific discipline that defines problems while finding meaningful information from collected data to solve problems. Along with artificial intelligence technology, the field of data utilization is gradually expanding, and awareness of the importance of data science education is also increasing. Despite the rapid growth of the domestic data industry market, it has recently been predicted that the shortfall of data experts will reach 31.4% within the next 5 years according to an analysis of the current status of the data industry by the Korea Data Agency. In the field of elementary education, various studies have been conducted to introduce data science in order to improve students' computational thinking and creativity. This paper proposed the contents of data science lectures developed for the purpose of educating elementary school teachers, who are mostly non-majors in the computer field. The developed contents were applied to a group of elementary school teachers attending graduate school for artificial intelligence convergence education. Points for improvement were derived by identifying the contents that were difficult for learners to understand and analyzing the causes of difficulty.

  • PDF

Utilization and Prospect of Big Data Analysis of Sports Contents (스포츠콘텐츠의 빅데이터 분석 활용과 전망)

  • Kang, Seungae
    • Convergence Security Journal
    • /
    • v.19 no.1
    • /
    • pp.121-126
    • /
    • 2019
  • The big data utilization category in the sports field was mainly focused on the big data analysis to improve the competence of the athlete and the performance. Since then, 'big data technology' which collect and analyze more detailed and diverse data through the application of ICT technology such as IoT and AI has been applied. The use of big data of sports contents in future has value and possibility in the smart environment, but it is necessary to overcome the shortage and limitation of platform to manage and share sports contents. In order to solve such problems, it is important to change the perception of the companies or providers that provide sports contents and cultivate and secure professional personnel capable of providing sports contents. Also, it is necessary to implement policies to systematically manage and utilize big data poured from sports contents.

A Prediction System of User Preferences for Newly Released Items Based on Words (새로 출시되는 품목들을 위한 단어 기반의 사용자 선호도 예측 기법)

  • Choi, Yoon-Seok;Moon, Byung-Ro
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.2
    • /
    • pp.156-163
    • /
    • 2006
  • CF systems are widely used in recommendation due to the easy implementation and the outstanding performance. They have several problems such as the sparsity problem, the first-rater problem, and recommending explanation. Many studies are suggested to resolve these problems. While the influence of the sparsity problem lessens as the users' data are accumulated, but the first-rater problem is originated from the CF systems and there are a number of researches to overcome the disadvantages of CF systems based on the content-based methods. Also CF systems are black boxes, providing no explanation of working of the recommendation. In this paper we present a content-based prediction system based on the preference words, which exposes the reasoning behind a recommendation. Our system predicts user's rating of a new movie and we suggest a semiotic network-based method to solve the mismatching problem between the items. For experimental comparison, we used EachMovie and IMDb dataset.

Development of Standard Data Flow for Building Integrated Logistics Digital Platform (통합 물류 연계 디지털 플랫폼 구축을 위한 표준 데이터 흐름도 개발)

  • Ji-Yeong Jang;Kang-Hyun Lee;Sun-Ho Bang;Hee-Yeon Jo;Kwnag-Sup Shin
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.205-215
    • /
    • 2022
  • The current development of digital platform-based online retail business leads to the rapid growth of freight demand and service innovation. With the demand for converged logistics technologies, our government supports R&D projects to innovate the logistics service. The digital platforms are developed based on their own standard process and data formats. It may cause interoperability and connectivity-related problems. In this research, the standard data flow to connect the data among systems is defined based on the standard process. Especially, data for connectivity is categorized into three groups, common data, connected data, and management data. It may be possible to enhance the applicability to the practical logistics business.

Influential Factor Based Hybrid Recommendation System with Deep Neural Network-Based Data Supplement (심층신경망 기반 데이터 보충과 영향요소 결합을 통한 하이브리드 추천시스템)

  • An, Hyeon-woo;Moon, Nammee
    • Journal of Broadcast Engineering
    • /
    • v.24 no.3
    • /
    • pp.515-526
    • /
    • 2019
  • In the real world, the user's preference for a particular product is determined by many factors besides the quality of the product. The reflection of these external factors was very difficult because of various fundamental problems including lack of data. However, access to external factors has become easier as the infrastructure for public data is opened and the availability of evaluation platforms with diverse and vast amounts of data. In accordance with these changes, this paper proposes a recommendation system structure that can reflect the collectable factors that affect user's preference, and we try to observe the influence of actual influencing factors on preference by applying case. The structure of the proposed system can be divided into a process of selecting and extracting influencing factors, a process of supplementing insufficient data using sentence analysis, and finally a process of combining and merging user's evaluation data and influencing factors. We also propose a validation process that can determine the appropriateness of the setting of the structural variables such as the selection of the influence factors through comparison between the result group of the proposed system and the actual user preference group.

A Study on Forecasting Demand and Supply of Marine Officer for Korean Ocean-Going Merchant Vessels (외항 상선 해기사 인력 수요 및 공급 예측에 관한 연구)

  • Sang-hoon Shin;Yong-John Shin
    • Journal of Navigation and Port Research
    • /
    • v.48 no.1
    • /
    • pp.7-16
    • /
    • 2024
  • Although the number of ocean-going merchant ships is increasing, the number of Korean marine officers is decreasing. This manpower shortage problem is becoming more serious. This study objectively measured factors determining the demand and supply of ocean-going merchant ship officers and forecasted the exact manpower demand and supply. Demand was predicted by applying the number of ship officers required for each ship size to the number of ships forecasted. The supply was predicted by segmenting by position and age using the Markov model, reflecting increase/decrease factors such as promotion, turnover, retirement, and new entry by year. The demand for ocean-going merchant ship officers will increase from 11,638 in 2023 to 13,879 in 2030 while the supply will decrease from7,006 in 2023 to 6,426 in 2030, with the shortage expected to exceed 10,000 in 2040. This study can be used as a reference to solve the problem of manpower shortage for ocean-going merchant ship officers by improving the accuracy of predictions through objective data, scientific analysis methods, and logical reasoning.

Sentimental Analysis of SW Education News Data (SW 교육 뉴스데이터의 감성분석)

  • Park, SunJu
    • Journal of The Korean Association of Information Education
    • /
    • v.21 no.1
    • /
    • pp.89-96
    • /
    • 2017
  • Recently, a number of researches actively focus on the contents and sensitivity of information distributed through SNS as smartphones and SNS gained its popularity. In this paper, we collected online news data about SW education, extracted words after morphological analysis, and analyzed emotions of collected news data by calculating sentimental score of each news datum. Also, the accuracy of the calculated sentimental score was examined. As a result, the number of news related to 'SW education' in the collection period was about 189 per month, and the average of sentimental score was 0.7, which signifies the news related to 'SW education' was emotionally positive. We were positive about the importance of SW education and the policy implementation, but there were negative views on the specific method for the realization. That is, a lack of SW education environment and its education method, a problem related to improvement of SW developers and improvement of their labor conditions, and increase of private education in coding were the factors for the negative viewers.

An Analysis of the 3D Spatial Distribution of Flow rate and Water Quality Convergence Monitoring Results in Rivers (하천에서의 수리·수질 복합 모니터링 결과의 3차원 공간분포 해석연구)

  • Lee, Chang Hyun;Kim, Kyung Dong;Ryu, Si Wan;Kim, Dong Su;Kim, Young Do
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.18-18
    • /
    • 2022
  • 하천 합류부에 있어 수체의 혼합양상을 분석은 고해상도의 자료가 필요하다. 반면에 수질환경 문제와 기존 모니터링 시스템이 고정된 측정 방식으로 이루어지기 때문에 하천 전체의 정보는 저해상도의 결과값은 나타낸다. 또한, 많은 수중 환경 문제가 1차원에서 3차원에 걸쳐 있지만, 대부분의 관측 시스템은 1차원에 머물러 있음을 확인할 수 있다. 이러한 문제를 해결하기 위해서는 보다 발전된 관찰 및 계측이 필요하다. 그에 따른 고해상도의 측정 자료를 얻기 위해서는 측정자가부담을 많이 가지며, 측정할 수 있는 영역이나 시간적으로 제한적이다. 해상도는 낮추되 광범위한 데이터를 취득하기 위해서는 적절한 보간법이 선정되어야 한다. 관련 논문을 검토한 결과, 측정 결과에 따른 2차원 횡단면 분포의 내용이 지배적이었고, 3차원 매핑 및 3차원 분석을 통한 수리학적 정보 획득에 관한 연구는 부족한 실정이였다. 특히 3차원 하천 수질 농도의 연구가 불충분했다. 그에 따라 저해상도 측정결과에서의 예측과 보간법에 대한 시각화를 통해 하천의 전체적인 수리·수질정보를 표기하였다. 각각의 보간법을 비교함으로써 하천 매핑에 있어 IDW, Natual Neighbor, Kriging 기법을 적용하여 시각화된 자료와 정량적 평가를 통해 하천매핑의 정밀성을 향상시켰다. 이를 통해 3차원화된 공간보간 자료를 이용한 하천합류부의 혼합양상을 해석하였다. 3차원 데이터를 활용하는 방법으로 측정 및 모니터링 기술의 중요한 데이터로 활용되며, 이러한 데이터는 유해물질 저감 기술 및 평가 예측 기술의 기초 데이터로 활용되고 있다. 유해화학물질 추정, 호수의 고위험 조류군 계층분석 등 다양한 수생건강 진단기술을 활용할 수 있다.

  • PDF