• Title/Summary/Keyword: Large-Scale Data

Search Result 2,745, Processing Time 0.038 seconds

Literature Review of AI Hallucination Research Since the Advent of ChatGPT: Focusing on Papers from arXiv (챗GPT 등장 이후 인공지능 환각 연구의 문헌 검토: 아카이브(arXiv)의 논문을 중심으로)

  • Park, Dae-Min;Lee, Han-Jong
    • Informatization Policy
    • /
    • v.31 no.2
    • /
    • pp.3-38
    • /
    • 2024
  • Hallucination is a significant barrier to the utilization of large-scale language models or multimodal models. In this study, we collected 654 computer science papers with "hallucination" in the abstract from arXiv from December 2022 to January 2024 following the advent of Chat GPT and conducted frequency analysis, knowledge network analysis, and literature review to explore the latest trends in hallucination research. The results showed that research in the fields of "Computation and Language," "Artificial Intelligence," "Computer Vision and Pattern Recognition," and "Machine Learning" were active. We then analyzed the research trends in the four major fields by focusing on the main authors and dividing them into data, hallucination detection, and hallucination mitigation. The main research trends included hallucination mitigation through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF), inference enhancement via "chain of thought" (CoT), and growing interest in hallucination mitigation within the domain of multimodal AI. This study provides insights into the latest developments in hallucination research through a technology-oriented literature review. This study is expected to help subsequent research in both engineering and humanities and social sciences fields by understanding the latest trends in hallucination research.

Development of Estimation Models for Parking Units -Focused on Gwangju Metropolitan City Condominium Apartments- (주차원단위 산정 모형 개발에 관한 연구 -광주광역시 공동 주택 아파트를 대상으로-)

  • Kwon, Sung-Dae;Ko, Dong-Bong;Park, Je-Jin;Ha, Tae-Jun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.34 no.2
    • /
    • pp.549-559
    • /
    • 2014
  • The rapid expansion of cities led to the shortage of housing in urban areas. The government compensated for this shortage through large scale residential developments that increased the housing supply. The supply of condominium apartments remains above 83% of the entire housing supply, and the proportion of apartments are at a steady increase, at about 50%. Due to the increase, illegally parked cars resulting from the shortage of parking spaces within the apartment complex have become increasingly problematic as they block the transit of emergency vehicles, and heighten the tension among neighboring residents in obtaining a parking space. Especially, the future residents are considered to plan the parking based on the estimated demand for parking. However, the parking unit method utilized to estimate the parking demand accounts for the exclusive use of space, which is believed to be far from the parking demands in reality. The reason for this discrepancy is that, as the number of households decrease, and area of exclusive space is expanded, the planned parking increases. On the other hand, when the number of households increase, and the area of exclusive space is reduced, the planned parking decreases, thus methods to recalculate the parking units based on estimated parking demand is an urgent concern. To estimate the parking units based on condominium apartments, this study first examined the existing research literature, and appointed the field of investigation to collect the necessary data. In addition, field study data and surveys collected and analyzed, in order to identify the problems underlying parking units, and problems regarding the current traffic impact assessment parking unit calculation method were deduced. Through identifying the influential factors on parking demand estimates, and performing a factorial analysis based on the collected data, the variables were selected in relation to the parking demand estimates, to develop the parking unit estimate model. Finally, through comparing and verifying the existing traffic impact assessment parking unit estimate against the newly developed model using collected data, a far more realistic parking unite estimate was suggested, reflecting the characteristics of the residents. The parking unit estimate model developed in this study is anticipated to serve as the guidelines for future parking lot legislature, as wel as the basis to provide a more realistic estimate of parking demands based on the resident characteristics of an apartment complex.

Geochemistry of Total Gaseous Mercury in Nan-Ji-Do, Seoul, Korea (난지도 지역의 대기수은 지화학)

  • Kim, Min-Young;Lee, Gang-Woong;Shin, Jae-Young;Kim, Ki-Hyun
    • Journal of the Korean earth science society
    • /
    • v.21 no.5
    • /
    • pp.611-622
    • /
    • 2000
  • To investigate the exchange rates of mercury(Hg) across soil-air boundary, we undertook the measurements of Hg flux using gradient technique from a major waste reclamation site, Nan-Ji-Do. Based on these measurement data, we attempted to provide insights into various aspects of Hg exchange in a strongly polluted soil environment. According to our analysis, the study site turned out to be not only a major emission source area but also a major sink area. When these data were compared on hourly basis over a full day scale, large fluxes of emission and deposition centered on daytime periods relative to nighttime periods. However, when comparison of frequency with which emission or deposition occurs was made, there emerged a very contrasting pattern. While emission was dominant during nighttime periods, deposition was most favored during daytime periods. When similar comparison was made as a function of wind direction, it was noticed that there may be a major Hg source at easterly direction to bring out significant deposition of Hg in the study area. To account for the environmental conditions controlling the vertical direction of Hg exchange, we compared environmental conditions for both the whole data group and those observed from the wind direction of strong deposition events. Results of this analysis indicated that the concentrations of pollutant species varied sensitively enough to reflect the environmental conditions for each direction of exchange. When correlation analysis was applied to our data, results indicated that windspeed and ozone concentrations best reflected changes in the magnitudes of emission/deposition fluxes. The results of factor analysis also indicated the possibility that Hg emission of study area is temperature-driven process, while that of deposition is affected by a mixed effects of various factors including temperature, ozone, and non-methane HCs. If the computed emission rate is extrapolated to the whole study area we estimate that annual emission of Hg from the study area can amount to approximately 6kg.

  • PDF

Design and Implementation of an Execution-Provenance Based Simulation Data Management Framework for Computational Science Engineering Simulation Platform (계산과학공학 플랫폼을 위한 실행-이력 기반의 시뮬레이션 데이터 관리 프레임워크 설계 및 구현)

  • Ma, Jin;Lee, Sik;Cho, Kum-won;Suh, Young-kyoon
    • Journal of Internet Computing and Services
    • /
    • v.19 no.1
    • /
    • pp.77-86
    • /
    • 2018
  • For the past few years, KISTI has been servicing an online simulation execution platform, called EDISON, allowing users to conduct simulations on various scientific applications supplied by diverse computational science and engineering disciplines. Typically, these simulations accompany large-scale computation and accordingly produce a huge volume of output data. One critical issue arising when conducting those simulations on an online platform stems from the fact that a number of users simultaneously submit to the platform their simulation requests (or jobs) with the same (or almost unchanging) input parameters or files, resulting in charging a significant burden on the platform. In other words, the same computing jobs lead to duplicate consumption computing and storage resources at an undesirably fast pace. To overcome excessive resource usage by such identical simulation requests, in this paper we introduce a novel framework, called IceSheet, to efficiently manage simulation data based on execution metadata, that is, provenance. The IceSheet framework captures and stores each provenance associated with a conducted simulation. The collected provenance records are utilized for not only inspecting duplicate simulation requests but also performing search on existing simulation results via an open-source search engine, ElasticSearch. In particular, this paper elaborates on the core components in the IceSheet framework to support the search and reuse on the stored simulation results. We implemented as prototype the proposed framework using the engine in conjunction with the online simulation execution platform. Our evaluation of the framework was performed on the real simulation execution-provenance records collected on the platform. Once the prototyped IceSheet framework fully functions with the platform, users can quickly search for past parameter values entered into desired simulation software and receive existing results on the same input parameter values on the software if any. Therefore, we expect that the proposed framework contributes to eliminating duplicate resource consumption and significantly reducing execution time on the same requests as previously-executed simulations.

Sapflux Measurement Database Using Granier's Heat Dissipation Method and Heat Pulse Method (수액류 측정 데이터베이스: 그래니어(Granier) 센서 열손실탐침법(Heat Dissipation Method)과 열파동법(Heat Pulse Method)을 이용한 수액류 측정)

  • Lee, Minsu;Park, Juhan;Cho, Sungsik;Moon, Minkyu;Ryu, Daun;Lee, Hoontaek;Lee, Hojin;Kim, Sookyung;Kim, Taekyung;Byeon, Siyeon;Jeon, Jihyun;Bhusal, Narayan;Kim, Hyun Seok
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.22 no.4
    • /
    • pp.327-339
    • /
    • 2020
  • Transpiration is the movement of water into the atmosphere through leaf stomata of plant, and it accounts for more than half of evapotranspiration from the land surface. The measurements of transpiration could be conducted in various ways including eddy covariance and water balance method etc. However, the transpiration measurements of individual trees are necessary to quantify and compare the water use of each species and individual component within stands. For the measurement of the transpiration by individual tree, the thermometric methods such as heat dissipation and heat pulse methods are widely used. However, it is difficult and labor consuming to maintain the transpiration measurements of individual trees in a wide range area and especially for long-term experiment. Therefore, the sharing of sapflow data through database should be useful to promote the studies on transpiration and water balance for large spatial scale. In this paper, we present sap flow database, which have Granier type sap flux data from 18 Korean pine (Pinus koraiensis) since 2011 and 16 (Quercus aliena) since 2013 in Mt.Taehwa Seoul National University forest and 18 needle fir (Abies holophylla), seven (Quercus serrata), three (Carpinus laxiflora and C. cordata each since 2013 in Gwangneung. In addition, the database includes the sapling transpiration of nine species (Prunus sargentii, Larix kaempferii, Quercus accutisima, Pinus densiflora, Fraxinus rhynchophylla, Chamecypans obtuse, P. koraiensis, Betulla platyphylla, A. holophylla, Pinus thunbergii), which were measured using heat pulse method since 2018. We believe this is the first database to share the sapflux data in Rep. of Korea, and we wish our database to be used by other researchers and contribute a variety of researches in this field.

A Study on Foreign Exchange Rate Prediction Based on KTB, IRS and CCS Rates: Empirical Evidence from the Use of Artificial Intelligence (국고채, 금리 스왑 그리고 통화 스왑 가격에 기반한 외환시장 환율예측 연구: 인공지능 활용의 실증적 증거)

  • Lim, Hyun Wook;Jeong, Seung Hwan;Lee, Hee Soo;Oh, Kyong Joo
    • Knowledge Management Research
    • /
    • v.22 no.4
    • /
    • pp.71-85
    • /
    • 2021
  • The purpose of this study is to find out which artificial intelligence methodology is most suitable for creating a foreign exchange rate prediction model using the indicators of bond market and interest rate market. KTBs and MSBs, which are representative products of the Korea bond market, are sold on a large scale when a risk aversion occurs, and in such cases, the USD/KRW exchange rate often rises. When USD liquidity problems occur in the onshore Korean market, the KRW Cross-Currency Swap price in the interest rate market falls, then it plays as a signal to buy USD/KRW in the foreign exchange market. Considering that the price and movement of products traded in the bond market and interest rate market directly or indirectly affect the foreign exchange market, it may be regarded that there is a close and complementary relationship among the three markets. There have been studies that reveal the relationship and correlation between the bond market, interest rate market, and foreign exchange market, but many exchange rate prediction studies in the past have mainly focused on studies based on macroeconomic indicators such as GDP, current account surplus/deficit, and inflation while active research to predict the exchange rate of the foreign exchange market using artificial intelligence based on the bond market and interest rate market indicators has not been conducted yet. This study uses the bond market and interest rate market indicator, runs artificial neural network suitable for nonlinear data analysis, logistic regression suitable for linear data analysis, and decision tree suitable for nonlinear & linear data analysis, and proves that the artificial neural network is the most suitable methodology for predicting the foreign exchange rates which are nonlinear and times series data. Beyond revealing the simple correlation between the bond market, interest rate market, and foreign exchange market, capturing the trading signals between the three markets to reveal the active correlation and prove the mutual organic movement is not only to provide foreign exchange market traders with a new trading model but also to be expected to contribute to increasing the efficiency and the knowledge management of the entire financial market.

The Dynamics of CO2 Budget in Gwangneung Deciduous Old-growth Forest: Lessons from the 15 years of Monitoring (광릉 낙엽활엽수 노령림의 CO2 수지 역학: 15년 관측으로부터의 교훈)

  • Yang, Hyunyoung;Kang, Minseok;Kim, Joon;Ryu, Daun;Kim, Su-Jin;Chun, Jung-Hwa;Lim, Jong-Hwan;Park, Chan Woo;Yun, Soon Jin
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.23 no.4
    • /
    • pp.198-221
    • /
    • 2021
  • After large-scale reforestation in the 1960s and 1970s, forests in Korea have gradually been aging. Net ecosystem CO2 exchange of old-growth forests is theoretically near zero; however, it can be a CO2 sink or source depending on the intervention of disturbance or management. In this study, we report the CO2 budget dynamics of the Gwangneung deciduous old-growth forest (GDK) in Korea and examined the following two questions: (1) is the preserved GDK indeed CO2 neutral as theoretically known? and (2) can we explain the dynamics of CO2 budget by the common mechanisms reported in the literature? To answer, we analyzed the 15-year long CO2 flux data measured by eddy covariance technique along with other biometeorological data at the KoFlux GDK site from 2006 to 2020. The results showed that (1) GDK switched back-and-forth between sink and source of CO2 but averaged to be a week CO2 source (and turning to a moderate CO2 source for the recent five years) and (2) the interannual variability of solar radiation, growing season length, and leaf area index showed a positive correlation with that of gross primary production (GPP) (R2=0.32~0.45); whereas the interannual variability of both air and surface temperature was not significantly correlated with that of ecosystem respiration (RE). Furthermore, the machine learning-based model trained using the dataset of early monitoring period (first 10 years) failed to reproduce the observed interannual variations of GPP and RE for the recent five years. Biomass data analysis suggests that carbon emissions from coarse woody debris may have contributed partly to the conversion to a moderate CO2 source. To properly understand and interpret the long-term CO2 budget dynamics of GDK, new framework of analysis and modeling based on complex systems science is needed. Also, it is important to maintain the flux monitoring and data quality along with the monitoring of coarse woody debris and disturbances.

Estimation of Monthly Dissolved Inorganic Carbon Inventory in the Southeastern Yellow Sea (황해 남동부 해역의 월별 용존무기탄소 재고 추정)

  • KIM, SO-YUN;LEE, TONGSUP
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.27 no.4
    • /
    • pp.194-210
    • /
    • 2022
  • The monthly inventory of dissolved inorganic carbon (CT) and its fluxes were simulated using a box-model for the southeastern Yellow Sea, bordering the northern East China Sea. The monthly CT data was constructed by combining the observed data representing four seasons with the data adopted from the recent publications. A 2-box-model of the surface and deep layers was used, assuming that the annual CT inventory was at the steady state and its fluctuations due to the advection in the surface box were negligible. Results of the simulation point out that the monthly CT inventory variation between the surface and deep box was driven primarily by the mixing flux due to the variation of the mixed layer depth, on the scale of -40~35 mol C m-2 month-1. The air to sea CO2 flux was about 2 mol C m-2 yr-1 and was lower than 1/100 of the mixing flux. The biological pump flux estimated magnitude, in the range of 4-5 mol C m-2 yr-1, is about half the in situ measurement value reported. The CT inventory of the water column was maximum in April, when mixing by cooling ceases, and decreases slightly throughout the stratified period. Therefore, the total CT inventory is larger in the stratified period than that of the mixing period. In order to maintain a steady state, 18 mol C m-2 yr-1 (= 216 g C m-2 yr-1), the difference between the maximum and minimum monthly CT inventory, should be transported out to the East China Sea. Extrapolating this flux over the entire southern Yellow Sea boundary yields 4 × 109 g C yr-1. Conceptually this flux is equivalent to the proposed continental shelf pump. Since this flux must go through the vast shelf area of the East China Sea before it joins the open Pacific waters the actual contribution as a continental shelf pump would be significantly lower than reported value. Although errors accompanied the simple box model simulation imposed by the paucity of data and assumptions are considerably large, nevertheless it was possible to constrain the relative contribution among the major fluxes and their range that caused the CT inventory variations, and was able to suggest recommendations for the future studies.

Rainfall image DB construction for rainfall intensity estimation from CCTV videos: focusing on experimental data in a climatic environment chamber (CCTV 영상 기반 강우강도 산정을 위한 실환경 실험 자료 중심 적정 강우 이미지 DB 구축 방법론 개발)

  • Byun, Jongyun;Jun, Changhyun;Kim, Hyeon-Joon;Lee, Jae Joon;Park, Hunil;Lee, Jinwook
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.6
    • /
    • pp.403-417
    • /
    • 2023
  • In this research, a methodology was developed for constructing an appropriate rainfall image database for estimating rainfall intensity based on CCTV video. The database was constructed in the Large-Scale Climate Environment Chamber of the Korea Conformity Laboratories, which can control variables with high irregularity and variability in real environments. 1,728 scenarios were designed under five different experimental conditions. 36 scenarios and a total of 97,200 frames were selected. Rain streaks were extracted using the k-nearest neighbor algorithm by calculating the difference between each image and the background. To prevent overfitting, data with pixel values greater than set threshold, compared to the average pixel value for each image, were selected. The area with maximum pixel variability was determined by shifting with every 10 pixels and set as a representative area (180×180) for the original image. After re-transforming to 120×120 size as an input data for convolutional neural networks model, image augmentation was progressed under unified shooting conditions. 92% of the data showed within the 10% absolute range of PBIAS. It is clear that the final results in this study have the potential to enhance the accuracy and efficacy of existing real-world CCTV systems with transfer learning.

Bankruptcy Forecasting Model using AdaBoost: A Focus on Construction Companies (적응형 부스팅을 이용한 파산 예측 모형: 건설업을 중심으로)

  • Heo, Junyoung;Yang, Jin Yong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.35-48
    • /
    • 2014
  • According to the 2013 construction market outlook report, the liquidation of construction companies is expected to continue due to the ongoing residential construction recession. Bankruptcies of construction companies have a greater social impact compared to other industries. However, due to the different nature of the capital structure and debt-to-equity ratio, it is more difficult to forecast construction companies' bankruptcies than that of companies in other industries. The construction industry operates on greater leverage, with high debt-to-equity ratios, and project cash flow focused on the second half. The economic cycle greatly influences construction companies. Therefore, downturns tend to rapidly increase the bankruptcy rates of construction companies. High leverage, coupled with increased bankruptcy rates, could lead to greater burdens on banks providing loans to construction companies. Nevertheless, the bankruptcy prediction model concentrated mainly on financial institutions, with rare construction-specific studies. The bankruptcy prediction model based on corporate finance data has been studied for some time in various ways. However, the model is intended for all companies in general, and it may not be appropriate for forecasting bankruptcies of construction companies, who typically have high liquidity risks. The construction industry is capital-intensive, operates on long timelines with large-scale investment projects, and has comparatively longer payback periods than in other industries. With its unique capital structure, it can be difficult to apply a model used to judge the financial risk of companies in general to those in the construction industry. Diverse studies of bankruptcy forecasting models based on a company's financial statements have been conducted for many years. The subjects of the model, however, were general firms, and the models may not be proper for accurately forecasting companies with disproportionately large liquidity risks, such as construction companies. The construction industry is capital-intensive, requiring significant investments in long-term projects, therefore to realize returns from the investment. The unique capital structure means that the same criteria used for other industries cannot be applied to effectively evaluate financial risk for construction firms. Altman Z-score was first published in 1968, and is commonly used as a bankruptcy forecasting model. It forecasts the likelihood of a company going bankrupt by using a simple formula, classifying the results into three categories, and evaluating the corporate status as dangerous, moderate, or safe. When a company falls into the "dangerous" category, it has a high likelihood of bankruptcy within two years, while those in the "safe" category have a low likelihood of bankruptcy. For companies in the "moderate" category, it is difficult to forecast the risk. Many of the construction firm cases in this study fell in the "moderate" category, which made it difficult to forecast their risk. Along with the development of machine learning using computers, recent studies of corporate bankruptcy forecasting have used this technology. Pattern recognition, a representative application area in machine learning, is applied to forecasting corporate bankruptcy, with patterns analyzed based on a company's financial information, and then judged as to whether the pattern belongs to the bankruptcy risk group or the safe group. The representative machine learning models previously used in bankruptcy forecasting are Artificial Neural Networks, Adaptive Boosting (AdaBoost) and, the Support Vector Machine (SVM). There are also many hybrid studies combining these models. Existing studies using the traditional Z-Score technique or bankruptcy prediction using machine learning focus on companies in non-specific industries. Therefore, the industry-specific characteristics of companies are not considered. In this paper, we confirm that adaptive boosting (AdaBoost) is the most appropriate forecasting model for construction companies by based on company size. We classified construction companies into three groups - large, medium, and small based on the company's capital. We analyzed the predictive ability of AdaBoost for each group of companies. The experimental results showed that AdaBoost has more predictive ability than the other models, especially for the group of large companies with capital of more than 50 billion won.