• Title/Summary/Keyword: Standard Dataset

Search Result 194, Processing Time 0.024 seconds

Evaluation of Urban Weather Forecast Using WRF-UCM (Urban Canopy Model) Over Seoul (WRF-UCM (Urban Canopy Model)을 이용한 서울 지역의 도시기상 예보 평가)

  • Byon, Jae-Young;Choi, Young-Jean;Seo, Bum-Geun
    • Atmosphere
    • /
    • v.20 no.1
    • /
    • pp.13-26
    • /
    • 2010
  • The Urban Canopy Model (UCM) implemented in WRF model is applied to improve urban meteorological forecast for fine-scale (about 1-km horizontal grid spacing) simulations over the city of Seoul. The results of the surface air temperature and wind speed predicted by WRF-UCM model is compared with those of the standard WRF model. The 2-m air temperature and wind speed of the standard WRF are found to be lower than observation, while the nocturnal urban canopy temperature from the WRF-UCM is superior to the surface air temperature from the standard WRF. Although urban canopy temperature (TC) is found to be lower at industrial sites, TC in high-intensity residential areas compares better with surface observation than 2-m temperature. 10-m wind speed is overestimated in urban area, while urban canopy wind (UC) is weaker than observation by the drag effect of the building. The coupled WRF-UCM represents the increase of urban heat from urban effects such as anthropogenic heat and buildings, etc. The study indicates that the WRF-UCM contributes for the improvement of urban weather forecast such nocturnal heat island, especially when an accurate urban information dataset is provided.

Application of Dimensional Expansion and Reduction to Earthquake Catalog for Machine Learning Analysis (기계학습 분석을 위한 차원 확장과 차원 축소가 적용된 지진 카탈로그)

  • Jang, Jinsu;So, Byung-Dal
    • The Journal of Engineering Geology
    • /
    • v.32 no.3
    • /
    • pp.377-388
    • /
    • 2022
  • Recently, several studies have utilized machine learning to efficiently and accurately analyze seismic data that are exponentially increasing. In this study, we expand earthquake information such as occurrence time, hypocentral location, and magnitude to produce a dataset for applying to machine learning, reducing the dimension of the expended data into dominant features through principal component analysis. The dimensional extended data comprises statistics of the earthquake information from the Global Centroid Moment Tensor catalog containing 36,699 seismic events. We perform data preprocessing using standard and max-min scaling and extract dominant features with principal components analysis from the scaled dataset. The scaling methods significantly reduced the deviation of feature values caused by different units. Among them, the standard scaling method transforms the median of each feature with a smaller deviation than other scaling methods. The six principal components extracted from the non-scaled dataset explain 99% of the original data. The sixteen principal components from the datasets, which are applied with standardization or max-min scaling, reconstruct 98% of the original datasets. These results indicate that more principal components are needed to preserve original data information with even distributed feature values. We propose a data processing method for efficient and accurate machine learning model to analyze the relationship between seismic data and seismic behavior.

A Study on METS Design Using DDI Metadata (DDI 메타데이터를 활용한 METS 설계에 관한 연구)

  • Park, Jin Ho
    • Journal of the Korean Society for information Management
    • /
    • v.38 no.4
    • /
    • pp.153-171
    • /
    • 2021
  • This study suggested a method of utilizing METS based on DDI metadata to manage, preserve, and service datasets. DDI is a standard for statistical data processing, and there are currently two versions of DDI Codebook (DDI-C) and DDI Lifecycle (DDI-L). In this study, the main elements of DDI-C were mainly used. First the structures and elements of METS and DDI-C were first analyzed. And the mapping of the major elements of METS and DDI-C. The standard was finally taken as METS, the format to express it. Since METS and DDI-C do not show a perfect 1:1 mapping, the DDI-C element that best matches each element of the standard METS was selected. As a result, a new dataset management transmission standard METS using DDI-C metadata elements was designed and presented.

Pre-Evaluation for Prediction Accuracy by Using the Customer's Ratings in Collaborative Filtering (협업필터링에서 고객의 평가치를 이용한 선호도 예측의 사전평가에 관한 연구)

  • Lee, Seok-Jun;Kim, Sun-Ok
    • Asia pacific journal of information systems
    • /
    • v.17 no.4
    • /
    • pp.187-206
    • /
    • 2007
  • The development of computer and information technology has been combined with the information superhighway internet infrastructure, so information widely spreads not only in special fields but also in the daily lives of people. Information ubiquity influences the traditional way of transaction, and leads a new E-commerce which distinguishes from the existing E-commerce. Not only goods as physical but also service as non-physical come into E-commerce. As the scale of E-Commerce is being enlarged as well. It keeps people from finding information they want. Recommender systems are now becoming the main tools for E-Commerce to mitigate the information overload. Recommender systems can be defined as systems for suggesting some Items(goods or service) considering customers' interests or tastes. They are being used by E-commerce web sites to suggest products to their customers who want to find something for them and to provide them with information to help them decide which to purchase. There are several approaches of recommending goods to customer in recommender system but in this study, the main subject is focused on collaborative filtering technique. This study presents a possibility of pre-evaluation for the prediction performance of customer's preference in collaborative filtering before the process of customer's preference prediction. Pre-evaluation for the prediction performance of each customer having low performance is classified by using the statistical features of ratings rated by each customer is conducted before the prediction process. In this study, MovieLens 100K dataset is used to analyze the accuracy of classification. The classification criteria are set by using the training sets divided 80% from the 100K dataset. In the process of classification, the customers are divided into two groups, classified group and non classified group. To compare the prediction performance of classified group and non classified group, the prediction process runs the 20% test set through the Neighborhood Based Collaborative Filtering Algorithm and Correspondence Mean Algorithm. The prediction errors from those prediction algorithm are allocated to each customer and compared with each user's error. Research hypothesis : Two research hypotheses are formulated in this study to test the accuracy of the classification criterion as follows. Hypothesis 1: The estimation accuracy of groups classified according to the standard deviation of each user's ratings has significant difference. To test the Hypothesis 1, the standard deviation is calculated for each user in training set which is divided 80% from MovieLens 100K dataset. Four groups are classified according to the quartile of the each user's standard deviations. It is compared to test the estimation errors of each group which results from test set are significantly different. Hypothesis 2: The estimation accuracy of groups that are classified according to the distribution of each user's ratings have significant differences. To test the Hypothesis 2, the distributions of each user's ratings are compared with the distribution of ratings of all customers in training set which is divided 80% from MovieLens 100K dataset. It assumes that the customers whose ratings' distribution are different from that of all customers would have low performance, so six types of different distributions are set to be compared. The test groups are classified into fit group or non-fit group according to the each type of different distribution assumed. The degrees in accordance with each type of distribution and each customer's distributions are tested by the test of ${\chi}^2$ goodness-of-fit and classified two groups for testing the difference of the mean of errors. Also, the degree of goodness-of-fit with the distribution of each user's ratings and the average distribution of the ratings in the training set are closely related to the prediction errors from those prediction algorithms. Through this study, the customers who have lower performance of prediction than the rest in the system are classified by those two criteria, which are set by statistical features of customers ratings in the training set, before the prediction process.

Translation of 3D CAD Data to X3D Dataset Maintaining the Product Structure (3차원 CAD 데이터의 제품구조를 포함하는 X3D 기반 데이터로의 변환 기법)

  • Cho, Gui-Mok;Hwang, Jin-Sang;Kim, Young-Kuk
    • The KIPS Transactions:PartA
    • /
    • v.18A no.3
    • /
    • pp.81-92
    • /
    • 2011
  • There has been a number of attempts to apply 3D CAD data created in the design stage of product life cycle to various applications of the other stages in related industries. But, 3D CAD data requires a large amount of computing resources for data processing, and it is not suitable for post applications such as distributed collaboration, marketing tool, or Interactive Electronic Technical Manual because of the design information security problem and the license cost. Therefore, various lightweight visualization formats and application systems have been suggested to overcome these problems. However, most of these lightweight formats are dependent on the companies or organizations which suggested them and cannot be shared with each other. In addition, product structure information is not represented along with the product geometric information. In this paper, we define a dataset called prod-X3D(Enhanced X3D Dataset for Web-based Visualization of 3D CAD Product Model) based on the international standard graphic format, X3D, which can represent the structure information as well as the geometry information of a product, and propose a translation method from 3D CAD data to an prod-X3D.

A Study on Database Design Model for Production System Record Management Module in DataSet Record Management (데이터세트 기록관리를 위한 생산시스템 기록관리 모듈의 DB 설계 모형연구)

  • Kim, Dongsu;Yim, Jinhee;Kang, Sung-hee
    • The Korean Journal of Archival Studies
    • /
    • no.78
    • /
    • pp.153-195
    • /
    • 2023
  • RDBMS is a widely used database system worldwide, and the term dataset refers to the vast amount of data produced in administrative information systems using RDBMS. Unlike business systems that mainly produce administrative documents, administrative information systems generate records centered around the unique tasks of organizations. These records differ from traditional approval documents and metadata, making it challenging to seamlessly transfer them to standard record management systems. With the 2022 revision of the 'Public Records Act Enforcement Decree,' dataset was included in the types of records for which only management authority is transferred. The core aspect of this revision is the need to manage the lifecycle of records within administrative information systems. However, there has been little exploration into how to manage dataset within administrative information systems. As a result, this research aims to design a database for a record management module that needs to be integrated into administrative information systems to manage the lifecycle of records. By modifying and supplementing ISO 16175-1:2020, we are designing an "human resource management system" and identifying and evaluating personnel management dataset. Through this, we aim to provide a concrete example of record management within administrative information systems. It's worth noting that the prototype system designed in this research has limitations in terms of data volume compared to systems currently in use within organizations, and it has not yet been validated by record researchers and IT developers in the field. However, this endeavor has allowed us to understand the nature of dataset and how they should be managed within administrative information systems. It has also affirmed the need for a record management module's database within administrative information systems. In the future, once a complete record management module is developed and standards are established by the National Archives, it is expected to become a necessary module for organizations to manage dataset effectively.

Empirical Analysis of the Effect of EU ETS on the CO2 Emission (유럽공동체 배출권거래제 도입 효과에 대한 실증분석)

  • Kim, Hyun;Lee, Gwanghoon
    • Environmental and Resource Economics Review
    • /
    • v.19 no.4
    • /
    • pp.875-896
    • /
    • 2010
  • Using the difference in differences (DID) estimation method, this paper analyzes the effect of European Union's Emission Trading Scheme (EU ETS) on the reduction of per capita $CO_2$ emission among the twenty five participating countries. For this, the panel dataset of forty two European countries for the period 1990~2007 is constructed. Special attention is paid to the bias of the standard errors in the DID estimation due to the presence of serial correlation in the error terms. The results shows quite a robust effect of EU ETS on the reduction of per capita $CO_2$ emission among the participating countries regardless of the calculation methods of standard errors. The results also shows that the increased implicit tax rate on energy has a robust effect on the reduction of per capita $CO_2$ emission. On the contrary, the estimation results regarding the effects of per capita GDP and population density on the per capita $CO_2$ emission seem inconsistent. In particular, the environmental Kuznets curve is not statistically supported with the use of robust standard errors.

  • PDF

Clinical Validation of a Deep Learning-Based Hybrid (Greulich-Pyle and Modified Tanner-Whitehouse) Method for Bone Age Assessment

  • Kyu-Chong Lee;Kee-Hyoung Lee;Chang Ho Kang;Kyung-Sik Ahn;Lindsey Yoojin Chung;Jae-Joon Lee;Suk Joo Hong;Baek Hyun Kim;Euddeum Shim
    • Korean Journal of Radiology
    • /
    • v.22 no.12
    • /
    • pp.2017-2025
    • /
    • 2021
  • Objective: To evaluate the accuracy and clinical efficacy of a hybrid Greulich-Pyle (GP) and modified Tanner-Whitehouse (TW) artificial intelligence (AI) model for bone age assessment. Materials and Methods: A deep learning-based model was trained on an open dataset of multiple ethnicities. A total of 102 hand radiographs (51 male and 51 female; mean age ± standard deviation = 10.95 ± 2.37 years) from a single institution were selected for external validation. Three human experts performed bone age assessments based on the GP atlas to develop a reference standard. Two study radiologists performed bone age assessments with and without AI model assistance in two separate sessions, for which the reading time was recorded. The performance of the AI software was assessed by comparing the mean absolute difference between the AI-calculated bone age and the reference standard. The reading time was compared between reading with and without AI using a paired t test. Furthermore, the reliability between the two study radiologists' bone age assessments was assessed using intraclass correlation coefficients (ICCs), and the results were compared between reading with and without AI. Results: The bone ages assessed by the experts and the AI model were not significantly different (11.39 ± 2.74 years and 11.35 ± 2.76 years, respectively, p = 0.31). The mean absolute difference was 0.39 years (95% confidence interval, 0.33-0.45 years) between the automated AI assessment and the reference standard. The mean reading time of the two study radiologists was reduced from 54.29 to 35.37 seconds with AI model assistance (p < 0.001). The ICC of the two study radiologists slightly increased with AI model assistance (from 0.945 to 0.990). Conclusion: The proposed AI model was accurate for assessing bone age. Furthermore, this model appeared to enhance the clinical efficacy by reducing the reading time and improving the inter-observer reliability.

Artificial Intelligence-based Echocardiogram Video Classification by Aggregating Dynamic Information

  • Ye, Zi;Kumar, Yogan J.;Sing, Goh O.;Song, Fengyan;Ni, Xianda;Wang, Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.2
    • /
    • pp.500-521
    • /
    • 2021
  • Echocardiography, an ultrasound scan of the heart, is regarded as the primary physiological test for heart disease diagnoses. How an echocardiogram is interpreted also relies intensively on the determination of the view. Some of such views are identified as standard views because of the presentation and ease of the evaluations of the major cardiac structures of them. However, finding valid cardiac views has traditionally been time-consuming, and a laborious process because medical imaging is interpreted manually by the specialist. Therefore, this study aims to speed up the diagnosis process and reduce diagnostic error by providing an automated identification of standard cardiac views based on deep learning technology. More importantly, based on a brand-new echocardiogram dataset of the Asian race, our research considers and assesses some new neural network architectures driven by action recognition in video. Finally, the research concludes and verifies that these methods aggregating dynamic information will receive a stronger classification effect.

A Study on the Formant Comparison of Korean Monophthongs according to Age and Gender -A Survey on Patients in Oriental Hospitals- (연령 및 성별에 따른 한국인 단모음 포먼트 비교에 관한 연구 -한방병원 내원환자를 중심으로-)

  • Kim, Young-Su;Kim, Keun Ho;Kim, Jong Yeol;Jang, Jun-Su
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.73-80
    • /
    • 2013
  • Formant is one of the essential vocal features for research of voice production, recognition and synthesis. Numerous studies were established on foreign languages including English vowels. However, studies related to Korean were done with a limited number of voice data. In this study, we compare four formants according to age and gender using a large number of Korean monophthongs. A total of 2614 Korean speakers participated in our experiments. We summarize statistical results by mean and standard deviation for each formant of five monophthongs. The results show a notable difference in each age and gender group. A quantitative study based on a large dataset is suggested for future studies on Korean speech sounds.