• Title/Summary/Keyword: Standard Dataset

Search Result 191, Processing Time 0.026 seconds

Scaling Up Face Masks Classification Using a Deep Neural Network and Classical Method Inspired Hybrid Technique

  • Kumar, Akhil;Kalia, Arvind;Verma, Kinshuk;Sharma, Akashdeep;Kaushal, Manisha;Kalia, Aayushi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.11
    • /
    • pp.3658-3679
    • /
    • 2022
  • Classification of persons wearing and not wearing face masks in images has emerged as a new computer vision problem during the COVID-19 pandemic. In order to address this problem and scale up the research in this domain, in this paper a hybrid technique by employing ResNet-101 and multi-layer perceptron (MLP) classifier has been proposed. The proposed technique is tested and validated on a self-created face masks classification dataset and a standard dataset. On self-created dataset, the proposed technique achieved a classification accuracy of 97.3%. To embrace the proposed technique, six other state-of-the-art CNN feature extractors with six other classical machine learning classifiers have been tested and compared with the proposed technique. The proposed technique achieved better classification accuracy and 1-6% higher precision, recall, and F1 score as compared to other tested deep feature extractors and machine learning classifiers.

TET2DICOM-GUI: Graphical User Interface Based TET2DICOM Program to Convert Tetrahedral-Mesh-Phantom to DICOM-RT Dataset

  • Se Hyung Lee;Bo-Wi Cheon;Chul Hee Min;Haegin Han;Chan Hyeong Kim;Min Cheol Han;Seonghoon Kim
    • Progress in Medical Physics
    • /
    • v.33 no.4
    • /
    • pp.172-179
    • /
    • 2022
  • Recently, tetrahedral phantoms have been newly adopted as international standard mesh-type reference computational phantoms (MRCPs) by the International Commission on Radiological Protection, and a program has been developed to convert them to computational tomography images and DICOM-RT structure files for application of radiotherapy. Through this program, the use of the tetrahedral standard phantom has become available in clinical practice, but utilization has been difficult due to various library dependencies requiring a lot of time and effort for installation. To overcome this limitation, in this study a newly developed TET2DICOM-GUI, a TET2DICOM program based on a graphical user interface (GUI), was programmed using only the MATLAB language so that it can be used without additional library installation and configuration. The program runs in the same order as TET2DICOM and has been optimized to run on a personal computer in a GUI environment. A tetrahedron-based male international standard human phantom, MRCP-AM, was used to evaluate TET2DICOM-GUI. Conversion into a DICOM-RT dataset applicable in clinical practice in about one hour with a personal computer as a basis was confirmed. Also, the generated DICOM-RT dataset was confirmed to be effectively implemented in the radiotherapy planning system. The program developed in this study is expected to replace actual patient data in future studies.

Statistical Blade Angular Velocity Information-based Wind Turbine Fault Diagnosis Monitoring System (블레이드 각속도 통계 정보 기반 풍력 발전기 고장 진단 모니터링 시스템)

  • Kim, Byoungjin;Kang, Suk-Ju;Park, Joon-Young
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.2 no.4
    • /
    • pp.619-625
    • /
    • 2016
  • In this paper, we propose a new fault diagnosis monitoring system using gyro sensor-based angular velocity calculation for blades of the wind turbine system. First, the proposed system generates the angular velocity dataset for the rotation speed of the normal blade. Using the dataset, we estimate and evaluate the state of blades for the wind turbine by comparing the current state with the pre-calculated normal state. In the experimental results, the angular velocity of the normal state was higher than $360^{\circ}/s$ while that of the damaged blades was lower than $360^{\circ}/s$ and the standard deviation of the angular velocity was significantly increased.

R2RML Based ShEx Schema

  • Choi, Ji-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.45-55
    • /
    • 2018
  • R2RML is a W3C standard language that defines how to expose the relational data as RDF triples. The output from an R2RML mapping is only an RDF dataset. By definition, the dataset has no schema. The lack of schema makes the dataset in linked data portal impractical for integrating and analyzing data. To address this issue, we propose an approach for generating automatically schemas for RDF graphs populated by R2RML mappings. More precisely, we represent the schema using ShEx, which is a language for validating and describing RDF. Our approach allows to generate ShEx schemas as well as RDF datasets from R2RML mappings. Our ShEx schema can provide benefits for both data providers and ordinary users. Data providers can verify and guarantee the structural integrity of the dataset against the schema. Users can write SPARQL queries efficiently by referring to the schema. In this paper, we describe data structures and algorithms of the system to derive ShEx documents from R2RML documents and presents a brief demonstration regarding its proper use.

Development of Dataset Items for Commercial Space Design Applying AI

  • Jung Hwa SEO;Segeun CHUN;Ki-Pyeong, KIM
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.1
    • /
    • pp.25-29
    • /
    • 2023
  • In this paper, the purpose is to create a standard of AI training dataset type for commercial space design. As the market size of the field of space design continues to increase and the time spent increases indoors after COVID-19, interest in space is expanding throughout society. In addition, more and more consumers are getting used to the digital environment. Therefore, If you identify trends and preemptively propose the atmosphere and specifications that customers require quickly and easily, you can increase customer trust and conduct effective sales. As for the data set type, commercial districts were divided into a total of 8 categories, and images that could be processed were derived by refining 4,009,30MB JPG format images collected through web crawling. Then, by performing bounding and labeling operations, we developed a 'Dataset for AI Training' of 3,356 commercial space image data in CSV format with a size of 2.08MB. Through this study, elements of spatial images such as place type, space classification, and furniture can be extracted and used when developing AI algorithms, and it is expected that images requested by clients can be easily and quickly collected through spatial image input information.

Empirical Verification of Conversion and Restoration of Preservation Format for Dataset: Application of Dataset with Disaster Safety Information to SIARD (데이터세트 보존포맷 검증방안에 관한 연구: 재난안전정보 데이터세트의 SIARD 적용을 통해)

  • Han, Hui-Jeong;Yoon, Sung-Ho;Oh, Hyo-Jung;Yang, Dongmin
    • Journal of the Korean Society for information Management
    • /
    • v.37 no.2
    • /
    • pp.251-284
    • /
    • 2020
  • As the use of information has emerged as the core of national competitiveness, major developed countries and the Korean government have realized the importance of data. They have pursued technical research and standard establishment for long-term preservation and continuously strived for systematic management and preservation of data. However, although various types of data are specified for the purpose of record management in the law, there is no specific method on how to collect, manage and preserve them, except standard electronic documents. In particular, management and preservation of huge datasets from the administrative information system have been strongly demanded above all. Any guidelines for datasets do not have been properly provided. After the framework for selecting preservation format must be prepared, the system can be supplemented and built. The framework considering the characteristics of the dataset should be specified more concretely, and empirical verification of the conversion and restoration for the dataset preservation format derived according to the selection criteria is necessary. Therefore, this study intends to propose a method for long-term preservation through empirical verification of the preservation format after deriving an evaluation the framework for the preservation format selection criteria considering the characteristics of the dataset.

Construction of a Standard Dataset for Liver Tumors for Testing the Performance and Safety of Artificial Intelligence-Based Clinical Decision Support Systems (인공지능 기반 임상의학 결정 지원 시스템 의료기기의 성능 및 안전성 검증을 위한 간 종양 표준 데이터셋 구축)

  • Seung-seob Kim;Dong Ho Lee;Min Woo Lee;So Yeon Kim;Jaeseung Shin;Jin‑Young Choi;Byoung Wook Choi
    • Journal of the Korean Society of Radiology
    • /
    • v.82 no.5
    • /
    • pp.1196-1206
    • /
    • 2021
  • Purpose To construct a standard dataset of contrast-enhanced CT images of liver tumors to test the performance and safety of artificial intelligence (AI)-based algorithms for clinical decision support systems (CDSSs). Materials and Methods A consensus group of medical experts in gastrointestinal radiology from four national tertiary institutions discussed the conditions to be included in a standard dataset. Seventy-five cases of hepatocellular carcinoma, 75 cases of metastasis, and 30-50 cases of benign lesions were retrieved from each institution, and the final dataset consisted of 300 cases of hepatocellular carcinoma, 300 cases of metastasis, and 183 cases of benign lesions. Only pathologically confirmed cases of hepatocellular carcinomas and metastases were enrolled. The medical experts retrieved the medical records of the patients and manually labeled the CT images. The CT images were saved as Digital Imaging and Communications in Medicine (DICOM) files. Results The medical experts in gastrointestinal radiology constructed the standard dataset of contrast-enhanced CT images for 783 cases of liver tumors. The performance and safety of the AI algorithm can be evaluated by calculating the sensitivity and specificity for detecting and characterizing the lesions. Conclusion The constructed standard dataset can be utilized for evaluating the machine-learning-based AI algorithm for CDSS.

A Study on Developing XML Marine GIS metadata (XML 형식의 해양GIS 메타데이터 작성에 관한 연구)

  • Oh, Se-Woong;Park, Jong-Min;Suh, Sang-Hyun
    • Journal of Navigation and Port Research
    • /
    • v.28 no.3
    • /
    • pp.247-252
    • /
    • 2004
  • It's important to develop Metadata standard to manage a large of marine geospatial data such as observation, ocean survey, satellite image more effectively. If we use metadata in Marine GIS we can make sense marine geospatial data, make the most of marine dataset. International standard organization's work and NGIS's standard are a good example to illustrate metadata standard's importance. But we don't have metadata standard for marine dataset, so it's difficult to search and use geospatial data In this paper, we presented common marine metadata element, and composited metadata implementation schema. Finally we constructed marine GIS metadata editing tool..

A Crowdsourcing-Based Paraphrased Opinion Spam Dataset and Its Implication on Detection Performance (크라우드소싱 기반 문장재구성 방법을 통한 의견 스팸 데이터셋 구축 및 평가)

  • Lee, Seongwoon;Kim, Seongsoon;Park, Donghyeon;Kang, Jaewoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.7
    • /
    • pp.338-343
    • /
    • 2016
  • Today, opinion reviews on the Web are often used as a means of information exchange. As the importance of opinion reviews continues to grow, the number of issues for opinion spam also increases. Even though many research studies on detecting spam reviews have been conducted, some limitations of gold-standard datasets hinder research. Therefore, we introduce a new dataset called "Paraphrased Opinion Spam (POS)" that contains a new type of review spam that imitates truthful reviews. We have noticed that spammers refer to existing truthful reviews to fabricate spam reviews. To create such a seemingly truthful review spam dataset, we asked task participants to paraphrase truthful reviews to create a new deceptive review. The experiment results show that classifying our POS dataset is more difficult than classifying the existing spam datasets since the reviews in our dataset more linguistically look like truthful reviews. Also, training volume has been found to be an important factor for classification model performance.

Design of Data-centroid Radial Basis Function Neural Network with Extended Polynomial Type and Its Optimization (데이터 중심 다항식 확장형 RBF 신경회로망의 설계 및 최적화)

  • Oh, Sung-Kwun;Kim, Young-Hoon;Park, Ho-Sung;Kim, Jeong-Tae
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.60 no.3
    • /
    • pp.639-647
    • /
    • 2011
  • In this paper, we introduce a design methodology of data-centroid Radial Basis Function neural networks with extended polynomial function. The two underlying design mechanisms of such networks involve K-means clustering method and Particle Swarm Optimization(PSO). The proposed algorithm is based on K-means clustering method for efficient processing of data and the optimization of model was carried out using PSO. In this paper, as the connection weight of RBF neural networks, we are able to use four types of polynomials such as simplified, linear, quadratic, and modified quadratic. Using K-means clustering, the center values of Gaussian function as activation function are selected. And the PSO-based RBF neural networks results in a structurally optimized structure and comes with a higher level of flexibility than the one encountered in the conventional RBF neural networks. The PSO-based design procedure being applied at each node of RBF neural networks leads to the selection of preferred parameters with specific local characteristics (such as the number of input variables, a specific set of input variables, and the distribution constant value in activation function) available within the RBF neural networks. To evaluate the performance of the proposed data-centroid RBF neural network with extended polynomial function, the model is experimented with using the nonlinear process data(2-Dimensional synthetic data and Mackey-Glass time series process data) and the Machine Learning dataset(NOx emission process data in gas turbine plant, Automobile Miles per Gallon(MPG) data, and Boston housing data). For the characteristic analysis of the given entire dataset with non-linearity as well as the efficient construction and evaluation of the dynamic network model, the partition of the given entire dataset distinguishes between two cases of Division I(training dataset and testing dataset) and Division II(training dataset, validation dataset, and testing dataset). A comparative analysis shows that the proposed RBF neural networks produces model with higher accuracy as well as more superb predictive capability than other intelligent models presented previously.