• Title/Summary/Keyword: Public dataset

Search Result 254, Processing Time 0.031 seconds

A Study on Insider Threat Dataset Sharing Using Blockchain (블록체인을 활용한 내부자 유출위협 데이터 공유 연구)

  • Wonseok Yoon;Hangbae Chang
    • Journal of Platform Technology
    • /
    • v.11 no.2
    • /
    • pp.15-25
    • /
    • 2023
  • This study analyzes the limitations of the insider threat datasets used for insider threat detection research and compares and analyzes the solution-based insider threat data with public insider threat data using a security solution to overcome this. Through this, we design a data format suitable for insider threat detection and implement a system that can safely share insider threat information between different institutions and companies using blockchain technology. Currently, there is no dataset collected based on actual events in the insider threat dataset that is revealed to researchers. Public datasets are virtual synthetic data randomly created for research, and when used as a learning model, there are many limitations in the real environment. In this study, to improve these limitations, a private blockchain was designed to secure information sharing between institutions of different affiliations, and a method was derived to increase reliability and maintain information integrity and consistency through agreement and verification among participants. The proposed method is expected to collect data through an outflow threat collector and collect quality data sets that posed a threat, not synthetic data, through a blockchain-based sharing system, to solve the current outflow threat dataset problem and contribute to the insider threat detection model in the future.

  • PDF

Comparison of Performance of Medical Image Semantic Segmentation Model in ATLASV2.0 Data (ATLAS V2.0 데이터에서 의료영상 분할 모델 성능 비교)

  • So Yeon Woo;Yeong Hyeon Gu;Seong Joon Yoo
    • Journal of Broadcast Engineering
    • /
    • v.28 no.3
    • /
    • pp.267-274
    • /
    • 2023
  • There is a problem that the size of the dataset is insufficient due to the limitation of the collection of the medical image public data, so there is a possibility that the existing studies are overfitted to the public dataset. In this paper, we compare the performance of eight (Unet, X-Net, HarDNet, SegNet, PSPNet, SwinUnet, 3D-ResU-Net, UNETR) medical image semantic segmentation models to revalidate the superiority of existing models. Anatomical Tracings of Lesions After Stroke (ATLAS) V1.2, a public dataset for stroke diagnosis, is used to compare the performance of the models and the performance of the models in ATLAS V2.0. Experimental results show that most models have similar performance in V1.2 and V2.0, but X-net and 3D-ResU-Net have higher performance in V1.2 datasets. These results can be interpreted that the models may be overfitted to V1.2.

A Case Study of Dataset Records in Information Management System (행정정보 데이터세트 사례 조사 연구)

  • Oh, Seh-La;Park, Seunghoon;Yim, Jin-Hee
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.18 no.2
    • /
    • pp.109-133
    • /
    • 2018
  • The need for the records management of administrative information dataset has led to a broad consensus among archivists and has been continuously studied. In the meantime, information technology has greatly advanced, and the development and redevelopment of information management systems have been increasing. Nevertheless, dataset management in information management system has not been practiced in public organizations. This is because it is supposed that no practical management plan exists. From the point of view that practical dataset management methods should be based on the reality of dataset creation and management environment, this study investigates various active datasets in working administrative information systems. The examples and the information drawn from the examination are expected to contribute to dataset management planning. Moreover, the research methods can be utilized in further studies.

Bark Identification Using a Deep Learning Model (심층 학습 모델을 이용한 수피 인식)

  • Kim, Min-Ki
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.10
    • /
    • pp.1133-1141
    • /
    • 2019
  • Most of the previous studies for bark recognition have focused on the extraction of LBP-like statistical features. Deep learning approach was not well studied because of the difficulty of acquiring large volume of bark image dataset. To overcome the bark dataset problem, this study utilizes the MobileNet which was trained with the ImageNet dataset. This study proposes two approaches. One is to extract features by the pixel-wise convolution and classify the features with SVM. The other is to tune the weights of the MobileNet by flexibly freezing layers. The experimental results with two public bark datasets, BarkTex and Trunk12, show that the proposed methods are effective in bark recognition. Especially the results of the flexible tunning method outperform state-of-the-art methods. In addition, it can be applied to mobile devices because the MobileNet is compact compared to other deep learning models.

Utilizing Artificial Neural Networks for Establishing Hearing-Loss Predicting Models Based on a Longitudinal Dataset and Their Implications for Managing the Hearing Conservation Program

  • Thanawat Khajonklin;Yih-Min Sun;Yue-Liang Leon Guo;Hsin-I Hsu;Chung Sik Yoon;Cheng-Yu Lin;Perng-Jy Tsai
    • Safety and Health at Work
    • /
    • v.15 no.2
    • /
    • pp.220-227
    • /
    • 2024
  • Background: Though the artificial neural network (ANN) technique has been used to predict noise-induced hearing loss (NIHL), the established prediction models have primarily relied on cross-sectional datasets, and hence, they may not comprehensively capture the chronic nature of NIHL as a disease linked to long-term noise exposure among workers. Methods: A comprehensive dataset was utilized, encompassing eight-year longitudinal personal hearing threshold levels (HTLs) as well as information on seven personal variables and two environmental variables to establish NIHL predicting models through the ANN technique. Three subdatasets were extracted from the afirementioned comprehensive dataset to assess the advantages of the present study in NIHL predictions. Results: The dataset was gathered from 170 workers employed in a steel-making industry, with a median cumulative noise exposure and HTL of 88.40 dBA-year and 19.58 dB, respectively. Utilizing the longitudinal dataset demonstrated superior prediction capabilities compared to cross-sectional datasets. Incorporating the more comprehensive dataset led to improved NIHL predictions, particularly when considering variables such as noise pattern and use of personal protective equipment. Despite fluctuations observed in the measured HTLs, the ANN predicting models consistently revealed a discernible trend. Conclusions: A consistent correlation was observed between the measured HTLs and the results obtained from the predicting models. However, it is essential to exercise caution when utilizing the model-predicted NIHLs for individual workers due to inherent personal fluctuations in HTLs. Nonetheless, these ANN models can serve as a valuable reference for the industry in effectively managing its hearing conservation program.

A Study on Record Selection Strategy and Procedure in Dataset for Administrative Information (행정정보 데이터세트 기록의 선별 기준 및 절차 연구)

  • Cho, Eun-Hee;Yim, Jin-Hee
    • The Korean Journal of Archival Studies
    • /
    • no.19
    • /
    • pp.251-291
    • /
    • 2009
  • Due to the trend toward computerization of business services in public sector and the push for e-government, the volume of records that are produced in electronic system and the types of records vary as well. Of those types, dataset is attracting everyone's attention because it is rapidly being supplied. Even though the administrative information system stipulated as an electronic record production system is increasing in number, as it is in blind spot for records management, the system can be superannuated or the records can be lost in case new system is developed. In addition, the system was designed not considering records management, it is managed in an unsatisfactory state because of not meeting the features and quality requirements as records management system. In the advanced countries, they recognized the importance of dataset and then managed the archives for dataset and carried out the project on management systems and a preservation formats for keeping data. Korea also is carrying out the researches on an dataset and individual administrative information systems, but the official scheme has not been established yet. In this study the items for managing archives which should be reflected when the administrative information system is designed was offered in two respects - an identification method and a quality requirement. The major directions for this system are as follows. First, as the dataset is a kind of an electronic record, it is necessary to reflect this factor from the design step prior to production. Second, the system should be established integrating the strategy for records management to the information strategy for the whole organization. In this study, based on such two directions the strategies to establish the identification for dataset in a frame to push e-government were suggested. The problem on the archiving steps including preservation format and the management procedures in dataset archive does not included in the research contents. In line with this, more researches on those contents as well as a variety of researches on dataset are expected to be more actively conducted.

Vehicle detection and tracking algorithm based on improved feature extraction

  • Xiaole Ge;Feng Zhou;Shuaiting Chen;Gan Gao;Rugang Wang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.9
    • /
    • pp.2642-2664
    • /
    • 2024
  • In the process of modern traffic management, information technology has become an important part of intelligent traffic governance. Real-time monitoring can accurately and effectively track and record vehicles, which is of great significance to modern urban traffic management. Existing tracking algorithms are affected by the environment, viewpoint, etc., and often have problems such as false detection, imprecise anchor boxes, and ID switch. Based on the YOLOv5 algorithm, we improve the loss function, propose a new feature extraction module to obtain the receptive field at different scales, and do adaptive fusion with the SGE attention mechanism, so that it can effectively suppress the noise information during feature extraction. The trained model improves the mAP value by 5.7% on the public dataset UA-DETRAC without increasing the amount of calculations. Meanwhile, for vehicle feature recognition, we adaptively adjust the network structure of the DeepSort tracking algorithm. Finally, we tested the tracking algorithm on the public dataset and in a realistic scenario. The results show that the improved algorithm has an increase in the values of MOTA and MT etc., which generally improves the reliability of vehicle tracking.

Exploring Public Opinion to Analyze the Consequences of Social Media on Students' Behaviors

  • Asif Nawaz;Tariq Ali;Saif Ur Rehman;Yaser Hafeez
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.8
    • /
    • pp.159-168
    • /
    • 2024
  • Social media sites like as twitter, Facebook and flicker widely used by people, not only as a source of distributing information but also as for communication purpose, with the advancement of technology today. Now a day's one of the most frequently used communication methods are social networks. In various research studies, their use in different fields and the effects of social media on student's behaviors, chat sites and blogs caused by Facebook has been analyzed. In order to obtain the basic data, a general scanning model that is public opinion and views of parents and comments that are openly available across social media sites, used to perceive attitude of graduate students, instead of traditional methods like questionnaires and survey's conduction. A dataset of nearly 20000 reviews of parents was collected from different social media networks about their children's, while in another dataset in which 362 graduate school teachers who observe the students to use social media during classes, labs and in campus during free times, their comments about those students were chosen. As per this study, through different positive and negative factors the detailed analysis has been performed to show effect of social media on student's behavior.

Analysis of YouTube Trending Video Dataset by Country and Category (YouTube 인기 급상승 동영상 데이터셋의 국가별-카테고리별 분석)

  • Jung, Jimin;Kim, Seungjin;Jung, Sungwook;Lee, Dongyun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.209-211
    • /
    • 2022
  • YouTube, a video platform used by millions of people worldwide, provides a rapidly growing video service. This study aims to understand the characteristics and cultural differences of each country using the Kaggle dataset, one of the public datasets, and to show the usefulness of the public dataset. For this purpose, we analyze data from 11 countries, 15 categories, and about 1.1 million trending videos. This study adopts Python to obtain the number of videos by category for data analysis, the selection period of videos rapidly increasing in popularity, and the ratio of unique videos. In the future, based on machine learning, we plan to research to help diagnose individual videos and establish channel operation plans and strategies by predicting the selection possibility and selection period based on machine learning.

  • PDF

An Exploratory Study for Utilization of Copyrighted Public Records and Provision of Customer-Centered Services (공공저작물 활용 및 수요자 중심의 서비스 제공을 위한 탐색적 연구 : 공공저작물 제공사이트를 중심으로)

  • Ryu, Me Ae;Ahn, Tae Ho
    • Journal of Information Technology Services
    • /
    • v.15 no.3
    • /
    • pp.223-245
    • /
    • 2016
  • This study defines copyrighted public records in broad sense including open government data and public domain except for some private records. Additionally, this study aims to investigate improvement plan for maximizing utilization of copyrighted public records in web-sites using customer side, without consideration of supplier side. For this purpose, qualitative study method was used with grounded theory on analyzed problems from literature review and case study. Literature review was concentrated on definition of open data and abroad utilization indicators whereas case study analyzed current situation of four web-sites providing copyrighted public records. Converged opinions from in-depth interview and various statistical data was analyzed as a basis for grounded theory, then a paradigm model was constructed and future improvement plans were presented. The findings imply that opening of copyrighted public records is not just important for quantitative results, rather it requires qualitative improvement providing latest credible information that is consistent with the demand of the customer. Thus, development of service platform and business models for copyrighted public records are urgent task.