• Title/Summary/Keyword: 데이터 분석론

Search Result 1,370, Processing Time 0.028 seconds

Privacy-Preserving Language Model Fine-Tuning Using Offsite Tuning (프라이버시 보호를 위한 오프사이트 튜닝 기반 언어모델 미세 조정 방법론)

  • Jinmyung Jeong;Namgyu Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.165-184
    • /
    • 2023
  • Recently, Deep learning analysis of unstructured text data using language models, such as Google's BERT and OpenAI's GPT has shown remarkable results in various applications. Most language models are used to learn generalized linguistic information from pre-training data and then update their weights for downstream tasks through a fine-tuning process. However, some concerns have been raised that privacy may be violated in the process of using these language models, i.e., data privacy may be violated when data owner provides large amounts of data to the model owner to perform fine-tuning of the language model. Conversely, when the model owner discloses the entire model to the data owner, the structure and weights of the model are disclosed, which may violate the privacy of the model. The concept of offsite tuning has been recently proposed to perform fine-tuning of language models while protecting privacy in such situations. But the study has a limitation that it does not provide a concrete way to apply the proposed methodology to text classification models. In this study, we propose a concrete method to apply offsite tuning with an additional classifier to protect the privacy of the model and data when performing multi-classification fine-tuning on Korean documents. To evaluate the performance of the proposed methodology, we conducted experiments on about 200,000 Korean documents from five major fields, ICT, electrical, electronic, mechanical, and medical, provided by AIHub, and found that the proposed plug-in model outperforms the zero-shot model and the offsite model in terms of classification accuracy.

From the Geography of Physical Space to the Geography of Virtual Space: Current and Future Research of the Information and Communication Geography and Virtual Geography (물리공간의 지리학에서 가상공간의 지리학으로: 정보통신지리학과 가상지리학의 연구동향과 가능성)

  • Kim, Young-Long
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.22 no.1
    • /
    • pp.70-83
    • /
    • 2019
  • This paper reviews how geographers have embraced the information and communication technology and expanded their perspectives from real space to virtual space. Information and communication geography research on the wired internet infrastructure began in the late 1990s, but the tradition has not been succeeded for the wireless internet technology. While the relationship-expansion, reproduction, and constraint-between real and virtual spaces have been studied by virtual geography scholars, we need more empirical research to reveal to what extent the two spaces impact to each other. To empirically investigate the physicality of the virtual, it will be useful to combine information and communication geography and virtual geography. However, it should be noted that empirical studies in the subfields can be criticized as being data- or technological deterministic.

A Study on Dataset Construction Technique for Intrusion Detection based on Pattern Recognition (패턴인식 기반 침입탐지를 위한 데이터셋 구성 기법에 대한 연구)

  • Gong, Seong-Hyeon;Cho, Min-Jeong;Cho, Jae-ik;Lee, Changhoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.343-345
    • /
    • 2017
  • 통신 기술이 발달하고, 네트워크 환경 또한 다양해짐에 따라 통신 사용자들에 대한 사이버 위협 또한 다양해졌다. 패턴인식 기술과 기계학습에 기반한 침입탐지 기술은 새롭게 보고되는 수많은 사이버 공격들에 대응하기 위해 등장하였다. 기계학습 기반의 IDS는 낮은 오탐률과 높은 효율성을 요구하며, 이러한 특징은 데이터셋을 구성하는 방법론에 큰 영향을 받는다. 본 논문에서는 패턴인식 기반 트래픽 분석을 수행하기 위한 데이터셋을 구성할 때 고려해야할 주안점에 대해 논하며, 현실의 사이버 위협 상황을 잘 반영할 수 있는 데이터셋을 도출하는 방법을 모색한다.

Performance Analysis of a PCI-Bus based RAID System (PCI-버스 기반 RAID 시스템의 버스 성능 분석)

  • 이찬수;성영락;오하령
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.7_8
    • /
    • pp.370-380
    • /
    • 2003
  • A large RAID system may consist of several PCI bus segments since a PCI bus segment can connect only a limited number of disks. In this paver, PCI bus transactions in a RAID system are classified in terms of the initiator and the target of the transaction. Also, the data transfer time of each transaction type is analyzed. By using the analysis results, read and write performance of two RAID system configurations are formulated. From simulation of the RAID system using the DEVS formalism, performance of the configurations are evaluated and compared with the analytical results while changing various system parameters.

Sentiment Analysis of Foot-and-mouth Disease using Tweet Keyword Network (트윗 키워드 네트워크를 이용한 구제역의 감성분석)

  • Chae, Heechan;Lee, Jonguk;Choi, Yoona;Park, Daihee;Chung, Yongwha
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.267-270
    • /
    • 2018
  • 구제역으로 인하여 국내 축산업계 및 관련 산업분야는 매년 막대한 피해를 입고 있다. 구제역과 관련한 다양한 학술적 연구들이 현재 진행되고는 있으나, 구제역의 발병에 따른 사회적 파급효과에 관한 공학적 분석 연구는 매우 제한적이다. 본 연구에서는 구제역에 관한 일반 시민들의 감성적 반응을 텍스트 마이닝 방법론을 사용하여 분석하는 체계적인 방법론을 제안한다. 제안하는 시스템은 먼저, 트위터에 게시된 트윗 중 구제역과 관련된 데이터를 수집한 후, 감성사전을 기반으로 극성탐지 과정을 거친다. 둘째, 토픽 모델링의 대표적인 기법 중 하나인 LDA를 활용하여 트윗으로 부터 키워드들을 추출하고, 추출된 키워드들로부터 극성별 동시출현 키워드 네트워크를 구성한다. 셋째, 키워드 네트워크을 통해 각 구간별 구제역의 사회적 파급효과를 분석한다. 사례 분석으로써, 2010년 7월부터 2011년 12월까지 국내에서 발생한 구제역에 관한 일반 시민들의 감성적 변화를 분석하였다.

A Methodology for Performance Modeling and Prediction of Large-Scale Cluster Servers (대규모 클러스터 서버의 성능 모델링 및 예측 방법론)

  • Jang, Hye-Churn;Jin, Hyun-Wook;Kim, Hag-Young
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.11
    • /
    • pp.1041-1045
    • /
    • 2010
  • Clusters can provide scalable and flexible architectures for parallel computing servers and data centers. Their performance prediction has been a very challenging issue. Existing performance measurement methodologies are able to measure the performance of servers already constructed. Thus they cannot provide a way to predict the overall system performance in advance when designing the system at the initial phase or adding more nodes for more capacity. Therefore, the performance modeling and prediction methodology for large-scale clusters is highly required. In this paper, we suggest a methodology to predict the performance of large-scale clusters, which consists of measurement, modeling and prediction steps. We apply the methodology to a real cluster server and show its usefulness.

A Modeling Methodology for Analysis of Dynamic Systems Using Heuristic Search and Design of Interface for CRM (휴리스틱 탐색을 통한 동적시스템 분석을 위한 모델링 방법과 CRM 위한 인터페이스 설계)

  • Jeon, Jin-Ho;Lee, Gye-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.4
    • /
    • pp.179-187
    • /
    • 2009
  • Most real world systems contain a series of dynamic and complex phenomena. One of common methods to understand these systems is to build a model and analyze the behavior of them. A two-step methodology comprised of clustering and then model creation is proposed for the analysis on time series data. An interface is designed for CRM(Customer Relationship Management) that provides user with 1:1 customized information using system modeling. It was confirmed from experiments that better clustering would be derived from model based approach than similarity based one. Clustering is followed by model creation over the clustered groups, by which future direction of time series data movement could be predicted. The effectiveness of the method was validated by checking how similarly predicted values from the models move together with real data such as stock prices.

Change Detection Using Deep Learning Based Semantic Segmentation for Nuclear Activity Detection and Monitoring (핵 활동 탐지 및 감시를 위한 딥러닝 기반 의미론적 분할을 활용한 변화 탐지)

  • Song, Ahram;Lee, Changhui;Lee, Jinmin;Han, Youkyung
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.991-1005
    • /
    • 2022
  • Satellite imaging is an effective supplementary data source for detecting and verifying nuclear activity. It is also highly beneficial in regions with limited access and information, such as nuclear installations. Time series analysis, in particular, can identify the process of preparing for the conduction of a nuclear experiment, such as relocating equipment or changing facilities. Differences in the semantic segmentation findings of time series photos were employed in this work to detect changes in meaningful items connected to nuclear activity. Building, road, and small object datasets made of KOMPSAT 3/3A photos given by AIHub were used to train deep learning models such as U-Net, PSPNet, and Attention U-Net. To pick relevant models for targets, many model parameters were adjusted. The final change detection was carried out by including object information into the first change detection, which was obtained as the difference in semantic segmentation findings. The experiment findings demonstrated that the suggested approach could effectively identify altered pixels. Although the suggested approach is dependent on the accuracy of semantic segmentation findings, it is envisaged that as the dataset for the region of interest grows in the future, so will the relevant scope of the proposed method.

Research on Development of Support Tools for Local Government Business Transaction Operation Using Big Data Analysis Methodology (빅데이터 분석 방법론을 활용한 지방자치단체 단위과제 운영 지원도구 개발 연구)

  • Kim, Dabeen;Lee, Eunjung;Ryu, Hanjo
    • The Korean Journal of Archival Studies
    • /
    • no.70
    • /
    • pp.85-117
    • /
    • 2021
  • The purpose of this study is to investigate and analyze the current status of unit tasks, unit task operation, and record management problems used by local governments, and to present improvement measures using text-based big data technology based on the implications derived from the process. Local governments are in a serious state of record management operation due to errors in preservation period due to misclassification of unit tasks, inability to identify types of overcommon and institutional affairs, errors in unit tasks, errors in name, referenceable standards, and tools. However, the number of unit tasks is about 720,000, which cannot be effectively controlled due to excessive quantities, and thus strict and controllable tools and standards are needed. In order to solve these problems, this study developed a system that applies text-based analysis tools such as corpus and tokenization technology during big data analysis, and applied them to the names and construction terms constituting the record management standard. These unit task operation support tools are expected to contribute significantly to record management tasks as they can support standard operability such as uniform preservation period, identification of delegated office records, control of duplicate and similar unit task creation, and common tasks. Therefore, if the big data analysis methodology can be linked to BRM and RMS in the future, it is expected that the quality of the record management standard work will increase.

Stability Analysis of Landslides using a Probabilistic Analysis Method in the Boeun Area (확률론적 해석기법을 이용한 보은지역의 사면재해 안정성분석)

  • Jeong, Nam-Soo;You, Kwang-ho;Park, Hyuck-Jin
    • The Journal of Engineering Geology
    • /
    • v.21 no.3
    • /
    • pp.247-257
    • /
    • 2011
  • In this study the infinite slope model, one of the physical landslide models has been suggested to evaluate the susceptibility of the landslide. However, applying the infinite slope model in regional study area can be difficult or impossible because of the difficulties in obtaining and processing of large spatial data sets. With limited site investigation data, uncertainties were inevitably involved with. Therefore, the probabilistic analysis method such as Monte Carlo simulation and the GIS based infinite slope stability model have been used to evaluate the probability of failure. The proposed approach has been applied to practical example. The study area in Boeun area been selected since the area has been experienced tremendous amount of landslide occurrence. The geometric characteristics of the slope and the mechanical properties of soils like to friction angle and cohesion were obtained. In addition, coefficient of variation (COV) values in the uncertain parameters were varied from 10% to 30% in order to evaluate the effect of the uncertainty. The analysis results showed that the probabilistic analysis method can reduce the effect of uncertainty involved in input parameters.