• Title/Summary/Keyword: 데이터 분석론

Search Result 1,370, Processing Time 0.033 seconds

Topographic Non-negative Matrix Factorization for Topic Visualization from Text Documents (Topographic non-negative matrix factorization에 기반한 텍스트 문서로부터의 토픽 가시화)

  • Chang, Jeong-Ho;Eom, Jae-Hong;Zhang, Byoung-Tak
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.324-329
    • /
    • 2006
  • Non-negative matrix factorization(NMF) 기법은 음이 아닌 값으로 구성된 데이터를 두 종류의 양의 행렬의 곱의 형식으로 분할하는 데이터 분석기법으로서, 텍스트마이닝, 바이오인포매틱스, 멀티미디어 데이터 분석 등에 활용되었다. 본 연구에서는 기본 NMF 기법에 기반하여 텍스트 문서로부터 토픽을 추출하고 동시에 이를 가시적으로 도시하기 위한 Topographic NMF (TNMF) 기법을 제안한다. TNMF에 의한 토픽 가시화는 데이터를 전체적인 관점에서 보다 직관적으로 파악하는데 도움이 될 수 있다. TNMF는 생성모델 관점에서 볼 때, 2개의 은닉층을 갖는 계층적 모델로 표현할 수 있으며, 상위 은닉층에서 하위 은닉층으로의 연결은 토픽공간상에서 토픽간의 전이확률 또는 이웃함수를 정의한다. TNMF에서의 학습은 전이확률값의 연속적 스케줄링 과정 속에서 반복적 파리미터 갱신 과정을 통해 학습이 이루어지는데, 파라미터 갱신은 기본 NMF 기반 학습 과정으로부터 유사한 형태로 유도될 수 있음을 보인다. 추가적으로 Probabilistic LSA에 기초한 토픽 가시화 기법 및 희소(sparse)한 해(解) 도출을 목적으로 한 non-smooth NMF 기법과의 연관성을 분석, 제시한다. NIPS 학회 논문 데이터에 대한 실험을 통해 제안된 방법론이 문서 내에 내재된 토픽들을 효과적으로 가시화 할 수 있음을 제시한다.

  • PDF

A Study on the Fraud Detection through Sequential Pattern Analysis: Focused on Transactions of Electronic Prepayment (순차패턴 분석을 통한 이상금융거래탐지 연구: 선불전자지급수단 거래를 중심으로)

  • Choi, Byung-Ho;Cho, Nam-Wook
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.3
    • /
    • pp.21-32
    • /
    • 2021
  • Due to the recent development in electronic financial services, transactions of electronic prepayment are rapidly increasing. The increased transactions of electronic prepayment, however, also leads to the increased fraud attempts. It is mainly because electronic prepayment can easily be converted into cash. The objective of this paper is to develop a methodology that can effectively detect fraud transactions in electronic prepayment, by using sequential pattern mining techniques. To validate our approach, experiments on real transaction data were conducted and the applicability of the proposed method was demonstrated. As a result, the accuracy of the proposed method has been 95.6 percent, showing that the proposed method can effectively detect fraud transactions. The proposed method could be used to reduce the damage caused by the fraud attempts of electronic prepayment.

Knowledge Modeling and Database Construction for Human Biomonitoring Data (인체 바이오모니터링 지식 모델링 및 데이터베이스 구축)

  • Lee, Jangwoo;Yang, Sehee;Lee, Hunjoo
    • Journal of Food Hygiene and Safety
    • /
    • v.35 no.6
    • /
    • pp.607-617
    • /
    • 2020
  • Human bio-monitoring (HBM) data is a very important resource for tracking total exposure and concentrations of a parent chemical or its metabolites in human biomarkers. However, until now, it was difficult to execute the integration of different types of HBM data due to incompatibility problems caused by gaps in study design, chemical description and coding system between different sources in Korea. In this study, we presented a standardized code system and HBM knowledge model (KM) based on relational database modeling methodology. For this purpose, we used 11 raw datasets collected from the Ministry of Food and Drug Safety (MFDS) between 2006 and 2018. We then constructed the HBM database (DB) using a total of 205,491 concentration-related data points for 18,870 participants and 86 chemicals. In addition, we developed a summary report-type statistical analysis program to verify the inputted HBM datasets. This study will contribute to promoting the sustainable creation and versatile utilization of big-data for HBM results at the MFDS.

The Study of Power Quality measuring using Network (네트워크를 이용한 전력품질 계측에 관한 연구)

  • Seo, Yong-Won;Kim, Ki-Chul;Kim, Tae-Eung;Kim, Jea-Eon
    • Proceedings of the KIEE Conference
    • /
    • 2006.07d
    • /
    • pp.1793-1794
    • /
    • 2006
  • 네트워크 전력계측 장치는 전국의 수용가 측에 설치되며 전력품질을 감시할 수 있는 단말기와 표준화된 이더넷망(유무선)을 사용한 네트워크 환경과 전국의 전력품질 데이터를 수집할 수 있는 서버로 크게 구성된다. 본 논문에서는 128 샘플링으로 계측되는 전력품질계측 단말기를 구현하고 전력품질 분석 알고리즘을 고안하였으며 이더넷망을 이용하여 전력품질 데이터를 수집하고 수집데이터를 분석하여 전력 품질에 대한 분석을 하였다. 뿐만아니라 전력품질에 영향을 주는 고조파신호들과 OVER/UNDER 전압과 전류 및 주파수, SAG, SWELL, INTERRUPT등을 인식하여 해당 데이터를 메모리에 저장하는 일련의 알고리즘을 연구하였으며 이렇게 저장된 정보를 이더넷망에서 무결성과 신뢰성이 있게 고속으로 전송받을 수 있는 통신 프로토콜에 관한 연구가 수행하였다. 뿐만 아니라 응용프로그램에서 소프트에웨어적인 필터링 기법과 분석알고리즘을 연구하여 이상신호에 대한 원인 판단이 가능하도록 연구한다. 본 논문은 네트워크(인터넷/이더넷)와 전력품질계측 관련 방법론과 SERVER개념을 도입한 유비쿼터스에서 센서네트워크 기법을 전력산업에 융합하려는 연구에 그 요점이 있다.

  • PDF

A Study of Applying Bootstrap Method to Seasonal Data (계절성 데이터의 부트스트랩 적용에 관한 연구)

  • Park, Jin-Soo;Kim, Yun-Bae
    • Journal of the Korea Society for Simulation
    • /
    • v.19 no.3
    • /
    • pp.119-125
    • /
    • 2010
  • The moving block bootstrap, the stationary bootstrap, and the threshold bootstrap are methods of simulation output analysis, which are applicable to autocorrelated data. These bootstrap methods assume the stationarity of data. However, bootstrap methods cannot work if the stationary assumption is not guaranteed because of seasonality or trends in data. In the simulation output analysis, threshold bootstrap method is the best in describing the autocorrelation structure of original data set. The threshold bootstrap makes the cycle based on threshold value. If we apply the bootstrap to seasonality data, we can get similar accuracy of the results. In this paper, we verify the possibility of applying the bootstrap to seasonal data.

Optimizing Language Models through Dataset-Specific Post-Training: A Focus on Financial Sentiment Analysis (데이터 세트별 Post-Training을 통한 언어 모델 최적화 연구: 금융 감성 분석을 중심으로)

  • Hui Do Jung;Jae Heon Kim;Beakcheol Jang
    • Journal of Internet Computing and Services
    • /
    • v.25 no.1
    • /
    • pp.57-67
    • /
    • 2024
  • This research investigates training methods for large language models to accurately identify sentiments and comprehend information about increasing and decreasing fluctuations in the financial domain. The main goal is to identify suitable datasets that enable these models to effectively understand expressions related to financial increases and decreases. For this purpose, we selected sentences from Wall Street Journal that included relevant financial terms and sentences generated by GPT-3.5-turbo-1106 for post-training. We assessed the impact of these datasets on language model performance using Financial PhraseBank, a benchmark dataset for financial sentiment analysis. Our findings demonstrate that post-training FinBERT, a model specialized in finance, outperformed the similarly post-trained BERT, a general domain model. Moreover, post-training with actual financial news proved to be more effective than using generated sentences, though in scenarios requiring higher generalization, models trained on generated sentences performed better. This suggests that aligning the model's domain with the domain of the area intended for improvement and choosing the right dataset are crucial for enhancing a language model's understanding and sentiment prediction accuracy. These results offer a methodology for optimizing language model performance in financial sentiment analysis tasks and suggest future research directions for more nuanced language understanding and sentiment analysis in finance. This research provides valuable insights not only for the financial sector but also for language model training across various domains.

Analysis and Study for Appropriate Deep Neural Network Structures and Self-Supervised Learning-based Brain Signal Data Representation Methods (딥 뉴럴 네트워크의 적절한 구조 및 자가-지도 학습 방법에 따른 뇌신호 데이터 표현 기술 분석 및 고찰)

  • Won-Jun Ko
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.137-142
    • /
    • 2024
  • Recently, deep learning technology has become those methods as de facto standards in the area of medical data representation. But, deep learning inherently requires a large amount of training data, which poses a challenge for its direct application in the medical field where acquiring large-scale data is not straightforward. Additionally, brain signal modalities also suffer from these problems owing to the high variability. Research has focused on designing deep neural network structures capable of effectively extracting spectro-spatio-temporal characteristics of brain signals, or employing self-supervised learning methods to pre-learn the neurophysiological features of brain signals. This paper analyzes methodologies used to handle small-scale data in emerging fields such as brain-computer interfaces and brain signal-based state prediction, presenting future directions for these technologies. At first, this paper examines deep neural network structures for representing brain signals, then analyzes self-supervised learning methodologies aimed at efficiently learning the characteristics of brain signals. Finally, the paper discusses key insights and future directions for deep learning-based brain signal analysis.

Ontology Implementation and Methodology Revisited Using Topic Maps based Medical Information Retrieval System (토픽맵 기반 의학 정보 검색 시스템 구축을 통한 온톨로지 구축 및 방법론 연구)

  • Yi, Myong-Ho
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.3
    • /
    • pp.35-51
    • /
    • 2010
  • Emerging Web 2.0 services such as Twitter, Blogs, and Wikis alongside the poorlystructured and immeasurable growth of information requires an enhanced information organization approach. Ontology has received much attention over the last 10 years as an emerging approach for enhancing information organization. However, there is little penetration into current systems. The purpose of this study is to propose ontology implementation and methodology. To achieve the goal of this study, limitations of traditional information organization approaches are addressed and emerging information organization approaches are presented. Two ontology data models, RDF/OW and Topic Maps, are compared and then ontology development processes and methodology with topic maps based medical information retrieval system are addressed. The comparison of two data models allows users to choose the right model for ontology development.

A Study of Integration Modelling for Context-aware Service Based on Ontology (온톨로지 기반의 상황인지 서비스를 위한 통합 모델에 관한 연구)

  • Hwang, Chi-Gon;Yoon, Chang-Pyo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.253-255
    • /
    • 2015
  • In a variety of network environments, the provision of context-aware services, it is difficult to integrate and share because of the heterogeneity problem between distributed data. This paper proposes the integration model using the ontology as a method for solving the above. This uses an ontology to integrate the context-aware informations that are collected. The ontology is generated by the acquisition, semantic analysis and inference of the metadata of the context-aware information. This is the basis of the analysis and analysis of the additional system. Accordingly, this paper studies ways to create an ontology and apply them. The advantage of the proposed scheme can be used without modifying the existing tools, it is possible to easily perform the expansion and consolidation of the system.

  • PDF

A study on the development of an integrated water quality index combining water quality and flow (수질-유량을 연계한 통합수질지수 개발 연구)

  • Sang Ung Lee;Bu Geon Jo;Young Do Kim
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.238-238
    • /
    • 2023
  • 최근 이상기후현상으로 홍수와 가뭄의 발생 빈도 증가 및 하천유지유량 부족 등 하천에서 유량 변동이 크게 나타나고 하천 환경 변화에 따른 수질오염, 갈수기 수질악화 등 하천에서 다양한 문제들이 발생하고 있다. 수질은 매개변수별 기준 농도와 측정 농도를 비교하여 평가하지만 직독식 측정 항목과 실험실 분석 항목 및 미측정 항목을 포함하기 때문에 수질 상태를 정확하게 나타내기 어렵다. 물리적, 화학적 및 생물학적 특성의 매개변수를 분석하여 수질을 평가하지만, 복잡한 수질 데이터를 단순하고 논리적으로 수질을 요약하기 위해 단일 값으로 매개변수를 통합한 수질지수가 개발되었다. 다양한 국가 및 기관에서 개발된 수질지수는 방법론, 최종산출 방법의 차이로 동일한 지점 및 기간에서 측정되는 자료를 각각의 수질지수 방법론을 적용하였을 때 상이한 점수 및 등급이 발생하여 유역 특성에 적절한 수질지수를 활용하는 것이 필요하며, 유량 변동이 고려되어야 한다. 따라서 본 연구에서는 기존의 수질지수 산정 매개변수를 유역 특성 및 관리기준을 고려하여 매개변수를 수정하고 매개변수별 중요도에 따른 가중치를 재산정하고 유량 인자를 추가하여 복합적인 하천 수질을 종합적으로 평가하고자 한다. 또한, 물리모형과 데이터 모형을 활용하여 기후변화에 따른 수질 변동 평가를 통해 수문학적 변화가 하천 수질에 미치는 영향을 평가하고자 한다.

  • PDF