• Title/Summary/Keyword: 편향된 데이터

Search Result 160, Processing Time 0.025 seconds

A Study on Flaming Phenomena in Social Network: Content Analysis of Major Issues in Seoul Mayor Reelection in 2011 (소셜 네트워크 상에서의 플레밍(Flaming) 현상과 공론장의 가능성 - 2011년 서울시장 선거 이슈 분석 -)

  • Jho, Whasun;Kim, Jeongyeon
    • Informatization Policy
    • /
    • v.20 no.2
    • /
    • pp.73-90
    • /
    • 2013
  • Rational debate and public conversation in the public sphere of social network are crucial conditions for realizing deliberative democracy. However, negative communication can occur online more frequently than in the real space, and mutually hostile messages are appearing. In the electoral process, citizens combining for particular candidates have made personal attacks against, abused and slandered the opposing candidates. Then, how and to what degree has the flaming behavior been appearing in the elections? Are there influencers to propagate the flaming behavior? And how flaming are these influencers, compared to internet users? This research focuses on the flaming behavior which occurred during the reelection for Seoul Mayor, in order to diagnose the role of social network as an online public sphere. This study analyzes the spreading degree of flaming messages depending on each issue, and the differences of messages between influencers and normal users. There was frequent flaming behaviors to distribute biased information which criticized, laughed at and maliciously attacked individual candidates. Moreover, influencers who advanced leading opinions, displayed a higher flaming degree than normal users.

  • PDF

J-Tree: An Efficient Index using User Searching Patterns for Large Scale Data (J-tree : 사용자의 검색패턴을 이용한 대용량 데이타를 위한 효율적인 색인)

  • Jang, Su-Min;Seo, Kwang-Seok;Yoo, Jae-Soo
    • Journal of KIISE:Databases
    • /
    • v.36 no.1
    • /
    • pp.44-49
    • /
    • 2009
  • In recent years, with the development of portable terminals, various searching services on large data have been provided in portable terminals. In order to search large data, most applications for information retrieval use indexes such as B-trees or R-trees. However, only a small portion of the data set is accessed by users, and the access frequencies of each data are not uniform. The existing indexes such as B-trees or R-trees do not consider the properties of the skewed access patterns. And a cache stores the frequently accessed data for fast access in memory. But the size of memory used in the cache is restricted. In this paper, we propose a new index based on disk, called J-tree, which considers user's search patterns. The proposed index is a balanced tree which guarantees uniform searching time on all data. It also supports fast searching time on the frequently accessed data. Our experiments show the effectiveness of our proposed index under various settings.

A Study on The Need for AI Literacy According to The Development of Artificial Intelligence Chatbot (인공지능 챗봇 발전에 따른 AI 리터러시 필요성 연구)

  • Cheol-Seung Lee;Hye-Jin Baek
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.3
    • /
    • pp.421-426
    • /
    • 2023
  • Among artificial intelligence convergence technologies, Chatbot is an artificial intelligence-based interactive system and refers to a system that can provide interaction with humans. Chatbots are being re-examined as chatbots develop into NLP, NLU, and NLG. However, artificial intelligence chatbots can provide biased information based on learned data and cause serious damage such as privacy infringement and cybersecurity concerns, and it is essential to understand artificial intelligence technology and foster AI literacy. With the continued evolution and universalization of artificial intelligence, AI Literacy will also expand its scope and include new areas. This study is meaningful in raising awareness of artificial intelligence technology and proposing the use of human respect technology that is not buried in technology by cultivating human AI literacy capabilities.

Research on Utilization of AI in the Media Industry: Focusing on Social Consensus of Pros and Cons in the Journalism Sector (미디어 산업 AI 활용성에 관한 고찰 : 저널리즘 분야 적용의 주요 쟁점을 중심으로)

  • Jeonghyeon Han;Hajin Yoo;Minjun Kang;Hanjin Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.713-722
    • /
    • 2024
  • This study highlights the impact of Artificial Intelligence (AI) technology on journalism, discussing its utility and addressing major ethical concerns. Broadcasting companies and media institutions, such as the Bloomberg, Guardian, WSJ, WP, NYT, globally are utilizing AI for innovation in news production, data analysis, and content generation. Accordingly, the ecosystem of AI journalism will be analyzed in terms of scale, economic feasibility, diversity, and value enhancement of major media AI service types. Through the previous literature review, this study identifies key ethical and social issues in AI journalism as well. It aims to bridge societal and technological concerns by exploring mutual development directions for AI technology and the media industry. Additionally, it advocates for the necessity of integrated guidelines and advanced AI literacy through social consensus in addressing these issues.

Cache Memory and Replacement Algorithm Implementation and Performance Comparison

  • Park, Na Eun;Kim, Jongwan;Jeong, Tae Seog
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.11-17
    • /
    • 2020
  • In this paper, we propose practical results for cache replacement policy by measuring cache hit and search time for each replacement algorithm through cache simulation. Thus, the structure of each cache memory and the four types of alternative policies of FIFO, LFU, LRU and Random were implemented in software to analyze the characteristics of each technique. The paper experiment showed that the LRU algorithm showed hit rate and search time of 36.044% and 577.936ns in uniform distribution, 45.636% and 504.692ns in deflection distribution, while the FIFO algorithm showed similar performance to the LRU algorithm at 36.078% and 554.772ns in even distribution and 45.662% and 489.574ns in bias distribution. Then LFU followed, Random algorithm was measured at 30.042% and 622.866ns at even distribution, 36.36% at deflection distribution and 553.878ns at lowest performance. The LRU replacement method commonly used in cache memory has the complexity of implementation, but it is the most efficient alternative to conventional alternative algorithms, indicating that it is a reasonable alternative method considering the reference information of data.

A Comparative Study on Discrimination Issues in Large Language Models (거대언어모델의 차별문제 비교 연구)

  • Wei Li;Kyunghwa Hwang;Jiae Choi;Ohbyung Kwon
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.125-144
    • /
    • 2023
  • Recently, the use of Large Language Models (LLMs) such as ChatGPT has been increasing in various fields such as interactive commerce and mobile financial services. However, LMMs, which are mainly created by learning existing documents, can also learn various human biases inherent in documents. Nevertheless, there have been few comparative studies on the aspects of bias and discrimination in LLMs. The purpose of this study is to examine the existence and extent of nine types of discrimination (Age, Disability status, Gender identity, Nationality, Physical appearance, Race ethnicity, Religion, Socio-economic status, Sexual orientation) in LLMs and suggest ways to improve them. For this purpose, we utilized BBQ (Bias Benchmark for QA), a tool for identifying discrimination, to compare three large-scale language models including ChatGPT, GPT-3, and Bing Chat. As a result of the evaluation, a large number of discriminatory responses were observed in the mega-language models, and the patterns differed depending on the mega-language model. In particular, problems were exposed in elder discrimination and disability discrimination, which are not traditional AI ethics issues such as sexism, racism, and economic inequality, and a new perspective on AI ethics was found. Based on the results of the comparison, this paper describes how to improve and develop large-scale language models in the future.

Data Reusable Search Scan Methods for Low Power motion Estimation (저전력 움직임 추정을 위한 데이터 재사용 스캔 방법)

  • Kim, Tae Sun;SunWoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.9
    • /
    • pp.85-91
    • /
    • 2013
  • This paper proposes the data reusable search scan methods for full search and fast search to implement low power Motion Estimation (ME). The proposed Optimized Sub-region Partitioning (OSP) method which divide search region into several sub-region can reduce the number of the required Reconfigurable Register Array (RRA) by half compared to the existing smart snake scan method for the same data reusability. In addition, the proposed Center Biased Search Scan method (CBSS) for various fast search algorithms can improve the data reusability. The performance comparisons show that the proposed search scan methods can reduce the average redundant data loading about 26.9% and 16.1% compared with the existing rater scan and snake scan methods, respectively. Due to the reduction of memory accesses, the proposed search scan methods are quite suitable for low power and high performance ME implementation.

Hybrid Learning for Vision-and-Language Navigation Agents (시각-언어 이동 에이전트를 위한 복합 학습)

  • Oh, Suntaek;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.9
    • /
    • pp.281-290
    • /
    • 2020
  • The Vision-and-Language Navigation(VLN) task is a complex intelligence problem that requires both visual and language comprehension skills. In this paper, we propose a new learning model for visual-language navigation agents. The model adopts a hybrid learning that combines imitation learning based on demo data and reinforcement learning based on action reward. Therefore, this model can meet both problems of imitation learning that can be biased to the demo data and reinforcement learning with relatively low data efficiency. In addition, the proposed model uses a novel path-based reward function designed to solve the problem of existing goal-based reward functions. In this paper, we demonstrate the high performance of the proposed model through various experiments using both Matterport3D simulation environment and R2R benchmark dataset.

Imbalanced Data Improvement Techniques Based on SMOTE and Light GBM (SMOTE와 Light GBM 기반의 불균형 데이터 개선 기법)

  • Young-Jin, Han;In-Whee, Joe
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.12
    • /
    • pp.445-452
    • /
    • 2022
  • Class distribution of unbalanced data is an important part of the digital world and is a significant part of cybersecurity. Abnormal activity of unbalanced data should be found and problems solved. Although a system capable of tracking patterns in all transactions is needed, machine learning with disproportionate data, which typically has abnormal patterns, can ignore and degrade performance for minority layers, and predictive models can be inaccurately biased. In this paper, we predict target variables and improve accuracy by combining estimates using Synthetic Minority Oversampling Technique (SMOTE) and Light GBM algorithms as an approach to address unbalanced datasets. Experimental results were compared with logistic regression, decision tree, KNN, Random Forest, and XGBoost algorithms. The performance was similar in accuracy and reproduction rate, but in precision, two algorithms performed at Random Forest 80.76% and Light GBM 97.16%, and in F1-score, Random Forest 84.67% and Light GBM 91.96%. As a result of this experiment, it was confirmed that Light GBM's performance was similar without deviation or improved by up to 16% compared to five algorithms.

Comparing Byte Pair Encoding Methods for Korean (음절 단위 및 자모 단위의 Byte Pair Encoding 비교 연구)

  • Lee, Chanhee;Lee, Dongyub;Hur, YunA;Yang, Kisu;Lim, Heuiseok
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.291-295
    • /
    • 2018
  • 한국어는 교착어적 특성이 강한 언어로, 교착어적 특성이 없는 영어 등의 언어와 달리 형태소의 수에 따라 조합 가능한 어절의 수가 매우 많으므로 어절 단위의 처리가 매우 어렵다. 따라서 어절을 더 작은 단위로 분해하는 전처리 단계가 요구되는데, 형태소 분석이 이를 위해 주로 사용되었다. 하지만 지도학습 방법을 이용한 형태소 분석 시스템은 다량의 학습 데이터가 요구되고, 비지도학습 방법을 이용한 형태소 분석은 성능에 큰 하락을 보인다. Byte Pair Encoding은 데이터를 압축하는 알고리즘으로, 이를 자연어처리 분야에 응용하면 비지도학습 방법으로 어절을 더 작은 단위로 분해할 수 있다. 본 연구에서는 한국어에 Byte Pair Encoding을 적용하는 두 가지 방법인 음절 단위 처리와 자모 단위 처리의 성능 및 특성을 정량적, 정성적으로 분석하는 방법을 제안하였다. 또한, 이 방법을 세종 말뭉치에 적용하여 각각의 알고리즘을 이용한 어절 분해를 실험하고, 그 결과를 어절 분해 정확도, 편향, 편차를 바탕으로 비교, 분석하였다.

  • PDF