• Title/Summary/Keyword: 분야 편향성

Search Result 18, Processing Time 0.023 seconds

Recommendations for the Construction of a Quslity-Controlled Stress Measurement Dataset (품질이 관리된 스트레스 측정용 테이터셋 구축을 위한 제언)

  • Tai Hoon KIM;In Seop NA
    • Smart Media Journal
    • /
    • v.13 no.2
    • /
    • pp.44-51
    • /
    • 2024
  • The construction of a stress measurement detaset plays a curcial role in various modern applications. In particular, for the efficient training of artificial intelligence models for stress measurement, it is essential to compare various biases and construct a quality-controlled dataset. In this paper, we propose the construction of a stress measurement dataset with quality management through the comparison of various biases. To achieve this, we introduce strss definitions and measurement tools, the process of building an artificial intelligence stress dataset, strategies to overcome biases for quality improvement, and considerations for stress data collection. Specifically, to manage dataset quality, we discuss various biases such as selection bias, measurement bias, causal bias, confirmation bias, and artificial intelligence bias that may arise during stress data collection. Through this paper, we aim to systematically understand considerations for stress data collection and various biases that may occur during the construction of a stress dataset, contributing to the construction of a dataset with guaranteed quality by overcoming these biases.

Analysis of Toxicity and Bias of ChatGPT within Korean Social Context (한국의 사회적 맥락에서의 ChatGPT의 독성 및 편향성 분석)

  • Seungyoon Lee;Chanjun Park;Gyeongmin Kim;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.539-545
    • /
    • 2023
  • 초거대 언어모델은 심화된 언어적 이해를 요구하는 여러 분야에 높은 영향력을 미치고 있으나, 그에 수반되는 편향성과 윤리성에 대한 우려 또한 함께 증대되었다. 특히 편향된 언어모델은 인종, 성적 지향 등과 같은 다양한 속성을 가진 개인들에 대한 편견을 강화시킬 수 있다. 그러나 이러한 편향성에 관한 연구는 대부분 영어 문화권에 한정적이며 한국어에 관한 연구 또한 한국에서 발생하는 지역 갈등, 젠더 갈등 등의 사회적 문제를 반영하지 못한다. 이에 본 연구에서는 ChatGPT의 내재된 편향성을 도출하기 위해 의도적으로 다양한 페르소나를 부여하고 한국의 사회적 쟁점들을 기반으로 프롬프트 집합을 구성하여 생성된 문장의 독성을 분석하였다. 실험 결과, 특정 페르소나 또는 프롬프트에 관해서는 지속적으로 유해한 문장을 생성하는 경향성이 나타났다. 또한 각 페르소나-쟁점에 대해 사회가 갖는 편향된 시각이 모델에 그대로 반영되어, 각 조합에 따라 생성된 문장의 독성 분포에 유의미한 차이를 보이는 것을 확인했다.

  • PDF

Discipline Bias of Document Citation Impact Indicators: Analyzing Articles in Korean Citation Index (논문 인용 영향력 측정 지수의 편향성에 대한 연구: KCI 수록 논문을 대상으로)

  • Lee, Jae Yun;Choi, Sanghee
    • Journal of the Korean Society for information Management
    • /
    • v.32 no.4
    • /
    • pp.205-221
    • /
    • 2015
  • The impact of a journal is commonly used as the impact of an individual paper within that journal. It is problematic to interpret a journal's impact as a single paper's impact of the journal, so there are several researches to measure a single paper's impact with its own citation counts. This study applied 8 impact indicators to Korean Citation Index database and examined discipline bias of each indicator. Analyzed indicators are simple citation counts, PageRank, f-value, CCI, c-index, single publication h-index, single publication hs-index, and cl-index. PageRank has the least discipline bias at highly ranked papers and journal bias in a discipline. On the contrary, simple citation counts showed strongly biased results toward a certain discipline or a journal. KCI database provides only simple citation counts. It needs to show PageRank (global indicator) to discover influential papers in diverse areas. Furthermore it needs to consider to provide the best of local indicators. Local indicators can be calculated only with papers in users' search results because they uses citation counts of citing papers and the number of references. They are more efficient than global indicators which explore the whole database. KCI should also consider to provide Cl-index (local indicator).

Automatic Classification and Vocabulary Analysis of Political Bias in News Articles by Using Subword Tokenization (부분 단어 토큰화 기법을 이용한 뉴스 기사 정치적 편향성 자동 분류 및 어휘 분석)

  • Cho, Dan Bi;Lee, Hyun Young;Jung, Won Sup;Kang, Seung Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2021
  • In the political field of news articles, there are polarized and biased characteristics such as conservative and liberal, which is called political bias. We constructed keyword-based dataset to classify bias of news articles. Most embedding researches represent a sentence with sequence of morphemes. In our work, we expect that the number of unknown tokens will be reduced if the sentences are constituted by subwords that are segmented by the language model. We propose a document embedding model with subword tokenization and apply this model to SVM and feedforward neural network structure to classify the political bias. As a result of comparing the performance of the document embedding model with morphological analysis, the document embedding model with subwords showed the highest accuracy at 78.22%. It was confirmed that the number of unknown tokens was reduced by subword tokenization. Using the best performance embedding model in our bias classification task, we extract the keywords based on politicians. The bias of keywords was verified by the average similarity with the vector of politicians from each political tendency.

커피시장의 새로운 흐름 원두커피자판기가 몰려온다

  • 한국자동판매기공업협회
    • Vending industry
    • /
    • v.1 no.4 s.4
    • /
    • pp.29-34
    • /
    • 2002
  • 20년 이상 인스턴트만을 고집해 온 국내 커피자판기 시장에 있어 올 들어 새로운 흐름이 형성되고 있다. 시장포화로 인한 신규 수요창출에 애를 먹는 기존 시장에 있어 새로운 시장발전 대안 모델로 원두커피자판기가 새롭게 등장하고 있는 것. 인스턴트 커피자판기의 편향성을 탈피, 이제는 원두시장으로 가야만 하는 절박한 상황에서 많은 업체들의 시장도전이 이어지고 있는 것은 분명 반가운 일이다. 일부에서는 아직까지는 시기상조라는 부정적인 의식이 있은 것도 사실이지만 몰려오는 원두커피자판기들은 이제 거부할 수 없는 대세적 흐름을 만들고 있다. 과연 올 한해 이러한 원두커피자판기 분야의 도전들이 새로운 커피자판기 시장발전의 전환점을 제시 할 수 있을까? 금호 기획특집란에서는 활발하게 시도되고 있는 각업체들의 사업동향과 출사표를 들어보고 시장발전 가능성을 진단하는 시간을 마련했다.

  • PDF

Research on the OASIS, a Web Archive in Korea (웹 아카이브 OASIS에 관한 고찰)

  • Yoon, Cheong-Ok
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.2
    • /
    • pp.5-27
    • /
    • 2010
  • The purpose of this research is to examine the characteristics and problems of OASIS, a web archive, developed and operated by the National Library of Korea, and then to propose how to improve the quality of its contents. An analysis of the contents in 'Literature' and 'Social Sciences' in the OASIS website shows problems, including low quality of some contents, lack of balance in subjects and creators/publishers, duplicate collections of web documents and web sites, lack of currency and evidence of unique or scholarly value of collected web resources, etc. Suggested are the use of 'real-name' of subject specialists to ensure their responsibilities in selection process and the addition of metadata elements to evidence the appropriate application of selection criteria.

Performance Comparison of Statistics-Based Machine Learning Model for Classification of Technical Documents (기술문서 분류를 위한 통계기반 기계학습 모델 성능비교 및 한계 연구)

  • Kim, Jin-gu;Yu, Heonchang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.393-396
    • /
    • 2022
  • 본 연구는 국방과학기술 분야의 특허 및 논문 실적을 이용하여 통계기반 기계학습 모델 4 종을 학습하고, 실제 분석 대상기관의 데이터 입력결과를 분석하여 실용성에 대한 한계점 분석을 목적으로 한다. 기존 연구에서는 특허분류코드를 기준으로 분류하여 특수 목적으로 활용하거나 세부 연구 범위 내 연구 주제탐색 및 특징연구 등 미시적인 관점에서의 상세연구 활용 목적인 반면, 본 연구는 거시적인 관점에서 연구의 전체적인 흐름과 경향성 파악을 목적으로 한다. 이에 ICT 기술 138 종의 특허 및 논문 30,965 건과 국방과학기술 192 종의 특허 및 논문 23,406 건을 학습데이터로 각 모델을 학습하였다. 비교한 통계기반 학습모델은 Support Vector Machines, Decision Tree, Naive Bayes, XGBoost 모델이다. 학습데이터에 대한 학습검증 단계에서는 최대 99.4%의 성능을 보였다. 다만, 실제 분석대상기관의 특허 및 논문 12,824 건으로 입력분석한 결과, 모델별 편향성 문제, 데이터 전처리 이슈, 다중클래스 및 다중레이블 문제를 확인, 도출한 문제에 대한 해결방안을 제시하고 추가 연구의 방향성을 제시한다.

Parametric and Non Parametric Measures for Text Similarity (텍스트 유사성을 위한 파라미터 및 비 파라미터 측정)

  • Mlyahilu, John;Kim, Jong-Nam
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.20 no.4
    • /
    • pp.193-198
    • /
    • 2019
  • The wide spread of genuine and fake information on internet has lead to various studies on text analysis. Copying and pasting others' work without acknowledgement, research results manipulation without proof has been trending for a while in the era of data science. Various tools have been developed to reduce, combat and possibly eradicate plagiarism in various research fields. Text similarity measurements can be manually done by using both parametric and non parametric methods of which this study implements cosine similarity and Pearson correlation as parametric while Spearman correlation as non parametric. Cosine similarity and Pearson correlation metrics have achieved highest coefficients of similarity while Spearman shown low similarity coefficients. We recommend the use of non parametric methods in measuring text similarity due to their non normality assumption as opposed to the parametric methods which relies on normality assumptions and biasness.

Identification of Emerging Research at the national level: Scientometric Approach using Scopus (국가적 차원의 유망연구영역 탐색: Scopus 데이터베이스를 이용한 과학계량학적 접근)

  • Yeo, Woon-Dong;Sohn, Eun-Soo;Jung, Eui-Seob;Lee, Chang-Hoan
    • Journal of Information Management
    • /
    • v.39 no.3
    • /
    • pp.95-113
    • /
    • 2008
  • In todays environment in which scientific technologies are changing very fast than ever, companies have to monitor and search emerging technologies to gain competitiveness. Actually many nations try to do that. Most of them use Dephi approach based on experts review as a searching method. But experts review has been criticised for probability of inclination and its derivative problems in the sense that it is accomplished only by expert's subjectivity. To overcome such problems, we used Scientometric Method for identifying emerging technology that had been done by Delphi as a rule. We made three particular efforts in order to improve the Quality of the result. Firstly, we selected one alternative database between SCI and Scopus hoping to see evenly-distributing results in wide fields on the front burner. Secondly we used Fractional citation counting in counting citation number in the stage of linear regression analysis. Lastly, we verified Scientometric result with experts opinions to minimize probable errors in a Scientometric research. As a result, we derived 290 emerging technologies from Scientometric analysis with Scopus Database, and visualized them on 2-dimension map with data mining system named KnowledgeMatrix which was developed by KISTI.

Analysis on elements of policy changes in character industry (캐릭터산업의 정책변인연구)

  • Han, Chang-Wan
    • Cartoon and Animation Studies
    • /
    • s.33
    • /
    • pp.597-616
    • /
    • 2013
  • Character industry is not only knowledge-based industry chiefly with copyrights but also motive power for creative economy to take a role functionally over the fields of industries because it has industrial characteristic as complement product to promote sale value in manufacturing industry and service industry and increase profit on sales. Since 2003, the national policy related to character has aimed to maximize effect among connected industries, extend its business abroad, enforce copyrights through the improvement of marketing system, develop industrial infrastructure through raising quality of character products. With the result of this policy, the successful cases of connected contents have been crystallized and domestic character industry has stepped up methodically since 2007. It is needed to reset the scales of character industry and industrial stats because there are more know-how of self industry promotion and more related characters through strategy of market departmentalization starting with cartoon, animation, games, novels, movies and musicals. Especially, The Korea government set our target for 'Global Top Five Character Power' since 2009 and has started to carry out to find global star characters, support to establish network among connected industries, diversify promotion channels, and develop licensing business. Particularly, since 2013, There have been prospered the indoor character theme park with time management just like character experimental marketing or Kids cafes using characters, the demand market of digital character focusing on SNS emoticon, and the performance market for character musical consistently. Moreover, The domestic and foreign illegal black markets on off-line have been enlarged, so we need another policy alternative. To prepare for the era of exploding character demand market and diversifying platform, it is needed to set up a solid strategy that is required the elements of policy changes in character industry to vitalize character industry and support new character design and connected contents. the following shows that the elements of policy changes related to the existing policy, the current position of market. Nowadays, the elements of policy changes in domestic character industry are that variety of consumers in the digital character market according to platform diversification, Convergence contents of character goods for the Korean waves, legalization of the illegal black contents market, and controling the tendency of consumers in departmentalized market. This can help find the policy issue entirely deferent with the existing character powers like US, Japan or Europe. In its final analysis, the alternatives are the promotion of models with contract copyrights of domestic and foreign connected contents, the diversification of profit models of platform economy, the additive development of target market related to enlarging the Korean waves, and the strategy of character market for the age-specific tendency according to developing character demand market.