• Title/Summary/Keyword: 데이터편향

Search Result 166, Processing Time 0.036 seconds

Punitiveness Toward Defendants Accused of Same-Race Crimes Revisited: Replication in a Different Culture (동인종 범죄로 기소된 피고인에 대한 엄벌주의적 판단의 재고찰: 다른 문화에서의 적용)

  • Lee, Jungwon;Khogali, Mawia;Despodova, Nikoleta M.;Penrod, Steven D.
    • Korean Journal of Forensic Psychology
    • /
    • v.11 no.1
    • /
    • pp.37-61
    • /
    • 2020
  • Lee, Khogali, Despodova, and Penrod (2019) demonstrated that American participants whose races are different from a defendant and a victim rendered more punitive judgments against the defendant in a same-race crime (e.g., White observer-Black defendant-Black victim) compared to a cross-race crime (e.g., White observer-Black defendant-Hispanic victim). The aim of the current study was to test the replicability of their findings in a different country-South Korea. Study 1a failed to replicate the race-combination effect in South Korea with three new moderators-case strength, defendant's use of violence, and race salience. Study 1b was conducted with the same design of Study 1a in the United States to examine whether the failure of the replication in Study 1a was due to cultural differences between South Korea and the United States. However, Study 1b also failed to replicate the race-combination effect. Study 2 conducted a meta-analytic review of the data from Lee et al.'s (2019) study, along with the data from Study 1a and 1b and revealed that the race-salience manipulation in Study 1a and 1b might have caused the null results. We conclude that when people' races are different from both a defendant and a victim, they are likely to render more punitive judgments against the defendant in a same-race crime than a cross-race crime. However, the race-combination effect is only sustained when race-relevant issues are not salient in the crime.

  • PDF

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

A Study on the Medical Application and Personal Information Protection of Generative AI (생성형 AI의 의료적 활용과 개인정보보호)

  • Lee, Sookyoung
    • The Korean Society of Law and Medicine
    • /
    • v.24 no.4
    • /
    • pp.67-101
    • /
    • 2023
  • The utilization of generative AI in the medical field is also being rapidly researched. Access to vast data sets reduces the time and energy spent in selecting information. However, as the effort put into content creation decreases, there is a greater likelihood of associated issues arising. For example, with generative AI, users must discern the accuracy of results themselves, as these AIs learn from data within a set period and generate outcomes. While the answers may appear plausible, their sources are often unclear, making it challenging to determine their veracity. Additionally, the possibility of presenting results from a biased or distorted perspective cannot be discounted at present on ethical grounds. Despite these concerns, the field of generative AI is continually advancing, with an increasing number of users leveraging it in various sectors, including biomedical and life sciences. This raises important legal considerations regarding who bears responsibility and to what extent for any damages caused by these high-performance AI algorithms. A general overview of issues with generative AI includes those discussed above, but another perspective arises from its fundamental nature as a large-scale language model ('LLM') AI. There is a civil law concern regarding "the memorization of training data within artificial neural networks and its subsequent reproduction". Medical data, by nature, often reflects personal characteristics of patients, potentially leading to issues such as the regeneration of personal information. The extensive application of generative AI in scenarios beyond traditional AI brings forth the possibility of legal challenges that cannot be ignored. Upon examining the technical characteristics of generative AI and focusing on legal issues, especially concerning the protection of personal information, it's evident that current laws regarding personal information protection, particularly in the context of health and medical data utilization, are inadequate. These laws provide processes for anonymizing and de-identification, specific personal information but fall short when generative AI is applied as software in medical devices. To address the functionalities of generative AI in clinical software, a reevaluation and adjustment of existing laws for the protection of personal information are imperative.

A Study on the Estimation of the V2 X-Rate Ratio for the Collection of Highway Traffic Information (고속도로 교통정보 수집을 위한 V2X 차량비율 추정연구)

  • Na, Sungyong;Lee, Seungjae;Ahn, Sanghyun;Kim, Jooyoung
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.17 no.1
    • /
    • pp.71-78
    • /
    • 2018
  • Transportation is gradually changing into the era of V2X and autonomous cars. Accurate judgement of traffic conditions is an important indicator of route choice or autonomous driving. There are many ways to use probes car such as taxis, as a way to identify accurate traffic conditions. These methods may vary depending on the characteristics of the probe vehicle, and there is a problem with the cost. The V2X vehicle can solve these problems and collect traffic information in real time. If all vehicles are of V2X vehicle, these issues are expected to be resolved briefly. However, if the communication information of a V2X vehicle is represented by a traffic representative in a traffic with only V2X, the traffic information of some V2X vehicles will be able to collect traffic information. To accomplish this, a virtual network and transport were created and various scenarios were performed through SUMO simulations. It has been analyzed that 3-5 % of V2 vehicles are capable of representative the road traffic characteristics. In the future, various follow-up studies are planned.

Similarity between the dispersion parameter in zero-altered model and the two goodness-of-fit statistics (영 변환 모형 산포형태모수와 두 적합도 검정통계량 사이의 유사성 비교)

  • Yun, Yujeong;Kim, Honggie
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.493-504
    • /
    • 2017
  • We often observe count data that exhibit over-dispersion, originating from too many zeros, and under-dispersion, originating from too few zeros. To handle this types of problems, the zero-altered distribution model is designed by Ghosh and Kim in 2007. Their model can control both over-dispersion and under-dispersion with a single parameter, which had been impossible ever. The dispersion type depends on the sign of the parameter ${\delta}$ in zero-altered distribution. In this study, we demonstrate the role of the dispersion type parameter ${\delta}$ through the data of the number of births in Korea. Employing both the chi-square statistic and the Kolmogorov statistic for goodness-of-fit, we also explained any difference between the theoretical distribution and the observed one that exhibits either over-dispersion or under-dispersion. Finally this study shows whether the test statistics for goodness-of-fit show any similarity with the role of the dispersion type parameter ${\delta}$ or not.

Unicorn Startups' Investment Duration, Government Policy, Foreign Investors, and Exit Valuation (유니콘 기업들의 투자 유치 지속 기간, 정부 정책, 해외 투자자가 Exit 가치평가에 미치는 영향에 대한 연구)

  • Lee, Minsun;Nam, Dae-il
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.15 no.5
    • /
    • pp.1-11
    • /
    • 2020
  • Increasing the number of unicorn startups has recently received much attention. In this study, we attempt to investigate that startups achieving an extremely high valuation could postpone their exit to raise more investment and receive more benefits. This study tested the hypotheses using data from Crunchbase, World Bank, Global Competitiveness Report, and Global Entrepreneurship Monitor. Using 140 unicorn startups that have already exited through an initial public offering (IPO) or mergers and acquisitions (M&A), we find out that unicorn startups tend to acquire higher valuation as their investment duration increases. Furthermore, we also examined the moderating effects of governmental policy and institutional distance from foreign investors in order to consider the institutional aspects of startups. The results of the moderating variables show significant supports. We expect to provide a better understanding with respect to making an exit decision of unicorn startups. Furthermore, managers and investors need to acknowledge the institutional factors of startups when they decide to fund.

Strengths of Lap Splices Anchored by SD600 Headed Bars (겹침이음 실험을 통한 SD600 확대머리철근의 정착강도 평가)

  • Chun, Sung-Chul;Lee, Jin-Gon
    • Journal of the Korea Concrete Institute
    • /
    • v.25 no.2
    • /
    • pp.217-224
    • /
    • 2013
  • Design provisions for the development length of headed bars in ACI 318-08 include concrete compressive strength and yield strength of headed bars as design parameters but do not consider the effects of transvers reinforcement. In addition, they have very strict limitation for clear spacing and material strengths because these provisions were developed based on limited tests. In this study, splice tests using SD600 headed bars with $2d_b$ clear spacing and transverse reinforcement were conducted. Test results show that unconfined specimens failed due to prying action and bottom cover concrete prematurely spalled. The contribution of head bearing on the anchorage strength is only 15% on average implying that unconfined specimens failed before the head bearing was not sufficiently developed. Confined specimens with stirrups placed along whole splice length have enhanced strengths in bearing as well as bond because the stirrups prevented prying action and improved bond capacity. Bond failure occurred in locally confined specimens where stirrups were placed only at the ends of splice length. The stirrups at ends of splice lengths can prevent prying action but the bond capacity did not increase. From regression analysis of test results, an equation to predict anchorage strength of headed bars was developed. The proposed equation consists of bond and bearing contributions and includes transverse reinforcement index. The average ratio of tests to predictions is 1.0 with coefficient of variation of 6%.

Efficient Collaboration Method Between CPU and GPU for Generating All Possible Cases in Combination (조합에서 모든 경우의 수를 만들기 위한 CPU와 GPU의 효율적 협업 방법)

  • Son, Ki-Bong;Son, Min-Young;Kim, Young-Hak
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.9
    • /
    • pp.219-226
    • /
    • 2018
  • One of the systematic ways to generate the number of all cases is a combination to construct a combination tree, and its time complexity is O($2^n$). A combination tree is used for various purposes such as the graph homogeneity problem, the initial model for calculating frequent item sets, and so on. However, algorithms that must search the number of all cases of a combination are difficult to use realistically due to high time complexity. Nevertheless, as the amount of data becomes large and various studies are being carried out to utilize the data, the number of cases of searching all cases is increasing. Recently, as the GPU environment becomes popular and can be easily accessed, various attempts have been made to reduce time by parallelizing algorithms having high time complexity in a serial environment. Because the method of generating the number of all cases in combination is sequential and the size of sub-task is biased, it is not suitable for parallel implementation. The efficiency of parallel algorithms can be maximized when all threads have tasks with similar size. In this paper, we propose a method to efficiently collaborate between CPU and GPU to parallelize the problem of finding the number of all cases. In order to evaluate the performance of the proposed algorithm, we analyze the time complexity in the theoretical aspect, and compare the experimental time of the proposed algorithm with other algorithms in CPU and GPU environment. Experimental results show that the proposed CPU and GPU collaboration algorithm maintains a balance between the execution time of the CPU and GPU compared to the previous algorithms, and the execution time is improved remarkable as the number of elements increases.

Using Text-mining Method to Identify Research Trends of Freshwater Exotic Species in Korea (텍스트마이닝 (text-mining) 기법을 이용한 국내 담수외래종 연구동향 파악)

  • Do, Yuno;Ko, Eui-Jeong;Kim, Young-Min;Kim, Hyo-Gyeom;Joo, Gea-Jae;Kim, Ji Yoon;Kim, Hyun-Woo
    • Korean Journal of Ecology and Environment
    • /
    • v.48 no.3
    • /
    • pp.195-202
    • /
    • 2015
  • We identified research trends for freshwater exotic species in South Korea using text mining methods in conjunction with bibliometric analysis. We searched scientific and common names of freshwater exotic species as searching keywords including 1 mammal species, 3 amphibian-reptile species, 11 fish species, 2 aquatic plant species. A total of 245 articles including research articles and abstracts of conference proceedings published by 56 academic societies and institutes were collected from scientific article databases. The search keywords used were the common names for the exotic species. The $20^{th}$ century (1900's) saw the number of articles increase; however, during the early $21^{st}$ century (2000's) the number of published articles decreased slowly. The number of articles focusing on physiological and embryological research was significantly greater than taxonomic and ecological studies. Rainbow trout and Nile tilapia were the main research topic, specifically physiological and embryological research associated with the aquaculture of these species. Ecological studies were only conducted on the distribution and effect of large-mouth bass and nutria. The ecological risk associated with freshwater exotic species has been expressed yet the scientific information might be insufficient to remove doubt about ecological issues as expressed by interested by individuals and policy makers due to bias in research topics with respect to freshwater exotic species. The research topics of freshwater exotic species would have to diversify to effectively manage freshwater exotic species.

A Comparative Study on Methods for Outlier Test of Rainfall in Korea (국내 강우의 이상치검정 방법의 비교 연구)

  • Lee, Jung Sik;Shin, Chang Dong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.359-359
    • /
    • 2018
  • 이상치는 표본자료에서 크게 어긋나 다른 자료들로부터 떨어져 표시되는 자료로써, 실제로 발생할 확률이 매우 낮은 자료로 정의되고 있다. 설계홍수량을 산정하기 위하여 적용하고 있는 극치계열의 연최대치 강우자료에는 기계오작동 및 엔지니어의 표독오류가 발생하고 있으며, 기후변화에 따른 거대태풍 및 국지적인 집중호우 발생 등으로 인한 극치값 등에서 이상치가 관측되고 있다. 통상 이상치들은 통계분석시 자료 본연의 특성을 왜곡시켜 편향된 결과를 산정할 수 있으므로 빈도해석시 이상치해석 절차를 수행하여 자료의 적정성을 확인하여야 한다. 현재 실무에서는 설계홍수량 산정요령과 하천설계기준 해설 등에서 관련 내용을 기술하고 있지만, 국내 강우자료의 기록연수의 부족으로 인하여 빈도해석시 이상치 해석이 미수행되고 있어 이상치에 따른 자료편의가 발생하면 결과물인 확률강우량이 왜곡되게 산정될 수 있다. 따라서, 본 연구에서는 국내 주요 도시의 강우자료를 이용하여 이상치검정을 수행하였다. 대상지점으로는 서울, 부산, 대전, 대구, 인천, 광주, 울산 등의 비교적 긴 관측년수를 보유하고 있는 광역시를 선정하였으며, 지속기간은 10분, 1~24시간의 25개 강우자료를 적용하였다. 이상치검정 방법으로는 타 방법에 비하여 이상치 검정력이 뛰어난 것으로 알려진 2가지 방법을 채택하였으며, 표본자료의 평균과 표준편차로 표준화된 z값을 이용하여 상 하 한계선를 초과하는 값을 확인하는 z-Score 방법중 향상된 중위수 절대편차(MAD)에 의한 수정 z-Score 방법(Hoaglin, 1993)과 Box-Plot 방법(Tukey, 1969)을 적용하였다. Box-Plot 방법(Tukey, 1969)은 전체 자료를 25%씩 사분위로 구분하는 방법으로 정렬된 자료계열을 중앙값, 박스, 수염(whiskers), 이상치로 구분한다. 정렬된 25~75% 값들을 박스로 포함하여 외곽의 수염값들을 이상치로 분류하며, 특히 사분위수의 도식화로 데이터의 분포를 파악하기 좋으며, 이상치들의 위치와 자료의 비대칭 여부를 쉽게 파악할 수 있다. 본 연구의 수행으로 수정 z-Score 방법의 경우에는 서울과 대구지점에는 이상치가 없으며, 부산지점에는 13개, 대전지점 7개, 인천지점 5개, 광주지점 32개, 울산지점 26개가 나타났다. Box-Plot 방법으로는 서울지점 35개, 부산지점 39개, 대전지점 32개, 대구지점 38개, 인천지점 51개, 광주지점 61개, 울산지점 65개의 이상치가 분석되었다. 연구를 수행한 결과, 수정 z-Score 방법에 비하여 Box-Plot 방법에 의한 이상치가 더 많이 발생하였으며, 각각의 방법으로 지속기간 및 연도별 이상치 발생자료를 확인하였다. 방법별 이상치 발생현황 등을 분석하여 지점별 발생횟수를 분석하였으며, 추후 지점 및 자료의 보완이 수행되면 활용성을 증대시킬 수 있을 것으로 판단된다.

  • PDF