• Title/Summary/Keyword: 랜덤추출

Search Result 190, Processing Time 0.023 seconds

A Study on the Classification of Unstructured Data through Morpheme Analysis

  • Kim, SungJin;Choi, NakJin;Lee, JunDong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.105-112
    • /
    • 2021
  • In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP's data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

Performance Improvement of Power Attacks with Truncated Differential Cryptanalysis (부정차분을 이용한 전력분석 공격의 효율 향상*)

  • Kang, Tae-Sun;Kim, Hee-Seok;Kim, Tae-Hyun;Kim, Jong-Sung;Hong, Seok-Hie
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.19 no.1
    • /
    • pp.43-51
    • /
    • 2009
  • In 1998, Kocher et al. introduced Differential Power Attack on block ciphers. This attack allows to extract secret key used in cryptographic primitives even if these are executed inside tamper-resistant devices such as smart card. At FSE 2003 and 2004, Akkar and Goubin presented several masking methods, randomizing the first few and last few($3{\sim}4$) rounds of the cipher with independent random masks at each round and thereby disabling power attacks on subsequent inner rounds, to protect iterated block ciphers such as DES against Differential Power Attack. Since then, Handschuh and Preneel have shown how to attack Akkar's masking method using Differential Cryptanalysis. This paper presents how to combine Truncated Differential Cryptanalysis and Power Attack to extract the secret key from intermediate unmasked values and shows how much more efficient our attacks are implemented than the Handschuh-Preneel method in term of reducing the number of required plaintexts, even if some errors of Hamming weights occur when they are measured.

Improving Efficiency of Food Hygiene Surveillance System by Using Machine Learning-Based Approaches (기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구)

  • Cho, Sanggoo;Cho, Seung Yong
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.53-67
    • /
    • 2020
  • This study employees a supervised learning prediction model to detect nonconformity in advance of processed food manufacturing and processing businesses. The study was conducted according to the standard procedure of machine learning, such as definition of objective function, data preprocessing and feature engineering and model selection and evaluation. The dependent variable was set as the number of supervised inspection detections over the past five years from 2014 to 2018, and the objective function was to maximize the probability of detecting the nonconforming companies. The data was preprocessed by reflecting not only basic attributes such as revenues, operating duration, number of employees, but also the inspections track records and extraneous climate data. After applying the feature variable extraction method, the machine learning algorithm was applied to the data by deriving the company's risk, item risk, environmental risk, and past violation history as feature variables that affect the determination of nonconformity. The f1-score of the decision tree, one of ensemble models, was much higher than those of other models. Based on the results of this study, it is expected that the official food control for food safety management will be enhanced and geared into the data-evidence based management as well as scientific administrative system.

A Study on the Effect of Virtual Reality Intervention on Cognitive Function in Individuals With Stroke Through Meta-analysis (메타분석을 통한 뇌졸중 환자의 인지기능에 대한 가상현실 중재 효과 연구)

  • Kwon, Jae Sung
    • Therapeutic Science for Rehabilitation
    • /
    • v.10 no.3
    • /
    • pp.7-22
    • /
    • 2021
  • Objective : The purpose of this study was to verify the effect of virtual reality interventions (VRIs) on cognitive function in individuals with stroke through a systematic literature review and meta-analysis. Methods : We reviewed randomized controlled trials (RCTs) the last 10 years using academic databases. PubMed, MEDLINE, and CINAHL were used for international studies, and DBpia, KISS, Kyoboscholar, and e-article were used for Korean studies. For the quantitative meta-analysis, subgroups of outcomes were classified into general cognitive function (G-CF), attention and memory (A&M), and executive function (EF). Results : Nine RCTs were analyzed. The total number of participants was 271 (140 in the experimental group). The effect size (Cohen's d) was estimated using a random effects model. The effect sizes of the outcome subgroups of were as follows: small to medium for G-CF (d=0.422; 95% CI: 0.101~0.742; p=0.010), small for A&M (d=0.249; 95% CI: -0.107~0.605; p=0.170), and medium for EF (d=0.666; 95% CI: 0.136~1.195; p=0.014). Conclusion : Considering the various stimuli provided by the virtual environment and the results from available research, virtual reality should be applied to interventions for integrated cognitive functions. In addition, it would be appropriate to be used as an additional intervention to traditional cognitive rehabilitation for stroke.

Predicting Functional Outcomes of Patients With Stroke Using Machine Learning: A Systematic Review (머신러닝을 활용한 뇌졸중 환자의 기능적 결과 예측: 체계적 고찰)

  • Bae, Suyeong;Lee, Mi Jung;Nam, Sanghun;Hong, Ickpyo
    • Therapeutic Science for Rehabilitation
    • /
    • v.11 no.4
    • /
    • pp.23-39
    • /
    • 2022
  • Objective : To summarize clinical and demographic variables and machine learning uses for predicting functional outcomes of patients with stroke. Methods : We searched PubMed, CINAHL and Web of Science to identify published articles from 2010 to 2021. The search terms were "machine learning OR data mining AND stroke AND function OR prediction OR/AND rehabilitation". Articles exclusively using brain imaging techniques, deep learning method and articles without available full text were excluded in this study. Results : Nine articles were selected for this study. Support vector machines (19.05%) and random forests (19.05%) were two most frequently used machine learning models. Five articles (55.56%) demonstrated that the impact of patient initial and/or discharge assessment scores such as modified ranking scale (mRS) or functional independence measure (FIM) on stroke patients' functional outcomes was higher than their clinical characteristics. Conclusions : This study showed that patient initial and/or discharge assessment scores such as mRS or FIM could influence their functional outcomes more than their clinical characteristics. Evaluating and reviewing initial and or discharge functional outcomes of patients with stroke might be required to develop the optimal therapeutic interventions to enhance functional outcomes of patients with stroke.

An Iterative Digital Image Watermarking Technique using Encrypted Binary Phase Computer Generated Hologram in the DCT Domain (DCT 영역에서 암호화된 이진 위상 컴퓨터형성 홀로그램을 이용한 반복적 디지털 영상 워터마킹 기술)

  • Kim, Cheol-Su
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.14 no.3
    • /
    • pp.15-21
    • /
    • 2009
  • In this paper, we proposed an iterative digital image watermarking technique using encrypted binary phase computer generated hologram in the discrete cosine transform(OCT) domain. For the embedding process of watermark, using simulated annealing algorithm, we would generate a binary phase computer generated hologram(BPCGH) which can reconstruct hidden image perfectly instead of hidden image and repeat the hologram and encrypt it through the XOR operation with key image that is ramdomly generated binary phase components. We multiply the encrypted watermark by the weight function and embed it into the DC coefficients in the DCT domain of host image and an inverse DCT is performed. For the extracting process of watermark, we compare the DC coefficients of watermarked image and original host image in the DCT domain and dividing it by the weight function and decrypt it using XOR operation with key image. And we recover the hidden image by inverse Fourier transforming the decrypted watermark. Finally, we compute the correlation between the original hidden image and recovered hidden image to determine if a watermark exits in the host image. The proposed watermarking technique use the hologram information of hidden image which consist of binary values and encryption technique so it is very secure and robust to the external attacks such as compression, noises and cropping. We confirmed the advantages of the proposed watermarking technique through the computer simulations.

Study on Predicting the Designation of Administrative Issue in the KOSDAQ Market Based on Machine Learning Based on Financial Data (머신러닝 기반 KOSDAQ 시장의 관리종목 지정 예측 연구: 재무적 데이터를 중심으로)

  • Yoon, Yanghyun;Kim, Taekyung;Kim, Suyeong
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.1
    • /
    • pp.229-249
    • /
    • 2022
  • This paper investigates machine learning models for predicting the designation of administrative issues in the KOSDAQ market through various techniques. When a company in the Korean stock market is designated as administrative issue, the market recognizes the event itself as negative information, causing losses to the company and investors. The purpose of this study is to evaluate alternative methods for developing a artificial intelligence service to examine a possibility to the designation of administrative issues early through the financial ratio of companies and to help investors manage portfolio risks. In this study, the independent variables used 21 financial ratios representing profitability, stability, activity, and growth. From 2011 to 2020, when K-IFRS was applied, financial data of companies in administrative issues and non-administrative issues stocks are sampled. Logistic regression analysis, decision tree, support vector machine, random forest, and LightGBM are used to predict the designation of administrative issues. According to the results of analysis, LightGBM with 82.73% classification accuracy is the best prediction model, and the prediction model with the lowest classification accuracy is a decision tree with 71.94% accuracy. As a result of checking the top three variables of the importance of variables in the decision tree-based learning model, the financial variables common in each model are ROE(Net profit) and Capital stock turnover ratio, which are relatively important variables in designating administrative issues. In general, it is confirmed that the learning model using the ensemble had higher predictive performance than the single learning model.

The Validity Test of Statistical Matching Simulation Using the Data of Korea Venture Firms and Korea Innovation Survey (벤처기업정밀실태조사와 한국기업혁신조사 데이터를 활용한 통계적 매칭의 타당성 검증)

  • An, Kyungmin;Lee, Young-Chan
    • Knowledge Management Research
    • /
    • v.24 no.1
    • /
    • pp.245-271
    • /
    • 2023
  • The change to the data economy requires a new analysis beyond ordinary research in the management field. Data matching refers to a technique or processing method that combines data sets collected from different samples with the same population. In this study, statistical matching was performed using random hotdeck and Mahalanobis distance functions using 2020 Survey of Korea Venture Firms and 2020 Korea Innovation Survey datas. Among the variables used for statistical matching simulation, the industry and the number of workers were set to be completely consistent, and region, business power, listed market, and sales were set as common variables. Simulation verification was confirmed by mean test and kernel density. As a result of the analysis, it was confirmed that statistical matching was appropriate because there was a difference in the average test, but a similar pattern was shown in the kernel density. This result attempted to expand the spectrum of the research method by experimenting with a data matching research methodology that has not been sufficiently attempted in the management field, and suggests implications in terms of data utilization and diversity.

Preliminary Inspection Prediction Model to select the on-Site Inspected Foreign Food Facility using Multiple Correspondence Analysis (차원축소를 활용한 해외제조업체 대상 사전점검 예측 모형에 관한 연구)

  • Hae Jin Park;Jae Suk Choi;Sang Goo Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.121-142
    • /
    • 2023
  • As the number and weight of imported food are steadily increasing, safety management of imported food to prevent food safety accidents is becoming more important. The Ministry of Food and Drug Safety conducts on-site inspections of foreign food facilities before customs clearance as well as import inspection at the customs clearance stage. However, a data-based safety management plan for imported food is needed due to time, cost, and limited resources. In this study, we tried to increase the efficiency of the on-site inspection by preparing a machine learning prediction model that pre-selects the companies that are expected to fail before the on-site inspection. Basic information of 303,272 foreign food facilities and processing businesses collected in the Integrated Food Safety Information Network and 1,689 cases of on-site inspection information data collected from 2019 to April 2022 were collected. After preprocessing the data of foreign food facilities, only the data subject to on-site inspection were extracted using the foreign food facility_code. As a result, it consisted of a total of 1,689 data and 103 variables. For 103 variables, variables that were '0' were removed based on the Theil-U index, and after reducing by applying Multiple Correspondence Analysis, 49 characteristic variables were finally derived. We build eight different models and perform hyperparameter tuning through 5-fold cross validation. Then, the performance of the generated models are evaluated. The research purpose of selecting companies subject to on-site inspection is to maximize the recall, which is the probability of judging nonconforming companies as nonconforming. As a result of applying various algorithms of machine learning, the Random Forest model with the highest Recall_macro, AUROC, Average PR, F1-score, and Balanced Accuracy was evaluated as the best model. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the selection reason for nonconforming facilities of individual instances, and discuss applicability to the on-site inspection facility selection system. Based on the results of this study, it is expected that it will contribute to the efficient operation of limited resources such as manpower and budget by establishing an imported food management system through a data-based scientific risk management model.

Evaluation of Applicability of Sea Ice Monitoring Using Random Forest Model Based on GOCI-II Images: A Study of Liaodong Bay 2021-2022 (GOCI-II 영상 기반 Random Forest 모델을 이용한 해빙 모니터링 적용 가능성 평가: 2021-2022년 랴오둥만을 대상으로)

  • Jinyeong Kim;Soyeong Jang;Jaeyeop Kwon;Tae-Ho Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_2
    • /
    • pp.1651-1669
    • /
    • 2023
  • Sea ice currently covers approximately 7% of the world's ocean area, primarily concentrated in polar and high-altitude regions, subject to seasonal and annual variations. It is very important to analyze the area and type classification of sea ice through time series monitoring because sea ice is formed in various types on a large spatial scale, and oil and gas exploration and other marine activities are rapidly increasing. Currently, research on the type and area of sea ice is being conducted based on high-resolution satellite images and field measurement data, but there is a limit to sea ice monitoring by acquiring field measurement data. High-resolution optical satellite images can visually detect and identify types of sea ice in a wide range and can compensate for gaps in sea ice monitoring using Geostationary Ocean Color Imager-II (GOCI-II), an ocean satellite with short time resolution. This study tried to find out the possibility of utilizing sea ice monitoring by training a rule-based machine learning model based on learning data produced using high-resolution optical satellite images and performing detection on GOCI-II images. Learning materials were extracted from Liaodong Bay in the Bohai Sea from 2021 to 2022, and a Random Forest (RF) model using GOCI-II was constructed to compare qualitative and quantitative with sea ice areas obtained from existing normalized difference snow index (NDSI) based and high-resolution satellite images. Unlike NDSI index-based results, which underestimated the sea ice area, this study detected relatively detailed sea ice areas and confirmed that sea ice can be classified by type, enabling sea ice monitoring. If the accuracy of the detection model is improved through the construction of continuous learning materials and influencing factors on sea ice formation in the future, it is expected that it can be used in the field of sea ice monitoring in high-altitude ocean areas.