Search | Korea Science

Machine Learning Based Automatic Categorization Model for Text Lines in Invoice Documents

Shin, Hyun-Kyung
- Journal of Korea Multimedia Society
- /
- v.13 no.12
- /
- pp.1786-1797
- /
- 2010
Automatic understanding of contents in document image is a very hard problem due to involvement with mathematically challenging problems originated mainly from the over-determined system induced by document segmentation process. In both academic and industrial areas, there have been incessant and various efforts to improve core parts of content retrieval technologies by the means of separating out segmentation related issues using semi-structured document, e.g., invoice,. In this paper we proposed classification models for text lines on invoice document in which text lines were clustered into the five categories in accordance with their contents: purchase order header, invoice header, summary header, surcharge header, purchase items. Our investigation was concentrated on the performance of machine learning based models in aspect of linear-discriminant-analysis (LDA) and non-LDA (logic based). In the group of LDA, na$\"{\i}$ve baysian, k-nearest neighbor, and SVM were used, in the group of non LDA, decision tree, random forest, and boost were used. We described the details of feature vector construction and the selection processes of the model and the parameter including training and validation. We also presented the experimental results of comparison on training/classification error levels for the models employed.
PDF KSCI

The Related Factors to Perceived gastritis or Perceived enteritis in High school seniors -the 2009 Korea Youth Risk Behavior Web-based Survey- (고등학교 3학년 학생들이 인지한 위염 및 장염 관련요인 -2009년 청소년 건강행태 온라인 조사 자료를 중심으로-)

Bea, Sang-Sook
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.13 no.2
- /
- pp.668-677
- /
- 2012
This study analyzed the related factors affecting to perceived gastritis or perceived enteritis for 11,753 Korean high school seniors who participated in the 2009 Korea Youth Risk Behavior Web-based Survey (KYHRBWS). Of the subjects, 5,685 (47.6%)were male and 6,068(52.4%) were female and 8.7% of the students responded that they had suffered from gastritis or enteritis for a long time and the females had a slightly higher attack rate of gastritis or enteritis. Survey logistic regression models and decision tree analysis were used to calculate odd ratios and 95% confidence intervals. As a result, there was affecting to their stress and health behaviors in the risk of gastritis and enteritis, and that their lower level perceived health, smoking, heavy drinking or starting drinking before they were 13 years old and a higher level of perceived stress significantly affected the risk of gastritis or enteritis in the subjects(p<.001).
https://doi.org/10.5762/KAIS.2012.13.2.668 인용 PDF KSCI

A study on integrating and discovery of semantic based knowledge model (의미 기반의 지식모델 통합과 탐색에 관한 연구)

Chun, Seung-Su
- Journal of Internet Computing and Services
- /
- v.15 no.6
- /
- pp.99-106
- /
- 2014
Generation and analysis methods have been proposed in recent years, such as using a natural language and formal language processing, artificial intelligence algorithms based knowledge model is effective meaning. its semantic based knowledge model has been used effective decision making tree and problem solving about specific context. and it was based on static generation and regression analysis, trend analysis with behavioral model, simulation support for macroeconomic forecasting mode on especially in a variety of complex systems and social network analysis. In this study, in this sense, integrating knowledge-based models, This paper propose a text mining derived from the inter-Topic model Integrated formal methods and Algorithms. First, a method for converting automatically knowledge map is derived from text mining keyword map and integrate it into the semantic knowledge model for this purpose. This paper propose an algorithm to derive a method of projecting a significant topic map from the map and the keyword semantically equivalent model. Integrated semantic-based knowledge model is available.
https://doi.org/10.7472/jksii.2014.15.6.99 인용 PDF KSCI

연관분석을 이용한 데이터마이닝 기법에 관한 사례연구

Ryu, Gwi-Yeol;Mun, Yeong-Su;Choi, Seung-Du
- 한국데이터정보과학회:학술대회논문집
- /
- 2006.04a
- /
- pp.109-120
- /
- 2006
Huge information has been made due to the current computing environment and could not be acceptable. People want the information which they can understand and accept easily. They may want not only simple information but also knowledge. That is why data mining becomes a center of information. We use RFM analysis in order to create customer score. Customers are classified into five groups(most oxcellenrexcellenycommoflowerilowest) for a various marketing activities. We can found the significant patterns in each group, and classify customers from loyal customers to leaving customers in the near future by the indirect data mining(e.g. association analysis) and the direct data mining(e.g. decision tree, logistic regression analysis, etc.), which are named in this study. Our research focuses on the advanced models by applying the association rules in data mining. Our results indicate that the indirect data mining and the direct data mining seem to have same outputs, but the former shows more clear pattern then the latter one.
PDF

A Study on Predictors of Academic Achievement in College Students : Focused on J University (대학생의 학업성취도 예측요인 연구 : J 대학을 중심으로)

Son, Yo-Han;Kim, In-Gyu
- The Journal of the Korea Contents Association
- /
- v.20 no.1
- /
- pp.519-529
- /
- 2020
The purpose of this study is to establish a model for predicting academic achievement of college students and to reveal the interrelationship and relative influence of each factor. For this, we surveyed the personal factors and learning strategy factors of 1,310 learners at J University, and analyzed the discriminant factors and patterns of the predictors of academic achievement through the decision tree analysis, a data mining method, and examined the relative effects of each factor. Binary logistic regression analysis was performed for viewing. As a result, the most important factor for predicting academic achievement was efficacy, and other factors such as motivation, time management, and depression were predictive of academic achievement. The patterns of factors predicting academic achievement were found to be high in efficacy and time management, and high in motivation for learning even if the efficacy was moderate. Low efficacy and learning motivation, and high depression have been shown to decrease academic achievement. Based on these results, the study suggested the efficacy and motivation to improve academic achievement of college students, strengthening time management education, and managing negative emotions.
https://doi.org/10.5392/JKCA.2020.20.01.519 인용 PDF KSCI HTML

Particulate Matter Prediction using Quantile Boosting (분위수 부스팅을 이용한 미세먼지 농도 예측)

Kwon, Jun-Hyeon;Lim, Yaeji;Oh, Hee-Seok
- The Korean Journal of Applied Statistics
- /
- v.28 no.1
- /
- pp.83-92
- /
- 2015
Concerning the national health, it is important to develop an accurate prediction method of atmospheric particulate matter (PM) because being exposed to such fine dust can trigger not only respiratory diseases as well as dermatoses, ophthalmopathies and cardiovascular diseases. The National Institute of Environmental Research (NIER) employs a decision tree to predict bad weather days with a high PM concentration. However, the decision tree method (even with the inherent unstableness) cannot be a suitable model to predict bad weather days which represent only 4% of the entire data. In this paper, while presenting the inaccuracy and inappropriateness of the method used by the NIER, we present the utility of a new prediction model which adopts boosting with quantile loss functions. We evaluate the performance of the new method over various ${\tau}$-value's and justify the proposed method through comparison.
https://doi.org/10.5351/KJAS.2015.28.1.083 인용 PDF KSCI

A Best Effort Classification Model For Sars-Cov-2 Carriers Using Random Forest

Mallick, Shrabani;Verma, Ashish Kumar;Kushwaha, Dharmender Singh
- International Journal of Computer Science & Network Security
- /
- v.21 no.1
- /
- pp.27-33
- /
- 2021
The whole world now is dealing with Coronavirus, and it has turned to be one of the most widespread and long-lived pandemics of our times. Reports reveal that the infectious disease has taken toll of the almost 80% of the world's population. Amidst a lot of research going on with regards to the prediction on growth and transmission through Symptomatic carriers of the virus, it can't be ignored that pre-symptomatic and asymptomatic carriers also play a crucial role in spreading the reach of the virus. Classification Algorithm has been widely used to classify different types of COVID-19 carriers ranging from simple feature-based classification to Convolutional Neural Networks (CNNs). This research paper aims to present a novel technique using a Random Forest Machine learning algorithm with hyper-parameter tuning to classify different types COVID-19-carriers such that these carriers can be accurately characterized and hence dealt timely to contain the spread of the virus. The main idea for selecting Random Forest is that it works on the powerful concept of "the wisdom of crowd" which produces ensemble prediction. The results are quite convincing and the model records an accuracy score of 99.72 %. The results have been compared with the same dataset being subjected to K-Nearest Neighbour, logistic regression, support vector machine (SVM), and Decision Tree algorithms where the accuracy score has been recorded as 78.58%, 70.11%, 70.385,99% respectively, thus establishing the concreteness and suitability of our approach.
https://doi.org/10.22937/IJCSNS.2021.21.1.5 인용 PDF KSCI

A Study on the Prediction of the Surface Drifter Trajectories in the Korean Strait (대한해협에서 표층 뜰개 이동 예측 연구)

Ha, Seung Yun;Yoon, Han-Sam;Kim, Young-Taeg
- Journal of Korean Society of Coastal and Ocean Engineers
- /
- v.34 no.1
- /
- pp.11-18
- /
- 2022
In order to improve the accuracy of particle tracking prediction techniques near the Korean Strait, this study compared and analyzed a particle tracking model based on a seawater flow numerical model and a machine learning based on a particle tracking model using field observation data. The data used in the study were the surface drifter buoy movement trajectory data observed in the Korea Strait, prediction data by machine learning (linear regression, decision tree) using the tide and wind data from three observation stations (Gageo Island, Geoje Island, Gyoboncho), and prediciton data by numerical models (ROMS, MOHID). The above three data were compared through three error evaluation methods (Correlation Coefficient (CC), Root Mean Square Errors (RMSE), and Normalized Cumulative Lagrangian Separation (NCLS)). As a final result, the decision tree model had the best prediction accuracy in CC and RMSE, and the MOHID model had the best prediction results in NCLS.
https://doi.org/10.9765/KSCOE.2022.34.1.11 인용 PDF KSCI

Corporate Corruption Prediction Evidence From Emerging Markets

Kim, Yang Sok;Na, Kyunga;Kang, Young-Hee
- Asia-Pacific Journal of Business
- /
- v.12 no.4
- /
- pp.13-40
- /
- 2021
Purpose - The purpose of this study is to predict corporate corruption in emerging markets such as Brazil, Russia, India, and China (BRIC) using different machine learning techniques. Since corruption is a significant problem that can affect corporate performance, particularly in emerging markets, it is important to correctly identify whether a company engages in corrupt practices. Design/methodology/approach - In order to address the research question, we employ predictive analytic techniques (machine learning methods). Using the World Bank Enterprise Survey Data, this study evaluates various predictive models generated by seven supervised learning algorithms: k-Nearest Neighbour (k-NN), Naïve Bayes (NB), Decision Tree (DT), Decision Rules (DR), Logistic Regression (LR), Support Vector Machines (SVM), and Artificial Neural Network (ANN). Findings - We find that DT, DR, SVM and ANN create highly accurate models (over 90% of accuracy). Among various factors, firm age is the most significant, while several other determinants such as source of working capital, top manager experience, and the number of permanent full-time employees also contribute to company corruption. Research implications or Originality - This research successfully demonstrates how machine learning can be applied to predict corporate corruption and also identifies the major causes of corporate corruption.
https://doi.org/10.32599/apjb.12.4.202112.13 인용 PDF

The Role of Data Technologies with Machine Learning Approaches in Makkah Religious Seasons

Waleed Al Shehri
- International Journal of Computer Science & Network Security
- /
- v.23 no.8
- /
- pp.26-32
- /
- 2023
Hajj is a fundamental pillar of Islam that all Muslims must perform at least once in their lives. However, Umrah can be performed several times yearly, depending on people's abilities. Every year, Muslims from all over the world travel to Saudi Arabia to perform Hajj. Hajj and Umrah pilgrims face multiple issues due to the large volume of people at the same time and place during the event. Therefore, a system is needed to facilitate the people's smooth execution of Hajj and Umrah procedures. Multiple devices are already installed in Makkah, but it would be better to suggest the data architectures with the help of machine learning approaches. The proposed system analyzes the services provided to the pilgrims regarding gender, location, and foreign pilgrims. The proposed system addressed the research problem of analyzing the Hajj pilgrim dataset most effectively. In addition, Visualizations of the proposed method showed the system's performance using data architectures. Machine learning algorithms classify whether male pilgrims are more significant than female pilgrims. Several algorithms were proposed to classify the data, including logistic regression, Naive Bayes, K-nearest neighbors, decision trees, random forests, and XGBoost. The decision tree accuracy value was 62.83%, whereas K-nearest Neighbors had 62.86%; other classifiers have lower accuracy than these. The open-source dataset was analyzed using different data architectures to store the data, and then machine learning approaches were used to classify the dataset.
https://doi.org/10.22937/IJCSNS.2023.23.8.4 인용 PDF

Search Result 332, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)