통합 검색 | Korea Science

Machine Learning Based Automatic Categorization Model for Text Lines in Invoice Documents

Shin, Hyun-Kyung
- 한국멀티미디어학회논문지
- /
- 제13권12호
- /
- pp.1786-1797
- /
- 2010
Automatic understanding of contents in document image is a very hard problem due to involvement with mathematically challenging problems originated mainly from the over-determined system induced by document segmentation process. In both academic and industrial areas, there have been incessant and various efforts to improve core parts of content retrieval technologies by the means of separating out segmentation related issues using semi-structured document, e.g., invoice,. In this paper we proposed classification models for text lines on invoice document in which text lines were clustered into the five categories in accordance with their contents: purchase order header, invoice header, summary header, surcharge header, purchase items. Our investigation was concentrated on the performance of machine learning based models in aspect of linear-discriminant-analysis (LDA) and non-LDA (logic based). In the group of LDA, na$\"{\i}$ve baysian, k-nearest neighbor, and SVM were used, in the group of non LDA, decision tree, random forest, and boost were used. We described the details of feature vector construction and the selection processes of the model and the parameter including training and validation. We also presented the experimental results of comparison on training/classification error levels for the models employed.
PDF KSCI

나이브 베이즈 분류기와 혼동 행렬을 이용한 OCR에서의 철자 교정 (Using Naïve Bayes Classifier and Confusion Matrix Spelling Correction in OCR)

노경목;김창현;천민아;김재훈
- 한국어정보학회:학술대회논문집
- /
- 한국어정보학회 2016년도 제28회 한글및한국어정보처리학술대회
- /
- pp.310-312
- /
- 2016
OCR(Optical Character Recognition)의 오류를 줄이기 위해 본 논문에서는 교정 어휘 쌍의 혼동 행렬(confusion matrix)과 나이브 베이즈 분류기($na{\ddot{i}}ve$ Bayes classifier)를 이용한 철자 교정 시스템을 제안한다. 본 시스템에서는 철자 오류 중 한글에 대한 철자 오류만을 교정하였다. 실험에 사용된 말뭉치는 한국어 원시 말뭉치와 OCR 출력 말뭉치, OCR 정답 말뭉치이다. 한국어 원시 말뭉치로부터 자소 단위의 언어모델(language model)과 교정 후보 검색을 위한 접두사 말뭉치를 구축했고, OCR 출력 말뭉치와 OCR 정답 말뭉치로부터 교정 어휘 쌍을 추출하고, 자소 단위로 분해하여 혼동 행렬을 만들고, 이를 이용하여 오류 모델(error model)을 구축했다. 접두사 말뭉치를 이용해서 교정 후보를 찾고 나이브 베이즈 분류기를 통해 확률이 높은 교정 후보 n개를 제시하였다. 후보 n개 내에 정답 어절이 있다면 교정을 성공하였다고 판단했고, 그 결과 약 97.73%의 인식률을 가지는 OCR에서, 3개의 교정 후보를 제시하였을 때, 약 0.28% 향상된 98.01%의 인식률을 보였다. 이는 한글에 대한 오류를 교정했을 때이며, 향후 특수 문자와 숫자 등을 복합적으로 처리하여 교정을 시도한다면 더 나은 결과를 보여줄 것이라 기대한다.
PDF

Developing on the Soil Moisture Index(SMI) for forecast by using AQUA AMSR-E

Park Seung-Hwan;Park Jong-Seo;Park Jeong-Hyun;Kim Kum-Lan;Kim Byung-Sun
- 대한원격탐사학회:학술대회논문집
- /
- 대한원격탐사학회 2004년도 Proceedings of ISRS 2004
- /
- pp.415-418
- /
- 2004
The Studying is on developing precision of the moisture information on a soil. We used the data of AQUA AMSR-E which were obtained by Direct Receiving System in Korea Meteorological Administration(KMA). Although we know the Soil Moisture Information(SMI) helps the numerical weather model to produce the realistic results, we couldn't do it for the problem on a spatial resolution of the data is too low to apply. So we've tried to develop in a spatial resolution by using the AMSR-E data with a Digital Elevation Model(DEM) and Normal Difference Vegetation Index(NDVI) from AQUA MODIS and compared the difference between their information in statics. The result is more precise than the simple algorithm by a polarization ratio, and we could get the better result to use in forecast practically, if it's apply to get more detail in the vegetation temperature.
PDF

나이브 베이즈 분류기와 혼동 행렬을 이용한 OCR에서의 철자 교정 (Using Naïve Bayes Classifier and Confusion Matrix Spelling Correction in OCR)

노경목;김창현;천민아;김재훈
- 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
- /
- 한국정보과학회언어공학연구회 2016년도 제28회 한글 및 한국어 정보처리 학술대회
- /
- pp.310-312
- /
- 2016
OCR(Optical Character Recognition)의 오류를 줄이기 위해 본 논문에서는 교정 어휘 쌍의 혼동 행렬(confusion matrix)과 나이브 베이즈 분류기($na{\ddot{i}}ve$ Bayes classifier)를 이용한 철자 교정 시스템을 제안한다. 본 시스템에서는 철자 오류 중 한글에 대한 철자 오류만을 교정하였다. 실험에 사용된 말뭉치는 한국어 원시 말뭉치와 OCR 출력 말뭉치, OCR 정답 말뭉치이다. 한국어 원시 말뭉치로부터 자소 단위의 언어 모델(language model)과 교정 후보 검색을 위한 접두사 말뭉치를 구축했고, OCR 출력 말뭉치와 OCR 정답 말뭉치로부터 교정 어휘 쌍을 추출하고, 자소 단위로 분해하여 혼동 행렬을 만들고, 이를 이용하여 오류 모델(error model)을 구축했다. 접두사 말뭉치를 이용해서 교정 후보를 찾고 나이브 베이즈 분류기를 통해 확률이 높은 교정 후보 n개를 제시하였다. 후보 n개 내에 정답 어절이 있다면 교정을 성공하였다고 판단했고, 그 결과 약 97.73%의 인식률을 가지는 OCR에서, 3개의 교정 후보를 제시하였을 때, 약 0.28% 향상된 98.01%의 인식률을 보였다. 이는 한글에 대한 오류를 교정했을 때이며, 향후 특수 문자와 숫자 등을 복합적으로 처리하여 교정을 시도한다면 더 나은 결과를 보여줄 것이라 기대한다.
PDF

블로그 기반 자율적 협동학습이 초등학생의 정보윤리의식에 미치는 영향 (The Effect of Blog-Based Co-op Co-op Learning on Information Ethics for The Elementary Students)

김길모;서승덕;김성식
- 정보교육학회논문지
- /
- 제14권3호
- /
- pp.375-383
- /
- 2010
본 연구는 초등학생의 정보윤리의식 향상을 위하여 블로그 기반 자율적 협동학습 모형을 개발하고 적용하였다. 이러한 목적을 달성하기 위하여 본 연구에서 개발한 블로그 기반 자율적 협동학습을 초등학교 5, 6학년을 대상으로 적용하고 효과를 분석하였다. 실험처치는 3주 동안 6차시에 걸쳐 진행되었으며, 실험집단에는 블로그 기반 자율적 협동학습 모형을 적용하고 비교집단에는 전통적인 방식의 자율적 협동학습 모형을 적용하였다. 정보윤리의식 수준의 향상 정도를 측정하기 위하여 두 집단 간 평균을 비교하는 독립표본 t 검증을 실시하였다. 연구 결과, 블로그 기반 자율적 협동학습 모델이 학습자의 정보윤리의식 향상에 긍정적인 영향을 미치는 것을 알 수 있었다. 또한 학습자의 정보윤리의식 변화를 영역별로 분석한 결과 모든 영역에서 블로그 기반 자율적 협동학습 모델이 학습자 정보윤리의식 향상에 유의미한 결과를 나타내었다.
PDF

성인 남성의 최대하 운동시 대사반응 및 1,200 m 달리기 기록을 이용한 최대산소섭취량 추정식 개발 및 타당도 (Predictions of VO₂max Using Metabolical Responses in Submaximal Exercise and 1,200 m Running for Male, and the Validity of These Prediction Models)

임재형;전유정;장혁기;김효중;김기홍;이병근
- 운동과학
- /
- 제21권2호
- /
- pp.231-242
- /
- 2012
본 연구의 목적은 운동부하검사에서 일반적으로 많이 사용하는 Bruce protocol을 이용한 최대하 운동의 대사반응, 주요 시점의 심박수 기록 및 1,200 m 달리기 기록을 이용하여 최대산소섭취량을 추정하는 모형을 개발하고 모형간 추정의 타당도를 분석하는 데 있다. 연구대상은 성인 남성 255명(1,200 m 달리기는 133명)이며 Bruce protocol을 이용하여 최대운동부하검사를 실시하였고, 3분인 1단계와 6분인 2단계 종료 시점의 대사반응을 측정하였다. 측정항목은 VO₂(㎖㎖/kg/min), VCO₂(㎖/kg/min), VE(L/min) 및 HR(bpm), HR가 150 bpm과 170 bpm에 도달하는 시간, Bruce protocol 6분과 3분 심박수 차이, 1,200 m 달리기 기록 등이었다. 신체자료와 최대하 운동 중 대사반응을 이용하여 최대산소섭취량을 산출하는 모형을 개발하기 위하여 다중회귀분석을 실시하였다. 모든 변수를 동시투입법으로 분석한 전체모형의 R은 0.642이고(p<.01) 추정의 표준오차(SEE)는 4.38 ㎖/kg/min, 변동계수(CV)는 10.8%이었으나(p<.01), 다중공선성이 나타났다. 단계별분석법으로 분석한 3분모형1과 모형2의 R은 0.341과 0.461이고, SEE는 6.05와 5.72 ㎖/kg/min, CV는 14.9와 14.1%로 나타났고(p<.01), 다중공선성이 나타나지 않았다. 6분모형1과 모형2의 R은 0.350과 0.456이었고(p<.01), SEE는 6.03과 5.74 ㎖/kg/min, 변동계수(CV)는 14.9와 14.2%로 나타났으며(p<.01), 다중공선성이 나타나지 않았다. 6분HR-3분HR 모형의 R은 0.150, HR150모형은 0.151, HR170모형은 0.154로 나소 낮게 나타났고, SEE는 6.36~6.37 ㎖/kg/min으로 유사하게 나타났고, CV도 15.7%로 유사하게 나타났다. 1,200 m 달리기 모형의 R은 0.444이고, SEE는 4.82 ㎖/kg/min, CV는 11.9%로 나타났다. 결론적으로 Bruce protocol을 이용하여 실시한 최대산소섭취량 추정 방법 중 실용적인 유용성과 간편성을 고려하면 대사반응을 이용한 6분모형과 3분모형이 적합한 모형으로 나타났고, 심박수 모형과 달리기 모형은 추정의 정확도가 다소 낮게 나타났다.

지수가중이동평균법과 결합된 마코위츠 포트폴리오 선정 모형 기반 투자 프레임워크 개발 : 글로벌 금융위기 상황 하 한국 주식시장을 중심으로 (Developing an Investment Framework based on Markowitz's Portfolio Selection Model Integrated with EWMA : Case Study in Korea under Global Financial Crisis)

박경찬;정종빈;김성문
- 한국경영과학회지
- /
- 제38권2호
- /
- pp.75-93
- /
- 2013
In applying Markowitz's portfolio selection model to the stock market, we developed a comprehensive investment decision-making framework including key inputs for portfolio theory (i.e., individual stocks' expected rate of return and covariance) and minimum required expected return. For estimating the key inputs of our decision-making framework, we utilized an exponentially weighted moving average (EWMA) which places more emphasis on recent data than the conventional simple moving average (SMA). We empirically analyzed the investment results of the decision-making framework with the same 15 stocks in Samsung Group Funds found in the Korean stock market between 2007 and 2011. This five-year investment horizon is marked by global financial crises including the U.S. subprime mortgage crisis, the collapse of Lehman Brothers, and the European sovereign-debt crisis. We measure portfolio performance in terms of rate of return, standard deviation of returns, and Sharpe ratio. Results are compared with the following benchmarks : 1) KOSPI, 2) Samsung Group Funds, 3) Talmudic portfolio based on the na$\ddot{i}$ve 1/N rule, and 4) Markowitz's model with SMA. We performed sensitivity analyses on all the input parameters that are necessary for designing an investment decision-making framework : smoothing constant for EWMA, minimum required expected return for the portfolio, and portfolio rebalancing period. In conclusion, appropriate use of the comprehensive investment decision-making framework based on the Markowitz's model integrated with EWMA proves to achieve outstanding performance compared to the benchmarks.
https://doi.org/10.7737/JKORMS.2013.38.2.075 인용 PDF KSCI

서비스 온톨로지 기반의 상황인식 모델링을 이용한 추천 (Recommendation using Service Ontology based Context Awareness Modeling)

류중경;정경용;김종훈;임기욱;이정현
- 한국콘텐츠학회논문지
- /
- 제11권2호
- /
- pp.22-30
- /
- 2011
품질뿐만 아니라 물질적 풍요가 되어가는 IT융합 환경에서 상황정보를 파악하는 것은 개인화 추천 서비스 전략의 중요한 성공요소가 되고 있다. 본 논문에서는 서비스 온톨로지 기반의 상황인식 모델링을 이용한 추천을 제안하였다. 이기종 디바이스 구축을 위해 OSGi 프레임워크 기반의 데이터 획득 모듈을 구축하고 온톨로지 기반의 상황정보 모델을 개발한다. 상황정보 모델을 위해서 추천 시스템에 필요한 상황정보를 추출하고 분류한다. 상황정보를 사용하여 온톨로지 기반의 상황인식 모델을 개발하고 협력적 필터링의 추천에 반영한다. 상황인식 모델은 Na$\"{\i}$ve Bayes 분류자를 사용하여 상황에 따라 서비스를 선택한 정보를 반영하고 사용자에게 제공한다. 제안한 방법의 성능 평가를 하기 위해 대응표본 T-검정을 실시하여 유용성을 검증하였다. 평가 결과, 서비스에 대한 만족도의 차이가 통계적으로 의미가 있음을 증명하였고 높은 만족도를 보임을 확인하였다.
https://doi.org/10.5392/JKCA.2011.11.2.022 인용 PDF KSCI

Movie Popularity Classification Based on Support Vector Machine Combined with Social Network Analysis

Dorjmaa, Tserendulam;Shin, Taeksoo
- 한국IT서비스학회지
- /
- 제16권3호
- /
- pp.167-183
- /
- 2017
The rapid growth of information technology and mobile service platforms, i.e., internet, google, and facebook, etc. has led the abundance of data. Due to this environment, the world is now facing a revolution in the process that data is searched, collected, stored, and shared. Abundance of data gives us several opportunities to knowledge discovery and data mining techniques. In recent years, data mining methods as a solution to discovery and extraction of available knowledge in database has been more popular in e-commerce service fields such as, in particular, movie recommendation. However, most of the classification approaches for predicting the movie popularity have used only several types of information of the movie such as actor, director, rating score, language and countries etc. In this study, we propose a classification-based support vector machine (SVM) model for predicting the movie popularity based on movie's genre data and social network data. Social network analysis (SNA) is used for improving the classification accuracy. This study builds the movies' network (one mode network) based on initial data which is a two mode network as user-to-movie network. For the proposed method we computed degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality as centrality measures in movie's network. Those four centrality values and movies' genre data were used to classify the movie popularity in this study. The logistic regression, neural network, $na{\ddot{i}}ve$ Bayes classifier, and decision tree as benchmarking models for movie popularity classification were also used for comparison with the performance of our proposed model. To assess the classifier's performance accuracy this study used MovieLens data as an open database. Our empirical results indicate that our proposed model with movie's genre and centrality data has by approximately 0% higher accuracy than other classification models with only movie's genre data. The implications of our results show that our proposed model can be used for improving movie popularity classification accuracy.
https://doi.org/10.9716/KITS.2017.16.3.167 인용 PDF KSCI

자극기반 의사결정과정에서 태도와 태도강도의 역할에 관한 실증연구 (An empirical study on the roles of attitudes and attitude strength in stimulus-based decision-making)

범상규;송균석
- Journal of the Korean Data and Information Science Society
- /
- 제20권3호
- /
- pp.563-575
- /
- 2009
본 연구는 Priester 등 (2004)에 의해서 제시된 기억기반 선택상황에서 태도-의사결정 통합모형인 '고려상표군과 선택에 대한 태도 및 태도강도의 영향모형'을 자극기반 선택상황으로 확장하였다. 즉 태도 및 태도강도가 고려상표군과 선택변수에 대한 역할 (모형 타당성측면)과 외부자극 요소인 제품 디스플레이 위치에 따른 물리적 현저성 강도의 역할 (모형 일반화측면)을 통해 모형의 확장가능성을 검토하였다. 본 연구는 기존 연구와 같이 태도 그 자체를 측정하여 행동이나 행동의도를 파악하는 대신에 '행동에 대한 태도'를 직접 측정할 수 있는 실험조사 방법을 토대로 로지스틱 회귀분석을 통해 실증적으로 분석하였다. 본 연구 결과는 기억기반에서 태도-의사결정간의 관련성에 대한 통합모형으로 제시된 모형의 특성들을 자극기반 상황에서 이론적 배경과 실증분석을 통해 모형타당성을 확인하여 모형의 일반화 가능성을 보여주었다.
PDF

검색결과 190건 처리시간 0.026초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)