• 제목/요약/키워드: Stemming algorithms

검색결과 7건 처리시간 0.021초

Information Retrieval Systems: Between Morphological Analyzers and Systemming Algorithms

  • Mohamed, Afaf Abdel Rhman;Ouni, Chafika;Eljack, Sarah Mustafa;Alfayez, Fayez
    • International Journal of Computer Science & Network Security
    • /
    • 제22권3호
    • /
    • pp.375-381
    • /
    • 2022
  • The main objective of an Information Retrieval System (IRS) is to obtain suitable information within a reasonable time to satisfy a user need. To achieve this purpose, an IRS should have a good indexing system that is based on natural language processing.In this context, we focus on the available Arabic language processing techniques for an IRS with the goal of contributing to an improvement in the performance. Our contribution consists of integrating morphological analysis into an IRS in order to compare the impact of morphological analysis with that of stemming algorithms.

Comparative Study of Various Persian Stemmers in the Field of Information Retrieval

  • Moghadam, Fatemeh Momenipour;Keyvanpour, MohammadReza
    • Journal of Information Processing Systems
    • /
    • 제11권3호
    • /
    • pp.450-464
    • /
    • 2015
  • In linguistics, stemming is the operation of reducing words to their more general form, which is called the 'stem'. Stemming is an important step in information retrieval systems, natural language processing, and text mining. Information retrieval systems are evaluated by metrics like precision and recall and the fundamental superiority of an information retrieval system over another one is measured by them. Stemmers decrease the indexed file, increase the speed of information retrieval systems, and improve the performance of these systems by boosting precision and recall. There are few Persian stemmers and most of them work based on morphological rules. In this paper we carefully study Persian stemmers, which are classified into three main classes: structural stemmers, lookup table stemmers, and statistical stemmers. We describe the algorithms of each class carefully and present the weaknesses and strengths of each Persian stemmer. We also propose some metrics to compare and evaluate each stemmer by them.

Automated Essay Grading: An Application For Historical Malay Text

  • Syed Mustapha, S.M.F.D;Idris, N.
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2001년도 The Pacific Aisan Confrence On Intelligent Systems 2001
    • /
    • pp.237-245
    • /
    • 2001
  • Automated essay grading has been proposed for over thirty years. Only recently have practical implementations been constructed and tested. This paper investigated the role of the nearest-neighbour algorithm within the information retrieval as a way of grading the essay automatically called Automated Essay Grading System. It intended to offer teachers an individualized assistance in grading the student\`s essay. The system involved several processes, which are the indexing, the structuring of the model answer and the grade processing. The indexing process comprised the document indexing and query processing which are mainly used for representing the documents and the query. Structuring the model answer is actually preparing the marking scheme and the grade processing is the process of assessing the essay. To test the effectiveness of the developed algorithms, the algorithms are tested against the History text in Malay. The result showed that th information retrieval and the nearest-neighbour algorithm are practical combination that offer acceptable performance for grading the essay.

  • PDF

RCP 기후변화 시나리오에 따른 우리나라 구상나무 잠재 분포 변화 예측 (Projecting the Potential Distribution of Abies koreana in Korea Under the Climate Change Based on RCP Scenarios)

  • 구경아;김재욱;공우석;정휘철;김근한
    • 한국환경복원기술학회지
    • /
    • 제19권6호
    • /
    • pp.19-30
    • /
    • 2016
  • The projection of climate-related range shift is critical information for conservation planning of Korean fir (Abies koreana E. H. Wilson). We first modeled the distribution of Korean fir under current climate condition using five single-model species distribution models (SDMs) and the pre-evaluation weighted ensemble method and then predicted the distributions under future climate conditions projected with HadGEM2-AO under four $CO_2$ emission scenarios, the Representative Concentration Pathways (RCP) 2.6, 4.5, 6.0 and 8.5. We also investigated the predictive uncertainty stemming from five individual algorithms and four $CO_2$ emission scenarios for better interpretation of SDM projections. Five individual algorithms were Generalized linear model (GLM), Generalized additive model (GAM), Multivariate adaptive regression splines (MARS), Generalized boosted model (GBM) and Random forest (RF). The results showed high variations of model performances among individual SDMs and the wide range of diverging predictions of future distributions of Korean fir in response to RCPs. The ensemble model presented the highest predictive accuracy (TSS = 0.97, AUC = 0.99) and predicted that the climate habitat suitability of Korean fir would increase under climate changes. Accordingly, the fir distribution could expand under future climate conditions. Increasing precipitation may account for increases in the distribution of Korean fir. Increasing precipitation compensates the negative effects of increasing temperature. However, the future distribution of Korean fir is also affected by other ecological processes, such as interactions with co-existing species, adaptation and dispersal limitation, and other environmental factors, such as extreme weather events and land-use changes. Therefore, we need further ecological research and to develop mechanistic and process-based distribution models for improving the predictive accuracy.

Meta-heuristic optimization algorithms for prediction of fly-rock in the blasting operation of open-pit mines

  • Mahmoodzadeh, Arsalan;Nejati, Hamid Reza;Mohammadi, Mokhtar;Ibrahim, Hawkar Hashim;Rashidi, Shima;Mohammed, Adil Hussein
    • Geomechanics and Engineering
    • /
    • 제30권6호
    • /
    • pp.489-502
    • /
    • 2022
  • In this study, a Gaussian process regression (GPR) model as well as six GPR-based metaheuristic optimization models, including GPR-PSO, GPR-GWO, GPR-MVO, GPR-MFO, GPR-SCA, and GPR-SSO, were developed to predict fly-rock distance in the blasting operation of open pit mines. These models included GPR-SCA, GPR-SSO, GPR-MVO, and GPR. In the models that were obtained from the Soungun copper mine in Iran, a total of 300 datasets were used. These datasets included six input parameters and one output parameter (fly-rock). In order to conduct the assessment of the prediction outcomes, many statistical evaluation indices were used. In the end, it was determined that the performance prediction of the ML models to predict the fly-rock from high to low is GPR-PSO, GPR-GWO, GPR-MVO, GPR-MFO, GPR-SCA, GPR-SSO, and GPR with ranking scores of 66, 60, 54, 46, 43, 38, and 30 (for 5-fold method), respectively. These scores correspond in conclusion, the GPR-PSO model generated the most accurate findings, hence it was suggested that this model be used to forecast the fly-rock. In addition, the mutual information test, also known as MIT, was used in order to investigate the influence that each input parameter had on the fly-rock. In the end, it was determined that the stemming (T) parameter was the most effective of all the parameters on the fly-rock.

수송 규모의 경제 효과를 고려한 단일 할당 허브 네트워크 설계 모형의 개발 (Development of a Single Allocation Hub Network Design Model with Transportation Economies of Scale)

  • 김동규;박창호;이진수
    • 대한토목학회논문집
    • /
    • 제26권6D호
    • /
    • pp.917-926
    • /
    • 2006
  • 수송 규모의 경제 효과는 허브 네트워크의 특수한 현상이다. 허브 네트워크의 중요한 특성은 규모의 경제 효과로 인한 비용 절감과 허브 시설 운영 및 수송량 집화에 따른 지체와 관련한 비용들을 정량화하는 것이다. 그럼에도 불구하고 허브 입지 문제의 NP-complete 특성으로 인하여, 대부분의 기존 연구자들은 근사해 도출을 위한 휴리스틱 알고리즘 개발에 초점을 맞추었다. 본 연구의 목적은 규모의 경제 효과를 반영하는 허브 네트워크 설계 모형을 개발하는 것이다. 모형은 허브 네트워크의 특수성을 고려하고 다양한 비용요소들을 결정할 수 있도록 설계된다. 개발된 모형식을 반영하는 휴리스틱 알고리즘이 개발되며 모형의 결과들은 실제 데이터를 이용하여 최근 발표된 연구결과들과 비교된다. 분석 결과, 제안된 모형은 수송량 집화에 따른 규모의 경제 효과를 잘 반영할 수 있는 것으로 확인되었다. 본 연구는 효율적이고 합리적인 네트워크 설계의 이론적 기반을 구성함과 동시에 현재 및 계획 중인 물류시스템의 평가에도 기여할 수 있을 것으로 사료된다.

잠재 토픽 기반의 제품 평판 마이닝 (Latent topics-based product reputation mining)

  • 박상민;온병원
    • 지능정보연구
    • /
    • 제23권2호
    • /
    • pp.39-70
    • /
    • 2017
  • 최근 여론조사 분야에서 데이터에 기반을 둔 분석 기법이 널리 활용되고 있다. 기업에서는 최근 출시된 제품에 대한 선호도를 조사하기 위해 기존의 설문조사나 전문가의 의견을 단순 취합하는 것이 아니라, 온라인상에 존재하는 다양한 종류의 데이터를 수집하고 분석하여 제품에 대한 대중의 기호를 정확히 파악할 수 있는 방안을 필요로 한다. 기존의 주요 방안에서는 먼저 해당 분야에 대한 감성사전을 구축한다. 전문가들은 수집된 텍스트 문서들로부터 빈도가 높은 단어들을 정리하여 긍정, 부정, 중립을 판단한다. 특정 제품의 선호를 판별하기 위해, 제품에 대한 사용 후기 글을 수집하여 문장을 추출하고, 감성사전을 이용하여 문장들의 긍정, 부정, 중립을 판단하여 최종적으로 긍정과 부정인 문장의 개수를 통해 제품에 대한 선호도를 측정한다. 그리고 제품에 대한 긍 부정 내용을 자동으로 요약하여 제공한다. 이것은 문장들의 감성점수를 산출하여, 긍정과 부정점수가 높은 문장들을 추출한다. 본 연구에서는 일반 대중이 생산한 문서 속에 숨겨져 있는 토픽을 추출하여 주어진 제품의 선호도를 조사하고, 토픽의 긍 부정 내용을 요약하여 보여주는 제품 평판 마이닝 알고리즘을 제안한다. 기존 방식과 다르게, 토픽을 활용하여 쉽고 빠르게 감성사전을 구축할 수 있으며 추출된 토픽을 정제하여 제품의 선호도와 요약 결과의 정확도를 높인다. 실험을 통해, K5, SM5, 아반떼 등의 국내에서 생산된 자동차의 수많은 후기 글들을 수집하였고, 실험 자동차의 긍 부정 비율, 긍 부정 내용 요약, 통계 검정을 실시하여 제안방안의 효용성을 입증하였다.