• 제목/요약/키워드: Urdu

검색결과 24건 처리시간 0.018초

A Methodology for Urdu Word Segmentation using Ligature and Word Probabilities

  • Khan, Yunus;Nagar, Chetan;Kaushal, Devendra S.
    • International Journal of Ocean System Engineering
    • /
    • 제2권1호
    • /
    • pp.24-31
    • /
    • 2012
  • This paper introduce a technique for Word segmentation for the handwritten recognition of Urdu script. Word segmentation or word tokenization is a primary technique for understanding the sentences written in Urdu language. Several techniques are available for word segmentation in other languages but not much work has been done for word segmentation of Urdu Optical Character Recognition (OCR) System. A method is proposed for word segmentation in this paper. It finds the boundaries of words in a sequence of ligatures using probabilistic formulas, by utilizing the knowledge of collocation of ligatures and words in the corpus. The word identification rate using this technique is 97.10% with 66.63% unknown words identification rate.

Building a text collection for Urdu information retrieval

  • Rasheed, Imran;Banka, Haider;Khan, Hamaid M.
    • ETRI Journal
    • /
    • 제43권5호
    • /
    • pp.856-868
    • /
    • 2021
  • Urdu is a widely spoken language in the Indian subcontinent with over 300 million speakers worldwide. However, linguistic advancements in Urdu are rare compared to those in other European and Asian languages. Therefore, by following Text Retrieval Conference standards, we attempted to construct an extensive text collection of 85 304 documents from diverse categories covering over 52 topics with relevance judgment sets at 100 pool depth. We also present several applications to demonstrate the effectiveness of our collection. Although this collection is primarily intended for text retrieval, it can also be used for named entity recognition, text summarization, and other linguistic applications with suitable modifications. Ours is the most extensive existing collection for the Urdu language, and it will be freely available for future research and academic education.

Urdu News Classification using Application of Machine Learning Algorithms on News Headline

  • Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • 제21권2호
    • /
    • pp.229-237
    • /
    • 2021
  • Our modern 'information-hungry' age demands delivery of information at unprecedented fast rates. Timely delivery of noteworthy information about recent events can help people from different segments of life in number of ways. As world has become global village, the flow of news in terms of volume and speed demands involvement of machines to help humans to handle the enormous data. News are presented to public in forms of video, audio, image and text. News text available on internet is a source of knowledge for billions of internet users. Urdu language is spoken and understood by millions of people from Indian subcontinent. Availability of online Urdu news enable this branch of humanity to improve their understandings of the world and make their decisions. This paper uses available online Urdu news data to train machines to automatically categorize provided news. Various machine learning algorithms were used on news headline for training purpose and the results demonstrate that Bernoulli Naïve Bayes (Bernoulli NB) and Multinomial Naïve Bayes (Multinomial NB) algorithm outperformed other algorithms in terms of all performance parameters. The maximum level of accuracy achieved for the dataset was 94.278% by multinomial NB classifier followed by Bernoulli NB classifier with accuracy of 94.274% when Urdu stop words were removed from dataset. The results suggest that short text of headlines of news can be used as an input for text categorization process.

Syntactic Structured Framework for Resolving Reflexive Anaphora in Urdu Discourse Using Multilingual NLP

  • Nasir, Jamal A.;Din, Zia Ud.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권4호
    • /
    • pp.1409-1425
    • /
    • 2021
  • In wide-ranging information society, fast and easy access to information in language of one's choice is indispensable, which may be provided by using various multilingual Natural Language Processing (NLP) applications. Natural language text contains references among different language elements, called anaphoric links. Resolving anaphoric links is a key problem in NLP. Anaphora resolution is an essential part of NLP applications. Anaphoric links need to be properly interpreted for clear understanding of natural languages. For this purpose, a mechanism is desirable for the identification and resolution of these naturally occurring anaphoric links. In this paper, a framework based on Hobbs syntactic approach and a system developed by Lappin & Leass is proposed for resolution of reflexive anaphoric links, present in Urdu text documents. Generally, anaphora resolution process takes three main steps: identification of the anaphor, location of the candidate antecedent(s) and selection of the appropriate antecedent. The proposed framework is based on exploring the syntactic structure of reflexive anaphors to find out various features for constructing heuristic rules to develop an algorithm for resolving these anaphoric references. System takes Urdu text containing reflexive anaphors as input, and outputs Urdu text with resolved reflexive anaphoric links. Despite having scarcity of Urdu resources, our results are encouraging. The proposed framework can be utilized in multilingual NLP (m-NLP) applications.

Deep recurrent neural networks with word embeddings for Urdu named entity recognition

  • Khan, Wahab;Daud, Ali;Alotaibi, Fahd;Aljohani, Naif;Arafat, Sachi
    • ETRI Journal
    • /
    • 제42권1호
    • /
    • pp.90-100
    • /
    • 2020
  • Named entity recognition (NER) continues to be an important task in natural language processing because it is featured as a subtask and/or subproblem in information extraction and machine translation. In Urdu language processing, it is a very difficult task. This paper proposes various deep recurrent neural network (DRNN) learning models with word embedding. Experimental results demonstrate that they improve upon current state-of-the-art NER approaches for Urdu. The DRRN models evaluated include forward and bidirectional extensions of the long short-term memory and back propagation through time approaches. The proposed models consider both language-dependent features, such as part-of-speech tags, and language-independent features, such as the "context windows" of words. The effectiveness of the DRNN models with word embedding for NER in Urdu is demonstrated using three datasets. The results reveal that the proposed approach significantly outperforms previous conditional random field and artificial neural network approaches. The best f-measure values achieved on the three benchmark datasets using the proposed deep learning approaches are 81.1%, 79.94%, and 63.21%, respectively.

Recognize Handwritten Urdu Script Using Kohenen Som Algorithm

  • Khan, Yunus;Nagar, Chetan
    • International Journal of Ocean System Engineering
    • /
    • 제2권1호
    • /
    • pp.57-61
    • /
    • 2012
  • In this paper we use the Kohonen neural network based Self Organizing Map (SOM) algorithm for Urdu Character Recognition. Kohenen NN have more efficient in terms of performance as compare to other approaches. Classification is used to recognize hand written Urdu character. The number of possible unknown character is reducing by pre-classification with respect to subset of the total character set. So the proposed algorithm is attempt to group similar character. Members of pre-classified group are further analyzed using a statistical classifier for final recognition. A recognition rate of around 79.9% was achieved for the first choice and more than 98.5% for the top three choices. The result of this paper shows that the proposed Kohonen SOM algorithm yields promising output and feasible with other existing techniques.

Combining Different Distance Measurements Methods with Dempster-Shafer-Theory for Recognition of Urdu Character Script

  • Khan, Yunus;Nagar, Chetan;Kaushal, Devendra S.
    • International Journal of Ocean System Engineering
    • /
    • 제2권1호
    • /
    • pp.16-23
    • /
    • 2012
  • In this paper we discussed a new methodology for Urdu Character Recognition system using Dempster-Shafer theory which can powerfully estimate the similarity ratings between a recognized character and sampling characters in the character database. Recognition of character is done by five probability calculation methods such as (similarity, hamming, linear correlation, cross-correlation, nearest neighbor) with Dempster-Shafer theory of belief functions. The main objective of this paper is to Recognition of Urdu letters and numerals through five similarity and dissimilarity algorithms to find the similarity between the given image and the standard template in the character recognition system. In this paper we develop a method to combine the results of the different distance measurement methods using the Dempster-Shafer theory. This idea enables us to obtain a single precision result. It was observed that the combination of these results ultimately enhanced the success rate.

Game Theoretic based Distributed Dynamic Power Allocation in Irregular Geometry Multicellular Network

  • Safdar, Hashim;Ullah, Rahat;Khalid, Zubair
    • International Journal of Computer Science & Network Security
    • /
    • 제22권7호
    • /
    • pp.199-205
    • /
    • 2022
  • The extensive growth in data rate demand by the smart gadgets and mobile broadband application services in wireless cellular networks. To achieve higher data rate demand which leads to aggressive frequency reuse to improve network capacity at the price of Inter Cell Interference (ICI). Fractional Frequency Reuse (FFR) has been recognized as an effective scheme to get a higher data rate and mitigate ICI for perfect geometry network scenarios. In, an irregular geometric multicellular network, ICI mitigation is a challenging issue. The purpose of this paper is to develop distributed dynamic power allocation scheme for FFR based on game theory to mitigate ICI. In the proposed scheme, each cell region in an irregular multicellular scenario adopts a self-less behavior instead of selfish behavior to improve the overall utility function. This proposed scheme improves the overall data rate and mitigates ICI.

Synthesis, Characterization and Biological Evaluations of Ciprofloxacin Carboxamide Analogues

  • Sultana, Najma;Arayne, Muhammad Saeed;Rizvi, Syeda Bushra Shakeb;Haroon, Urooj
    • Bulletin of the Korean Chemical Society
    • /
    • 제32권2호
    • /
    • pp.483-488
    • /
    • 2011
  • Present work comprises of synthesis various analogues of ciprofloxacin by introducing new functionality at carboxylic group position via ester aminolysis reaction. For this purpose the carboxylic group at C-3 was esterified and later subjected to nucleophilic attack at the carbonyl carbon by various aromatic amines. Structure of the analogues was confirmed by different techniques i.e. IR, $^1H$ NMR and mass spectrometry. The antibacterial activity of the derivatives was also assessed with the parent against a series of Gram-positive and Gram-negative bacteria. The synthesized compounds showed diverse antimicrobial profile among which most compounds possessed a comparable or better activity in comparison to the ciprofloxacin. Additionally unlike ciprofloxacin, some of the derivatives were also found to show antifungal activity.

DERIVED FUNCTOR COHOMOLOGY GROUPS WITH YONEDA PRODUCT

  • Husain, Hafiz Syed;Sultana, Mariam
    • 한국수학교육학회지시리즈B:순수및응용수학
    • /
    • 제28권2호
    • /
    • pp.187-198
    • /
    • 2021
  • This work presents an exposition of both the internal structure of derived category of an abelian category D*(𝓐) and its contribution in solving problems, particularly in algebraic geometry. Calculation of some morphisms will be presented between objects in D*(𝓐) as elements in appropriate cohomology groups along with their compositions with the help of Yoneda construction under the assumption that the homological dimension of D*(𝓐) is greater than or equal to 2. These computational settings will then be considered under sheaf cohomological context with a particular case from projective geometry.