• Title/Summary/Keyword: Arabic language

Search Result 57, Processing Time 0.026 seconds

Automatic Transcription of Three Ambiguous Symbols Used with Arabic Numerals: Period, Colon and Slash. (아라비안 숫자를 동반한 중의적 기호의 자동전사: 온점, 쌍점, 빗금을 중심으로)

  • 윤애선;정영임;권혁철
    • Language and Information
    • /
    • v.8 no.1
    • /
    • pp.117-136
    • /
    • 2004
  • In this paper, we have proposed Auto- TSS, an automatic transcription module of three ambiguous symbols-period (.), colon (:) and slash (/)--using their linguistic contexts. Few previous studies have discussed the problems of ambiguities in reading those symbols into Korean alphabetic letters in order to improve the current Korean TTS (Text-To-Speech) systems. We have classified 9 different reading formulae of the three symbols, analyzed their left and right contexts, and investigated selection rules and distributions between the symbols and their contexts. Based on these linguistic features, 30 stereotyped patterns, 53 rules and 5 heuristics determining the types of reading formulae are investigated for Auto-TSS. This module works modularly in 4 steps. The pilot test was conducted with three test suites, which contain respectively 6,979, 3,491 and 2,450 morpheme clusters containing at least one of three ambiguous symbols and Arabic numeral(s). Encouraging results of 94.3%, 93.0%, 94.2% accuracy were obtained for the test suites. Our next phases are to develop a guessing routine for unknown contexts of the union symbols by using statistical information; to refine the proper nouns and terminology detecting module; and to apply Auto-TSS on a larger scale.

  • PDF

GMM-Based Maghreb Dialect Identification System

  • Nour-Eddine, Lachachi;Abdelkader, Adla
    • Journal of Information Processing Systems
    • /
    • v.11 no.1
    • /
    • pp.22-38
    • /
    • 2015
  • While Modern Standard Arabic is the formal spoken and written language of the Arab world; dialects are the major communication mode for everyday life. Therefore, identifying a speaker's dialect is critical in the Arabic-speaking world for speech processing tasks, such as automatic speech recognition or identification. In this paper, we examine two approaches that reduce the Universal Background Model (UBM) in the automatic dialect identification system across the five following Arabic Maghreb dialects: Moroccan, Tunisian, and 3 dialects of the western (Oranian), central (Algiersian), and eastern (Constantinian) regions of Algeria. We applied our approaches to the Maghreb dialect detection domain that contains a collection of 10-second utterances and we compared the performance precision gained against the dialect samples from a baseline GMM-UBM system and the ones from our own improved GMM-UBM system that uses a Reduced UBM algorithm. Our experiments show that our approaches significantly improve identification performance over purely acoustic features with an identification rate of 80.49%.

Arabic Handwritten Manuscripts Text Recognition: A Systematic Review

  • Alghamdi, Arwa;Alluhaybi, Dareen;Almehmadi, Doaa;Alameer, Khadijah;Siddeq, Sundos Bin;Alsubait, Tahani
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.11
    • /
    • pp.319-323
    • /
    • 2022
  • Handwritten text recognition is one of the active research areas nowadays. The progress in this field differs in every language. For example, the progress in Arabic handwritten text recognition is still insignificant and needs more attentions and efforts. One of the most important fields in this is Arabic handwritten manuscript text recognition which focuses in extracting text from historical manuscripts. For eons, ancients used manuscripts to write everything. Nowadays, there are millions of manuscripts all around the world. There are two main challenges in dealing with these manuscripts. The first one is that they are at the risk of damage since they are written in primitive materials, the second challenge is due to the difference in writing styles, hence most people are unable to read these manuscripts easily. Therefore, we discuss in this study different papers that are related to this important research field.

Islamization or Arabization? The Arab Cultural Influence on the South Sulawesi Muslim Community since the Islamization in the 17th Century

  • Halim, Wahyuddin
    • SUVANNABHUMI
    • /
    • v.10 no.1
    • /
    • pp.35-61
    • /
    • 2018
  • This paper explores the influence of Arab culture on the culture of Bugis-Makassar, the two major ethnic groups in South Sulawesi, Indonesia, particularly after their Islamization in the early 17th century. The paper argues that since then, the on-going process of Islamization in the region has also brought a continuous flow of ideas and cultural practices from Mecca to Indonesia by means of the hajj pilgrims, Arab traders, and the establishment of Islamic educational institutions that emphasized the teaching and use of Arabic language in education. These factors, among others, have facilitated a cultural inflow which enabled cultural practices borne of West Asia (Middle East) to be integrated into local customs and beliefs. The paper particularly depicts the most observable forms of Arabic cultural integration, acculturation, and assimilation into the Bugis-Makassar culture such as the use of Arabic in Islamic schools and religious sermons; the Arab-style dressing by religious scholars, teachers, and students; the wearing of the hijab (head cover) by women; and the change of people's names from local into Arabic. By utilizing the historical and anthropological approach, this paper investigates this dynamic process of adaptation and integration of a foreign culture that first came through the Islamization of a local culture, exploring the role of an Islamic missionary and educational institutions in mediating and maintaining such cultural integration processes.

  • PDF

Arabic Stock News Sentiments Using the Bidirectional Encoder Representations from Transformers Model

  • Eman Alasmari;Mohamed Hamdy;Khaled H. Alyoubi;Fahd Saleh Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.113-123
    • /
    • 2024
  • Stock market news sentiment analysis (SA) aims to identify the attitudes of the news of the stock on the official platforms toward companies' stocks. It supports making the right decision in investing or analysts' evaluation. However, the research on Arabic SA is limited compared to that on English SA due to the complexity and limited corpora of the Arabic language. This paper develops a model of sentiment classification to predict the polarity of Arabic stock news in microblogs. Also, it aims to extract the reasons which lead to polarity categorization as the main economic causes or aspects based on semantic unity. Therefore, this paper presents an Arabic SA approach based on the logistic regression model and the Bidirectional Encoder Representations from Transformers (BERT) model. The proposed model is used to classify articles as positive, negative, or neutral. It was trained on the basis of data collected from an official Saudi stock market article platform that was later preprocessed and labeled. Moreover, the economic reasons for the articles based on semantic unit, divided into seven economic aspects to highlight the polarity of the articles, were investigated. The supervised BERT model obtained 88% article classification accuracy based on SA, and the unsupervised mean Word2Vec encoder obtained 80% economic-aspect clustering accuracy. Predicting polarity classification on the Arabic stock market news and their economic reasons would provide valuable benefits to the stock SA field.

Arabic-Numerals to Korean Transliteration Disambiguation using BERT (BERT를 이용한 숫자-한국어 음역 모호성 해소)

  • Park, Jeong Yeon;Yuk, Dae Bum;Lee, Jae Sung
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.42-44
    • /
    • 2020
  • TTS(Text-to-Speech) 시스템을 위해서는 한글 이외의 문자열을 한글로 변환해줄 필요가 있다. 이러한 문자열에는 숫자, 특수문자 등의 문자열이 포함되어 있다. 특히 숫자의 경우, 숫자가 사용되는 문맥에 따라 그 발음방법이 달라지는 문제점이 있다. 본 논문에서는 기존의 규칙기반과 한정된 문맥 정보만을 활용할 수 있는 방법이 아닌, 딥러닝을 이용한 방법으로 문맥에 따라 발음방법이 달라지는 숫자 음역의 모호성을 해소하는 방법을 소개한다.

  • PDF

Mediated Public Diplomacy and the Contest Over International Agenda-Building in the Gulf Diplomatic Crisis

  • Albishri, Osama;Lan, Xiaomeng;Kiousis, Spiro
    • Journal of Public Diplomacy
    • /
    • v.1 no.1
    • /
    • pp.57-79
    • /
    • 2021
  • Drawing on the theories of mediated public diplomacy, intermedia agenda-building, and homophily, this study aims to compare the effectiveness of the public diplomacy efforts made by the Saudi and Qatari governments during the Gulf diplomatic crisis. The study examines the respective international agenda-building influence of the state-sponsored media from the two competing Gulf states on the regional and international media's coverage of the crisis. Results show that, compared to Saudisponsored Al Arabiya, Qatari-sponsored Al Jazeera was more effective in shaping the agendas of the regional and international media. Whereas Al Arabiya has a weak first-level agenda-building influence and a moderate-to-strong influence at the second and the third levels, Al Jazeera demonstrates a strong agenda-building influence on the foreign media outlets at all of the three levels. We also analyze the impact of political proximity and the language of the media content (English or Arabic) on the agenda-building relationships. Still, the results suggest that, compared to Al Arabiya, Al Jazeera was more successful in shaping the agendas of the regional and international news media-no matter where they are based in the allied or the opposing countries. Also, we observe a higher level of consistency between Arabic- and English-language content in Al Jazeera.

When 5004 is Said "Five Thousand Zero Hundred Remainder Four": The Influence of Language on Natural Number Transcoding: Cross-National Comparison

  • Nguyen, Hien Thi-Thu;Gregoire, Jacques
    • Research in Mathematical Education
    • /
    • v.18 no.2
    • /
    • pp.149-170
    • /
    • 2014
  • The Vietnamese language has a specific property related to the zero in the name-number system. This study was conducted to examine the impact of linguistic differences and of the zero's position in a number on a transcoding task (verbal number into Arabic number). Vietnamese children and French-speaking Belgian children, from grades 3 to 6, participated in the study. The success rate and the type of errors they made varied, depending on their grade and language. At Grade 4, Vietnamese children showed performances equivalent to Grade 6 Belgian children. Our results confirmed the support provided by language to the understanding and performances in a transcoding task. Results also showed that a syntactic zero is easier to manipulate than a lexical zero for Vietnamese children. The relative influence of language and the source of errors are discussed.

Global History: Understanding Islamic Astronomy

  • LOHLKER, RUDIGER
    • Acta Via Serica
    • /
    • v.4 no.2
    • /
    • pp.97-118
    • /
    • 2019
  • This study presents a new conceptualization of the history of Islamic astronomy. Islamic history is an embedded global cultural phenomenon and will be analyzed at different levels: a) the history of institutional aspects (observatories, including buildings), b) instruments, c) manuscripts, and d) scholars. This phenomenon will be analyzed as a multi-lingual phenomenon with Arabic as the language of sciences as a starting point. Although this is not a study of a geographical region in a narrow sense, it is a historical note on the entanglement of research written in Arabic, Persian and other languages and contextualized in a framework reaching geographically far beyond the confines of the Islamic world and being part of global history.

The Use of MSVM and HMM for Sentence Alignment

  • Fattah, Mohamed Abdel
    • Journal of Information Processing Systems
    • /
    • v.8 no.2
    • /
    • pp.301-314
    • /
    • 2012
  • In this paper, two new approaches to align English-Arabic sentences in bilingual parallel corpora based on the Multi-Class Support Vector Machine (MSVM) and the Hidden Markov Model (HMM) classifiers are presented. A feature vector is extracted from the text pair that is under consideration. This vector contains text features such as length, punctuation score, and cognate score values. A set of manually prepared training data was assigned to train the Multi-Class Support Vector Machine and Hidden Markov Model. Another set of data was used for testing. The results of the MSVM and HMM outperform the results of the length based approach. Moreover these new approaches are valid for any language pairs and are quite flexible since the feature vector may contain less, more, or different features, such as a lexical matching feature and Hanzi characters in Japanese-Chinese texts, than the ones used in the current research.