Text-Based and Speech-Based Automatic Dialect Identification for the Arabic Language


Fri, 09/02/2022 - 3:00pm


Part of the Fall 2022 Middle Eastern and North African Studies Colloquium Series

Elsayed Issa, PhD Candidate in Linguistics and MENAS, University of Arizona


Headshot of Elsayed Issa



Dialect Identification (DID) is a special case of Language Identification (LID), that presents specific challenges and problems related to the linguistic similarity between dialects. Even though LID can be considered a well-understood problem, closely related dialects and language varieties still pose significant challenges for their automatic recognition. Several workshops (WANLP) and challenges (VardDial, MGB) have contributed to improve identification results by attracting re- searchers to this topic of study. This talk presents two published papers on Arabic Dialect identification (ADI). The First paper was published in WANLP workshop 2021. It investigates the value of augmenting recurrent neural networks with feature engineering for the Second Nuanced Arabic Dialect Identification (NADI) Subtask 1.2: Country-level DA identification. The performance of a simple word-level LSTM using pretrained embeddings is compared with one enhanced using feature embeddings for engineered linguistic features. Results show that the addition of explicit features to the LSTM is detrimental to performance. The second paper presents a full end-to-end pipeline for ADI using intonation patterns and acoustic representations. Results of the experiments show that intonation patterns for Arabic dialects provide sufficient information to achieve state-of-the-art results on the VarDial 17 ADI datatset, outperforming single-feature systems. The authors conjecture on the importance of sufficient information as a criterion for optimality in a deep learning ADI task.



Elsayed Issa is currently a Ph.D. candidate specializing in Arabic linguistics at the School of Middle Eastern and North African Studies at the University of Arizona. He obtained his M.A. degree in Machine Translation from Alexandria University in Egypt. His thesis involved designing software for translating simple English sentences into their Arabic equivalents. He obtained another M.S. degree from the Human Language Technology (HLT) program at the Linguistics Department at the University of Arizona. His research interests include phonology, morphology, natural language processing, machine learning, blended learning, and education technology.

This event is hybrid, and will be held over Zoom and in-person in Marshall 490. Masks are not required but are strongly recommended.

Here is the Zoom meeting link:

Julie M Ellison-Speight is inviting you to a scheduled Zoom meeting.
This meeting was created in a non-BAA environment and is not intended for the discussion of healthcare, health education, or health data research. 
Topic: MENAS Colloquium Talk: Sept 2 Issa
Time: Sep 2, 2022 03:00 PM Arizona
Join Zoom Meeting
Password: 678672
One tap mobile
+16027530140,,87920996774# US (Phoenix)
+13462487799,,87920996774# US (Houston)
Dial by your location
        +1 602 753 0140 US (Phoenix)
        +1 346 248 7799 US (Houston)
        +1 669 900 6833 US (San Jose)
        +1 253 215 8782 US (Tacoma)
        +1 312 626 6799 US (Chicago)
        +1 646 876 9923 US (New York)
        +1 301 715 8592 US (Washington DC)
Meeting ID: 879 2099 6774
Find your local number: https://arizona.zoom.us/u/kJPvfLghz
Join by SIP
Join by H.323 (US West) (US East) (India Mumbai) (India Hyderabad) (Amsterdam Netherlands) (Germany) (Australia Sydney) (Australia Melbourne) (Singapore) (Brazil) (Mexico) (Canada Toronto) (Canada Vancouver) (Japan Tokyo) (Japan Osaka)
Meeting ID: 879 2099 6774
Password: 678672
Join by Skype for Business


To request disability-related accommodations that would ensure your full participation in this event, please contact: jellison@arizona.edu