School of Mathematical and Data Sciences faculty member, Dr. Vito D'Orazio, wins Best Paper at the SBP-BRims 2024 conference at Carnegie Mellon University.
His paper, titled "Extractive Question Answering for Spanish and Arabic Political Text", advances the integration of domain-specific large language models (LLMs) for low-resource languages with applications for question-answering (QA).
Leveraging on recent LLMs trained to extract events of political violence and conflict, we introduce ConfliBERT-Arabic and ConfliBERT-Spanish, fine-tuned for extractive QA. Contributions include tailored QA fine-tuning techniques for Arabic and Spanish, curation of five datasets, and a comprehensive performance analysis. These new models provide language and domain-specific enhancements over extant models trained on general corpora. Substantively, these tools allow implementation of high-quality QA about conflict and violence in multiple world regions in their native languages.