Article

Enabling Arabic Database Querying via Parameter-Efficient Fine-Tuning of Large Language Models

Recent advancements in Natural Language Processing (NLP) and Text-to-SQL systems have enabled easier interaction with relational databases. However, most solutions focus on English, leaving Arabic underrepresented. This study addresses that gap by fine-tuning the Llama-3-SQLCoder model to convert Arabic text into correct and executable SQL, enabling non-technical users to work with databases without learning SQL syntax. We enhanced the model using Low-Rank Adaptation (LoRA) and Unsloth, training it on Arabic questions paired with SQL queries from the Northwind database. To support low-resource environments, the model was converted to the GGUF format, reducing computational requirements while preserving performance. Evaluation results showed an execution accuracy of 90.24% and a validity rate of 97.56%, outperforming the zero-shot baseline (44% and 80%). The model also achieved an Exact Match score of 32% and an F1 score of 0.83, compared to 12% and 0.61 for the baseline. These findings demonstrate that LoRA and Unsloth are effective for adapting SQL-specialized models to Arabic. Despite these improvements, the system still struggles with complex nested queries and dialectal variations, indicating areas for future work. Overall, this study contributes to narrowing the gap between Arabic and other languages in Text-to-SQL research and improves database accessibility for non-technical users.

...
Mohammed Taj
Department of Information System, Faculty of Computer and Information Technology, Sana’a University, Sana’a, Yemen,
...
Mohammed Zayed
Department of Computer Science, Faculty of Computer and Information Technology, Sana’a University, Sana’a, Yemen
...
Abdulwahid Alhetar
Department of Information System, Faculty of Computer and Information Technology, Sana’a University, Sana’a, Yemen
...
Mohammed Rajeh
Department of Information System, Faculty of Computer and Information Technology, Sana’a University, Sana’a, Yemen
...
Mohammed Abbas Al-Sharafi
Department of Information System, Faculty of Computer and Information Technology, Sana’a University, Sana’a, Yemen
...
Basem Abdulrhman Munassar
Department of Information System, Faculty of Computer and Information Technology, Sana’a University, Sana’a, Yemen
11400
Rayhan, et al., "Natural Language Processing:
11401
Transforming How Machines Understand
11402
Human Language.," Conference: The
11403
development of Artificial General Intelligence,
11404
D. Gao et al., “Text-to-SQL Empowered by
11405
Large Language Models: A Benchmark
11406
Evaluation,” Proc. VLDB Endow., vol. 17, no. 5,
11407
pp. 1132–1145, Jan. 2024,doi:
11408
14778/3641204.3641221.
11409
X. Zhu, Q. Li, L. Cui, and Y. Liu, “Large
11410
Language Model Enhanced Text-to-SQL
11411
Generation: A Survey,” Oct. 08, 2024. doi:
11412
48550/arxiv.2410.06011.
11413
S. Almohaimeed, S. Almohaimeed, M. Ghanim,
11414
and L. Wang, “Ar-Spider: Text-to-SQL in
11415
Arabic,” Feb. 22, 2024. doi:
11416
48550/arxiv.2402.15012.
11417
Z. Hong et al., “Next-Generation Database
11418
Interfaces: A Survey of LLM-based Text-toSQL,” June 12, 2024. doi:
11419
48550/arxiv.2406.08426.
11420
P. Shi et al., “Cross-lingual Text-to-SQL
11421
Semantic Parsing with Representation Mixup,”
11422
Association for Computational Linguistics, Jan.
11423
, pp. 5296–5306. doi:
11424
18653/v1/2022.findings-emnlp.388.
11425
S. Chafik, S. Ezzini, and I. Berrada,
11426
“Dialect2SQL: A Novel Text-to-SQL Dataset for
11427
Arabic Dialects with a Focus on Moroccan
11428
Darija,” Jan. 20, 2025. doi:
11429
48550/arxiv.2501.11498.
11430
Aswin Ak, "Defog AI Introduces LLama-3-
11431
based SQLCoder-8B: A State-of-the-Art AI
11432
Model for Generating SQL Queries from Natural
11433
Language,"
11435
g-ai-introduces-llama-3-based-sqlcoder-8b-astate-of-the-art-ai-model-for-generating-sqlqueries-from-natural-language/, 2024.
11436
A. Agrahari, A. Gautam, P. K. Ojha, and P.
11437
Singh, “SFT For Improved Text-to-SQL
11438
Translation,” Feb. 13, 2024, Mdpi Ag. doi:
11439
20944/preprints202402.0693.v1.
11440
E. Hu et al., “LoRA: Low-Rank Adaptation of
11441
Large Language Models,” June 17, 2021. doi:
11442
48550/arxiv.2106.09685.
11443
Unsloth.ai, "Fine-tuning LLMs guide," 2025.
11445
T. Shi, K. Tatwawadi, K. Chakrabarti, Y. Mao,
11446
O. Polozov, and W. Chen, “IncSQL: Training
11447
Incremental Text-to-SQL Parsers with NonDeterministic Oracles,” Sept. 13, 2018. doi:
11448
48550/arxiv.1809.05054.
11449
V. Zhong, C. Xiong, and R. Socher, “Seq2SQL:
11450
Generating Structured Queries from Natural
11451
Language using Reinforcement Learning,” Aug.
11452
, 2017. doi: 10.48550/arxiv.1709.00103.
11453
T. Yu et al., “Spider: A Large-Scale HumanLabeled Dataset for Complex and Cross-Domain
11454
Semantic Parsing and Text-to-SQL Task,”
11455
Association for Computational Linguistics, Jan.
11456
doi: 10.18653/v1/d18-1425.
11457
M. Shah, "Transforming Natural Language Text
11458
to SQL: Harnessing RAG and LLMs for
11459
Precision Querying," medium, 2024.
11460
A. Vaswani et al., “Attention Is All You Need,”
11461
June 12, 2017. doi: 10.48550/arxiv.1706.03762.
11462
L. Xue et al., “mT5: A massively multilingual
11463
pre-trained text-to-text transformer,” Oct. 22,
11464
, Cornell University. doi:
11465
48550/arxiv.2010.11934.
11466
E. Manjavacas et al., “BLOOM: A 176BParameter Open-Access Multilingual Language
11467
Model,” Nov. 09, 2022, Cornell University. doi:
11468
48550/arxiv.2211.05100.
11469
N. Wretblad, F. Riseby, R. Biswas, A. Ahmadi,
11470
and O. Holmström, “Understanding the Effects
11471
of Noise in Text-to-SQL: An Examination of the
11472
BIRD-Bench Benchmark,” Feb. 19, 2024. doi:
11473
48550/arxiv.2402.12243.
11474
AI@Meta, "Introducing Meta Llama 3: The
11475
most capable openly available LLM to date,"
11476
[Online]. Available:
11478
Hugging Face, "GGUF," GGUF, [Online].
11479
Available:
11481
J. Lee et al., “Emergent Abilities of Large
11482
Language Models,” June 15, 2022, Cornell
11483
University. doi: 10.48550/arxiv.2206.07682.
11484
HuggingFace, "PEFT Documentation.,"
11485
[Online]. Available:
11487
a. a. tatsu, "stanford_alpaca," 2023. [Online].
11489
W. Antoun, F. Baly, and H. Hajj, “AraBERT:
11490
Transformer-based Model for Arabic LanguageUnderstanding,” Feb. 28, 2020. doi:
11491
48550/arxiv.2003.00104.
11492
T. Dettmers, S. Shleifer, M. Lewis, and L.
11493
Zettlemoyer, “8-bit Optimizers via Block-wise
11494
Quantization,” Oct. 06, 2021, Cornell
11495
University. doi: 10.48550/arxiv.2110.02861.
11496
T. Brown et al., “Language Models are FewShot Learners,” May 28, 2020, Cornell
11497
University. doi: 10.48550/arxiv.2005.14165.
11498
X. Xu, C. Liu, and D. Song, “SQLNet:
11499
Generating Structured Queries From Natural
11500
Language Without Reinforcement Learning,”
11501
Nov. 13, 2017. doi: 10.48550/arxiv.1711.04436.
11502
Y. Huang et al., “Exploring the Landscape of
11503
Text-to-SQL with Large Language Models:
11504
Progresses, Challenges and Opportunities,” May
11505
, 2025. doi: 10.48550/arxiv.2505.23838.
Download data is not yet available.

Metrics

0
Views
0
Downloads
0
Citations

How to Cite

Enabling Arabic Database Querying via Parameter-Efficient Fine-Tuning of Large Language Models. (2026). Sana’a University Journal of Applied Sciences and Technology, 4(1), 1507-1519. https://doi.org/10.59628/jast.v4i1.2248

Similar Articles

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)