Enabling Arabic Database Querying via Parameter-Efficient Fine-Tuning of Large Language Models

Mohammed Taj; Mohammed Zayed; Abdulwahid Alhetar; Mohammed  Rajeh; Mohammed Abbas Al-Sharafi; Basem Abdulrhman Munassar

doi:10.59628/jast.v4i1.2248

Article

Enabling Arabic Database Querying via Parameter-Efficient Fine-Tuning of Large Language Models

Cover Image

PDF

Published 2026-01-29

DOI 10.59628/jast.v4i1.2248

Issue Vol. 4 No. 1 (2026): Sana'a University Journal of Applied Sciences and Technology

Section Article

Text-to-SQL Large Language Models (LLMs) Natural Language Processing (NLP) Parameter-Efficient Fine-Tuning (PEFT) Low-Rank Adaptation (LoRA) GGUF Quantization Unsloth

Recent advancements in Natural Language Processing (NLP) and Text-to-SQL systems have enabled easier interaction with relational databases. However, most solutions focus on English, leaving Arabic underrepresented. This study addresses that gap by fine-tuning the Llama-3-SQLCoder model to convert Arabic text into correct and executable SQL, enabling non-technical users to work with databases without learning SQL syntax. We enhanced the model using Low-Rank Adaptation (LoRA) and Unsloth, training it on Arabic questions paired with SQL queries from the Northwind database. To support low-resource environments, the model was converted to the GGUF format, reducing computational requirements while preserving performance. Evaluation results showed an execution accuracy of 90.24% and a validity rate of 97.56%, outperforming the zero-shot baseline (44% and 80%). The model also achieved an Exact Match score of 32% and an F1 score of 0.83, compared to 12% and 0.61 for the baseline. These findings demonstrate that LoRA and Unsloth are effective for adapting SQL-specialized models to Arabic. Despite these improvements, the system still struggles with complex nested queries and dialectal variations, indicating areas for future work. Overall, this study contributes to narrowing the gap between Arabic and other languages in Text-to-SQL research and improves database accessibility for non-technical users.

...

Mohammed Taj

Department of Information System, Faculty of Computer and Information Technology, Sana’a University, Sana’a, Yemen,

...

Mohammed Zayed

Department of Computer Science, Faculty of Computer and Information Technology, Sana’a University, Sana’a, Yemen

...

Abdulwahid Alhetar

Department of Information System, Faculty of Computer and Information Technology, Sana’a University, Sana’a, Yemen

...

Mohammed Rajeh

Department of Information System, Faculty of Computer and Information Technology, Sana’a University, Sana’a, Yemen

...

Mohammed Abbas Al-Sharafi

Department of Information System, Faculty of Computer and Information Technology, Sana’a University, Sana’a, Yemen

...

Basem Abdulrhman Munassar

Department of Information System, Faculty of Computer and Information Technology, Sana’a University, Sana’a, Yemen

11400

Rayhan, et al., "Natural Language Processing:

11401

Transforming How Machines Understand

11402

Human Language.," Conference: The

11403

development of Artificial General Intelligence,

11404

D. Gao et al., “Text-to-SQL Empowered by

11405

Large Language Models: A Benchmark

11406

Evaluation,” Proc. VLDB Endow., vol. 17, no. 5,

11407

pp. 1132–1145, Jan. 2024,doi:

11408

14778/3641204.3641221.

11409

X. Zhu, Q. Li, L. Cui, and Y. Liu, “Large

11410

Language Model Enhanced Text-to-SQL

11411

Generation: A Survey,” Oct. 08, 2024. doi:

11412

48550/arxiv.2410.06011.

11413

S. Almohaimeed, S. Almohaimeed, M. Ghanim,

11414

and L. Wang, “Ar-Spider: Text-to-SQL in

11415

Arabic,” Feb. 22, 2024. doi:

11416

48550/arxiv.2402.15012.

11417

Z. Hong et al., “Next-Generation Database

11418

Interfaces: A Survey of LLM-based Text-toSQL,” June 12, 2024. doi:

11419

48550/arxiv.2406.08426.

11420

P. Shi et al., “Cross-lingual Text-to-SQL

11421

Semantic Parsing with Representation Mixup,”

11422

Association for Computational Linguistics, Jan.

11423

, pp. 5296–5306. doi:

11424

18653/v1/2022.findings-emnlp.388.

11425

S. Chafik, S. Ezzini, and I. Berrada,

11426

“Dialect2SQL: A Novel Text-to-SQL Dataset for

11427

Arabic Dialects with a Focus on Moroccan

11428

Darija,” Jan. 20, 2025. doi:

11429

48550/arxiv.2501.11498.

11430

Aswin Ak, "Defog AI Introduces LLama-3-

11431

based SQLCoder-8B: A State-of-the-Art AI

11432

Model for Generating SQL Queries from Natural

11433

Language,"

11434

https://www.marktechpost.com/2024/05/15/defo

11435

g-ai-introduces-llama-3-based-sqlcoder-8b-astate-of-the-art-ai-model-for-generating-sqlqueries-from-natural-language/, 2024.

11436

A. Agrahari, A. Gautam, P. K. Ojha, and P.

11437

Singh, “SFT For Improved Text-to-SQL

11438

Translation,” Feb. 13, 2024, Mdpi Ag. doi:

11439

20944/preprints202402.0693.v1.

11440

E. Hu et al., “LoRA: Low-Rank Adaptation of

11441

Large Language Models,” June 17, 2021. doi:

11442

48550/arxiv.2106.09685.

11443

Unsloth.ai, "Fine-tuning LLMs guide," 2025.

11444

[Online]. Available: https://docs.unsloth.ai/getstarted/fine-tuning-llms-guide.

11445

T. Shi, K. Tatwawadi, K. Chakrabarti, Y. Mao,

11446

O. Polozov, and W. Chen, “IncSQL: Training

11447

Incremental Text-to-SQL Parsers with NonDeterministic Oracles,” Sept. 13, 2018. doi:

11448

48550/arxiv.1809.05054.

11449

V. Zhong, C. Xiong, and R. Socher, “Seq2SQL:

11450

Generating Structured Queries from Natural

11451

Language using Reinforcement Learning,” Aug.

11452

, 2017. doi: 10.48550/arxiv.1709.00103.

11453

T. Yu et al., “Spider: A Large-Scale HumanLabeled Dataset for Complex and Cross-Domain

11454

Semantic Parsing and Text-to-SQL Task,”

11455

Association for Computational Linguistics, Jan.

11456

doi: 10.18653/v1/d18-1425.

11457

M. Shah, "Transforming Natural Language Text

11458

to SQL: Harnessing RAG and LLMs for

11459

Precision Querying," medium, 2024.

11460

A. Vaswani et al., “Attention Is All You Need,”

11461

June 12, 2017. doi: 10.48550/arxiv.1706.03762.

11462

L. Xue et al., “mT5: A massively multilingual

11463

pre-trained text-to-text transformer,” Oct. 22,

11464

, Cornell University. doi:

11465

48550/arxiv.2010.11934.

11466

E. Manjavacas et al., “BLOOM: A 176BParameter Open-Access Multilingual Language

11467

Model,” Nov. 09, 2022, Cornell University. doi:

11468

48550/arxiv.2211.05100.

11469

N. Wretblad, F. Riseby, R. Biswas, A. Ahmadi,

11470

and O. Holmström, “Understanding the Effects

11471

of Noise in Text-to-SQL: An Examination of the

11472

BIRD-Bench Benchmark,” Feb. 19, 2024. doi:

11473

48550/arxiv.2402.12243.

11474

AI@Meta, "Introducing Meta Llama 3: The

11475

most capable openly available LLM to date,"

11476

[Online]. Available:

11477

https://ai.meta.com/blog/meta-llama-3/.

11478

Hugging Face, "GGUF," GGUF, [Online].

11479

Available:

11480

https://huggingface.co/docs/transformers/gguf.

11481

J. Lee et al., “Emergent Abilities of Large

11482

Language Models,” June 15, 2022, Cornell

11483

University. doi: 10.48550/arxiv.2206.07682.

11484

HuggingFace, "PEFT Documentation.,"

11485

[Online]. Available:

11486

https://huggingface.co/docs/peft/en/index.

11487

a. a. tatsu, "stanford_alpaca," 2023. [Online].

11488

Available: https://github.com/tatsulab/stanford_alpaca.

11489

W. Antoun, F. Baly, and H. Hajj, “AraBERT:

11490

Transformer-based Model for Arabic LanguageUnderstanding,” Feb. 28, 2020. doi:

11491

48550/arxiv.2003.00104.

11492

T. Dettmers, S. Shleifer, M. Lewis, and L.

11493

Zettlemoyer, “8-bit Optimizers via Block-wise

11494

Quantization,” Oct. 06, 2021, Cornell

11495

University. doi: 10.48550/arxiv.2110.02861.

11496

T. Brown et al., “Language Models are FewShot Learners,” May 28, 2020, Cornell

11497

University. doi: 10.48550/arxiv.2005.14165.

11498

X. Xu, C. Liu, and D. Song, “SQLNet:

11499

Generating Structured Queries From Natural

11500

Language Without Reinforcement Learning,”

11501

Nov. 13, 2017. doi: 10.48550/arxiv.1711.04436.

11502

Y. Huang et al., “Exploring the Landscape of

11503

Text-to-SQL with Large Language Models:

11504

Progresses, Challenges and Opportunities,” May

11505

, 2025. doi: 10.48550/arxiv.2505.23838.

Download data is not yet available.

Metrics

Views

Downloads

Citations

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

How to Cite

Enabling Arabic Database Querying via Parameter-Efficient Fine-Tuning of Large Language Models. (2026). Sana’a University Journal of Applied Sciences and Technology, 4(1), 1507-1519. https://doi.org/10.59628/jast.v4i1.2248