Welcome to NLPAI 2024

2nd International Conference on NLP & AI (NLPAI 2024)

March 09 ~ 10, 2024, Virtual Conference



Accepted Papers
Semantic Question Generation a Proposed Methodology and a Case Study for Generating Algorithmic Question Pool

Sumayyah Alamoudi and Amany Alnahdi, Department of Computer Science, King Abdulaziz University, Jeddah, KSA

ABSTRACT

Assessment of student performance is one of the most important tasks in the educational process. Thus, formulating questions and creating tests takes the instructor a lot of time and effort. However, the time spent for learning acquisition and on exam preparation could be utilized in better ways. With the technical development in representing and linking data, ontologies have been used in academic fields to represent the terms in a field by defining concepts and categories classifies the subject. Also, the emergence of such methods that represent the data and link it logically contributed to the creation of methods and tools for creating questions. These tools can be used in existing learning systems to provide effective solutions to assist the teacher in creating test questions. This research paper proposes a semantic methodology for automating question generation and an application for Algorithms domain. The approach is expected to help educators in practicing using pool of automatically generated questions in specific topics.

KEYWORDS

Ontology-based approach, Automatic question generation, Education, Algorithms, E-learning, assessment.


Cryptocurrency Wallets and Digital Artefacts: a Primer for Law Enforcement AgenciesThe Brain’s Basic Functional Circuit The Functional Unit of the BrainCharacter-based Pre-trained Language Model for Arabic Language Understanding

Abdulelah Alkesaiberi1, Ali Alkhathlan1, and Ahmed Abdelali2, 1Department of Computer Science, King Abdul Aziz University, Jeddah, Saudi Arabia. 2National Center for AI, SDAIA, Riyadh, Saudi Arabia

ABSTRACT

State-of-the-art advancements in Natural Language Processing have significantly improved machine ability to understand natural language. However, as language models progress, they require continuous architectural enhancements and different approaches to text processing. One significant challenge stems from the rich diversity of languages, each characterized by its distinctive grammar resulting in a decreased accuracy of language models for specific languages specially for low resources languages. This limitation is exacerbated by the reliance of existing NLP models on rigid tokenization methods, rendering them susceptible to issues with previously unseen or infrequent words. Additionally, models based on word and subword tokenization are vulnerable to minor typographical errors, whether they occur naturally or result from adversarial misspellings.To address these challenges, this paper presents the utilization of a recently proposed free tokenization method, such as Cannine, to enhance the comprehension of natural language. Specifically, we employ this method to develop an Arabic free tokenization language model. In this research, we will precisely evaluate our model’s performance across a range of eight tasks using the Arabic Language Understanding Evaluation (ALUE) benchmark. Furthermore, we will conduct a comparative analysis, pitting our free-tokenization model against existing Arabic language models that rely on sub-word tokenization. By making our pre-training and fine-tuning models accessible to the Arabic NLP community, we aim to facilitate the replication of our experiments and contribute to the advancement of Arabic language processing capabilities.

KEYWORDS

NLP, Free-Tokeniziation, Large Language Model, Arabic Language.


Improving the Capabilities of Large Language Model Based Marketing Analytics Copilots With Semantic Search and Finetuning

Yilin Gao1, Sai Kumar Arava2, Yancheng Li2, and James W Snyder Jr2, 1Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA, 2Adobe Inc., San Jose, California, USA

ABSTRACT

Artificial intelligence (AI) is widely deployed to solve problems related to marketing attribution and budget optimization. However, AI models can be quite complex, and it can be difficult to understand model workings and insights without extensive implementation teams. In principle, recently developed large language models (LLMs), like GPT-4, can be deployed to provide marketing insights, reducing the time and effort required to make critical decisions. In practice, there are substantial challenges that need to be overcome to reliably use such models. We focus on domain-specific question-answering, SQL generation needed for data retrieval, and tabular analysis and show how a combination of semantic search, prompt engineering, and fine-tuning can be applied to dramatically improve the ability of LLMs to execute these tasks accurately. We compare both proprietary models, like GPT-4, and open-source models, like Llama-2-70b, as well as various embedding methods. These models are tested on sample use cases specific to marketing mix modeling and attribution.

KEYWORDS

Generative AI, Large Language Models, Semantic Search, Fine-tuning, Marketing.


Direct Punjabi to English Speech Translation Using Discrete Units

Prabhjot Kaur1, L. Andrew M. Bush2, and Weisong Shi3, 1Wayne State University, USA, 2Utah State University, USA, 3University of Delaware, USA

ABSTRACT

Speech-to-speech translation is yet to reach the same level of coverage as text-to-text translation systems. The current speech technology is highly limited in its coverage of over 7000 languages spoken worldwide, leaving more than half of the population deprived of such technology and shared experiences. With voice-assisted technology (such as social robots and speech-to-text apps) and auditory content (such as podcasts and lectures) on the rise, ensuring that the technology is available for all is more important than ever. Speech translation can play a vital role in mitigating technological disparity and creating a more inclusive society. With a motive to contribute towards speech translation research for low-resource languages, our work presents a direct speech-to-speech translation model for one of the Indic languages called Punjabi to English. Additionally, we explore the performance of using a discrete representation of speech called discrete acoustic units as input to the Transformer-based translation model. The model, abbreviated as Unit to Unit Translation (U2UT), takes a sequence of discrete units of the source language (the language being translated from) and outputs a sequence of discrete units of the target language (the language being translated to). Our results show that the U2UT model performs better than the Speech to Unit Translation (S2UT) model by a 3.69 BLEU score.

KEYWORDS

Direct speech-to-speech translation; Natural Language Processing (NLP), Deep Learning, Transformer.


Enhancing Stability and Performance: AI driven Single Optimization of Solid Lipid Microparticles

Mohamed Kouider Amar1 and Mohamed Hentabli2, 1Biomaterials and Transport Phenomena Laboratory (LBMPT), University Yahia Fares of Medea, Medea 26000, Algeria, 2Department of Process Engineering, Institute of Technology, University Dr. Yahia Fares of Medea, Medea 26000, Algeria

ABSTRACT

This study explores the complex behavior of solid lipid microparticle (SLM) formulations. It reveals significant findings on the transient flow behavior of SLMs. A crucial aspect of the investigation was to establish a model that connects fatty amphiphiles and shear stress. This was achieved through start-up flow experiments using a hybrid Artificial Neural Network-AntLion Optimizer (ANN-ALO) algorithm. The pretreatment step established a strong correlation between stress and shear rate. Cetyl alcohol was found to have a significant positive effect on stress levels. Cetyl palmitate, on the other hand, showed a slight negative correlation with stress. The application of ANN-ALO yielded promising results. The coefficient of determination for the tested dataset was exceptionally high, measuring 0.9972. Furthermore, key statistics were assessed, revealing a MAE of 1.75, a MSE of 9.06, and a RMSE of 3.01 for the test dataset. The study involved the concatenation of a global optimization process, where the ALO algorithm was utilized in conjunction with the optimized ANN model to identify the optimal combinations of fatty amphiphiles. The results displayed a range of optimal parameter sets with diverse ratios of fatty alcohols in these formulations, highlighting the flexibility in design and offering multiple options to achieve the desired characteristics. In summary, these modeling and optimization processes contributed to the understanding of the transient flow behavior of SLMs and introduced an effective methodology for its optimization. The findings have practical implications for the development of SLMs, and the proposed approach enables the creation of innovative and effective formulations.

KEYWORDS

Solid lipid microparticles, Rheology, Transient shear flow, Artificial Neural Network-Antlion Optimizer, Modeling, Global optimization.