Pakistan to Develop Urdu LLM for Generative AI

National University of Science and Technology (NUST), National Information Technology Board (NITB) and Telecom network operator Jazz have signed a Memorandum of Understanding (MOU) to develop Pakistan’s first indigenous Large Language Model (LLM) with focus on Urdu, including datasets for Pashto and Punjabi languages. It is aimed at empowering individuals, businesses, and organizations with advanced AI tools in their native languages. The envisioned LLM is expected to drive innovation in Generative AI applications, boosting productivity and accessibility in critical sectors like healthcare, education, and agriculture.

GPT-4 Accuracy Scores. Source: The Economist

Generative AI tools such as ChatGPT are powered by large language models, or LLMs. These models need to be trained on vast amounts of data in specific languages to be useful. Unfortunately, the Urdu content of the Internet is less than 0.1%. This will present a challenge for the developers of Urdu LLMs.

Online Content of Various Languages. Source: W3Techs 

Lack of Urdu content available for training ChatGPT affects the accuracy of the results for Urdu language users. For example, the GPT-4 accuracy score in question-answer tests in Urdu is just over 70%, compared with 85% accuracy score in the English language, according to data from OpenAI. Other South Asian languages, including Hindi, Bengali, Punjabi, Marathi and Telugu, suffer from the same problem. 

It's not just a South Asian problem. These challenges exist in the developing world. Non-European languages are generally poorly represented online. It's a major obstacle for non-European nations in developing their own generative artificial-intelligence (AI) models, which rely on vast amounts of training data. Generative artificial intelligence (AI) can produce biased results due to a number of factors, including the data it's trained on, the algorithms used, and how it's deployed. 

The use of AI in developing nations such as Pakistan will remain limited to a small number of people proficient in the use of the English language. Broadening the adoption of AI applications will require LLMs trained on local language content. The absence of this development could cost Pakistan the opportunity to take full advantage of the AI Revolution

Views: 29

Comment by Riaz Haq on November 8, 2024 at 8:52am

VEON’s Jazz Launches FikrFree: An AI-Powered Digital


https://www.globenewswire.com/news-release/2024/10/24/2968536/0/en/...

VEON Ltd. (Nasdaq: VEON, Euronext Amsterdam: VEON), a global digital operator (“VEON” or the “Company”), today announces that Jazz, its digital operator in Pakistan, has launched FikrFree, a new AI-powered digital marketplace for insurance and healthcare. The platform aims to bridge a significant gap in Pakistan, where insurance sector penetration is less than 1% of GDP according to the Securities and Exchange Commission of Pakistan, and millions lack access to essential healthcare. In comparison, insurance penetration in other countries is significantly higher (over 7% of GDP in the US and more than 9% of GDP in the UK, according to the World Bank). FikrFree helps users find accessible and affordable coverage through personalized insurance plans and healthcare services.

FikrFree aims to reach the underserved healthcare market in Pakistan through an innovative platform that seamlessly integrates insurance, healthcare, and financial services all in one mobile app. FikrFree also leverages artificial intelligence to recommend personalized insurance plans for customers. The new digital service builds on VEON’s commitment to creating innovative digital solutions as part of its Digital Operator 1440 strategy, offering customers a portfolio of connected services that are relevant for each of the 1,440 minutes in a day. In 2Q24, direct digital revenues represented over 10% of VEON Group’s total revenues.

"Access to affordable healthcare is a fundamental need. In Pakistan, where millions struggle to find suitable insurance coverage and healthcare services, VEON is addressing this challenge with connected digital services. With the launch of FikrFree, we are empowering customers to access personalized insurance plans, specialist doctors, and on-demand medicine delivery—all in one seamless platform. Our digital operator strategy focuses on investing in services that enhance lives, and with FikrFree, we aim to make affordable healthcare accessible to all Pakistanis," says Kaan Terzioglu, CEO of VEON Group.

Comment by Riaz Haq on November 8, 2024 at 8:58am

UNODC Pakistan provided Law Enforcement with Cutting-Edge Training on Crime Analytics and AI Models to Counter Terrorism


https://www.unodc.org/copak/en/Stories/SP4/unodc-pakistan-provided-...


28 September 2024, Islamabad - UNODC Pakistan organized a comprehensive workshop aimed at building the capacity of National Counter Terrorism Authority analyst’s in using advanced crime analytics and artificial intelligence (AI) to combat terrorism. The workshop covered a wide range of critical topics, equipping participants with the skills and knowledge needed to analyze data and counter terrorism through innovative AI techniques. In total 25 analysts including 7 women participated in the training session.

The participants were introduced to the fundamentals of intelligence gathering, the intelligence cycle, and the development of intelligence products. Practical discussions were held around strategic intelligence and its pivotal role in decision-making. Participants also reviewed products developed in earlier training sessions on i2 Analyst's Notebook and Power BI, enabling them to grasp how past learnings integrate with the current focus on terrorism prevention. The workshop covered data analysis, beginning with an introduction to various data forms and their relevance in crime intelligence. Sessions covered both qualitative and quantitative data, with participants learning how to distinguish between structured and unstructured data and their real-world applications in intelligence work.

The hands-on segment includes Textalyser, an online tool used to analyze qualitative data specially for conducting sentimental analysis allowing participants to experiment with real-world examples. Participants were engaged through thought-provoking case studies, including analyses of social media sentiment and notable incidents such as the Al Qaeda network and the Sialkot lynching case. These examples highlighted the practical value of AI tools like Voyant in unraveling criminal networks and understanding public sentiment related to terrorist activities.

The overall workshop was dedicated to hands-on sessions with low-code and no-code AI platforms, empowering participants to leverage AI without the need for extensive programming knowledge. Practical exercises included case studies using Google Teachable Machines for image classification and Google Cloud AutoML for predictive crime analytics, both of which offer powerful tools for identifying criminal patterns and behaviors in complex datasets.

The workshop concluded with a closing session that recapped the key learnings and allowed participants to discuss the next steps in their professional development.

Comment by Riaz Haq on November 8, 2024 at 7:35pm

Generalists vs. Specialists: Evaluating Large Language Models for Urdu


https://arxiv.org/html/2407.04459v1

In this paper, we compare general-purpose pretrained models, (OpenAI's) GPT-4-Turbo and (Meta/Facebook) Llama-3-8b-Instruct with special-purpose models fine-tuned on specific tasks, XLM-Roberta-large, mT5-large, and Llama-3-8b-Instruct. We focus on seven classification and six generation tasks to evaluate the performance of these models on Urdu language. Urdu has 70 million native speakers, yet it remains underrepresented in Natural Language Processing (NLP). Despite the frequent advancements in Large Language Models (LLMs), their performance in low-resource languages, including Urdu, still needs to be explored. We also conduct a human evaluation for the generation tasks and compare the results with the evaluations performed by GPT-4-Turbo and Llama-3-8b-Instruct. We find that special-purpose models consistently outperform general-purpose models across various tasks. We also find that the evaluation done by GPT-4-Turbo for generation tasks aligns more closely with human evaluation compared to the evaluation by Llama-3-8b-Instruct. This paper contributes to the NLP community by providing insights into the effectiveness of general and specific-purpose LLMs for low-resource languages.

Comment

You need to be a member of PakAlumni Worldwide: The Global Social Network to add comments!

Join PakAlumni Worldwide: The Global Social Network

Pre-Paid Legal


Twitter Feed

    follow me on Twitter

    Sponsored Links

    South Asia Investor Review
    Investor Information Blog

    Haq's Musings
    Riaz Haq's Current Affairs Blog

    Please Bookmark This Page!




    Blog Posts

    Barrick Gold CEO "Super-Excited" About Reko Diq Copper-Gold Mine Development in Pakistan

    Barrick Gold CEO Mark Bristow says he’s “super excited” about the company’s Reko Diq copper-gold development in Pakistan. Speaking about the Pakistani mining project at a conference in the US State of Colorado, the South Africa-born Bristow said “This is like the early days in Chile, the Escondida discoveries and so on”, according to Mining.com, a leading industry publication. "It has enormous…

    Continue

    Posted by Riaz Haq on November 19, 2024 at 9:00am

    What Can Pakistan Do to Cut Toxic Smog in Lahore?

    Citizens of Lahore have been choking from dangerous levels of toxic smog for weeks now. Schools have been closed and outdoor activities, including travel and transport, severely curtailed to reduce the burden on the healthcare system.  Although toxic levels of smog have been happening at this time of the year for more than a decade, this year appears to be particularly bad with hundreds of people hospitalized to treat breathing problems. Millions of Lahoris have seen their city's air quality…

    Continue

    Posted by Riaz Haq on November 14, 2024 at 10:30am — 1 Comment

    © 2024   Created by Riaz Haq.   Powered by

    Badges  |  Report an Issue  |  Terms of Service