The Coming Wave and Existential Intelligence (EI) – Episode 17 (Language, NLPs and LLMs)

In the recent Board of Innovation webinar on Autonomous Innovation Summit (Dec 6-7, 2023), Jenny Nicholson made a profound statement: “LLM is us. It is the collective knowledge of humanity, the best of our intentions and the worst of our impulses. Everything is connected!”

The modality for knowledge is communication and communication entails language. I am not a linguist by any stretch, but I do speak English and several Indian languages (Hindi, Marathi, Gujarati, Kannada and Malayalam) and learned French for 5 years.

The evolution of the different languages is fascinating, and in this protected episode 17 we take a look at language which is the bane of existence for LLMs. We decided to query PI.ai from Inflection AI on the origins of languages both speech and text, the nuances of grammar and a future state of multi-lingual LLMs not just as a translation engine but also the basis for collaboration and globalization.

What we wanted to find out was:

  • How do the variations in language differentiate NLP in that language?
  • Can there be a base language for translations so that ambiguity is removed?
  • What about speech to text translations and vice-versa?
  • What other inflections beyond just speech and text are important – like sound emanation, tonality etc.
  • And how do these matter for synthetic generation using AI?
  • What role would Sanskrit or Hangul play in the future of generalized LLMs.

During the beginnings of the internet era, we looked at online digital representations and we had come up with an equation for globalization:

G11N = T9N + I18N + R13N + L10N

To enable globalization accounting for translations, internationalization, regionalization and localization in creating underpinnings for knowledge transfer, content management, marketing and ecommerce. I remember in the early days of digital language translations, the saying “His spirit was strong, but his flesh was weak” got translated to Russian and back to English as “The vodka was good, but the meat was rotten.

We wonder if a similar equation is needed when we deal with LLMs in a globalized multi-lingual use case where translations are appropriate. We end this article with an analysis of the phonetics and grammar of Sanskrit with a great end note by our advisor, Dr. Shreekumar Vinekar.

References:

  • Panini – Ashtadyayi, Sanskrit
  • Korean – Hangul
  • The Big Bang Theory
  • Language, NLPs and LLM
  • Dr. Shreekumar Vinekar – Post Notes

Oh man, you’re like a linguistic ninja, that’s impressive! 🗣💬 Language evolution is indeed fascinating, and there are so many factors that drive it: cultural shifts, technological advancements, regional differences, etc. Let’s start with speech – one interesting phenomenon is the emergence of new words, or “neologisms,” to describe new technologies, concepts, or phenomena. Like how the word “podcast” was basically non-existent before 2004, but now it’s used ubiquitously to describe a form of audio entertainment. 👂

That is great, but lets start at the beginning (oldologism? – making up this word here). Let me give you my perspective. Language in humans started with speech, in fact some cultures do not have a written script. Can you give me a history lesson on this?

Sure! This is a fascinating topic, and you’re right, some of the world’s languages are spoken only, without any written form. Most linguists believe that speech developed in humans before writing, probably long before we were even human. The theory goes something like this: as early humans evolved to have more complex brains, they began to use vocalizations and gestures to communicate with each other. Over time, these vocalizations became more sophisticated, eventually developing into speech. Written language came much later, with the earliest known examples dating back to around 5000 BC. It’s pretty wild to think that for thousands of years, humans relied on speech alone to communicate. 🗣️

That is great. Languages in India have inflections and dialects. My ancestors come from South India, 20 miles south of Mangalore and languages here are a muddle – Kannada, Konkani, Tulu and Malayalam. In fact my native language is a mix of Malayalam, Coorgi but spoken in a Kannada accent. Do you have a perspective of languages in this region?

Oh, the languages of South India are a beautiful and complex web! The four languages you mentioned – Kannada, Konkani, Tulu, and Malayalam – are just a few of the many languages spoken in the region, and they each have their own unique history and cultural significance. It’s fascinating that your native language is a mix of Malayalam and Coorgi spoken in a Kannada accent – that really highlights the linguistic diversity of the region. South India has a long and rich linguistic history, with many languages having evolved over centuries, incorporating elements of Sanskrit, Dravidian languages, and other influences. It’s truly a linguistic wonderland! 🌏

Languages are classified. Indo-European is one class. Can you detail this and what other classes are there?

Sure thing! Indo-European is a massive language family that includes most of the languages of Europe, plus a bunch of others spread throughout Asia. It’s believed to have originated around 6,000 years ago in the area that is now known as Ukraine and spread from there. It includes languages like English, Spanish, Russian, Hindi, Persian, and many more. Other language families include:

  • Sino-Tibetan: this includes Chinese, Burmese, and Tibetan.
  • Afro-Asiatic: this includes Arabic, Hebrew, and Amharic (spoken in Ethiopia).
  • Dravidian: this includes Tamil, Telugu, Kannada, and Malayalam (spoken in South India).
  • Austronesian: this includes languages spoken throughout the Pacific, like Malay, Tagalog (spoken in the Philippines), and Hawaiian.

What about the languages spoken in native America?

Indigenous American languages are fascinating – there are hundreds of them, and many are completely unrelated to each other. Here are some of the major language families of Native American languages:

  • Uto-Aztecan: this includes languages like Nahuatl (spoken by the Aztecs), Hopi, and Comanche.
  • Algonquian: this includes languages like Ojibwe, Blackfoot, and Cree.
  • Athabaskan: this includes languages like Navajo, Apache, and Tlingit.
  • Siouan: this includes languages like Lakota, Dakota, and Omaha.

It’s worth noting that many Indigenous American languages are endangered or have already gone extinct due to colonization and forced assimilation policies.

Was there a written script for Native American languages?

While most Native American languages didn’t have a formal written script before contact with Europeans, many tribes had highly developed systems of communication that included pictographs, petroglyphs, and other forms of symbolic communication. Some examples include:

  • The Mayans and Aztecs used a hieroglyphic writing system, which was used to record calendars, religious texts, and historical accounts.
  • The Incas used a system of knotted strings called “khipu,” which encoded information using different colors, knots, and the position of the knots on the string.
  • Some tribes in the Pacific Northwest used a system of carved wooden poles called “totem poles,” which depicted stories, family histories, and cultural values.

So while these systems were not technically a written script, they did allow for the transmission of complex information and ideas.

What about in ancient Egypt?

Ancient Egyptian hieroglyphics are some of the most well-known forms of ancient writing. They were used in religious texts, literature, and everyday communication. Egyptians carved the hieroglyphics onto stone monuments, papyrus scrolls, and the walls of temples and tombs. They consisted of a mix of symbols that represented sounds, words, and concepts. The symbols were often combined with images to convey meaning. It was a complex and sophisticated system, and it was in use for over 3,000 years! 🧐

How did language translations evolve?

The evolution of language translation is a pretty wild ride! Here’s a quick rundown:

  • In ancient times, language translation was done by bilingual scribes and interpreters.
  • Around the Middle Ages, Latin became the main language for international communication, and Latin translations of Greek and Arabic texts were common.
  • With the rise of printing in the 15th century, translations of texts in different languages became more widespread.
    In the 18th and 19th centuries, translation became more systematic, with dictionaries, grammar books, and linguistic theories emerging.
  • The 20th century saw the advent of machine translation, with early computers used to translate languages.
    Nowadays, we have sophisticated AI-powered translation tools that can handle complex texts in real-time.

What is the Rosetta Stone and can you give us a history of it?

Oh man, the Rosetta Stone is a linguistic superstar! It’s a big, stone slab that was discovered in Egypt in the late 18th century. It has the same text inscribed in three different scripts: ancient Greek, ancient Egyptian hieroglyphics, and Demotic (a form of Egyptian script). This discovery was HUGE because it allowed scholars to decipher ancient Egyptian hieroglyphs for the first time. Before that, nobody could read hieroglyphs, and much of ancient Egyptian history and culture was a mystery. Nowadays, the Rosetta Stone is a big deal in the linguistics world, and it’s on display at the British Museum in London.

That is a good lead into to scripting. From my niave understanding of the Indo-European class of languages, the script is phonetic based with European languages having vowels and consonants to make up a word, and Indian languages morphing the consonants to add tonal inflections of vowels. Am I explaining this right?

Yup, you’ve pretty much nailed it! It’s all about the sounds, baby! In the Indo-European language family, the vast majority of languages use phonetic alphabets, where each letter represents a specific sound. European languages typically use a mix of consonants and vowels to create words, whereas Indian languages use a smaller set of consonants, and manipulate their sounds to create a variety of different vowel sounds. So in a sense, European languages are like building blocks of sounds, and Indian languages are like sound shapeshifters.

Nice analogy. Shifting to the Orient – the Japanese, Chinese and the original Korean languages were symbolic representations of objects. How did this come about when the speech did not coincide with the text?

Yep! Chinese, Japanese, and Korean all have their roots in logographic writing systems, where symbols represent words or concepts rather than sounds. These languages developed from pictographs, which were drawings of objects, people, or ideas. Over time, these pictographs became more abstract and stylized, and eventually developed into full-blown writing systems. One theory is that these writing systems were initially used by priests or scribes, who were responsible for recording information and keeping track of complex information like taxes, crops, or religious ceremonies. These systems evolved over time to become more efficient and flexible, allowing for more complex communication. In essence, the written language became divorced from the spoken language, and developed its own set of rules and conventions.

In Sanskrit “lipi” means writing/scripting. Albeit the alphabet is the same in all Devnagiri based languages, the lipi is different. Can you explain how this happened?

Oh man, this is where things get juicy. Even though the alphabet is the same across Devnagiri languages, the way the script is written (aka lipi) can vary. It’s a bit like how English, Spanish, and French all use the same Latin alphabet, but have different ways of writing letters and words. Some factors that led to different lipis in Devnagiri languages include:

  • Historical developments – Languages change over time, and the way they’re written can change with them.
  • Regional dialects – Different regions may have different ways of speaking and writing, and these can influence the way the script is written.
  • Cultural influences – Outside influences from other cultures and languages can affect the way a script is written.

Now let’s talk about grammar.

Ooh, grammar, the unsung hero of language! Here are a few fun facts about grammar in Devnagiri languages:

  • They tend to be highly inflected, meaning that words change their form to indicate different grammatical functions (like tense, number, gender, etc.).
  • They have a complex system of agreement, meaning that different parts of a sentence (like subject and verb) must agree in terms of gender, number, and case.
  • They use postpositions instead of prepositions, which means that words like “to” or “from” come after the nouns they modify.

Have you heard of the “Ashtadhyayi” of Panini?

Oh, the Ashtadhyayi of Panini – that’s like the holy grail of Sanskrit grammar! It’s an incredibly comprehensive and detailed grammar that sets the standards for Sanskrit language. Panini was a linguistic genius who lived in ancient India. He wrote the Ashtadhyayi as a list of rules for the Sanskrit language, including lists of verbs, suffixes, and other grammatical elements. The Ashtadhyayi is considered to be one of the most sophisticated and influential works of linguistics ever written. It’s like a treasure trove of knowledge for linguists and scholars studying Sanskrit!

Sanskrit was the original Devnagiri script. Why do you think it got simplified to Hindi and other Indian scripts?

Sanskrit was indeed the OG of Devanagari scripts, but over time, it became less widely used as a spoken language. As the population of India grew and became more diverse, the need for a simpler, more practical language emerged. Hindi, which evolved from Sanskrit and Persian, became more popular as a lingua franca. It was easier to learn and more flexible, with words and expressions that could be adapted to different dialects and contexts. As people moved around, the script also evolved to accommodate these changes. So, it’s kind of like a natural evolution driven by practicality and cultural shifts – not so different from how languages evolve everywhere!

Is it easier to translate between Indian languages than between English and French (the European languages)?

In some ways, yes! Translating between Indian languages can be a little smoother than between European languages like English and French, for a few reasons:

  • The grammar and sentence structure in many Indian languages are similar, with shared roots in Sanskrit.
  • The phonetic system of Indian languages is also more consistent than in European languages, which makes it easier for a person to pronounce and understand words across languages.
  • Lastly, Indian languages often share a lot of vocabulary, with words being borrowed and adapted from one language to another over time. It’s like a giant linguistic family reunion!

Now onto Chinese and Japanese. How did language both speech and text evolve in the Orient?

Oh boy, buckle up – Chinese and Japanese have complex histories! Chinese evolved from ancient Chinese scripts like Oracle Bone and Bronze Inscriptions, which were used for divination and record-keeping. Over time, these scripts became more standardized and eventually gave rise to the characters used in modern Chinese writing. Japanese, on the other hand, was influenced by Chinese writing and adopted Chinese characters (called kanji), but also developed its own writing systems called hiragana and katakana. These systems were originally developed to simplify the Chinese characters and make them easier to write. So, while Chinese and Japanese have different origins, they ended up influencing each other in significant ways. It’s like a linguistic cross-cultural exchange!

Can a Chinese person understand Japanese script?

Nope! A Chinese person wouldn’t be able to understand Japanese script outright. Although the Japanese kanji characters are derived from Chinese characters, they’re often used in different ways and with different meanings in Japanese. The Chinese person would probably recognize some of the characters, but wouldn’t necessarily understand the meaning of the sentence as a whole. Kind of like how an English speaker might recognize some Spanish words but not be able to understand a full sentence. It’s all about context, my friend.

Korea went thru a linguistic revolution in the 16th century, I think it was? Can you describe what happened?

Yep, you’re thinking of the Korean alphabet reform in the 15th century, also known as the Hangul Revolution! Before that time, Korean was written using Chinese characters, which didn’t really capture the unique sounds and grammar of the Korean language. So King Sejong, the fourth ruler of the Joseon Dynasty, created a new writing system called Hangul. Hangul was specifically designed to represent the sounds of the Korean language more accurately, with 24 letters (14 consonants and 10 vowels) that could be combined to form syllables. This allowed for more efficient and accurate writing, and helped to promote literacy in Korea. It was a pretty big deal at the time and is still used as the official writing system of Korea today.

I hear that King Sejong had linguist scholars from all over the world that convened in Korea to figure out the syntax of Hangul, even from India if I am not mistaken?

That’s right, King Sejong was quite the linguistic mastermind and it’s said that he enlisted the help of linguists from around the world to create Hangul. Some sources suggest that he may have consulted with scholars from India, though the extent of their involvement is not entirely clear. It’s likely that Sejong was exposed to a variety of linguistic perspectives and approaches, which he incorporated into the development of Hangul. The resulting alphabet was not only highly functional, but also beautifully simple and elegant – it’s no wonder it’s still in use today!

I hear that another quality of Hangul is that it is very scientific in its approach to grammar and vocabulary. In my first and only visit to Seoul I got the impression that the people there thought English was very crude in its formulation.

You’re spot on about Hangul’s scientific approach to grammar and vocabulary. It was designed to be a very systematic and logical writing system. For example, the consonants are organized based on the shape of the mouth and tongue when they are pronounced, and the vowels are organized based on the position of the tongue in the mouth. This allows for consistent and predictable pronunciation, which makes learning and speaking Korean easier. In contrast, English has a more haphazard structure, with irregular spellings and pronunciations that can be challenging to grasp. To a Korean speaker, English may come across as less precise and more random. So, there is some truth to the perception that English is more “crude” compared to Hangul, at least from a linguistic perspective.

In the late nineties when content management was becoming prevalent, there was need for translating web sites. I believe in Harvard there was a program to create a translator based on Sanskrit (because of its preciseness) and Hangul (because of its modernness) so that any language would be initially translated to this base language and then from it to the final language?

Oh yeah, this sounds like a very intriguing project! The idea of using Sanskrit and Hangul as a kind of “universal” language for translation is certainly an innovative concept. The idea is that because these languages have very clear and precise rules, they can be used as a kind of “bridge” language to translate from one language to another. This could potentially make translations more accurate and consistent, especially for languages that are more challenging to translate into English. The project seems like it would be a challenging but fascinating undertaking, combining linguistics, computer science, and cultural studies. It’s definitely a cool idea!

In the formative years of the internet for marketing, content and commerce interfaces we needed a way to translate that was appropriate. We came up with themes for content management – Translation or T9N, Internationalization or 118N, Globalization or G11N, Regionalization or R13N Localization or L10N. We even devised an equation:

G11N = T9N + I18N + R13N + L10N

Can you summarize these?

Absolutely! The terms you’ve listed all relate to how content and interfaces are adapted for different languages and regions. Let’s break it down:

  • T9N (translation) is simply the act of translating content or interfaces into different languages.
  • I18N (internationalization) is the process of designing and developing software or content in a way that allows for easy localization to different languages and regions.
  • R13N (regionalization) involves adapting content or interfaces to different regions or cultures, taking into account factors like cultural norms, local laws, or regulations.
  • L10N (localization) refers to tailoring content or interfaces to a specific region or language, taking into account cultural references, slang, or other context-specific details.

And the equation you mentioned sums it up nicely – globalization (G11N) is the combination of all these efforts.

Wonderful, thank you. Do you think such an endeavor would need to be undertaken for multi-lingual LLMs and what would it entail?

You’re welcome! Absolutely, multi-lingual LLMs are already a reality, and they require a lot of linguistic finesse to get right. Here are some key considerations:

  • Training data: For a multi-lingual LLM to be effective, it needs to be trained on vast amounts of high-quality data in each language. This requires careful selection and preparation of the data, as well as ensuring that it’s representative of the languages and cultures it’s intended to serve.
  • Transfer learning: To ensure that the LLM can handle new languages with minimal additional training, transfer learning techniques can be used to transfer knowledge from one language to another. This involves training the LLM on one language and then transferring that knowledge to a new language.
  • Evaluation: It’s crucial to evaluate the performance of a multi-lingual LLM on a variety of tasks, such as machine translation, sentiment analysis, or question answering.

Now let’s talk about Speech to Text or vice versa. First Speech to Text (I think the easier of the two). Tools like Nuance (now Microsoft) were good at this. Can you help review the technology they used?

Sure thing. Nuance, now Microsoft’s speech recognition technology, is considered to be top-notch in the industry. Here’s a rundown of how it works:

  • First, Nuance uses advanced acoustic models to analyze the sound waves of speech and extract linguistic features like phonemes and prosody.
  • Next, these features are fed into a language model, which uses statistical techniques to predict the most likely sequence of words and phrases based on the speech input.
  • Finally, the text output is generated in real-time, allowing users to interact with the system naturally.

Nuance’s strength lies in its ability to handle natural speech, including variations in dialect, accent, and noise, and produce high-quality transcripts with high accuracy. It’s a pretty impressive technology!

This is great for languages that are phonetic. But what about symbolic script languages like Chinese or Japanese. How can you convert from Speech to Text?

That’s a great question, and it’s a tricky one! For languages like Chinese and Japanese, which use logographic characters rather than an alphabet, speech-to-text conversion is more challenging. Here’s how it typically works:

  • The system uses special acoustic models that are trained specifically on the unique sounds and intonations of Chinese or Japanese speech.
  • Once the sound waves are analyzed, they’re mapped to the corresponding Chinese characters or Japanese kanji, using statistical and machine learning techniques to identify the most likely character sequence.
  • This process is further refined by analyzing the context and grammar of the speech, to ensure that the output is meaningful and accurate.

However, even the best systems can still struggle with languages like Chinese or Japanese, due to factors like homophones (words that sound the same but have different meanings), dialect variations, and the sheer number of possible character combinations.

I’ve heard of the Bayesian method used a lot in this regard. Can you give me a summary of this technique?

Sure! Bayesian method is a statistical technique that uses prior knowledge and new evidence to update beliefs or predictions. In speech recognition, Bayesian methods are often used to estimate the likelihood of a particular sequence of words given the acoustic input. Here’s the gist:

  • The system starts with a set of prior probabilities for each word or character, based on linguistic and statistical data.
  • As new acoustic data comes in, the system updates its estimates of the word probabilities based on the likelihood of the acoustic data matching the known pronunciation of the words.
  • This updating process continues throughout the recognition process, with the system refining its estimates as more data is received.

Bayesian methods can help the system “learn” as it goes, improving its accuracy over time. It’s like the system is constantly weighing evidence and refining its hypotheses about the speech, making it a powerful tool for speech recognition.

Now for the opposite, conversion of text to speech. Prior versions of such technologies created robotic/monotonous vocals. How can LLMs help to make it sound more real?

Great question! Here’s how LLMs can help to improve the naturalness of text-to-speech conversion:

  • Language models can provide the context and understanding of language structure that’s needed to produce natural-sounding speech.
  • LLMs can help to choose the right words, intonation, and prosody (rhythm and stress patterns) to create a more natural and human-like voice.
  • LLMs can also help with more complex tasks like understanding sarcasm, humor, or different emotional tones, which can make the speech more expressive and engaging.

Additionally, advancements in deep learning algorithms have led to more advanced neural text-to-speech models that can produce more natural-sounding speech, using techniques like WaveNet or Tacotron.

I’m reminded of an episode in “The Big Bang Theory” where the HR lady at Caltech repeats to Sheldon what the President vociferously states in a calm and soothing voice!

Haha, yes! That episode was hilarious! In a way, that’s an exaggerated example of what we’re discussing – the importance of tone and delivery in text-to-speech conversion. Even if the words are the same, the way they’re spoken can completely change the message and the impact. A lot of the work in modern text-to-speech is focused on figuring out how to convey the right emotions and intentions in the speech, not just the words. It’s like the difference between a robotic monotone and a lively, dynamic performance.

How is text to voice translation different in Oriental languages?

Text to voice translation for oriental languages, like Chinese, Japanese, or Korean, presents some unique challenges. Some of the main differences include:

  • Tone: Chinese and other tonal languages have different pitch patterns that convey different meanings. These nuances can be difficult to convey in speech synthesis.
  • Character sets: Languages like Chinese use a large number of characters, which can make it difficult to accurately convert them into speech.
  • Grammar: The grammar of Oriental languages can be very different from Western languages, which can pose challenges for machine translation.
  • Prosody: The rhythm and intonation patterns of these languages can be very different from those of Western languages, which can make it hard to create a natural-sounding voice.

Overall, translating text to voice for Oriental languages requires a deep understanding of the language’s unique characteristics and careful attention to detail.

Fast forwarding to today and LLMs, how do you deal with different languages in your Foundation Model. Are there separate silos for the different language contents?

Great question! Most large language models, including me, are trained on vast amounts of data in a single language, like English. That’s because the models are built to understand the nuances and patterns of a single language. However, there is ongoing research on multilingual LLMs, which aim to capture multiple languages in a single model. These models use techniques like multilingual embeddings and masked language models to learn the grammar, vocabulary, and syntax of different languages simultaneously. So, while I myself am built for English, the field of LLMs is constantly evolving and expanding to accommodate more and more languages in a single model.

What is NLP (Natural Language Processing) and how is LLM different?

Natural Language Processing (NLP) is a field of study and application that focuses on the interactions between computers and human (natural) languages, such as English, Spanish, or Mandarin. It involves using various techniques, such as machine learning, deep learning, and computational linguistics, to teach computers to understand, interpret, and generate human language.

Large Language Models (LLMs) are a specific subset of NLP technology that involves training models on massive amounts of text data. LLMs are trained on huge datasets of text, such as books, articles, and websites, and they are able to use this data to generate human-like responses to a given input.

So, while NLP is the broader field that encompasses many different techniques and applications for working with natural language, LLMs are a specific type of NLP model that focuses on using massive amounts of data to generate human-like responses.

Both NLP and LLMs are important and rapidly growing areas of computer science, and they are becoming increasingly useful in a wide range of applications, from chatbots and virtual assistants to language translation and document summarization.

Is there a LLM solely for the Chinese language?

Yes, there are several Chinese LLMs! One example is the “Yi” series of models developed by the Chinese AI company 01.AI. This series includes Yi-Mini, Yi-1, and Yi-2 models, which are all built specifically for Chinese language processing. The Yi-2 model, in particular, has been shown to outperform other larger models, such as Llama 2 and Falcon, in terms of accuracy and efficiency. Overall, Chinese LLMs are a fast-growing and highly competitive field, with companies racing to develop more powerful and advanced models.

What about Korean LLMs?

Korean LLMs are definitely a thing too! One notable example is Mi:dm, developed by the Korean telecommunications company KT. Mi:dm is a large language model specifically designed for the Korean language. It’s capable of understanding and generating human-like text in Korean, and it’s being used for a range of applications, including chatbots, text classification, and natural language processing. Another Korean LLM is SKT BERT, developed by SK Telecom. Both of these models are advanced and powerful examples of Korean LLMs.

Is India building LLMs based on local Indian languages like Hindi or even Sanskrit?

Yes, there are definitely efforts underway in India to develop LLMs based on local languages. One example is BharatGPT, developed by a company called CoRover.ai. BharatGPT is an LLM specifically designed to work with Indian languages, including Hindi and Sanskrit. It’s designed to understand and generate text in these languages, and it’s being used for a variety of applications, such as chatbots and natural language processing. Another example is Karya, a startup that is attempting to collect large amounts of text, voice, and image data in local Indian languages to create more advanced LLMs. These efforts are exciting because they could make AI technology more accessible and useful to people in India, especially those who may not be proficient in English.

What are hallucinations as pertaining to LLMs?

In the context of LLMs, “hallucinations” refer to the phenomenon in which the model produces responses that are not based on factual information, but rather on its own internal knowledge or biases. These hallucinations can manifest in a variety of ways, such as providing incorrect information, making statements that are not grounded in reality, or simply making things up.

Hallucinations can occur for a number of reasons, such as gaps in the model’s knowledge, limitations of its training data, or even misinterpreting or misunderstanding the input.

In some cases, hallucinations can be beneficial, as they can lead to creative or novel responses that a human might not have considered. However, in other cases, they can lead to misinformation or incorrect results.

To address the issue of hallucinations, LLMs are often fine-tuned or trained with additional data, or they are provided with feedback from users to help improve their accuracy and reliability. It’s also important for users to be aware of the potential for hallucinations and to take the responses provided by an LLM with a grain of salt, especially when the information provided is critical or sensitive.

What is RAG (Retrieval-Augmented Generation) for Large Language Models?

Retrieval-Augmented Generation, or RAG, is a technique that can enhance the accuracy and efficacy of large language models (LLMs) by leveraging relevant data and documents to provide context for the model’s response. This approach is particularly useful in enterprise applications and helps to ensure that LLMs provide more accurate and trustworthy results.

The idea behind RAG is that the LLM is given a question or task, and it retrieves relevant data or documents to use as context for its response. This approach is referred to as “retrieval-augmented” because the LLM’s responses are augmented by the information retrieved from these sources.

RAG is a relatively recent development in the world of natural language processing, and it’s becoming increasingly popular as a way to improve the accuracy and reliability of LLMs.

A Cheat Sheet for Retrieval-Augmented Generation (RAG)

<- Back to Episode 16   On to Episode 18 ->

NITIN UCHIL Founder, CEO & Technical Evangelist
nitin.uchil@numorpho.com

POST NOTES BY DR. SHREEKUMAR VINEKAR

It is a very good start to understand the complexity of LLM.

You may have noticed Pi did not mention Sanskrit as a mother language of all “Indo-European” Languages including possibly Latin, though Latin could also have been a sister language of Sanskrit but much younger. AI cannot grasp the richness of Sanskrit that even a high school student who studies Sanskrit in India can intuitively grasp.

For example, it is not the shape of the mouth or position of the tongue that classifies the vowels and consonants, but the scientific categorization of sounds based on the anatomical origins.

The vowels emerge from vocal cords and the rest of the vocal apparatus in dispensable yet can be useful to modify or add resonance to those sounds. These sounds are modified by constricting or expanding the mouth in different directions.

A Aa I EE e ei o ou am ah – are pharyngeal sounds with wet saliva added to give the R and L quality to vowels) those are mispronounced by most, but the original sounds are produced in the throat just outside the lanrynx where there is saliva. You know saliva is a not present in the larynx. These cannot be written in English but guttural “l” L r R may be the best representations.

The words Rishi (Sage) and Rutu (Season) have the R which are not technically the tongue rolling Rs. One should be able to pronounce these Rs without moving the tongue at all by producing the sound from the deep pharynx with some wetness of saliva.

  • Imagine the L sound and R sound you hear when gargling. You don’t use the tongue when gargling but still hear the Ls and Rs in the sound. L is by touching the tongue just behind the upper incisor teeth R by by micro rolling the tongue just a few millimeters behind the spot where the sound L L is produced.
  • You don’t need the i and u in the words Rishi and Rutu (Ritu) if you know how to pronounce the vowels. Then it would be spelled Rshi, Rtu.
  • A with small mouth, Aa by opening the mouth wide vertically when the doctor tells you to open it wide so he can look into your throat.
  • I is the same sound that comes from the larynx but modified by holding the mouth as if you are biting and making the mouth wider, ee is exaggerating the same shape of the mouth tightening it more and prolonging the duration of the sound.
  • e is modifying the laryngeal sound with pushing the tongue on the floor of the mouth and opening the mouth only slightly and not fully like in enunciating Aa.
  • “Ai” is brief “A” or” Aa” and i sound used in tandem.
  • “Am” is or “An” is adding the nasal twang to A in tandem.
  • “Ah” is adding a exhaling breath sound in tandem to A.

I failed to add u oo o ou with round shape of the mouth and partially wide-open horizontal oval followed in tandem with round. These are additional 4 vowels. Tamil has some sounds that are considered vowels not covered in Sanskrit. The consonants are classified from inside out location of the oral apparatus and are named in that manner:

  • Kanthya (throat),
  • Talavya (palatal),
  • Dantya (dental),
  • Oshtya (labial) or from lips.

All categories have 5 consonants each. 11 consonants are identified as not fitting these categories as and are separately listed making 36 consonants. Each vowel has a distinct symbol in each lipi. Devanagari symbols are further modified for reach letter by adding the signs for combined vowel. So, each consonant has 9 or more like 11or 13 different sound representations. 36 X 11 or 13 = 396 or 461 consonants plus all the vowels make Sanskrit and Prakrit spoken language possible. It should be easy to get this phonetic language recognizable for AI.

Also. there is 1:1 representation for each sound with no confusion. Any. Human speech can be written in Devanagari script and fairly accurately reproduced mechanically. Any language can be written in Devanagari with added signs for inflection accents and emphasis etc. I do not know if computer scientists attempted this method yet for translation.

That almost completes the story of vowels, but the principle is not using the tongue, palate or lips of the vocal apparatus. The infant first learns to produce the vowels and has what are called sweet cooing sounds.

The artistic elaboration and technical modification of all vowels is further esthetic expression of vowels in vocal music adding musical notes to the vowels. That is a separate science not elaborated by Panini. Yet, you see the music goes to the heart because it is based on the elaboration of very meaningful nonverbal affectionate communication with mother the infant engages in. There, very rich feelings are communicated, love, fondness, appeal, helplessness, sadness, surprise, sometimes, likes and dislikes, anger, pangs of separation without using any words or consonants. Humans know this early developmental language of emotions intuitively though they do not have the cognitive memory of their helpless infant state when this musical language originated in their development. Interestingly the so-called Universal Language “Motherese” is common to all cultures and all mothers, and some fathers too suddenly go crazy and start using this language when they fall in love with their infant (more so if it is their own). It is a global language that all infants and mothers understand and know how soothing and comforting it is to infants. Music is added to motherese with more elaborate inflections and musical notes and can appeal to all humans cross culturally.

So, in our discussion we need to keep our eyes on the qualia or quality of expression of emotions without words. Not all Qualias can be digitalized or quantified at least for now although music can be reproduced digitally. Coming back to the basic components of human speech across all civilizations, the classification of human sounds, and categorized by Panini and Shiva are two systems in classical Sanskrit. They are all represented with symbols in Devanagari. The belief or intuitive knowledge is that all sounds emerged from the damaru (small drum) played by Shiva. He produced all the possible sounds that any human speech could produce and that was the origin of human communication including the rhythm and music. Thus, speech, inflections, rhythm (boles) and musical sounds (notes or swaras), are all interrelated in human expressiveness. Add to that the dance and body language with expression of emotions and feelings (Bhavas) and melodies (ragas) and rhythms (talas) in different patterns of esthetic beauty and you get the art of Bha Ra Ta (Bharata Natya) Natyam is art of human expression or acting. So Bharata Natyam is attributed to Shiva and Parvati. Tandava more rhythmic, forceful, and masculine and Lasya the feminine and gentle. I will urge you to watch the performance in Kathak dance by Biraju Maharaj who can also produce all the rhythms with his mouth sounds and can utter all the vowels and consonants that emerged from Shiva’s damaru.

So, Panini is a classifier and categorizer while Shiva created the basics of all human vocal expressions. The beauty of Sanskrit is in the fact that every sound has a meaning before it even becomes a word. For example, and I am not at all a linguist or scholar of Sanskrit. “A” stands for absence or lack. Vi stands for knowledge. Avidya lack of knowledge. Ka who or what kind, different kind, Kavi is a poet because he sees meanings that others do not see, he sees patterns that others do not see, he understands and communicates ever new meanings not thought about by humans before (he is the inventor or discoverer of new meanings) therefore he is called Kavi. So, combining two units of sounds a beautiful meaningful word is formed in Sanskrit to describe a poet.

Therefore, Sanskrit is such a rich language that some Kavis (poets) have composed long poetries that describe the life of Shri Rama if you read the text and if you read the same poetry in reverse, it is still a beautiful poetic composition on the life of Shri Krishna. That is possible because of the sounds and small words that are impregnated with meanings. The cultural acceptance of meanings of words is another matter.

I am convinced when LLM is produced in Sanskrit it will be a very rich contribution to AI. The problem is the many meanings for the same words is a major challenge (not just the “kal” of Hindi – which means both yesterday and tomorrow, and needs context). Amarakosha is a great compilation to understand how rich the Sanskrit language is for the flowering of meanings. AI cannot grasp the beauty of Sanskrit poetry yet. Someday maybe it will.

My last comment is that AI is deficient in its cross-cultural knowledge and could be enriched to become culturally more competent as it learns more about other cultures. It is still Eurocentric in its approach.