## Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy
## Priyanka Mandikal
Department of Computer Science, UT Austin mandikal@utexas.edu
Figure 1: The dissemination of knowledge through the ages. Over time, methods of storing and transmitting knowledge have evolved from oral teachings to computers and the internet, significantly increasing the distribution and proliferation of human knowledge. The emerging LLM technology represents a new paradigm shift in this process.
<details>
<summary>Image 1 Details</summary>

### Visual Description
## Diagram: Evolution of Knowledge Distribution
### Overview
The image is a diagram illustrating the evolution of knowledge distribution methods over time, starting from oral teachings and progressing to Large Language Models (LLMs). It uses a series of icons within rounded rectangles connected by arrows to represent different stages of technological advancement and knowledge proliferation.
### Components/Axes
* **Stages of Knowledge Distribution (Left to Right):**
* Oral teachings
* Hand-written manuscripts
* Printing press
* Computers and the Internet
* LLMs
* **Horizontal Axis:** Represents time, increasing from left to right.
* **Vertical Arrows:** Indicate "Technological advancements" and "Knowledge distribution & proliferation".
* **Connecting Arrows:** Show the progression from one stage to the next.
### Detailed Analysis
The diagram presents a linear progression of knowledge distribution methods.
1. **Oral Teachings:** The leftmost rectangle contains an icon depicting people sitting around a tree, symbolizing oral transmission of knowledge.
2. **Hand-written Manuscripts:** The next rectangle shows a scroll, representing the era of hand-written documents.
3. **Printing Press:** The third rectangle features an icon of a printing press, marking a significant advancement in mass production of written material.
4. **Computers and the Internet:** The fourth rectangle displays a computer monitor with a globe on the screen, symbolizing the digital age and the interconnectedness of information.
5. **LLMs:** The rightmost rectangle contains a neural network diagram, representing Large Language Models.
6. **Arrows:** Solid black arrows connect each stage, indicating the flow of technological advancement.
7. **Dotted Arrow:** A dotted arrow runs along the bottom of the diagram, labeled with upward arrows indicating "Technological advancements" and "Knowledge distribution & proliferation". The arrow points to the right, indicating the direction of time.
### Key Observations
* The diagram highlights a clear progression from traditional methods of knowledge sharing to modern digital technologies.
* Each stage represents a significant leap in the accessibility and distribution of information.
* The use of icons provides a visual representation of each stage, making the diagram easy to understand.
### Interpretation
The diagram illustrates the evolution of knowledge distribution, emphasizing the increasing speed and scale of information dissemination over time. It suggests that technological advancements have played a crucial role in making knowledge more accessible and widespread. The progression from oral traditions to LLMs demonstrates a shift from localized, personal knowledge transfer to global, automated information processing. The diagram implies that LLMs represent the latest stage in this evolution, potentially revolutionizing how knowledge is accessed, processed, and shared.
</details>
## Abstract
LLMs have revolutionized the landscape of information retrieval and knowledge dissemination. However, their application in specialized areas is often hindered by factual inaccuracies and hallucinations, especially in long-tail knowledge distributions. We explore the potential of retrieval-augmented generation (RAG) models for long-form question answering (LFQA) in a specialized knowledge domain. We present VedantaNY-10M, a dataset curated from extensive public discourses on the ancient Indian philosophy of Advaita Vedanta. We develop and benchmark a RAG model against a standard, non-RAG LLM, focusing on transcription, retrieval, and generation performance. Human evaluations by computational linguists and domain experts show that the RAG model significantly outperforms the standard model in producing factual and comprehensive responses having fewer hallucinations. In addition, a keyword-based hybrid retriever that emphasizes unique low-frequency terms further improves results. Our study provides insights into effectively integrating modern large language models with ancient knowledge systems.
Proceedings of the 1st Machine Learning for Ancient Languages Workshop , Association for Computational Linguistics (ACL) 2024 . Dataset, code, and evaluation is available at: https://sites.google.com/view/vedantany-10m
ॐके ने�षतं पत�त प्रे�षतं मनः || 1.1 || यन्मनसा न मनुते येनाहुम�नो मतम् । तदेव ब्रह्म त्वं �व�� नेदं य�ददमुपासते || 1.6 ||
- "By whom willed and directed, does the mind alight upon its objects?"
- "What one cannot comprehend with the mind, but by which they say the mind comprehends, know that alone to be Brahman, not this which people worship here."
- Kena Upanishad, >3000 B.C.E.
## 1 Introduction
Answer-seeking has been at the heart of human civilization. Humans have climbed mountains and crossed oceans in search of answers to the greatest questions concerning their own existence. Over time, ancient wisdom has travelled from the silent solitude of mountain caves and forest hermitages into the busy cities and plains of the world. Technology has played a major role in this transmission, significantly increasing the distribution and proliferation of human knowledge. In recent times, large language models (LLMs) trained on large swathes of the internet have emerged as de-facto questionanswering machines. Recent studies on the societal impact of LLMs (Malhotra, 2021; Yiu et al., 2023) highlight their growing significance as cultural technologies. Analogous to earlier technologies like writing, print, and the internet, the power of LLMs can be harnessed meaningfully to preserve and disseminate human knowledge (Fig. 1).
Generic LLMs have proven to be highly effective for broad knowledge domains. However, they often struggle in niche and less popular areas, encountering issues such as factual inaccuracies and hallucinations in long-tail knowledge distributions (Kandpal et al., 2023; Mallen et al., 2023). Moreover, their inability to verify responses against authentic sources is particularly problematic in these domains, where LLMs can generate highly inaccurate answers with unwarranted confidence (Kandpal et al., 2023; Menick et al., 2022). In response to these limitations, there has been growing interest in retrieval-augmented generation (RAG) models (Karpukhin et al., 2020; Lewis et al., 2020b; Izacard et al., 2022; Ram et al., 2023). These models integrate external datastores to retrieve relevant knowledge and incorporate it into LLMs, demonstrating higher factual accuracy and reduced hallucinations compared to conventional LLMs (Shuster et al., 2021; Borgeaud et al., 2022). Updating these external datastores with new information is also more efficient and cost-effective than retraining LLMs. In this vein, we argue that RAG models show immense potential for enhancing study in unconventional, niche knowledge domains that are often underrepresented in pre-training data. Their ability to provide verified, authentic sources when answering questions is particularly advantageous for end-users.
In this work, we develop and evaluate a RAGbased language model specialized in the ancient Indian philosophy of Advaita Vedanta (Upanishads, >3000 B.C.E.; Bhagavad Gita, 3000 B.C.E.; Shankaracharya, 700 C.E.). To ensure that the LLMhas not been previously exposed to the source material, we construct VedantaNY-10M, a custom philosophy dataset comprising transcripts of over 750 hours of public discourses on YouTube from Vedanta Society of New York. We evaluate standard non-RAG and RAG models on this domain and find that RAG models perform significantly better. However, they still encounter issues such as irrelevant retrievals, sub-optimal retrieval passage length, and retrieval-induced hallucinations. In early attempts to mitigate some of these issues, we find that traditional sparse retrievers have a unique advantage over dense retrievers in niche domains having specific terminology-Sanskrit terms in our case. Consequently, we propose a keyword-based hybrid retriever that effectively combines sparse and dense embeddings to upsample low-frequency or domain-specific terms.
We conduct an extensive evaluation comprising both automatic metrics and human evaluation by computational linguists and domain experts. The models are evaluated along three dimensions: transcription, retrieval, and generation. Our findings are twofold. First, RAG LLMs significantly outperform standard non-RAG LLMs along all axes, offering more factual, comprehensive, and specific responses while minimizing hallucinations, with an 81% preference rate. Second, the keyword-based hybrid RAG model further outperforms the standard deep-embedding based RAG model in both automatic and human evaluations. Our study also includes detailed long-form responses from the evaluators, with domain experts specifically indicating the likelihood of using such LLMs to supplement their daily studies. Our work contributes to the broader understanding of how emerging technologies can continue the legacy of knowledge preservation and dissemination in the digital age.
## 2 Related Work
Language models for ancient texts Sommerschield et al. (2023) recently conducted a thorough survey of machine learning techniques applied to the study and restoration of ancient texts. Spanning digitization (Narang et al., 2019; Moustafa et al., 2022), restoration (Assael et al., 2022), attribution (Bogacz and Mara, 2020; Paparigopoulou et al., 2022) and representation learning (Bamman and Burns, 2020), a wide range of use cases have benefitted from the application of machine learning to study ancient texts. Recently, Lugli et al. (2022) released a digital corpus of romanized Buddhist Sanskrit texts, training and evaluating embedding models such as BERT and GPT-2. However, the use of LLMs as a question-answering tool to enhance understanding of ancient esoteric knowledge systems has not yet been systematically studied. To the best of our knowledge, ours is the first work that studies the effects of RAG models in the niche knowledge domain of ancient Indian philosophy.
Retrieval-Augmented LMs. In current LLM research, retrieval augmented generation models (RAGs) are gaining popularity (Izacard et al., 2022; Ram et al., 2023; Khandelwal et al., 2020; Borgeaud et al., 2022; Menick et al., 2022). A key area of development in RAGs has been their architecture. Early approaches involved finetuning the language model on open-domain questionanswering before deployment. MLM approaches such as REALM (Guu et al., 2020) introduced a two-stage process combining retrieval and reading, while DPR (Karpukhin et al., 2020) focused on pipeline training for question answering. RAG (Lewis et al., 2020b) used a generative approach with no explicit language modeling. Very recently, in-context RALM (Ram et al., 2023) showed that retrieved passages can be used to augment the input to the LLM in-context without any fine-tuning like prior work. In this work, we adopt the in-context retrieval augmented methodology similar to (Ram et al., 2023), where neither the retriever nor the generator is fine-tuned. This also enables us to use any combination of retrieval and generation models that best suits our application.
Applications of RAGs. The applications of RAGs are diverse and evolving. ATLAS (Izacard et al., 2022) and GopherCite (Menick et al., 2022) have shown how fine-tuning and reinforcement learning from human feedback can enhance RAGs' ability to generate verifiable answers from reliable sources. Prompting techniques have also seen innovation. kNNPrompt (Shi et al., 2022) extended kNN-LM for zero or few-shot classification tasks, and retrieval in-context approaches (Ram et al., 2023; Shi et al., 2023) have proven effective in utilizing retrieval at the input stage. Retrieval-LMs have been shown to be particularly valuable for handling longtail or less frequent entities (Kandpal et al., 2023; Mallen et al., 2023), updating knowledge (Izacard et al., 2022), improving parameter efficiency (Izacard et al., 2022; Mallen et al., 2023), and enhancing verifiability (Bohnet et al., 2022), making them increasingly relevant in a wide range of applications. In our work, we examine the application of RAGs for long-tail knowledge, conducting an extensive study on a niche knowledge domain of ancient Indian philosophy.
Evaluation of LFQA The field of long-form question answering (LFQA) is an emerging area of active research (Krishna et al., 2021; Nakano et al., 2021; Xu et al., 2023). Recently, Xu et al. (2023) conducted a thorough examination of various LFQA metrics, encompassing both human and automatic evaluation methods, and found that existing automatic metrics don't always align with human preferences. On their suggestion, we place special emphasis on conducting an extensive human evaluation utilizing the expertise of experienced computational linguists and domain experts.
Figure 2: Sanskrit terms in VedantaNY-10M. Frequently occurring Sanskrit terms in the corpus.
<details>
<summary>Image 2 Details</summary>

### Visual Description
## Word Cloud: Spiritual Terms
### Overview
The image is a word cloud featuring various terms related to spirituality, philosophy, and religion, primarily from Hindu traditions. The size of each word roughly corresponds to its frequency or importance.
### Components/Axes
There are no axes in a word cloud. The components are the individual words themselves. The color of the words varies, but there is no apparent pattern or meaning associated with the color.
### Detailed Analysis or Content Details
Here's a transcription of the words present in the word cloud, grouped by approximate location and size (largest to smallest):
* **Largest:**
* brahman (orange)
* **Large:**
* shanti (green)
* sarvam (orange)
* aham (green)
* **Medium:**
* yoga (green)
* karma (tan)
* tat (tan)
* satyam (yellow)
* jnana (yellow)
* atma (tan)
* **Smaller:**
* ramakrishna (tan)
* upanishad (tan)
* anantam (tan)
* bhakti (tan)
* krishna (tan)
* bhagavan (tan)
* veda (tan)
* swami (tan)
* gita (tan)
* sukham (tan)
* sri (white)
* namaste (white)
* sat (white)
* hari (white)
* mithya (white)
* sarva (yellow)
* pranam (yellow)
* iti (yellow)
* eva (yellow)
* dharma (yellow)
* vande (yellow)
* vedanta (yellow)
* jagat (yellow)
* shivoham (yellow)
* drisya (yellow)
* buddhi (yellow)
* jagadguru (yellow)
* maharaj (yellow)
* kosa (yellow)
* sukadeva (yellow)
* ishvar (yellow)
* vishwa (yellow)
* asi (yellow)
* rupa (yellow)
* indra (yellow)
* shankaracharya (white)
* tatra (white)
* shivam (green)
* advaita (green)
* nitya (green)
* sankhya (green)
* bhava (tan)
* vasudeva (tan)
* ananda (tan)
* mantra (tan)
* bhagavad (tan)
* second (tan)
* karan (tan)
* canmar (tan)
* sharas (tan)
* vastu (tan)
* chitta (tan)
* moksha (tan)
* vairagya (tan)
* roopam (tan)
* pragyam (orange)
* anubhuts (orange)
* tvam (orange)
* viveka (orange)
* swamiji (green)
* bhadram (green)
* jiva (green)
* bhakti (green)
* kasya (green)
* devaki (green)
* karan aap (orange)
* swam1 (orange)
* prakrits (white)
* vriti (yellow)
* gyana (yellow)
* drakosha (yellow)
### Key Observations
* The word "brahman" is the most prominent, suggesting its central importance.
* Words like "shanti," "sarvam," and "aham" are also emphasized.
* The cloud includes a mix of philosophical concepts, deities, and practices.
### Interpretation
The word cloud visually represents key concepts and terms within Hindu philosophy and spirituality. The prominence of "brahman" indicates its foundational role as the ultimate reality. The presence of terms like "shanti" (peace), "sarvam" (all), and "aham" (I) suggests a focus on inner peace, universal connection, and self-realization. The inclusion of deities like "krishna" and practices like "yoga" highlights the diverse aspects of Hindu tradition. The word cloud serves as a visual summary of core ideas and elements within this spiritual context.
</details>
## 3 The VedantaNY-10M Dataset
We first describe our niche domain dataset creation process. The custom dataset for our study needs to satisfy the following requirements: ( 1 ) Niche: Must be a specialized niche knowledge domain within the LLM's long-tail distribution. ( 2 ) Novel : The LLM must not have previously encountered the source material. ( 3 ) Authentic: The dataset should be authentic and representative of the knowledge domain. ( 4 ) Domain experts: should be available to evaluate the model's effectiveness and utility.
Knowledge domain. To satisfy the first requirement, we choose our domain to be the niche knowledge system of Advaita Vedanta, a 1300-year-old Indian school of philosophy (Shankaracharya, 700 C.E.) based on the Upanishads (>3000 B.C.E.), Bhagavad Gita (3000 B.C.E.) and Brahmasutras (3000 B.C.E.) 1 . It is a contemplative knowledge tradition that employs a host of diverse tools and techniques including analytical reasoning, logic, linguistic paradoxes, metaphors and analogies to enable the seeker to enquire into their real nature. Although a niche domain, this knowledge system has been continuously studied and rigorously developed over millenia, offering a rich and structured niche for the purposes of our study. Being a living tradition, it offers the additional advantage of providing experienced domain experts to evaluate the language models in this work.
Composition of the dataset. Considering the outlined criteria, we introduce VedantaNY-10M, a curated philosophy dataset of public discourses.
1 Currently there exists no consensus on accurately dating these ancient scriptures. The Upanishads (which are a part of the Vedas) have been passed on orally for millennia and are traditionally not given a historic date. However, they seem to have been compiled and systematically organized sometime around 3000 B.C.E. by Vyasa. Likewise, the time period of Adi Shankaracharya also varies and he is usually placed between 450 B.C.E to 700 C.E.
To maintain authenticity while ensuring that the LLM hasn't previously been exposed to the source material, we curate our dataset from a collection of YouTube videos on Advaita Vedanta, sourced from the Vedanta Society of New York. It contains 10M tokens and encompasses over 750 hours of philosophical discourses by Swami Sarvapriyananda, a learned monk of the Ramakrishna Order. These discourses provide a rich and comprehensive exposition of the principles of Advaita Vedanta, making them an invaluable resource for our research.
Languages and scripts. The dataset primarily features content in English, accounting for approximately 97% of the total material. Sanskrit, the classical language of Indian philosophical literature, constitutes around 3% of the dataset. The Sanskrit terms are transliterated into the Roman script. To accommodate the linguistic diversity and the specific needs of the study, the dataset includes words in both English and Sanskrit, without substituting the Sanskrit terms with any English translations. Translating ancient Sanskrit technical terms having considerably nuanced definitions into English is a non-trivial problem (Malhotra and Babaji, 2020). Hence, our dual-language approach ensures that the Sanskrit terms and concepts are accurately represented and accessible, thereby enhancing the authenticity of our research material. Frequently occurring Sanskrit terms in the corpus are shown in Fig. 2. For excerpts from passages, please refer to Appendix Table 2.
## 4 In-context RAG for niche domains
We now discuss the methodology adopted to build an in-context retrieval augmented chatbot from the custom dataset described above.
We first define a generic chatbot C g that does not use retrieval as follows: C g : q → a g where q is the user query and a g is the answer generated by the chatbot. Now, let D t represent the textual data corpus from our knowledge domain and R be the retriever. Our goal is to build a retrieval-augmented generation chatbot C r : q × R ( D t , q ) → a r that will generate answer a r for the query by retrieving relevant context from D t using R . An overview of our approach is illustrated in Fig. 3. We first build D t from 765 hours of public discourses on Advaita Vedanta introduced in Sec. 3. When deployed, the system processes q by first using retriever R to identify the top-k most relevant passages P from D t using a similarity metric. Subsequently, a large language model (LLM) is prompted with both the query and the retrieved passages in-context, following Ram et al. (2023), to generate a contextually relevant response.
We now describe each of the components in detail. We follow a four-stage process as follows:
Transcription. We first need to create a dense textual corpus targeted at our niche domain. Since our dataset consists of YouTube videos, we first employ a transcription model to transcribe the audio into text. Our video corpus D v consists of 612 videos totaling 765 hours of content, with an average length of 1.25 hours per video. We extract audio content from D v and transcribe it using OpenAI's Whisper large-v2 model (Radford et al., 2023). This step converts the spoken discourses into a transcribed textual corpus D t consisting of 10M tokens in total. Since Whisper is a multi-lingual model, it has the capacity to support the dual-language nature of our dataset. We evaluate the transcription quality of Whisper in Sec. C.1.
Datastore creation. The transcribed text in D t is then segmented into shorter chunks called passages P , consisting of 1500 characters each. These chunks are then processed by a deep embedder to produce deep embedding vectors z dense . These embedded chunks are stored in a vector database D z . Ultimately, we store approximately 25,000 passage embeddings z ∈ D z , each representing a discrete chunk of the philosophical discourse in D t .
Retrieval. To perform retrieval-augmented generation, we first need to build a retrieval system R : D z × q → P that retrieves contextually relevant textual passages P from D t given D z and q . The retriever performs the following operation: P = D t [ argTop -k z ∈ D z sim ( q, z )] , where we use cosine similarity as the similarity metric. Standard RAG models employ state-of-theart deep embedders to encode documents and retrieve them during inference. However, these semantic embeddings can struggle to disambiguate between specific niche terminology in custom domains (Mandikal and Mooney, 2024). This can be particularly problematic in datasets having long-tail distributions such as ours. In addition, retrieved fixed-length passages are sub-optimal. Short incomplete contexts can be particularly damaging for LFQA, while longer contexts can contain unnecessary information that can confuse the generation model. To mitigate these two issues, we experiment with two key changes: (1) a keyword-based
Figure 3: Overview of the RAG model. We present VedantaNY-10M, a dataset derived from over 750 hours of public discourses on the ancient Indian philosophy of Advaita Vedanta, and build a retrieval-augmented generation (RAG) chatbot for this knowledge domain. At deployment, given a query q , the retriever R first retrieves the top-k most relevant passages P from the datastore using a hybrid keyword-based retriever. It then refines this retrieved context using a keyword-based context reshaper to adjust the passage length. Finally, an LLM is prompted with the query and the refined passages in-context. We conduct an extensive evaluation with computational linguists and domain experts to assess the model's real-world utility and identify challenges.
<details>
<summary>Image 3 Details</summary>

### Visual Description
## System Diagram: Chatbot Architecture
### Overview
The image presents a system diagram illustrating the architecture of a chatbot. It details the flow of information from a datastore through a retriever to the chatbot, highlighting key components such as hybrid retrieval, keyword extraction, context refinement, and the large language model (LLM).
### Components/Axes
The diagram is divided into three main sections:
1. **Datastore D**: A collection of video thumbnails featuring Swami Sarvapriyananda.
2. **Retriever R**: A process that retrieves relevant passages from the datastore.
3. **Chatbot C**: The core chatbot component that generates answers.
The diagram uses blue and green boxes to represent different stages of processing. Arrows indicate the direction of information flow.
### Detailed Analysis or ### Content Details
**1. Datastore D (Left Side):**
* The datastore is represented by a collection of video thumbnails. All thumbnails feature Swami Sarvapriyananda.
* Examples of video titles include:
* "MANDUKYA UPANISHAD CHAPTER 2 KARIKA-35"
* "VEDANTASARA of Sadananda PART 19: TEXTS 122-127"
* "Mundaka Upanishad PART 5 Mantras 1.1.8-9"
* "BHAGAVAD GITA Chapter 8 Verses 15-20"
* "QUESTIONS AND ANSWERS June 7th, 2020"
* "CONSCIOUSNESS AND SELF IN VEDANTA"
* "MODELS OF Consciousness IN INDIAN PHILOSOPHY"
* "ALL OF IT"
* "NONE OF IT"
* "SUNYAM THE VOID"
* "PURNAM THE FULL"
* "NO MIND"
**2. Retriever R (Middle Section):**
* **Hybrid Retrieval: Deep + Sparse**: This blue box represents the initial retrieval process. It takes input from the datastore.
* **Keyword extraction**: A blue box that is connected to the "Hybrid Retrieval" box.
* **Passage 1, Passage 2, ..., Passage k**: A stack of blue boxes representing the retrieved passages.
* **Context Refiner**: A blue box that refines the retrieved passages.
**3. Chatbot C (Right Side):**
* **Query (q)**: A green box representing the user's query. It receives input from the "Keyword extraction" box.
* **Prompt (p)**: A green box representing the prompt generated for the LLM. It receives input from the "Context Refiner" box and the "Query" box.
* **LLM**: A light green box representing the Large Language Model. It receives input from the "Prompt" box.
* **Answer (a)**: A green box representing the final answer generated by the chatbot. It is the output of the LLM.
### Key Observations
* The diagram illustrates a pipeline architecture where data flows sequentially from the datastore to the chatbot.
* The retriever uses a hybrid approach (Deep + Sparse) to retrieve relevant passages.
* Context refinement is used to improve the quality of the retrieved passages.
* The chatbot uses a Large Language Model (LLM) to generate answers.
### Interpretation
The diagram depicts a Retrieval-Augmented Generation (RAG) system for a chatbot. The datastore contains information, likely transcripts or summaries, from videos featuring Swami Sarvapriyananda. When a user poses a query, the system retrieves relevant passages from the datastore, refines the context, and uses an LLM to generate an answer based on the retrieved information. This approach allows the chatbot to provide informed and contextually relevant responses. The use of hybrid retrieval suggests an attempt to balance precision and recall in the retrieval process.
</details>
hybrid retriever to focus on unique low-frequency words, and (2) a context-refiner to meaningfully shorten or expand retrieved context.
1. Keyword-based retrieval. To emphasize the importance of key terminology, we first employ keyword extraction and named-entity recognition techniques on the query q to extract important keywords κ . During retrieval, we advocate for a hybrid model combining both deep embeddings as well as sparse vector space embeddings. We encode the full query in the deep embedder and assign a higher importance to keyphrases in the sparse embedder. The idea is to have the sparse model retrieve domainspecific specialized terms that might otherwise be missed by the deep model. Our hybrid model uses a simple weighted combination of the query-document similarities in the sparse and dense embedding spaces. Specifically, we score a document D for query q and keywords κ using the ranking function:
<!-- formula-not-decoded -->
where z d and z s denotes the dense and sparse embedding functions and Sim is cosine simi- larity measuring the angle between such vector embedddings. In our experiments, we set λ = 0 . 2 . Amongst the top-n retrieved passages, we choose k passages containing the maximum number of unique keywords.
2. Keyword-based context refinement. Furthermore, we refine our retrieved passages by leveraging the extracted keywords using a heuristic-based refinement operation to produce P ′ = Ref ( P, κ ) . For extension, we expand the selected passage to include one preceding and one succeeding passage, and find the first and last occurrence of the extracted keywords. Next, we trim the expanded context from the first occurrence to the last. This can either expand or shorten the original passage depending on the placement of keywords. This ensures that retrieved context contains relevant information for the generation model.
Generation. For answer generation, we construct prompt p from the query q and the retrieved passages ( P ′ 1 , P ′ 2 , ..., P ′ k ) ∈ P in context. Finally, we invoke the chatbot C r to synthesize an answer a r from the constructed prompt. For an example of the constructed RAG bot prompt, please refer to Fig. 5. This four-stage process produces a retrieval- augmented chatbot that can generate contextually relevant responses for queries in our niche domain.
Implementation Details. For embedding and generation, we experiment with both closed and open source language models. For RAG vs non-RAG comparison, we use OpenAI's text-embeddingada-002 model (Brown et al.) as the embedder and GPT-4-turbo (OpenAI, 2023) as the LLM for both C r and C g . For comparing RAG model variants, we use the open source nomic-embed-textv1 (Nussbaum et al., 2024) as our deep embedder and Mixtral-8x7B-Instruct-v0.1 (Jiang et al., 2024) as our generation model. For keyword extraction, we use an ensemble of different models including OpenKP (Xiong et al., 2019), KeyBERT (Grootendorst, 2020) and SpanMarker (Aarsen, 2020). We experimented with using language models such as ChatGPT for keyword extraction, but the results were very poor as also corroborated in Song et al. (2024). For further implementation details of the eval metrics, see Appendix Sec. A. The VedantaNY-10M dataset, code and evaluation is publicly available at https://github.com/ priyankamandikal/vedantany-10m .
## 5 Evaluation
We now evaluate the model along two axes: automatic evaluation metrics and a human evaluation survey. To ensure a broad and comprehensive evaluation, we categorize the questions into five distinct types, each designed to test different aspects of the model's capabilities:
1. Anecdotal: Generate responses based on stories and anecdotes narrated by the speaker in the discourses.
2. Comparative : Analyze and compare different concepts, philosophies, or texts. This category tests the model's analytical skills and its ability to draw parallels and distinctions.
3. Reasoning Require logical reasoning, critical thinking, and the application of principles to new scenarios.
4. Scriptural : Test the model's ability to reference, interpret, and explain passages from religious or philosophical texts.
5. Terminology : Probe the model's understanding of specific technical terms and concepts.
For a sample set of questions across the above five categories, please refer to Appendix Table 4.
## 5.1 Automatic Evaluation
Inspired by Xu et al. (2023), we conduct an extensive automatic evaluation of the two RAG models on our evaluation set. We describe each metric type below and provide implementation details in Appendix Sec. A. Due to the lack of gold answers, we are unable to report reference-based metrics.
Answer-only metrics: We assess features like fluency and coherence by analyzing responses with specific metrics: (1) Self-BLEU (Zhu et al., 2018) for text diversity, where higher scores suggest less diversity, applied in open-ended text generation; (2) GPT-2 perplexity for textual fluency, used in prior studies on constrained generation. We also consider (3) Word and (4) Sentence counts as length-based metrics, owing to their significant influence on human preferences (Sun et al., 2019; Liu et al., 2022; Xu et al., 2023).
(Question, answer) metric: To ensure answers are relevant to the posed questions, we model p ( q | a ) for ranking responses with RankGen (Krishna et al., 2022). Leveraging the T5-XXL architecture, this encoder is specially trained via contrastive learning to evaluate model generations based on their relevance to a given prefix, in this context, the question. A higher RankGen score indicates a stronger alignment between the question and the answer, serving as a measure of relevance.
(Answer, evidence) metric: A key challenge in LFQA is assessing answer correctness without dedicated factuality metrics, akin to summarization's faithfulness. We apply QAFactEval (Fabbri et al., 2022), originally for summarization, to LFQA by considering the answer as a summary and evidence documents as the source. Answers deviating from source content, through hallucinations or external knowledge, will score lower on this metric.
## 5.2 Human Evaluation
We have three experienced domain experts evaluate the models across the five categories. Each of these experts is closely associated with Vedanta Society of New York, and has extensively studied the philosophy in question for up to a decade on average, being well-versed with domain-specific terminology and conceptual analysis. We conduct the human survey along two dimensions: retrieval and generation. For retrieval, we evaluate relevance and completeness, and for generation we evaluate factual correctness and completeness. In addition, we ask the reviewers to provide free-form justifi-
Figure 4: Human evaluation: RAG vs non-RAG. Both transcription and retrieval performance receive high scores from the evaluators. For generation, the RAG model outperforms the generic model across various metrics, particularly in factuality, completeness and specificity, while being marginally lower in ease of understanding.
<details>
<summary>Image 4 Details</summary>

### Visual Description
## Comparative Analysis of RAG vs. Non-RAG Performance
### Overview
The image presents a comparative analysis of Retrieval-Augmented Generation (RAG) models against Non-RAG models across several key metrics: Transcription Quality, Relevance, Factuality, Completeness, Specificity, Ease of Understanding, Faithfulness, Hallucinations, and Outside Knowledge. The data is visualized using a combination of bar charts and pie charts to illustrate the performance differences between the two model types.
### Components/Axes
* **Transcription:**
* Metric: Transcription Quality
* Visualization: Pie chart
* Value: 4.48 (Green portion of the pie chart)
* **Retrieval:**
* Metric: Relevance
* Visualization: Pie chart
* Value: 4.34 (Green portion of the pie chart)
* **Generation:**
* Metrics: Factuality, Completeness, Specificity, Ease of Understanding, Faithfulness
* Visualization: Bar chart
* Y-axis: Numerical scale from 0.00 to 5.00, incrementing by 1.00
* X-axis: Categories - Factuality, Completeness, Specificity, Ease of Understanding, Faithfulness
* Legend:
* Red: Non-RAG
* Green: RAG
* **Hallucinations:**
* Visualization: Pie charts (one for Non-RAG, one for RAG)
* Categories: No, Not sure, Yes
* Legend:
* Green: No
* Yellow: Not sure
* Red: Yes
* **Outside Knowledge:**
* Visualization: Pie chart (RAG only)
* Categories: No, Not sure, Yes
* Legend:
* Green: No
* Yellow: Not sure
* Red: Yes
### Detailed Analysis
**1. Transcription Quality (Pie Chart):**
* Transcription Quality is represented by a pie chart with a single value of 4.48. This value likely represents a score or rating.
**2. Retrieval Relevance (Pie Chart):**
* Relevance is represented by a pie chart with a single value of 4.34. This value likely represents a score or rating.
**3. Generation (Bar Chart):**
* **Factuality:**
* Non-RAG (Red): Approximately 3.2
* RAG (Green): Approximately 4.0
* **Completeness:**
* Non-RAG (Red): Approximately 3.6
* RAG (Green): Approximately 4.1
* **Specificity:**
* Non-RAG (Red): Approximately 3.4
* RAG (Green): Approximately 4.5
* **Ease of Understanding:**
* Non-RAG (Red): Approximately 4.3
* RAG (Green): Approximately 4.2
* **Faithfulness:**
* Non-RAG (Red): Not present
* RAG (Green): Approximately 3.9
**4. Hallucinations (Pie Charts):**
* **Non-RAG:**
* Yes (Red): 30.0%
* Not sure (Yellow): 30.0%
* No (Green): 40.0%
* **RAG:**
* Yes (Red): 27.5%
* Not sure (Yellow): 25.0%
* No (Green): 47.5%
**5. Outside Knowledge (Pie Chart):**
* **RAG:**
* Yes (Red): 43.6%
* Not sure (Yellow): 38.5%
* No (Green): The remaining percentage, calculated as 100% - (43.6% + 38.5%) = 17.9%
### Key Observations
* RAG models generally outperform Non-RAG models in Factuality, Completeness, Specificity, and Faithfulness.
* The "Ease of Understanding" metric is slightly lower for RAG compared to Non-RAG.
* RAG models show a lower percentage of "Yes" responses for Hallucinations (27.5%) compared to Non-RAG models (30.0%), and a higher percentage of "No" responses (47.5% vs 40.0%).
* For Outside Knowledge in RAG models, "Yes" responses are at 43.6%, "Not sure" at 38.5%, and "No" at 17.9%.
### Interpretation
The data suggests that incorporating Retrieval-Augmented Generation (RAG) generally improves the performance of language models, particularly in areas like Factuality, Completeness, Specificity, and reducing Hallucinations. While "Ease of Understanding" is slightly lower for RAG, the overall benefits in other critical areas appear to outweigh this drawback. The "Outside Knowledge" pie chart for RAG models indicates that a significant portion of the generated content relies on external information, with a considerable percentage of responses being affirmative. The lower hallucination rate in RAG models suggests that retrieving and incorporating external knowledge helps ground the generated content in reality, reducing the likelihood of generating false or nonsensical information.
</details>
cation for their choices, which proves to be very useful in analyzing the two models.
Relevance: Defined as the relevance of the retrieved passages to the user query, this metric is scored on a scale from 1 to 5 (1 = Not at all relevant, 5 = Extremely relevant).
Correctness: Factual accuracy of the generated answer (1 = Inaccurate, 5 = No inaccuracies)
Completeness: Measures if the retrieved passage and generated answer comprehensively cover all parts of the query (1 = Not at all comprehensive misses crucial points, 5 = Very comprehensive).
## 5.3 Results: RAG vs Non-RAG
We first conduct a human evaluation survey with 5 computational linguists and 3 domain experts on RAG vs non-RAG models. In the evaluation of the generation capabilities of our models, we consider five metrics: factuality, completeness, specificity, ease of understanding, and faithfulness. The performance of the RAG model is compared against a baseline non-RAG model across these dimensions in Fig. 4. The RAG model substantially outperforms the non-RAG model across various metrics, particularly in factuality, completeness and specificity, while being marginally lower in ease of understanding. Sample responses in Figs. 7-11.
## 5.4 Results: Standard RAG vs Keyword-based RAG
We report results in Table 1. The keyword based RAG model shows strong improvement across all automatic metrics while significantly outperforming the standard model in the human evaluation. Amongst the answer-only metrics, the model tends to produce longer, more comprehensive answers (indicated by longer length), which are more coherent (lower perplexity). The question-answer RankGen metric (Krishna et al., 2022) evaluates the probability of the answer given the question. A higher score for the model suggests more relevant answers to the question. Most notably, the keyword model does very well on QAFactEval (Fabbri et al., 2022), which evaluates faithfulness by comparing answers from the summary (in our case, the answer) and the evidence (retrievals). A higher score indicates greater faithfulness of the answer to retrieved passages, indicating fewer hallucinations and reliance on outside knowledge.
For the human evaluation in Table 1, we report scores normalized between 0 to 1. A relevance rating of 0.82 for keyword-based RAG vs 0.59 for standard RAG indicates a strong alignment between the retrieved content and the users' queries for our model, demonstrating the efficacy of the retrieval process. Conversely, the standard model
Table 1: Automatic and human evaluation: standard RAG (M1) vs keyword-based RAG (M2). We report both automatic and human evaluation metrics calculated on 25 triplets of {question, answer, retrievals} across 5 different question categories. The key-word based RAG model shows strong improvement across all automatic metrics while significantly outperforming the standard model in the human evaluation.
| Category RAG Model | Mean M1/M2 | Anecdotal M1/M2 | Comparative M1/M2 | Reasoning M1/M2 | Scriptural M1/M2 | Terminology M1/M2 |
|----------------------|----------------------|-------------------|---------------------|-------------------|--------------------|---------------------|
| Automatic metrics | Automatic metrics | Automatic metrics | Automatic metrics | Automatic metrics | Automatic metrics | Automatic metrics |
| Answer-only | | | | | | |
| GPT2-PPL ↓ | 16.6/ 15.3 | 16.6 / 16.6 | 16.9/ 15.7 | 13.9/ 11.9 | 14.2 /14.7 | 21.5/ 17.7 |
| Self-bleu ↓ | 0.12 /0.13 | 0.11/ 0.05 | 0.10/ 0.06 | 0.15 /0.27 | 0.13 /0.16 | 0.09 /0.14 |
| # Words ↑ | 196/ 227 | 189 / 189 | 174/ 206 | 218/ 282 | 225/ 243 | 216/ 261 |
| # Sentences ↑ | 9.0/ 10.1 | 8.2 /7.6 | 7.8/ 9.4 | 9.6/ 11.8 | 10.0/ 10.6 | 9.4/ 11.0 |
| (Question, answer) | (Question, answer) | | | | | |
| RankGen ↑ | 0.46/ 0.48 | 0.42/ 0.52 | 0.44/ 0.47 | 0.41/ 0.43 | 0.51/ 0.52 | 0.52 /0.46 |
| (Answer, retrievals) | (Answer, retrievals) | | | | | |
| QAFactEval ↑ | 1.36/ 1.60 | 1.01/ 1.14 | 1.53/ 1.94 | 1.18/ 1.61 | 1.52 /1.36 | 1.56/ 1.95 |
| Human evaluation | Human evaluation | Human evaluation | Human evaluation | Human evaluation | Human evaluation | Human evaluation |
| Retrieval | | | | | | |
| Relevance ↑ | 0.59/ 0.82 | 0.41/ 0.88 | 0.79/ 0.85 | 0.73/ 0.83 | 0.48/ 0.73 | 0.55/ 0.81 |
| Completeness ↑ | 0.52/ 0.79 | 0.41/ 0.86 | 0.72/ 0.79 | 0.57/ 0.83 | 0.37/ 0.68 | 0.52/ 0.79 |
| Answer | | | | | | |
| Correctness ↑ | 0.61/ 0.86 | 0.40/ 0.89 | 0.81/ 0.88 | 0.71/ 0.85 | 0.52/ 0.81 | 0.63/ 0.89 |
| Completeness ↑ | 0.58/ 0.85 | 0.42/ 0.92 | 0.80/ 0.85 | 0.72/ 0.81 | 0.49/ 0.77 | 0.63/ 0.91 |
sometimes fails to disambiguate unique terminology and retrieves incorrect passages (see Fig. 12). In assessing the accuracy of the generated answer, the keyword-based RAG model significantly outperforms the standard model, indicating better alignment with verifiable facts. Fig. 13 shows a factually inaccurate response from the generic model. The keyword model achieves higher completeness scores for both the retrieval and generation. Sample responses are shown in Figs. 12-16.
## 6 Challenges
The evaluation in Sec. 5 shows that the RAG model provides responses that are not only more aligned with the source material but are also more comprehensive, specific, and user-friendly compared to the responses generated by the generic language model. In this section, we discuss the challenges we encountered while building the retrieval-augmented chatbot for the niche knowledge domain of ancient Indian philosophy introduced in this work.
Transcription. Our requirement of using a niche data domain having long-tail knowledge precludes the use of source material that the LLM has previously been exposed to. To ensure this, we construct a textual corpus that is derived from automated transcripts of YouTube discourses. These transcripts can sometimes contain errors such as missing punc- tuations, incorrect transcriptions, and transliterations of Sanskrit terms. A sample of such errors is shown in Appendix Table 3. A proofreading mechanism and/or improved transcription models can help alleviate these issues to a large extent.
Spoken vs written language. Unlike traditional textual corpora that are compiled from written sources, our dataset is derived from spoken discourses. Spoken language is often more verbose and less structured than written text, with the speaker frequently jumping between concepts mid-sentence. This unstructured nature of the text can be unfamiliar for a language model trained extensively on written text. A peculiar failure case arising from this issue is shown in Appendix Fig. 6. This can be addressed by converting the spoken text into a more structured prose format with the help of well-crafted prompts to LLMs, followed by human proofreading.
Context length. The passages retrieved in the standard model are of a fixed length and can sometimes be too short for many queries, especially for long-form answering. For instance, the retrieved passage may include a snippet from the middle of the full context, leading to incomplete or incoherent chatbot responses (Fig. 11). This prompted us to employ a keyword-based context-expansion mechanism to provide a more complete context. While this results in much better answer generation, the retrieved passage may contain too much information, making it difficult for the generator to reason effectively. Moreover, the increase in the number of tokens increases processing time. Future work can explore more advanced retrieval models capable of handling longer contexts and summarizing them effectively before input to the LLM.
Retrieval-induced hallucinations. There are scenarios when the RAG model latches onto a particular word or phrase in the retrieved passage and hallucinates a response that is not only irrelevant but also factually incorrect. A sample of such a hallucination is in Fig. 10. This is a more challenging problem to address. However, retrieval models that can extract the full context, summarize it and remove irrelevant information should be capable of mitigating this issue to a reasonable extent.
## 7 Conclusion
In this work, we integrate modern retrievalaugmented large language models with the ancient Indian philosophy of Advaita Vedanta. Toward this end, we present VedantaNY-10M, a large dataset curated from automatic transcriptions of extensive philosophical discourses on YouTube. Validating these models along various axes using both automatic and human evaluation provides two key insights. First, RAG models significantly outperform non-RAG models, with domain experts expressing a strong preference for using such RAG models to supplement their daily studies. Second, the keyword-based hybrid RAG model underscores the merits of integrating classical and contemporary deep learning techniques for retrieval in niche and specialized domains. While there is much work to be done, our study underscores the potential of integrating modern machine learning techniques to unravel ancient knowledge systems.
## Limitations and Future Work
While our study demonstrates the utility of integrating retrieval-augmented LLMs with ancient knowledge systems, there are limitations and scope for future work. First, this study focuses on a single niche domain of Advaita Vedanta as taught by one teacher. Expanding this study to include other ancient philosophical systems, such as the Vedantic schools of Vishishtadvaita and Dvaita, as well as various Buddhist and Jain traditions, would be a valuable extension. Second, incorporating primary scriptural sources, in addition to spoken discourses, would enhance the authenticity of the RAG model's outputs. Third, while we only experiment with RAG models in this study, finetuning the language models themselves on philosophy datasets is an interesting future direction. Fourth, the context refiner is currently heuristic-based and may not generalize well to all scenarios. Replacing it with a trained refiner using abstractive or extractive summarization techniques would considerably improve its utility and efficiency. Fifth, expanding the evaluation set and involving more subjects for evaluation and will considerably strengthen the study's robustness. Finally, while the language models in this work are primarily in English and Latin script, building native LLMs capable of functioning in the original Sanskrit language of the scriptures using Devanagari script is essential future work.
## Acknowledgments
The author would like to thank Prof. Kyle Mahowald for his insightful course on form and functionality in LLMs, which guided the evaluation of the language models presented in this paper. Fangyuan Xu provided valuable information on automatic metrics for LFQA evaluation. The author extends their gratitude to all the human evaluators who took the survey and provided valuable feedback, with special thanks to Dr. Anandhi who coordinated the effort among domain experts. Finally, the author expresses deep gratitude to the Vedanta Society of New York and Swami Sarvapriyananda for the 750+ hours of public lectures that served as the dataset for this project.
## Ethics Statement
All data used in this project has been acquired from public lectures on YouTube delivered by Swami Sarvapriyananda of the Vedanta Society of New York. While our study explores integrating ancient knowledge systems with modern machine learning techniques, we recognize their inherent limitations. These knowledge traditions have always emphasized the importance of the teacher in transmitting knowledge. We do not see LLMs as replacements for monks and teachers of these ancient traditions, but only as tools to supplement analysis and study. Moreover, users of these tools need to be made well aware that these models can and do make errors, and should therefore seek guidance from qualified teachers to carefully progress on the path.
## References
- Tom Aarsen. 2020. SpanMarker for Named Entity Recognition.
- Yannis Assael, Thea Sommerschield, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipanagiotou, Ion Androutsopoulos, Jonathan Prag, and Nando de Freitas. 2022. Restoring and attributing ancient texts using deep neural networks. Nature , 603.
- David Bamman and Patrick J Burns. 2020. Latin bert: A contextual language model for classical philology. arXiv preprint arXiv:2009.10053 .
- Bhagavad Gita. 3000 B.C.E. The Bhagavad Gita.
- Bartosz Bogacz and Hubert Mara. 2020. Period classification of 3d cuneiform tablets with geometric neural networks. In ICFHR .
- Bernd Bohnet, Vinh Q Tran, Pat Verga, Roee Aharoni, Daniel Andor, Livio Baldini Soares, Jacob Eisenstein, Kuzman Ganchev, Jonathan Herzig, Kai Hui, et al. 2022. Attributed question answering: Evaluation and modeling for attributed large language models. arXiv preprint arXiv:2212.08037 .
- Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. In ICML .
- The Brahmasutras. 3000 B.C.E. The Brahmasutras.
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.
- Anthony Chen, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. 2020. MOCHA: A dataset for training and evaluating generative reading comprehension metrics. In EMNLP .
- Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pretraining text encoders as discriminators rather than generators. In ICLR .
- Alexander Fabbri, Chien-Sheng Wu, Wenhao Liu, and Caiming Xiong. 2022. QAFactEval: Improved QAbased factual consistency evaluation for summarization. In NAACL .
- Maarten Grootendorst. 2020. Keybert: Minimal keyword extraction with bert.
- Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Realm: Retrieval augmented language model pre-training. In ICML .
- Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. spaCy: Industrialstrength Natural Language Processing in Python.
- Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane DwivediYu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2022. Atlas: Few-shot learning with retrieval augmented language models. JMLR .
- Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, MarieAnne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2024. Mixtral of experts. Preprint , arXiv:2401.04088.
- Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, and Colin Raffel. 2023. Large language models struggle to learn long-tail knowledge. In ICML .
- Vladimir Karpukhin, Barlas O˘ guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for opendomain question answering. EMNLP .
- Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2020. Generalization through memorization: Nearest neighbor language models. ICLR .
- Kalpesh Krishna, Yapei Chang, John Wieting, and Mohit Iyyer. 2022. Rankgen: Improving text generation with large ranking models. arXiv:2205.09726 .
- Kalpesh Krishna, Aurko Roy, and Mohit Iyyer. 2021. Hurdles to progress in long-form question answering. In NAACL .
- Philippe Laban, Tobias Schnabel, Paul N. Bennett, and Marti A. Hearst. 2022. Summac: Re-visiting nlibased models for inconsistency detection in summarization. TACL .
- Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020a. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In ACL .
- Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020b. Retrieval-augmented generation for knowledge-intensive nlp tasks. NeurIPS .
- Yixin Liu, Alexander R. Fabbri, Pengfei Liu, Yilun Zhao, Linyong Nan, Ruilin Han, Simeng Han, Shafiq R. Joty, Chien-Sheng Wu, Caiming Xiong, and Dragomir R. Radev. 2022. Revisiting the gold standard: Grounding summarization evaluation with robust human evaluation. ArXiv , abs/2212.07981.
- Ligeia Lugli, Matej Martinc, Andraž Pelicon, and Senja Pollak. 2022. Embeddings models for buddhist Sanskrit. In LREC .
- Rajiv Malhotra. 2021. Artificial Intelligence and the Future of Power: 5 Battlegrounds . Rupa Publications.
- Rajiv Malhotra and Satyanarayana Dasa Babaji. 2020. Sanskrit Non-Translatables: The Importance of Sanskritizing English . Amaryllis.
- Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Hannaneh Hajishirzi, and Daniel Khashabi. 2023. When not to trust language models: Investigating effectiveness and limitations of parametric and nonparametric memories. ACL .
- Priyanka Mandikal and Raymond Mooney. 2024. Sparse meets dense: A hybrid approach to enhance scientific document retrieval. In The 4th CEUR Workshop on Scientific Document Understanding, AAAI .
- Joshua Maynez, Shashi Narayan, Bernd Bohnet, and Ryan McDonald. 2020. On faithfulness and factuality in abstractive summarization. In ACL .
- Jacob Menick, Maja Trebacz, Vladimir Mikulik, John Aslanides, Francis Song, Martin Chadwick, Mia Glaese, Susannah Young, Lucy CampbellGillingham, Geoffrey Irving, et al. 2022. Teaching language models to support answers with verified quotes. arXiv preprint arXiv:2203.11147 .
- Ragaa Moustafa, Farida Hesham, Samiha Hussein, Badr Amr, Samira Refaat, Nada Shorim, and Taraggy M Ghanim. 2022. Hieroglyphs language translator using deep learning techniques (scriba). In International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC) .
- Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, et al. 2021. Webgpt: Browser-assisted questionanswering with human feedback. arXiv preprint arXiv:2112.09332 .
- Sonika Narang, MK Jindal, and Munish Kumar. 2019. Devanagari ancient documents recognition using statistical feature extraction techniques. S¯ adhan¯ a , 44.
- Zach Nussbaum, John X. Morris, Brandon Duderstadt, and Andriy Mulyar. 2024. Nomic embed: Training a reproducible long context text embedder. Preprint , arXiv:2402.01613.
OpenAI. 2023. Gpt-4 technical report.
- Asimina Paparigopoulou, John Pavlopoulos, and Maria Konstantinidou. 2022. Dating greek papyri images with machine learning. In ICDAR Workshop on Computational Paleography, https://doi. org/10.21203/rs .
- Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2023. Robust speech recognition via large-scale weak supervision. In ICML .
- Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083 .
Ramakrishna Order. Belur math.
- Adi Shankaracharya. 700 C.E. Commentary on the Upanishads.
- Weijia Shi, Julian Michael, Suchin Gururangan, and Luke Zettlemoyer. 2022. Nearest neighbor zero-shot inference. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing .
- Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. 2023. Replug: Retrievalaugmented black-box language models. arXiv preprint arXiv:2301.12652 .
- Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. 2021. Retrieval augmentation reduces hallucination in conversation. ACL .
- Thea Sommerschield, Yannis Assael, John Pavlopoulos, Vanessa Stefanak, Andrew Senior, Chris Dyer, John Bodel, Jonathan Prag, Ion Androutsopoulos, and Nando de Freitas. 2023. Machine Learning for Ancient Languages: A Survey. Computational Linguistics .
- Mingyang Song, Xuelian Geng, Songfang Yao, Shilong Lu, Yi Feng, and Liping Jing. 2024. Large language models as zero-shot keyphrase extractors: A preliminary empirical study. Preprint , arXiv:2312.15156.
- Simeng Sun, Ori Shapira, Ido Dagan, and Ani Nenkova. 2019. How to compare summarizers without target length? pitfalls, solutions and re-examination of the neural summarization literature. Workshop on Methods for Optimizing and Evaluating Neural Language Generation .
Upanishads. >3000 B.C.E. The Upanishads.
- Vedanta Society of New York. Swami Sarvapriyananda's Vedanta discourses.
- Lee Xiong, Chuan Hu, Chenyan Xiong, Daniel Campos, and Arnold Overwijk. 2019. Open domain web keyphrase extraction beyond language modeling. arXiv preprint arXiv:1911.02671 .
- Fangyuan Xu, Yixiao Song, Mohit Iyyer, and Eunsol Choi. 2023. A critical evaluation of evaluations for long-form question answering. In ACL .
- Eunice Yiu, Eliza Kosoy, and Alison Gopnik. 2023. Imitation versus innovation: What children can do that large language and language-and-vision models cannot (yet)? ArXiv , abs/2305.07666.
- Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu. 2018. Texygen: A benchmarking platform for text generation models. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval .
## A Implementation Details
## A.1 Automatic Metrics
Following Xu et al. (2023), we implement a number of automatic evaluation metrics for LFQA as described below.
Length We use the Spacy package (Honnibal et al., 2020) for word tokenization.
Self-BLEU We calculate Self-BLEU by regarding one sentence as hypothesis and all others in the same answer paragraph as reference. We report self-BLEU-5 as a measure of coherence.
RankGen For a given question q and a modelgenerated answer a , we first transform them into fixed-size vectors ( q , a ) using the RankGen encoder (Krishna et al., 2022). To assess their relevance, we compute the dot product q · a . We utilize the T5-XXL (11B) encoder, which has been trained using both in-book negative instances and generative negatives.
QAFactEval QAFactEval is a QA-based metric recently introduced by Fabbri et al. (2022). It has demonstrated exceptional performance across multiple factuality benchmarks for summarization (Laban et al., 2022; Maynez et al., 2020). The pipeline includes four key components: (1) Noun Phrase (NP) extraction from sentence S represented as Ans ( S ) , (2) BART-large (Lewis et al., 2020a) for question generation denoted as Q G , (3) Electralarge (Clark et al., 2020) for question answering labeled as Q A , and (4) learned metrics LERC (Chen et al., 2020), to measure similarity as Sim ( p i , s i ) . An additional answerability classification module is incorporated to assess whether a question can be answered with the information provided in document D . Following Xu et al. (2023), we report LERC, which uses the learned metrics to compare Ans S and Ans D (a).
## A.2 Chat Prompt
For an example of the constructed RAG bot prompt, please refer to Fig. 5. In this scenario, the RAG bot C r is presented with the top-k retrieved passages alongside the query for generating a response, whereas a generic bot C g would only receive the query without additional context.
## B Sample Sanskrit terms
Table 2 contains excerpts from passages containing Sanskrit terms. The Sanskrit terms are italicized
<details>
<summary>Image 5 Details</summary>

### Visual Description
## Diagram: Bot Descriptions
### Overview
The image presents a comparison between two types of bots: a "RAG Bot" and a "Generic Bot." Each bot is described with a brief instruction set, highlighting their purpose and how they should respond to queries. The RAG Bot is distinguished by its use of specific passages to formulate its answers.
### Components/Axes
* **RAG Bot Box:** Located at the top, with a light green background.
* **Title:** "RAG Bot"
* **Description:** "You are a helpful assistant that accurately answers queries using Swami Sarvapriyananda's YouTube talks. Use the following passages to provide a detailed answer to the query: {query} Passages: {Passage 1} {Passage 2} ... {Passage k}"
* **Generic Bot Box:** Located at the bottom, with a light red background.
* **Title:** "Generic Bot"
* **Description:** "You are a helpful assistant that accurately answers queries using Swami Sarvapriyananda's YouTube talks. Provide a detailed answer to the query: {query}"
### Detailed Analysis or ### Content Details
The RAG Bot's description explicitly instructs it to use provided passages to answer queries, while the Generic Bot's description does not mention the use of passages. Both bots are designed to answer queries based on Swami Sarvapriyananda's YouTube talks. The RAG Bot is given specific passages labeled "{Passage 1}", "{Passage 2}", and so on, up to "{Passage k}".
### Key Observations
* The key difference between the two bots lies in the RAG Bot's utilization of specific passages to formulate its answers, suggesting a more targeted or context-aware approach.
* Both bots are designed to provide detailed answers to queries related to Swami Sarvapriyananda's YouTube talks.
### Interpretation
The diagram illustrates two different approaches to building a question-answering bot. The RAG Bot (Retrieval-Augmented Generation) uses a retrieval mechanism to identify relevant passages and then generates an answer based on those passages. This approach can lead to more accurate and contextually relevant answers. The Generic Bot, on the other hand, likely relies on a more general knowledge base or a different method for generating answers, without explicitly using provided passages. The diagram highlights the importance of context and information retrieval in building effective question-answering systems.
</details>
Figure 5: Prompts for the RAG and generic chatbots. RAG Bot receives the top-k retrieved relevant passages in the prompt along with the query, while the generic bot only receives the query.
and underlined. Notice that the passages contain detailed English explanations of these terms. To retain linguistic diversity, authenticity and comprehensiveness of the source material, we retain these Sanskrit terms as is in our passages as described in Sec. 3. Note that these are direct Whisper (Radford et al., 2023) transcriptions with no further postprocessing or proofreading. Transcriptions may not always be accurate.
## C Transcription
We asses the transcript quality and list out some common errors.
## C.1 Transcript Evaluation
Transcription quality is scored on a scale from 1 to 5 (where 1 = Poor, 5 = Perfect). On 10 randomly sampled transcripts, evaluators assign a high average score of 4.48 suggesting that the transcription of YouTube audio into text is highly accurate and clear, indicating that our constructed custom dataset D t is of high quality.
## C.2 Transcript Errors
Table 3 contains a few sample transcription errors. The transcriptions are largely good for English words and sentences. However, errors often arise from incorrectly transcribing Sanskrit terms and verses. Other less common errors include missing
## Sl. No. Excerpts from passages
1. Om Bhadram Karne Bhishrinu Yamadevaha Bhadram Pashyam Akshabhirya Jatraaha Sthirai Rangai Stushta Vagam Sasthanubhi Vyase Madevahitaiyadayoh Swasthina Indro Vriddha Shravaha Swasthina Phusa Vishwa Vedaaha Swasthina Starksho Arishta Nemi Swasthino Brihas Patir Dadhatu Om Shanti Shanti Shanti .
2. Samsara is our present situation, the trouble that we are caught in, the mess that we are caught in. Samsara is this. In Sanskrit, normally when you use the word samsara , it really means this world of our life, you know, being born and struggling in life and afflicted by suffering and death and hopelessness and meaninglessness.
3. The problem being ignorance, solution is knowledge and the method is Jnana Yoga , the path of knowledge. So what is mentioned here, Shravana Manana Nididhyasana , hearing, reflection, meditation, that is Jnana Yoga . So that's at the highest level of practice, way of knowledge.
4. In Sanskrit, ajnana and adhyasa , ignorance and superimposition. Now if you compare the four aspects of the self, the three appearances and the one reality, three appearances, waker, dreamer, deep sleeper, the one reality, turiyam , if you compare them with respect to ignorance and error, you will find the waker, that's us right now. We have both ignorance and error.
5. Mandukya investigates this and points out there is an underlying reality, the Atman , pure consciousness, which has certain characteristics. This is causality, it is beyond causality. It is neither a cause nor an effect. The Atman is not produced like this, nor is it a producer of this. It is beyond change. No change is there in the Atman , nirvikara . And third, it is not dual, it is non-dual, advaitam . This is kadyakarana in Sanskrit, this is kadyakarana vilakshana Atma . In Sanskrit this is savikara , this is nirvikara Atma . This is dvaita , this is advaita Atma . So this is samsara and this is moksha , freedom.
## Notes
This is a Sanskrit chant which is directly Romanized and processed. The automatic transcriptions often contain errors in word segmentation for Sanskrit verses.
Samsara is a Sanskrit term. The excerpt contains an explanation of the concept in English.
The excerpt contains an explanation of Jnana Yoga , the path of knowledge.
Ajnana , adhyasa and turiyam are Sanskrit terms. Notice that the passage implicitly contains rough English translations of these terms in the context of the overall discourse. For instance, ajnana is translated as ignorance and adhyasa is translated as superimposition.
The excerpt contains an explanation of different Sanskrit technical terms.
Table 2: Excerpts from passages containing Sanskrit terms. These excerpts contain detailed English descriptions of technical terms in Sanskrit (italicized and underlined). To retain authenticity to the source material, we retain these Sanskrit terms as is in our passages. Note that these are direct Whisper (Radford et al., 2023) transcriptions with no further post-processing or proofreading, so transcriptions may not always be accurate. For more details, refer to Sec. B.
Figure 6: Subject confusion: Aldous Huxley vs Alan Watts. In the retrieved passage, the speaker jumps between speaking about Aldous Huxley and Alan Watts. Moreover, the transcription has poor punctuation. As a result, the RAG bot incorrectly attributes information about Huxley to Watts in the response. Due to the unstructured nature of the spoken text, such errors can occur. This can be addressed by converting the spoken text into a more structured prose format with the help of well-crafted prompts to LLMs, followed by human proofreading (see Sec. D).
<details>
<summary>Image 6 Details</summary>

### Visual Description
## Text Comparison: Alan Watts and Vedanta
### Overview
The image presents a comparison of two text passages related to Alan Watts' association with Vedanta. The left passage is labeled "Answer" and attributed to "RAG Bot," while the right passage is labeled "Passages" and titled "Swami Vivekananda: The Power and the Glory | Swami Sarvapriyananda." Both passages discuss Alan Watts' relationship with Vedanta, with certain phrases highlighted in green and red, presumably indicating agreement and disagreement or positive and negative sentiment, respectively.
### Components/Axes
* **Titles:** "Was Alan Watts' associated with Vedanta?" (overarching question), "Answer" (left passage), "RAG Bot" (attribution for left passage), "Passages" (right passage), "Swami Vivekananda: The Power and the Glory | Swami Sarvapriyananda" (title of right passage).
* **Text Content:** Two blocks of text, one providing an answer to the question and the other providing supporting passages.
* **Highlights:** Phrases within both passages are highlighted in green and red.
### Detailed Analysis or ### Content Details
**Left Passage ("Answer" - RAG Bot):**
* "Yes, Alan Watts was associated with Vedanta."
* Green highlight: "Although he was not directly connected with the Vedanta Society, he taught Vedanta and Zen, which was a kind of eclectic mixture of both."
* "In the 1950s and 60s, when he wrote his book 'The Perennial Philosophy,' some reviews were critical, suggesting that he had become part of some Hindu cult."
* Red highlight: "However, he maintained his association with the Vedanta Society of Southern California until the very end."
* "Alan Watts was one of the leading intellectuals in Britain, along with other British intellectuals like Aldous Huxley, Christopher Isherwood, and Gerald Hurd. While Huxley and Isherwood were directly associated with Vedanta, Alan Watts taught Vedanta and Zen, which shows his connection to Vedantic ideas."
**Right Passage ("Passages" - Swami Vivekananda):**
* "50s and 60s and when he wrote that book some reviews were nasty some said that oh he's become part of some Hindu cult or something like that you know but"
* Red highlight: "he maintained his association with the Vedanta Society till the very end it's very closely connected with the Vedanta Society of Southern California Aldous Huxley one of the leading intellectuals in Britain and then they were all British by the way Huxley Christopher Isherwood Gerald Hurd, Alan Watts."
* "Alan Watts"
* Green highlight: "was not directly connected with Vedanta but he taught Vedanta and Zen a kind of eclectic mixture of both. Aldous Huxley his amazing book the perennial philosophy."
* "Houston Smith major figure in in the study of comparative religion and his book the world's religions he wrote that he was a disciple of Swami Satprakashananda in the Vedanta Society in St. Louis so this spread of Vedantic ideas, yogic ideas, ideas of and then later on Buddhism and so on that Vivekananda was the pioneer he opened the door here and he went back to India the other side of his work. Sister Nivedita writes when he stood up to speak here in Chicago his message for the West he said I have a message for the West as Buddha had the message for the East but his message his words Nivedita writes traveled back across the dark oceans to a land to his motherland asleep to awaken her to a sense of her greatness. When he went back to India, India which was colonized which was starving superstitious divided he was the first person historian says to"
### Key Observations
* Both passages address Alan Watts' relationship with Vedanta.
* The "Answer" passage provides a direct response to the question, while the "Passages" passage offers supporting context.
* The green highlights indicate agreement or positive associations, while the red highlights indicate disagreement or negative associations.
* Both passages mention Alan Watts' book "The Perennial Philosophy."
* Both passages mention Aldous Huxley.
### Interpretation
The image presents a comparison of two perspectives on Alan Watts' association with Vedanta. The "RAG Bot" answer acknowledges his association but clarifies that it wasn't a direct connection with the Vedanta Society. The supporting passage from Swami Vivekananda reinforces this nuanced view, highlighting both his connection and his independent approach to Vedanta and Zen. The highlighting suggests areas of agreement and disagreement or positive and negative sentiment regarding the nature and extent of his involvement. The inclusion of figures like Aldous Huxley suggests a broader intellectual context for understanding Watts' work.
</details>
or incorrect punctuation. Human proofreading will remove these errors to a large extent.
## D Spoken vs written language
Unlike traditional textual corpora that are compiled from written sources, our dataset is derived from spoken discourses. Spoken language is often more verbose and less structured than written text, with the speaker frequently jumping between concepts mid-sentence. This unstructured nature of the text can be unfamiliar for a language model trained extensively on written text, which expects a more coherent and structured input. Fig. 7 shows a peculiar failure case arising from this issue. The speaker jumps between speaking about Aldous Huxley and Alan Watts. As a result, the RAG bot incorrectly attributes information about Huxley to Watts while answering the query. Due to the unstructured nature of the spoken text, such errors can occur. This can be addressed by converting the spoken text into a more structured prose format with the help of well-crafted prompts to LLMs, followed by human proofreading.
## E Question Categories
To ensure a broad and comprehensive evaluation, we categorize the questions into five distinct types-anecdotal, teminology, scriptural, comparative and reasoning. Questions from each of these categories is provided in Tab. 4. Each category is designed to test different aspects of the model's capabilities. For details, refer to Sec. 5.
## F Sample Responses: RAG vs Non-RAG
In Figs. 7-11, we present the sample queries and responses from the RAG and generic chatbots for five sample queries. The query is highlighted in yellow, RAG response is in the green box. NonRAG response is in the red box. We use GPT-4turbo (OpenAI, 2023) as the large language model for both of them.
## G Sample Responses: Standard RAG vs Keyword-based Hybrid RAG
In Figs. 12-16, we present the sample queries and responses from the standard RAG and keywordbased hybrid RAG model for one query each from five categories. We additionally also show the human-annotated and model generated keywords. Keyword-RAG response is in the green box. Standard-RAG response is in the red box. We use Mixtral-8x7B-Instruct-v0.1 (Jiang et al., 2024) as the large language model for both of them.
Table 3: Sample transcription errors. For constructing our text corpus, we directly use the transcripts obtained from Whisper (Radford et al., 2023) with no further post-processing or proofreading. The transcriptions are largely good (with a score of 4.5/5 from human evaluators). However, errors arise from incorrectly transcribing Sanskrit terms, missing punctuations, etc. Human proofreading will remove these errors to a large extent.
| Sl. No. | Transcription errors | Notes |
|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| 1 | That's what Sam Altman, Chachjipiti, some- body asked him. | Should be ChatGPT |
| 2 | Last year, you studied extensively with Profes- sor Garfield, I believe, studying Vajamaka and the teachings of the Garjuna . | Should be Madhyamaka and Nagarjuna , respec- tively |
| 3 | From attachment comes desire, raga , I want it and if that desire is satisfied then there is no end to it, greed, lobha . But if it is somehow thwarted, then anger, kama krodho vijayate . | Should be bhijayate |
| 4 | In fact, one of the terms which is used in Mandukya Upanishad, Brahman is abhyavaharyam . | Should be avyavaharam |
| 5 | So, one of them was the Brahmo Samad , which was quite popular in Calcutta in those days. | Should be Samaj |
| 6 | I am awareness I'm eternal consciousness Al- dous Huxley Christopher Isherwood Gerald Hurd all of them were very close to Swami Prabhavananda in Southern California in Hol- lywood and look at the product of that Ish- erwood wrote that one of the most amazing biographies | The transcripts sometimes miss punctuation marks, making the passage dif- ficult to comprehend for both humans and language models |
Table 4: Question categories. To ensure a broad and comprehensive evaluation, we categorize the questions into five distinct types, each designed to test different aspects of the model's capabilities. For details, refer to Sec. 5.
| Category | Description | Questions |
|-------------|--------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Anecdotal | Stories and anecdotes narrated by the speaker in the discourses | • Does Swami speak about Wittgenstein's thesis defense? • Does Swami narrate any incident surrounding Shivaratri? • Does Swami speak about The Matrix movie? • Does Swami speak about Vachaspati Mishra? Does he narrate how Bhamati came to be written? • What was Christopher Isherwood's contribution to Vedanta? |
| Terminology | Probe the model's understanding of specific terms and concepts | • What is Adhyaropa Apavada? • What is Vikshepa Shakti? • What is the significance of the word 'Shraddha'? • What is Upadana Karana? • What constitutes Sadhana Chatushtaya? |
| Scriptural | Reference, interpret, and explain passages from religious or philosophical texts | • In Mandukya Upanishad, what is the significance of the word 'Om'? • In the Gospel, what parable does Sri Ramakrishna use to portray intense longing for God? • In the Mundaka Upanishad, how do we interpret the parable of the two birds? • How is Phala Vyapti and Vritti Vyapti defined in Vedantasara? • In the Gospel of Sri Ramakrishna, how do we understand the analogy of the salt doll that Thakur provides? |
| Comparative | Analyze and compare different concepts, philosophies, or texts | • As mentioned in the Yoga Sutras, is Samadhi necessary to attain enlightenment according to Advaita Vedanta? • Would Sri Ramakrishna's teachings be considered purely Ad- vaitic? • In Kashmir Shaivism, Chit is both Prakasha and Vimarsha i.e. it is both self-lumious and self-reflective. Would Advaita Vedanta agree with this? • How does Sankhya differ from Advaita Vedanta? • What is the main difference between Buddhist Shunyavada and Advaita Vedanta? |
| Reasoning | Require logical reasoning, critical thinking, and the application of principles to new scenarios | • Can AI ever become conscious? • Is the waking state similar to a dream or absolutely indistinguish- able from the dream state? • Do our senses report reality to us? • Dis-identifying myself from the body-mind seems to require a lot of effort. Did the reverse process of identification also involve as much effort? • If Brahman as Existence-Consciousness-Bliss is the eternal sub- ject that cannot be objectified, yet eternally reveals itself to give rise to apparent objects through the principle of Maya, can we infer that Brahman is Existence-Consciousness-Bliss-Maya? That is, is Brahman inseparable from Maya? |
<details>
<summary>Image 7 Details</summary>

### Visual Description
## Text Analysis: Brahman and Atman
### Overview
The image presents a question about Sam Altman's belief in the absolute equivalence of Brahman and Atman, followed by two different answers generated by "RAG Bot" and "Generic Non-RAG Bot". It also includes excerpts from passages related to the topic, specifically from the Bhagavad Gita.
### Components/Axes
* **Title:** Sam Altman says that he believes in the absolute equivalence of Brahman and Atman. What does it mean?
* **Answers:** Two answers are provided, one from "RAG Bot" and another from "Generic Non-RAG Bot".
* **Passages:** Excerpts from the Bhagavad Gita are included.
### Detailed Analysis or ### Content Details
**1. Question:**
* "Sam Altman says that he believes in the absolute equivalence of Brahman and Atman. What does it mean?"
**2. Answer from RAG Bot:**
* Sam Altman, a prominent figure in the tech industry, has expressed his belief in the absolute equivalence of Brahman and Atman.
* To understand this, one needs to delve into the concepts of Brahman and Atman in the context of Advaita Vedanta, a school of Hindu philosophy.
* In Advaita Vedanta, Brahman refers to the ultimate reality or the absolute truth. It is considered the underlying essence of everything in the universe. Brahman is often described as infinite, eternal, and beyond all limitations. It is the source and substratum of all existence.
* Atman refers to the individual self or the true essence of an individual. It is the innermost core of our being, beyond the physical body, mind, and ego.
* Atman is said to be pure consciousness, unchanging and eternal.
* The belief in the absolute equivalence of Brahman and Atman means that there is no fundamental difference between the ultimate reality and the individual self.
* It suggests that the true nature of the individual self is identical to the nature of the ultimate reality. In other words, the essence of every individual is not separate from the essence of the universe.
* This concept is central to Advaita Vedanta, which emphasizes the non-dual nature of reality. It asserts that the perceived duality and multiplicity in the world is an illusion, and the ultimate truth is the oneness of Brahman.
* The individual self, or Atman, is considered to be a manifestation or reflection of Brahman.
* Recognizing the equivalence of Brahman and Atman allows one to realize their true nature and attain liberation or enlightenment.
* This realization involves transcending the limited identification with the ego and realizing the underlying unity of all existence.
* Sam Altman's belief aligns with the teachings of Advaita Vedanta. This philosophy has been expounded by various spiritual masters and is deeply rooted in the ancient texts of Hinduism, such as the Upanishads and the Bhagavad Gita.
**3. Answer from Generic Non-RAG Bot:**
* Sam Altman, the American entrepreneur and investor, has expressed his belief in the absolute equivalence of Brahman and Atman.
* To understand this, it is important to delve into the concepts of Brahman and Atman in Hindu philosophy.
* In Hinduism, Brahman refers to the ultimate reality or the supreme cosmic power that underlies and permeates everything in the universe.
* It is considered to be the unchanging, eternal, and infinite essence from which all existence arises. Brahman is often described as being beyond human comprehension and transcending all dualities.
* Atman refers to the individual self or soul. It is the innermost essence of an individual, which is believed to be eternal and unchanging.
* Atman is considered to be a part of Brahman, and it is through self-realization or spiritual enlightenment that one can realize their identity with Brahman.
* The belief in the absolute equivalence of Brahman and Atman suggests that there is no fundamental difference between the ultimate reality (Brahman) and the individual self (Atman).
* It implies that the true nature of every individual is divine and that the individual self is essentially one with the supreme cosmic power.
* This concept is rooted in Advaita Vedanta, a school of Hindu philosophy that emphasizes the non-dual nature of reality.
* According to Advaita Vedanta, the apparent distinctions and divisions in the world are illusory, and the ultimate truth is the oneness of Brahman.
* By stating his belief in the absolute equivalence of Brahman and Atman, Sam Altman aligns himself with this philosophical perspective.
* It suggests that he sees the individual self as an inseparable part of the divine reality and recognizes the inherent unity of all existence.
**4. Passages:**
* **117. Bhagavad Gita | Chapter 9 Verse 34 Revision | Swami Sarvapriyananda:**
* A discussion about Sam Altman's belief in the absolute equivalence of Brahman and Atman, mentioning YouTube comments and the concept of non-dualism. It also touches upon the ego and its role in spirituality, referencing Krishna's teachings in the 9th chapter.
* **8. Bhagavad Gita I Chapter 2 Verses 20-22 | Swami Sarvapriyananda:**
* An explanation of the verses, denying the sixfold changes of the Atman. It explains that Atman literally means the self, and Brahman means the vast. The conclusion of Advaita Vedanta is that Atman is Brahman. It mentions the great Mahavakyas, such as "tat tvam asi" (that thou art) and "Aham brahmasmi" (I am Brahman).
### Key Observations
* Both "RAG Bot" and "Generic Non-RAG Bot" provide similar explanations of Brahman and Atman, emphasizing their equivalence in Advaita Vedanta.
* The passages from the Bhagavad Gita provide context and support for the concept of Brahman and Atman.
### Interpretation
The image presents a philosophical question and provides explanations from two different AI models, along with relevant passages from the Bhagavad Gita. The answers and passages suggest that the belief in the absolute equivalence of Brahman and Atman is rooted in Advaita Vedanta, which emphasizes the non-dual nature of reality and the oneness of the individual self with the ultimate reality. The passages from the Bhagavad Gita offer further insight into the concepts of Brahman and Atman, highlighting their significance in Hindu philosophy. The discussion about Sam Altman's belief adds a contemporary context to these ancient concepts.
</details>
Figure 7: RAG vs Non-RAG: Sam Altman on Atman and Brahman. The retrieved passages are very good at capturing the context of the query. The first passage references the monk speaking about Sam Altman and his views on Atman and Brahman. The second passage contains a brief explanation of Atman and Brahman that is helpful to answer the query. The generic bot also generates a reasonable answer. However, it has a sentence stating that, 'Atman is a part of Brahman'. This is incorrect in the context of Advaita Vedanta as it is not the same as equivalence-this point was highlighted by the domain experts in the survey.
<details>
<summary>Image 8 Details</summary>

### Visual Description
## Text Comparison: RAG Bot vs. Non-RAG Bot and Other Passages
### Overview
The image presents text from different sources, including responses from a RAG Bot and a Generic Non-RAG Bot to the question of whether Swamiji speaks about Wittgenstein's dissertation defense. It also includes passages from "The Language of Paradox in Advaita - Swami Sarvapriyananda" and "Vedanta Retreat at Loon Lake on Who am I by Swami Sarvapriyananda Lecture 11".
### Components/Axes
The image is divided into four main sections:
1. **RAG Bot Response:** This section contains the question and the answer provided by a RAG Bot.
2. **Generic Non-RAG Bot Response:** This section contains the answer provided by a Generic Non-RAG Bot.
3. **Passages:** This section contains text from "The Language of Paradox in Advaita - Swami Sarvapriyananda".
4. **Vedanta Retreat:** This section contains text from "Vedanta Retreat at Loon Lake on Who am I by Swami Sarvapriyananda Lecture 11".
### Detailed Analysis or ### Content Details
**1. RAG Bot Response:**
* **Question:** "Does Swamiji speak about Wittgenstein's dissertation defense?"
* **Answer:**
* Yes, Swami Sarvapriyananda does speak about Wittgenstein's dissertation defense in one of his YouTube talks.
* He shares an interesting story about Wittgenstein's PhD examination at Cambridge University.
* The examiners for Wittgenstein's thesis were Bertrand Russell and G.E. Moore, both renowned philosophers.
* Instead of asking Wittgenstein questions about his thesis, Russell and Moore started chatting with each other.
* Eventually, Moore suggested that they should ask Wittgenstein some questions.
* They opened Wittgenstein's book, Tractatus, and asked him to explain a particular statement.
* Wittgenstein and Russell then started quarreling over it.
* At this point, Wittgenstein stood up, patted the examiners on their back, and said, "Give it up, boys. You'll never get it." He then walked out of the room.
* Surprisingly, Russell and Moore awarded him the PhD right then and there.
* Swami Sarvapriyananda uses this story to emphasize the difficulty of understanding certain philosophical concepts.
* He mentions that even great philosophers like Wittgenstein can have disagreements and struggles in their pursuit of knowledge.
* Swami Sarvapriyananda also highlights the importance of engaging in dialectics and considering opposing viewpoints to deepen our understanding of our own positions.
* While Swami Sarvapriyananda does not directly discuss Wittgenstein's dissertation defense in relation to spiritual growth or enlightenment, he does mention the value of engaging in discussions and responding to questions in the context of teaching and learning.
* He believes that teaching and responding to queries provide an opportunity to think about teachings from different angles and deepen one's own understanding.
**2. Generic Non-RAG Bot Response:**
* **Answer:**
* Yes, Swami Sarvapriyananda does speak about Wittgenstein's dissertation defense in one of his YouTube talks.
* In his talk titled "Understanding Wittgenstein's Philosophical Investigations," Swami Sarvapriyananda discusses various aspects of Wittgenstein's work, including his dissertation defense.
* He provides insights into the historical context of Wittgenstein's defense, the arguments he presented, and the impact it had on his later philosophical development.
* Swami Sarvapriyananda also delves into the significance of Wittgenstein's ideas and their relevance in contemporary philosophy.
* Overall, his talk offers a comprehensive analysis of Wittgenstein's dissertation defense and its implications.
**3. Passages:**
* **Title:** The Language of Paradox in Advaita - Swami Sarvapriyananda
* **Content:**
* Thesis and we will pass it. I'll get it through. So you'll get a doctorate and you don't have to bother about it. Then you can teach. We want you to teach in Cambridge University.
* So the PhD examination of Wittgenstein. The story is very interesting. The examiners for a PhD thesis. Imagine who are the examiners. Bertrand Russell is the examiner. It's already fixed.
* So and the other examiner, those who are in philosophy, know this name is G.E. Moore. G.E. Moore. Another one of Bertrand Russell's senior contemporary. One of the great names of English philosophy. So these two are the PhD examiners. Imagine the examination.
* And the student is Wittgenstein. They started chatting to each other. After some time, Moore said to Russell, Bertie, don't you think we should ask him some questions? So all right. They opened that book, Tractatus. They opened that book at a page and said, explain this. This statement. Wittgenstein said something. Bertrand Russell said something. They started quarreling over it.
* And then student Wittgenstein, he stands up. He pats the examiners on their back. The Oxford, the Cambridge dons on their back and says, give it up, boys. You'll never get it. And he walks out. And they give him the PhD there and then. You've got a PhD. Says, give it up, boys. You'll never get it. But we'll try to understand.
* And we'll bring the light of Shankaracharya's commentary to bear on this problem. 1,400 years ago in India, Shankaracharya, who was 16 years old, if you go by
**4. Vedanta Retreat:**
* **Title:** Vedanta Retreat at Loon Lake on Who am I by Swami Sarvapriyananda Lecture 11
* **Content:**
* Philosophers Of the next generation Now, one day One of the students Was saying that Professor this is what you said in the last class And Wittgenstein said No I didn't say this
* And the student was pretty sure that's exactly what the professor had said So that day he wrote down the exact words What Wittgenstein said Next day he said Sir you said in the last class You said this And Wittgenstein said no no no I did not say that And he said no I have written it down exactly as you said You said this And Wittgenstein said Oh but those are my words You don't understand what I mean by those words You are using the same words But we don't mean the same thing I can see that you do not understand that Very interesting What happened to him Did he denounce the Tractatus of the Praetorians And say that it was a really good answer No, when he wrote the Tractatus He said all the problems of philosophy have been solved I give up philosophy And he went away to become a school teacher And he taught mathematics To children in a village I think in Switzerland and Austria And then he realized That all that he had done in the Tractatus Was wrong So he came back again to Cambridge And he worked on a new book And he became the Philosophical Investigations So he is probably the only philosopher Who wrote two philosophical classics In his lifetime Both opposite, completely opposite Repudiating the earlier work Tractatus and Philosophical Investigations Anyway So language One has to be careful Don't
### Key Observations
* Both the RAG Bot and the Generic Non-RAG Bot confirm that Swami Sarvapriyananda speaks about Wittgenstein's dissertation defense.
* The RAG Bot provides a more detailed and narrative answer, including a specific anecdote about Wittgenstein's PhD examination.
* The Generic Non-RAG Bot provides a more concise and summary-like answer.
* The "Passages" section provides a similar anecdote about Wittgenstein's PhD examination, suggesting a common source or theme.
* The "Vedanta Retreat" section presents a different anecdote about Wittgenstein and a student, highlighting the complexities of language and understanding.
### Interpretation
The image compares responses from different AI models (RAG Bot and Generic Non-RAG Bot) regarding a specific topic. The RAG Bot provides a more detailed and contextualized answer, while the Generic Non-RAG Bot offers a more concise summary. The inclusion of passages from Swami Sarvapriyananda's lectures provides additional context and insights into the topic of Wittgenstein's work and philosophical discussions. The anecdote about Wittgenstein's PhD examination appears in both the RAG Bot's response and the "Passages" section, suggesting a common source or a well-known story related to Wittgenstein's life and work. The "Vedanta Retreat" section offers a different perspective on Wittgenstein's views and the challenges of interpreting his ideas.
</details>
Figure 8: RAG vs Non-RAG: Wittgenstein's thesis defense. The retrieved passage is about the monk narrating a story about Wittgenstein's thesis defense, which is highly relevant to the query. The generated RAG bot response makes use of this passage to answer the query. However, the latter half of the response is not relevant to the query and is purely hallucinated. The generic bot response, on the other hand, is completely fabricated and references a non-existent YouTube talk.
<details>
<summary>Image 9 Details</summary>

### Visual Description
## Textual Analysis: AI Consciousness Debate
### Overview
The image presents a discussion on whether Artificial Intelligence (AI) can ever become conscious. It includes answers from two different AI bots (RAG Bot and Generic Non-RAG Bot) and excerpts from a Q&A session with Swami Sarvapriyananda.
### Components/Axes
The image is divided into three main sections:
1. **Left Side:** Two "Answer" sections, one for "RAG Bot" and one for "Generic Non-RAG Bot."
2. **Right Side:** Two "Passages" sections, both titled "Ask Swami with Swami Sarvapriyananda | May 28th, 2023."
### Detailed Analysis or ### Content Details
**1. RAG Bot Answer:**
* **Main Idea:** According to Swami Sarvapriyananda, AI is not capable of becoming conscious, even though it can replicate human capacities like intelligence, memory, decision-making, and creativity.
* **Key Points:**
* "Artificial intelligence" implies imitation, not consciousness.
* AI can perform tasks and simulate human behavior.
* Imitating human creativity does not imply consciousness.
* Consciousness is the feeling of life itself and the capability of experiencing.
* AI can perform complex tasks but does not generate consciousness itself.
* AI is focused on replicating human intelligence and behavior but does not possess consciousness.
* The question of AI becoming conscious is a topic of debate.
**2. Generic Non-RAG Bot Answer:**
* **Main Idea:** AI refers to the development of computer systems that perform tasks requiring human intelligence. The question of AI consciousness is complex and debated.
* **Key Points:**
* Consciousness is the state of being aware and perceiving surroundings, thoughts, and emotions.
* It is a subjective experience difficult to define and measure.
* No consensus exists on what constitutes consciousness.
* Consciousness may emerge from neuron interactions in the human brain.
* Replicating human complexity in AI is a challenge.
* AI lacks subjective experience and self-awareness.
* AI systems use predefined algorithms, not genuine understanding.
* AI consciousness remains a theoretical possibility with no concrete evidence.
* The nature of consciousness and its emergence in the human brain is a complex research area.
**3. Passages - Ask Swami with Swami Sarvapriyananda | May 28th, 2023 (Top):**
* **Main Idea:** Addressing the question of AI consciousness in relation to Atman (the self).
* **Key Points:**
* Scientists are not busy creating consciousness through AI.
* AI is imitating human capacities like intelligence, memory, decision-making, and creativity.
* Mention of using chat GPT to write poems on Swami Vivekananda quickly.
**4. Passages - Ask Swami with Swami Sarvapriyananda | May 28th, 2023 (Bottom):**
* **Main Idea:** AI can do many things, but it lacks consciousness, which is the feeling of life itself.
* **Key Points:**
* AI can do almost everything except consciousness.
* Consciousness is the capability of experiencing.
* Example of self-driving cars in San Francisco, which perform activities but lack the human sense of sound, sight, and continuous feeling of taking decisions.
### Key Observations
* Both AI bots and Swami Sarvapriyananda agree that AI currently lacks consciousness.
* The RAG Bot answer is based on Swami Sarvapriyananda's views.
* The Generic Non-RAG Bot provides a more general scientific perspective.
* Swami Sarvapriyananda emphasizes the importance of the feeling of life itself and the capability of experiencing as essential components of consciousness.
### Interpretation
The image presents a multifaceted perspective on the question of AI consciousness. It highlights the distinction between AI's ability to replicate human intelligence and its lack of subjective experience and self-awareness. The discussion emphasizes that while AI can perform complex tasks, it does not possess the feeling of life itself, which is considered a crucial aspect of consciousness. The inclusion of Swami Sarvapriyananda's views adds a philosophical dimension to the debate, linking it to concepts like Atman. The self-driving car example effectively illustrates the difference between AI's functional capabilities and the human experience of consciousness. The overall tone suggests that AI consciousness remains a complex and unresolved issue, requiring further exploration and debate.
</details>
Figure 9: RAG vs Non-RAG: AI and Consciousness. The retrieved passages are again highly relevant to the query and the generated answer is coherent. However, it does contain a statement about 'a mirror reflecting a face' which is not present in the passages. This was pointed out by the experts as a relevant statement possibly drawn from outside knowledge. The generic bot's response is highly general and does not represent the monk's views on the topic.
<details>
<summary>Image 10 Details</summary>

### Visual Description
## Textual Analysis: "Do our senses report reality to us?"
### Overview
The image presents a collection of text passages discussing the philosophical question of whether our senses accurately represent reality. It includes answers from a "RAG Bot" and a "Generic Non-RAG Bot," along with excerpts from a discussion with Swami Sarvapriyananda and an introduction to a Vedanta lecture.
### Components/Axes
The image is divided into three main sections:
1. **Left Side:** Contains two "Answer" blocks, one labeled "RAG Bot" and the other "Generic Non-RAG Bot."
2. **Top-Right:** Features a passage titled "Ask Swami with Swami Sarvapriyananda | Nov 19th, 2017."
3. **Bottom-Right:** Presents an excerpt titled "Introduction to Vedanta - Swami Sarvapriyananda - Aparokshanubhuti - Part 17 - November 29, 2016."
### Detailed Analysis or Content Details
**1. RAG Bot Answer:**
* **Title:** Do our senses report reality to us?
* **Answer:** RAG Bot
* **Content:** The RAG Bot's answer states that, according to Swami Sarvapriyananda, our senses do not report reality accurately. This conclusion is supported by modern cognitive science and evolutionary science. The answer uses the example of a virtual reality movie to illustrate how we can become immersed in an experience that is not real. It references Advaita Vedanta and Shankaracharya to explain that our ignorance of our true nature leads us to identify ourselves with our mind and body, causing us to experience the external world and subjective feelings. The answer concludes that our senses provide a subjective experience of the world, and understanding Vedanta can help us understand the illusory nature of our sensory perceptions.
**2. Generic Non-RAG Bot Answer:**
* **Title:** Answer
* **Answer:** Generic Non-RAG Bot
* **Content:** The Generic Non-RAG Bot identifies itself as an AI assistant trained on a vast amount of data, including Swami Sarvapriyananda's YouTube talks. It states that our senses do not directly report reality but provide a filtered and interpreted version. It explains that our senses are limited and influenced by various factors, including our mental state, biases, and cultural conditioning. The answer emphasizes that our perception is a subjective experience created by our brain and that our senses should be understood as tools that provide a certain level of information about the world. It concludes that understanding the limitations and potential for distortion in our perception can help us develop a more nuanced and humble approach to our perception of the world.
**3. Ask Swami with Swami Sarvapriyananda | Nov 19th, 2017:**
* **Title:** Ask Swami with Swami Sarvapriyananda | Nov 19th, 2017
* **Content:** This passage is a transcript of a question-and-answer session with Swami Sarvapriyananda. It begins with an introduction to the session and a question from Ajoy about modern cognitive scientists veering towards the idea that consciousness is fundamental. The questioner references the sensory systems of honeybees and bats as examples of how different organisms perceive the world differently. The passage concludes with the statement that modern cognitive science and evolutionary science suggest that our senses do not report reality to us.
**4. Introduction to Vedanta - Swami Sarvapriyananda - Aparokshanubhuti - Part 17 - November 29, 2016:**
* **Title:** Introduction to Vedanta - Swami Sarvapriyananda - Aparokshanubhuti - Part 17 - November 29, 2016
* **Content:** This passage introduces a lecture by Swami Sarvapriyananda on Vedanta. It references an article in The Atlantic magazine by a cognitive scientist named Hoffman, who argues that what our senses reveal to us is illusory. The passage discusses the idea that our senses have been generated by evolution to give us an evolutionary advantage. It argues that if our senses did not report reality, then animals would not survive. However, it also states that the senses have evolved to make us fit for survival, not necessarily to report reality accurately.
### Key Observations
* Both the RAG Bot and the Generic Non-RAG Bot agree that our senses do not directly report reality.
* The passages reference Swami Sarvapriyananda and Advaita Vedanta as sources for this perspective.
* The passages discuss the role of evolution in shaping our sensory perceptions.
* The passages highlight the subjective nature of our sensory experiences.
### Interpretation
The image presents a philosophical discussion about the nature of reality and the role of our senses in perceiving it. The central argument is that our senses do not provide an accurate representation of reality but rather a filtered and interpreted version. This perspective is supported by modern cognitive science, evolutionary science, and the teachings of Advaita Vedanta. The passages suggest that our sensory experiences are subjective and influenced by various factors, including our mental state, biases, and cultural conditioning. The ultimate goal is to encourage a more nuanced and humble approach to our perception of the world, recognizing the limitations and potential for distortion in our sensory experiences.
</details>
Figure 10: RAG vs Non-RAG: Senses and reality. In this scenario, the retrieved passages provide only partial context needed to answer the query, with additional irrelevant information. For example, one passage mentions cognitive scientist Donald Hoffman, inadvertently influencing the LLM to generate a response linking the Swami's views to Hoffman's concept of the world as a virtual reality. This is an instance where the LLM diverges from the query, using incomplete information from the passage. To mitigate this, implementing retrieval models capable of processing longer contexts and summarizing them effectively before input to the LLM could be beneficial. Despite this, the rest of the response aligns reasonably well with Advaita Vedanta philosophy, similar to the generic bot's response, which is also congruent with the philosophical context.
<details>
<summary>Image 11 Details</summary>

### Visual Description
## Text Analysis: Dream State and Reality
### Overview
The image presents a question about the reality of dreams and provides answers from two different sources: a RAG Bot and a Generic Non-RAG Bot. It also includes relevant passages from the Mandukya Upanishad, offering philosophical insights into the nature of dreams and waking states.
### Components/Axes
* **Title:** Dreams feel absolutely real while dreaming. What is the guarantee that I am not fast asleep in my bed and dreaming right now?
* **Sections:**
* RAG Bot Answer
* Generic Non-RAG Bot Answer
* Passages (containing excerpts from Mandukya Upanishad)
### Detailed Analysis or ### Content Details
**RAG Bot Answer:**
* Attribution: According to Swami Sarvapriyananda, referencing the insights of Gaudapada, an ancient Advaita Vedanta philosopher.
* Core Argument: There is no definitive way to differentiate between the dream state and the waking state. The distinction is fuzzy.
* Key Points:
* Dreams can feel absolutely real.
* The feeling of reality in dreams doesn't guarantee the truth or permanence of dream objects/experiences.
* Objects and experiences in the waking state are also appearances, similar to those in dreams.
* Both dream and waking states are dependent on consciousness.
* Gaudapada's teachings don't dismiss the waking state but invite questioning of reality.
* No guarantee that we are not currently dreaming.
**Generic Non-RAG Bot Answer:**
* Core Argument: It's difficult to determine with certainty whether we are dreaming or awake.
* Key Points:
* Introduces the concept of reality testing.
* Reality testing involves examining the characteristics of current experience.
* Examples of reality tests: checking the time, looking for inconsistencies, attempting impossible actions.
* Reality testing is not foolproof.
* Lucid dreaming allows awareness and control within dreams.
* Ultimately, no guarantee is provided, but reality testing can increase confidence.
**Passages:**
* **20. Mandukya Upanishad | Chapter 2 Karika 6 | Swami Sarvapriyananda:**
* Focuses on the unreality of dream objects and their origin in the mind.
* Compares dream experiences to waking experiences, suggesting that waking entities also do not exist apart from consciousness.
* Emphasizes the role of witness consciousness.
* **4. Mandukya Upanishad | Chapter 1 Mantra 5-6 | Swami Sarvapriyananda:**
* The dream state is a waking state for the dreamer.
* Objects in dreams feel absolutely real and not temporary.
* If waking state objects are like dream objects, then relatives would be mere ideas.
* Relatives in waking/dream states are as real as the "I" or yourself.
* Advaita's answer: relatives are appearances, they are dreams.
### Key Observations
* The RAG Bot provides a philosophical answer based on Advaita Vedanta, emphasizing the illusory nature of both dream and waking states.
* The Generic Non-RAG Bot offers a practical approach using reality testing.
* The passages from the Mandukya Upanishad support the philosophical perspective, highlighting the role of consciousness and the unreality of dream objects.
### Interpretation
The image explores the age-old question of the nature of reality and the distinction between dreams and waking life. The RAG Bot's answer, grounded in Advaita Vedanta, suggests that both states are ultimately dependent on consciousness and that there is no absolute guarantee of being awake. The Generic Non-RAG Bot offers a more pragmatic approach, suggesting reality testing as a means to increase confidence in one's state of awareness. The passages from the Mandukya Upanishad provide further philosophical context, emphasizing the illusory nature of dream objects and the role of consciousness in both dream and waking experiences.
The juxtaposition of these different perspectives highlights the complexity of the question and the lack of a definitive answer. It suggests that our understanding of reality is ultimately subjective and dependent on our individual consciousness. The image encourages a deeper exploration of the nature of reality and our assumptions about it.
</details>
Figure 11: RAG vs Non-RAG: Waking and dreaming. The passages retrieved for the query are relevant but only offer a segment from a broader context. As stated earlier, this can be mitigated by using more advanced retrieval models that summarize longer contexts before input to the LLM. The RAG bot's response, while relevant, is lengthy and complex, making it challenging to comprehend. The generic bot's response although more concise and coherent, is completely general and has no relation to the Advaita Vedanta philosophy.
Figure 12: Keyword-RAG vs Standard-RAG: Anecdotal. The keyword-based retriever accurately retrieves the correct snippet, while the dense retriever confuses 'Shivaratri' with 'Shivamayananda', possibly owing to similar embeddings and completely misses the context. As a result, the generated standard RAG answer is meaningless and lacks substance. The keyword-RAG model identifies the two key incidents relating to Shivaratri correctly and effectively summarizes them.
<details>
<summary>Image 12 Details</summary>

### Visual Description
## Text Analysis: Anecdotal Query Response
### Overview
The image presents a comparison of two different approaches to answering a query about whether Swami narrates any incident surrounding Shivaratri. The approaches are "Keyword-based RAG" and "Standard RAG." The image also includes the category of the query ("Anecdotal") and keywords used.
### Components/Axes
* **Category:** Anecdotal
* **Query:** Does Swami narrate any incident surrounding Shivaratri?
* **Keywords:**
* Human-annotated: Shivaratri
* Model ensemble: Shivaratri
* **Keyword-based RAG:** A block of text providing an answer to the query, highlighting the presence of an incident related to Shivaratri in Swami Sarvapriyananda's lecture. It also includes a source citation.
* **Standard RAG:** A block of text providing an answer to the query, mentioning an incident involving Swami Shivamayanji, but not explicitly mentioning Shivaratri. It also includes a source citation.
### Detailed Analysis or Content Details
**Keyword-based RAG:**
* The text confirms that Swami Sarvapriyananda shares an incident related to the Hindu festival of Shivaratri in his lecture "Raja Yoga: The Path of Meditation (Part 1)."
* It mentions his own experience of fasting on Shivaratri when he became a monk.
* It narrates an incident where a mathematics teacher approached a Swami to ask about fasting on Shivaratri, with the Swami responding sarcastically.
* The source is identified as "Raja Yoga: The Path of Meditation (Part 1) | Swami Sarvapriyananda."
* The text contains additional details about fasting and memories related to Shivaratri.
**Standard RAG:**
* The text states that Swami Sarvapriyananda shares an incident involving Swami Shivamayanji, but it does not explicitly mention Shivaratri.
* It summarizes a story told by Swami Shivamayanji about an experience with his guru, Swami Gambhirananda, describing the nature of God realization.
* It includes a phrase in Bengali: "Amin jakhoni chok bhoji. Ishta murti. Jaljal kare bhaveshe othe."
* Translation: (Approximate) "The moment I close my eyes, the chosen deity appears."
* It mentions that Swami Shivamayanji used the story to teach young monks about expectations and the importance of their conduct.
* The source is identified as "79 Bhagavad Gita I Chapter 6 Verse 15 I Swami Sarvapriyananda."
* The text contains additional details about the experience and teachings.
### Key Observations
* The Keyword-based RAG directly addresses the query by confirming the presence of an incident related to Shivaratri.
* The Standard RAG provides a related incident but does not explicitly mention Shivaratri, focusing on the teachings and experiences of the swamis.
* Both approaches provide source citations for their information.
### Interpretation
The image demonstrates two different approaches to answering a query using Retrieval-Augmented Generation (RAG). The Keyword-based RAG focuses on directly answering the query by identifying content that explicitly mentions the keywords (Shivaratri). The Standard RAG, on the other hand, provides a related narrative that may not directly address the query but offers insights into the experiences and teachings of the swamis. The inclusion of Bengali phrases in the Standard RAG response adds cultural context and depth to the narrative. The choice between these approaches depends on the desired level of specificity and the goal of the information retrieval process.
</details>
Figure 13: Keyword-RAG vs Standard-RAG: Terminology. The keyword-RAG model retrieves a comprehensive exposition on the concept of Shraddha, loosely translated as conviction, in the context of qualifications for the study of Advaita Vedanta. The standard RAG although retrieves a passage containing the word, it is however not directly related to what the questioner intends. This seems to be an unfortunate case of false positive for standard RAG due to inadequate or implied meaning in the query.
<details>
<summary>Image 13 Details</summary>

### Visual Description
## Text Analysis: Significance of 'Shraddha'
### Overview
The image presents a comparative analysis of the term 'Shraddha' from two perspectives: a keyword-based approach and a standard reference approach. It includes the definition, context, and source of the term from different texts and viewpoints.
### Components/Axes
* **Category:** Terminology
* **Query:** What is the significance of the word 'Shraddha'?
* **Keywords:**
* Human-annotated: Shraddha
* Model ensemble: Shraddha, significance
* **Sections:**
* Keyword-based RAG (left side, green background)
* Standard RAG (right side, red background)
### Detailed Analysis or ### Content Details
**Keyword-based RAG (Left Side):**
* **Definition and Context:**
* 'Shraddha' in Swami Sarvapriyananda's teachings is significant in learning and understanding spiritual philosophies.
* Translated as 'faith' in English, but goes beyond simple belief or trust.
* Implies reverence, respect, and acceptance for the teacher, teachings, and texts.
* Essential for meaningful investigation or learning.
* Allows approaching the subject with an open mind, rather than to judge or dismiss.
* In its absence, one might jump to conclusions and create barriers to understanding.
* With Shraddha, one approaches with humility, acknowledging the value in teachings.
* **Source:**
* Bhagavad Gita | Chapter 10 Verse 16-22 | Swami Sarvapriyananda
* **Additional Points:**
* Devotional approach cultivates love and faith in God.
* Starts with Shraddha and absolute acceptance.
* If one questions the existence of God, the path of devotion is not for them.
* Hinduism offers a spectrum of other paths.
* Jnani's path requires questioning.
* Clarity is important at every step.
* Teachers can teach through questions.
* Analogy of leaky, upside-down, and dirty pots.
* Traditional teachers insisted on Shraddha.
* Modern approach equates Shraddha with faith.
* Necessary for investigation.
* Necessary to have Shraddha on the syllabus and the text and the teacher.
* Shraddha is necessary to listen to something that's all wrong.
* Shraddha helps in many ways.
* Shraddha is a deep feeling of reverence for the teacher and the material.
* Approach with Shraddha is to seek understanding.
**Standard RAG (Right Side):**
* **Definition and Context:**
* 'Shraddha' has a significant meaning in the context of the Mandukya Upanishad and Hindu traditions.
* Sanskrit word referring to funeral rites for departed ancestors.
* Carries psychological and spiritual connotations.
* Psychologically, signifies letting go for those left behind.
* Helps the living come to terms with loss.
* Spiritually, performed to ensure the deceased embarks on a higher path.
* Prayers, worship, and rituals add to their store of good karma.
* Acts as a booster rocket for spiritual evolution.
* Symbolizes psychological release and spiritual boost for departed ancestors.
* **Source:**
* Mandukya Upanishad | Chapter 1 Mantra 12 & Karika 22-23 | Swami Sarvapriyananda
* **Additional Points:**
* Remember Mandukya Upanishad suddenly and wake up with a cold sweat, but it will attract you.
* Genuine religion or spirituality pulls you powerfully.
* All paths lead to the same reality.
* One will start practicing from wherever they left off.
* Shraddha is a Sanskrit word meaning the funeral rites.
* Psychological reason for letting go.
* For those transmigrating, let them have a higher gati, a higher path.
* Prayers and worship add to their store of good karma.
* Booster rocket to a better world.
* Prayers for the dead.
### Key Observations
* The keyword-based RAG focuses on 'Shraddha' as a foundation for learning and understanding, emphasizing reverence and acceptance.
* The standard RAG focuses on 'Shraddha' as a concept related to funeral rites and the afterlife, emphasizing psychological release and spiritual boost.
* Both approaches acknowledge Swami Sarvapriyananda as a source.
### Interpretation
The image presents two distinct interpretations of 'Shraddha.' The keyword-based approach emphasizes its role in intellectual and spiritual growth, while the standard approach highlights its significance in rituals related to death and the afterlife. The juxtaposition of these perspectives provides a comprehensive understanding of the term's multifaceted nature within Hindu traditions. The keyword-based RAG seems to be more focused on the living and the process of learning, while the standard RAG is focused on the dead and the process of letting go.
</details>
<details>
<summary>Image 14 Details</summary>

### Visual Description
## Comparative Analysis: Buddhist Shunyavada vs. Advaita Vedanta
### Overview
The image presents a comparative analysis of Buddhist Shunyavada and Advaita Vedanta, focusing on their differing understandings of the self. It includes information extracted using Keyword-based RAG and Standard RAG methods, along with source texts from Advaita Vedanta and Mandukya Upanishad.
### Components/Axes
* **Category:** Comparative
* **Query:** What is the main difference between Buddhist Shunyavada and Advaita Vedanta?
* **Keywords:**
* Human-annotated: Buddhist Shunyavada, Advaita
* Model ensemble: difference, buddhist shunyavada, advaita vedanta
* **Sections:**
* Keyword-based RAG (left side)
* Standard RAG (right side)
* **Sources:**
* Advaita Vedanta (Talk 1) - Swami Sarvapriyananda (left side)
* 65. Mandukya Upanishad | Chapter 4 Karika 74-78 | Swami Sarvapriyananda (right side)
### Detailed Analysis or Content Details
**Keyword-based RAG (Left Side):**
* **Main Difference:** The primary distinction lies in their understanding of the self. Advaita Vedanta considers the self as unchanging and permanent (Atman), while Buddhist Shunyavada denies the existence of an unchanging self, positing Anatman or Anatta ("no self" or "no permanent self").
* **Impermanence:** Advaita Vedanta acknowledges a permanent self underlying the changing body-mind complex. Shunyavada asserts that everything, including the body-mind complex, is impermanent and momentary, thus no permanent self exists.
* **Shared Truths:** Despite differences, both philosophies share some common ground. Advaitins agree with Buddhists that "the body is not the self," and both acknowledge that mind, thoughts, feelings, and sensations do not constitute a permanent self.
* **Linguistic Approximations:** The contrasting viewpoints might be different linguistic approximations of the same truth, stemming from the language and ideas of the final developments of Buddhist philosophy (Shunyavada of Nagarjuna) and Advaita Vedanta sharing similarities.
* **Source Text (Advaita Vedanta Talk 1):** A dialogue discussing the differences between the non-duality of Advaita Vedanta and Buddhism. It highlights the differing views on the self (unchanging vs. changing) and references concepts like Anityam (impermanence), Kshanikam (momentariness), and Shunyam (emptiness). It also mentions the Atman in Advaita Vedanta.
**Standard RAG (Right Side):**
* **Main Difference:** The main difference lies in the interpretations of Buddhist Shunyavada (as presented by Nagarjuna) and Advaita Vedanta, and how they are perceived by traditional followers of each philosophy.
* **Traditional Advaitin View:** Traditional Advaitins often dismiss Shunyavada as nihilism, suggesting it posits nothingness or the absence of existence. They argue that Nagarjuna's philosophy denies the existence of all things, which is considered an unfair characterization. Nagarjuna and Chandrakirti's texts clarify that they are not nihilists.
* **Modern Buddhist View:** Modern Tibetan Buddhists and other Buddhist scholars sometimes misinterpret Advaita Vedanta as positing a God-like, eternal substance called Brahman. However, this is not an accurate representation. Gaudapada asserts that the world is a mere appearance (vivarta).
* **Summary:** The main difference can be attributed to misinterpretations and oversimplifications by traditional followers on both sides. Traditional Advaitins accuse Shunyavada of nihilism, while modern Tibetan Buddhists misunderstand Advaita Vedanta as believing in an eternal, God-like substance.
* **Source Text (Mandukya Upanishad):** Discusses the Advaitic view of Nagarjuna's Madhyamaka Shunyavada and how it is sometimes misplaced by traditional Advaitins. It mentions that traditional Advaitins have dismissed Nagarjuna's Shunyavada as nothing, which is considered unfair. It also notes that traditional Buddhists may misinterpret Advaita as talking about an eternal substance called Brahman.
### Key Observations
* Both RAG methods identify the understanding of the "self" as a key point of divergence between the two philosophies.
* The Keyword-based RAG focuses more on the core philosophical tenets, while the Standard RAG emphasizes the interpretations and misinterpretations by followers of each tradition.
* Both sides acknowledge potential misinterpretations and oversimplifications in understanding the other's philosophy.
### Interpretation
The image highlights the nuanced differences between Buddhist Shunyavada and Advaita Vedanta. While both philosophies address the nature of reality and the self, their approaches and interpretations differ significantly. The Keyword-based RAG provides a concise overview of the core philosophical differences, while the Standard RAG delves into the historical and interpretative aspects, revealing how each tradition has sometimes misunderstood the other. The inclusion of source texts provides context and allows for a deeper understanding of the original philosophical arguments. The analysis suggests that a deeper understanding of both philosophies requires moving beyond simplistic interpretations and engaging with the original texts and historical context.
</details>
Figure 14: Keyword-RAG vs Standard-RAG: Comparative. This is a case where both models retrieve meaningful passages, with the keyword-based model providing greater context due to retrieval expansion. The standard model also seems to be using outside knowledge not mentioned in the passage.
Figure 15: Keyword-RAG vs Standard-RAG: Scriptural. Both models retrieve from the correct scripture. However, while the retrieved passage in keyword-RAG is very comprehensive and accurately captures the context of the question, the standard RAG misses the point of the question and gives a generic answer.
<details>
<summary>Image 15 Details</summary>

### Visual Description
## Text Analysis: Significance of 'Om' in Mandukya Upanishad
### Overview
The image presents three text blocks answering the query: "In Mandukya Upanishad, what is the significance of the word 'Om'?" These blocks are categorized as "Keyword-based RAG," "Standard RAG," and "Keywords" with human and model annotations. The text explores the meaning of 'Om' within the context of the Mandukya Upanishad, referencing related concepts and scriptures.
### Components/Axes
* **Category:** Scriptural
* **Query:** In Mandukya Upanishad, what is the significance of the word 'Om'?
* **Keyword-based RAG:** A text block providing an answer to the query.
* **Source:** 48. Mandukya Upanishad | Chapter 3 Karika 37-39 | Swami Sarvapriyananda
* **Keywords:**
* **Human-annotated:** Om significance, Mandukya
* **Model ensemble:** Om, Mandukya Upanishad
* **Standard RAG:** A text block providing an answer to the query.
* **Source:** 13. Mandukya Upanishad | Chapter 1 Mantra 9-11 | Swami Sarvapriyananda
### Detailed Analysis or Content Details
**Keyword-based RAG:**
* 'Om' represents the entirety of our experience of the universe, encompassing gross, subtle, and causal levels.
* The letters 'A', 'U', and 'Ma' correspond to the waking, dreaming, and deep sleep states, respectively.
* 'A' represents the physical world and the waker.
* 'U' signifies the subtle level of mind, intellect, and memory.
* 'Ma' symbolizes the causal level where everything is withdrawn.
* Turiya is beyond these three levels, representing silent consciousness.
* 'Om' signifies the complete universe and points to the reality of Turiya.
* The real meaning of Om is after Om, the silence after Om, which represents turiya, the pure consciousness.
* Om includes the silence after the Om, which represents turiya.
* Brahman is not the world, but the world is nothing other than brahman.
**Keywords:**
* Human-annotated: Om significance, Mandukya
* Model ensemble: Om, Mandukya Upanishad
**Standard RAG:**
* 'Om' holds immense significance and is considered a Mahavakya.
* The Mahavakya is "I am Atma Brahma," meaning "This very self is Brahman."
* 'Om' is a symbol of pure consciousness and the ultimate reality.
* Meditating on 'Om' superimposes the idea of the universe and the divine.
* 'Om' signifies the ultimate reality and is a powerful symbol for meditation.
* The pure consciousness is represented by Om.
### Key Observations
* Both "Keyword-based RAG" and "Standard RAG" emphasize the significance of 'Om' in the Mandukya Upanishad.
* The "Keyword-based RAG" provides a more detailed breakdown of the components of 'Om' and their relation to different states of consciousness.
* The "Standard RAG" focuses on 'Om' as a symbol of ultimate reality and its role in meditation.
* Both texts reference the concept of Turiya as being beyond the levels represented by 'Om'.
### Interpretation
The image presents two distinct explanations of the significance of 'Om' within the Mandukya Upanishad. Both explanations agree on the importance of 'Om' as a representation of the universe and consciousness. The "Keyword-based RAG" offers a more analytical breakdown, while the "Standard RAG" emphasizes the symbolic and meditative aspects. The inclusion of "Keywords" highlights the central themes of Om, Mandukya, and their significance. The texts suggest that 'Om' is not merely a sound but a profound symbol that encapsulates the entirety of existence and points towards a deeper, transcendental reality.
</details>
<details>
<summary>Image 16 Details</summary>

### Visual Description
## Text Comparison: Waking State vs. Dream State
### Overview
The image presents a comparison of perspectives on the similarity between the waking state and the dream state, using different approaches to information retrieval. It includes a query, keywords, and responses generated by "Keyword-based RAG" and "Standard RAG" methods, along with human-annotated keywords and model ensemble keywords.
### Components/Axes
* **Category:** Reasoning
* **Query:** Is the waking state similar to a dream or absolutely indistinguishable from the dream state?
* **Keywords:**
* Human-annotated: waking, dream, indistinguishable
* Model ensemble: indistinguishable, dream state, waking state similar
* **Sections:**
* Keyword-based RAG
* Standard RAG
### Detailed Analysis or ### Content Details
**Keyword-based RAG:**
* **Content:** This section provides a response to the query based on a keyword-based retrieval approach. It references Swami Sarvapriyananda's lecture on Vedanta, stating that the waking state can be compared to a dream because both are temporary and illusory. It introduces the concepts of "vyavaharika jiva" (transactional self) and "pratibhasika jiva" (illusory self). It acknowledges a difference between the two states, noting that the waking state is experienced as more real and permanent, but both are illusory from Vedanta's perspective. It mentions the merging of "jivas" upon waking or enlightenment. It also references the Mandukya Upanishad, where Gaudapada considers the waking state a dream.
* **Source:** Vedanta Retreat at Loon Lake on Who am I by Swami Sarvapriyananda Lecture 11.
* **Additional Text:** A longer excerpt from the source lecture is included, elaborating on the concepts of "jiva," Brahman, and the nature of reality in different states of consciousness. It includes Sanskrit terms and references to other spiritual figures and texts.
**Standard RAG:**
* **Content:** This section provides a response to the query using a standard retrieval-augmented generation approach. It states that the waking state is not absolutely indistinguishable from the dream state due to differences. However, it suggests the dream state can be considered similar to a waking state for the dreamer. It highlights that objects and experiences feel real and permanent within their respective contexts in both states. It notes that in the waking state, objects and experiences are generally consistent, while in the dream state, they change. It mentions Swami Sarvapriyananda's explanation of Advaita Vedanta's nuanced approach, acknowledging similarities and differences between the states. It uses the perception of relatives as an example.
* **Source:** 4. Mandukya Upanishad | Chapter 1 Mantra 5-6 | Swami Sarvapriyananda.
* **Additional Text:** An excerpt from the Mandukya Upanishad is included, discussing the nature of dreams and waking states. It suggests that the dream state is a waking state for the dreamer and that objects in dreams feel real. It addresses the question of how dream relatives differ from waking relatives, stating that both are real to the individual experiencing them.
### Key Observations
* Both RAG approaches acknowledge similarities between the waking and dream states, but also highlight differences.
* The Keyword-based RAG focuses more on the illusory nature of both states from a Vedanta perspective.
* The Standard RAG emphasizes the feeling of reality and permanence within each state's context.
* Both sections reference Swami Sarvapriyananda and the Mandukya Upanishad.
* The additional text in the Keyword-based RAG provides a more extensive explanation of related concepts.
### Interpretation
The image demonstrates how different information retrieval and generation methods can provide varying perspectives on a philosophical question. The Keyword-based RAG, by focusing on specific terms, delivers a response rooted in Vedanta's concept of illusion. The Standard RAG offers a more balanced view, acknowledging both similarities and differences based on the individual's experience. The inclusion of human-annotated and model ensemble keywords suggests an attempt to guide the retrieval process and evaluate the quality of the generated responses. The image highlights the complexity of the question and the potential for different interpretations depending on the approach and source material used.
</details>
Figure 16: Keyword-RAG vs Standard-RAG: Reasoning. The retrieved passage in keyword-RAG is technical and comprehensive and the generated answer effectively summarizes the main points. The standard model is also good, although the explanation is not as effective owing to the quality of retrieval.