## Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy
## Priyanka Mandikal
Department of Computer Science, UT Austin mandikal@utexas.edu
Figure 1: The dissemination of knowledge through the ages. Over time, methods of storing and transmitting knowledge have evolved from oral teachings to computers and the internet, significantly increasing the distribution and proliferation of human knowledge. The emerging LLM technology represents a new paradigm shift in this process.
<details>
<summary>Image 1 Details</summary>

### Visual Description
\n
## Diagram: Evolution of Knowledge Dissemination
### Overview
The image is a diagram illustrating the historical progression of knowledge dissemination methods, from oral teachings to Large Language Models (LLMs). It depicts five stages, each represented by an icon within a grey square, arranged horizontally and connected by arrows indicating a sequential flow over time. The diagram also includes labels indicating the relationship between technological advancements and knowledge distribution.
### Components/Axes
* **Horizontal Axis:** "Time" - indicated by an arrow pointing from left to right at the bottom of the diagram.
* **Stages (from left to right):**
1. "Oral teachings" - Icon: People gathered around a tree.
2. "Hand-written manuscripts" - Icon: A rolled scroll.
3. "Printing press" - Icon: A traditional printing press.
4. "Computers and the Internet" - Icon: A desktop computer with a globe on the screen.
5. "LLMs" - Icon: A neural network diagram.
* **Arrows:** Black arrows connecting each stage, indicating the flow of progression.
* **Labels:**
* "Technological advancements" - positioned below the first two stages, with an upward pointing arrow.
* "Knowledge distribution & proliferation" - positioned below the last two stages, with an upward pointing arrow.
### Detailed Analysis or Content Details
The diagram presents a linear progression of knowledge dissemination methods.
1. **Oral teachings:** The leftmost stage depicts a group of people gathered around a tree, symbolizing the earliest form of knowledge transfer through spoken word.
2. **Hand-written manuscripts:** The second stage shows a rolled scroll, representing the transition to recording knowledge on physical materials.
3. **Printing press:** The third stage features a printing press, signifying the mechanization of knowledge reproduction and wider distribution.
4. **Computers and the Internet:** The fourth stage illustrates a computer with a globe, representing the digital revolution and global connectivity.
5. **LLMs:** The rightmost stage displays a neural network diagram, symbolizing the latest advancement in knowledge processing and generation.
The arrows indicate a clear sequential flow from one stage to the next, demonstrating how each advancement built upon its predecessor. The labels "Technological advancements" and "Knowledge distribution & proliferation" highlight the driving forces behind this evolution.
### Key Observations
The diagram emphasizes the increasing speed and scale of knowledge dissemination over time. Each stage represents a significant leap in the ability to record, reproduce, and share information. The final stage, LLMs, suggests a shift towards automated knowledge processing and generation.
### Interpretation
The diagram illustrates a historical narrative of how humanity has progressively improved its ability to acquire, store, and share knowledge. It suggests that technological advancements are intrinsically linked to the expansion of knowledge distribution. The progression from oral traditions to LLMs demonstrates a trend towards increasing accessibility, scalability, and automation in the realm of knowledge. The diagram implies that LLMs represent a potentially transformative stage in this evolution, with the capacity to not only disseminate but also generate and synthesize knowledge. The diagram is a simplified representation of a complex process, but it effectively conveys the core idea of continuous improvement in knowledge management and dissemination.
</details>
## Abstract
LLMs have revolutionized the landscape of information retrieval and knowledge dissemination. However, their application in specialized areas is often hindered by factual inaccuracies and hallucinations, especially in long-tail knowledge distributions. We explore the potential of retrieval-augmented generation (RAG) models for long-form question answering (LFQA) in a specialized knowledge domain. We present VedantaNY-10M, a dataset curated from extensive public discourses on the ancient Indian philosophy of Advaita Vedanta. We develop and benchmark a RAG model against a standard, non-RAG LLM, focusing on transcription, retrieval, and generation performance. Human evaluations by computational linguists and domain experts show that the RAG model significantly outperforms the standard model in producing factual and comprehensive responses having fewer hallucinations. In addition, a keyword-based hybrid retriever that emphasizes unique low-frequency terms further improves results. Our study provides insights into effectively integrating modern large language models with ancient knowledge systems.
Proceedings of the 1st Machine Learning for Ancient Languages Workshop , Association for Computational Linguistics (ACL) 2024 . Dataset, code, and evaluation is available at: https://sites.google.com/view/vedantany-10m
ॐके ने�षतं पत�त प्रे�षतं मनः || 1.1 || यन्मनसा न मनुते येनाहुम�नो मतम् । तदेव ब्रह्म त्वं �व�� नेदं य�ददमुपासते || 1.6 ||
- "By whom willed and directed, does the mind alight upon its objects?"
- "What one cannot comprehend with the mind, but by which they say the mind comprehends, know that alone to be Brahman, not this which people worship here."
- Kena Upanishad, >3000 B.C.E.
## 1 Introduction
Answer-seeking has been at the heart of human civilization. Humans have climbed mountains and crossed oceans in search of answers to the greatest questions concerning their own existence. Over time, ancient wisdom has travelled from the silent solitude of mountain caves and forest hermitages into the busy cities and plains of the world. Technology has played a major role in this transmission, significantly increasing the distribution and proliferation of human knowledge. In recent times, large language models (LLMs) trained on large swathes of the internet have emerged as de-facto questionanswering machines. Recent studies on the societal impact of LLMs (Malhotra, 2021; Yiu et al., 2023) highlight their growing significance as cultural technologies. Analogous to earlier technologies like writing, print, and the internet, the power of LLMs can be harnessed meaningfully to preserve and disseminate human knowledge (Fig. 1).
Generic LLMs have proven to be highly effective for broad knowledge domains. However, they often struggle in niche and less popular areas, encountering issues such as factual inaccuracies and hallucinations in long-tail knowledge distributions (Kandpal et al., 2023; Mallen et al., 2023). Moreover, their inability to verify responses against authentic sources is particularly problematic in these domains, where LLMs can generate highly inaccurate answers with unwarranted confidence (Kandpal et al., 2023; Menick et al., 2022). In response to these limitations, there has been growing interest in retrieval-augmented generation (RAG) models (Karpukhin et al., 2020; Lewis et al., 2020b; Izacard et al., 2022; Ram et al., 2023). These models integrate external datastores to retrieve relevant knowledge and incorporate it into LLMs, demonstrating higher factual accuracy and reduced hallucinations compared to conventional LLMs (Shuster et al., 2021; Borgeaud et al., 2022). Updating these external datastores with new information is also more efficient and cost-effective than retraining LLMs. In this vein, we argue that RAG models show immense potential for enhancing study in unconventional, niche knowledge domains that are often underrepresented in pre-training data. Their ability to provide verified, authentic sources when answering questions is particularly advantageous for end-users.
In this work, we develop and evaluate a RAGbased language model specialized in the ancient Indian philosophy of Advaita Vedanta (Upanishads, >3000 B.C.E.; Bhagavad Gita, 3000 B.C.E.; Shankaracharya, 700 C.E.). To ensure that the LLMhas not been previously exposed to the source material, we construct VedantaNY-10M, a custom philosophy dataset comprising transcripts of over 750 hours of public discourses on YouTube from Vedanta Society of New York. We evaluate standard non-RAG and RAG models on this domain and find that RAG models perform significantly better. However, they still encounter issues such as irrelevant retrievals, sub-optimal retrieval passage length, and retrieval-induced hallucinations. In early attempts to mitigate some of these issues, we find that traditional sparse retrievers have a unique advantage over dense retrievers in niche domains having specific terminology-Sanskrit terms in our case. Consequently, we propose a keyword-based hybrid retriever that effectively combines sparse and dense embeddings to upsample low-frequency or domain-specific terms.
We conduct an extensive evaluation comprising both automatic metrics and human evaluation by computational linguists and domain experts. The models are evaluated along three dimensions: transcription, retrieval, and generation. Our findings are twofold. First, RAG LLMs significantly outperform standard non-RAG LLMs along all axes, offering more factual, comprehensive, and specific responses while minimizing hallucinations, with an 81% preference rate. Second, the keyword-based hybrid RAG model further outperforms the standard deep-embedding based RAG model in both automatic and human evaluations. Our study also includes detailed long-form responses from the evaluators, with domain experts specifically indicating the likelihood of using such LLMs to supplement their daily studies. Our work contributes to the broader understanding of how emerging technologies can continue the legacy of knowledge preservation and dissemination in the digital age.
## 2 Related Work
Language models for ancient texts Sommerschield et al. (2023) recently conducted a thorough survey of machine learning techniques applied to the study and restoration of ancient texts. Spanning digitization (Narang et al., 2019; Moustafa et al., 2022), restoration (Assael et al., 2022), attribution (Bogacz and Mara, 2020; Paparigopoulou et al., 2022) and representation learning (Bamman and Burns, 2020), a wide range of use cases have benefitted from the application of machine learning to study ancient texts. Recently, Lugli et al. (2022) released a digital corpus of romanized Buddhist Sanskrit texts, training and evaluating embedding models such as BERT and GPT-2. However, the use of LLMs as a question-answering tool to enhance understanding of ancient esoteric knowledge systems has not yet been systematically studied. To the best of our knowledge, ours is the first work that studies the effects of RAG models in the niche knowledge domain of ancient Indian philosophy.
Retrieval-Augmented LMs. In current LLM research, retrieval augmented generation models (RAGs) are gaining popularity (Izacard et al., 2022; Ram et al., 2023; Khandelwal et al., 2020; Borgeaud et al., 2022; Menick et al., 2022). A key area of development in RAGs has been their architecture. Early approaches involved finetuning the language model on open-domain questionanswering before deployment. MLM approaches such as REALM (Guu et al., 2020) introduced a two-stage process combining retrieval and reading, while DPR (Karpukhin et al., 2020) focused on pipeline training for question answering. RAG (Lewis et al., 2020b) used a generative approach with no explicit language modeling. Very recently, in-context RALM (Ram et al., 2023) showed that retrieved passages can be used to augment the input to the LLM in-context without any fine-tuning like prior work. In this work, we adopt the in-context retrieval augmented methodology similar to (Ram et al., 2023), where neither the retriever nor the generator is fine-tuned. This also enables us to use any combination of retrieval and generation models that best suits our application.
Applications of RAGs. The applications of RAGs are diverse and evolving. ATLAS (Izacard et al., 2022) and GopherCite (Menick et al., 2022) have shown how fine-tuning and reinforcement learning from human feedback can enhance RAGs' ability to generate verifiable answers from reliable sources. Prompting techniques have also seen innovation. kNNPrompt (Shi et al., 2022) extended kNN-LM for zero or few-shot classification tasks, and retrieval in-context approaches (Ram et al., 2023; Shi et al., 2023) have proven effective in utilizing retrieval at the input stage. Retrieval-LMs have been shown to be particularly valuable for handling longtail or less frequent entities (Kandpal et al., 2023; Mallen et al., 2023), updating knowledge (Izacard et al., 2022), improving parameter efficiency (Izacard et al., 2022; Mallen et al., 2023), and enhancing verifiability (Bohnet et al., 2022), making them increasingly relevant in a wide range of applications. In our work, we examine the application of RAGs for long-tail knowledge, conducting an extensive study on a niche knowledge domain of ancient Indian philosophy.
Evaluation of LFQA The field of long-form question answering (LFQA) is an emerging area of active research (Krishna et al., 2021; Nakano et al., 2021; Xu et al., 2023). Recently, Xu et al. (2023) conducted a thorough examination of various LFQA metrics, encompassing both human and automatic evaluation methods, and found that existing automatic metrics don't always align with human preferences. On their suggestion, we place special emphasis on conducting an extensive human evaluation utilizing the expertise of experienced computational linguists and domain experts.
Figure 2: Sanskrit terms in VedantaNY-10M. Frequently occurring Sanskrit terms in the corpus.
<details>
<summary>Image 2 Details</summary>

### Visual Description
\n
## Word Cloud: Spiritual Concepts
### Overview
The image is a word cloud composed of terms related to Hindu philosophy and spirituality. The size of each word corresponds to its frequency within the source text used to generate the cloud. There are no axes or scales, and the arrangement appears random, though larger words are more centrally located.
### Components/Axes
There are no axes or scales present. The components are solely the words themselves, varying in size and color. The colors used are primarily shades of orange, yellow, and green.
### Detailed Analysis or Content Details
The following words are visible, with approximate size estimations based on visual prominence (larger = more frequent):
* **brahman** (very large, central, orange) - Appears multiple times, dominating the cloud.
* **shanti** (large, top-left, green)
* **atman** (large, center-left, yellow)
* **yoga** (medium-large, center, green)
* **karma** (medium, top-right, yellow)
* **vedanta** (medium, bottom-right, orange)
* **jnana** (medium, bottom-center, yellow)
* **aham** (medium, top-right, yellow)
* **sat** (medium, bottom-left, orange)
* **vasudeva** (small-medium, center-right, yellow)
* **bhagavan** (small-medium, center-right, yellow)
* **pranam** (small-medium, bottom-center, yellow)
* **dharma** (small-medium, bottom-center, orange)
* **sri** (small, bottom-left, orange)
* **mithya** (small, bottom-left, orange)
* **samadhi** (small, center-left, yellow)
* **sukham** (small, center-left, yellow)
* **vairagya** (small, center-left, yellow)
* **viveka** (small, top-left, orange)
* **tvam** (small, top-left, orange)
* **ramakrishna** (small, top-left, green)
* **upanishad** (small, top-right, green)
* **tat** (small, top-right, yellow)
* **jiva** (small, center, yellow)
* **bhakti** (small, center, yellow)
* **mano** (small, center, yellow)
* **swami** (small, center, yellow)
* **gita** (small, center, yellow)
* **vritti** (small, bottom-center, yellow)
* **iti** (small, bottom-center, yellow)
* **eva** (small, bottom-center, yellow)
* **advaita** (small, top-left, green)
* **sankhya** (small, top-left, green)
* **sarvasatyam** (small, center, yellow)
* **chitta** (small, bottom-center, yellow)
* **buddhi** (small, bottom-center, yellow)
* **maharaj** (small, bottom-right, yellow)
* **drishya** (small, bottom-right, yellow)
* **asi** (small, bottom-right, yellow)
* **vande** (small, bottom-right, orange)
* **kosha** (small, bottom-right, orange)
* **jagat** (small, bottom-right, orange)
* **swastina** (small, center-right, yellow)
* **bhadram** (small, top-right, green)
* **param** (small, top-right, green)
* **tantam** (small, top-right, yellow)
* **bhavat** (small, top-right, yellow)
There are also several smaller, less prominent words that are difficult to decipher with certainty.
### Key Observations
The most prominent words are "brahman," "shanti," "atman," "yoga," and "karma," suggesting these concepts are central to the source material. The color distribution doesn't appear to have a clear thematic organization. The repetition of "brahman" is striking.
### Interpretation
This word cloud visually represents the core vocabulary of Hindu philosophical discourse. The dominance of "brahman" indicates its fundamental importance as the ultimate reality. The presence of terms like "atman" (the self), "karma" (action and consequence), and "yoga" (union) highlights the paths and concepts used to understand and realize this ultimate reality. "Shanti" (peace) suggests a desired state of being achieved through these practices. The cloud's overall impression is one of interconnectedness and the pursuit of spiritual understanding. The use of a word cloud format itself suggests a holistic, non-linear approach to these concepts, emphasizing their interwoven nature rather than a strict hierarchical structure. The image does not provide any quantitative data, but rather a qualitative representation of thematic emphasis.
</details>
## 3 The VedantaNY-10M Dataset
We first describe our niche domain dataset creation process. The custom dataset for our study needs to satisfy the following requirements: ( 1 ) Niche: Must be a specialized niche knowledge domain within the LLM's long-tail distribution. ( 2 ) Novel : The LLM must not have previously encountered the source material. ( 3 ) Authentic: The dataset should be authentic and representative of the knowledge domain. ( 4 ) Domain experts: should be available to evaluate the model's effectiveness and utility.
Knowledge domain. To satisfy the first requirement, we choose our domain to be the niche knowledge system of Advaita Vedanta, a 1300-year-old Indian school of philosophy (Shankaracharya, 700 C.E.) based on the Upanishads (>3000 B.C.E.), Bhagavad Gita (3000 B.C.E.) and Brahmasutras (3000 B.C.E.) 1 . It is a contemplative knowledge tradition that employs a host of diverse tools and techniques including analytical reasoning, logic, linguistic paradoxes, metaphors and analogies to enable the seeker to enquire into their real nature. Although a niche domain, this knowledge system has been continuously studied and rigorously developed over millenia, offering a rich and structured niche for the purposes of our study. Being a living tradition, it offers the additional advantage of providing experienced domain experts to evaluate the language models in this work.
Composition of the dataset. Considering the outlined criteria, we introduce VedantaNY-10M, a curated philosophy dataset of public discourses.
1 Currently there exists no consensus on accurately dating these ancient scriptures. The Upanishads (which are a part of the Vedas) have been passed on orally for millennia and are traditionally not given a historic date. However, they seem to have been compiled and systematically organized sometime around 3000 B.C.E. by Vyasa. Likewise, the time period of Adi Shankaracharya also varies and he is usually placed between 450 B.C.E to 700 C.E.
To maintain authenticity while ensuring that the LLM hasn't previously been exposed to the source material, we curate our dataset from a collection of YouTube videos on Advaita Vedanta, sourced from the Vedanta Society of New York. It contains 10M tokens and encompasses over 750 hours of philosophical discourses by Swami Sarvapriyananda, a learned monk of the Ramakrishna Order. These discourses provide a rich and comprehensive exposition of the principles of Advaita Vedanta, making them an invaluable resource for our research.
Languages and scripts. The dataset primarily features content in English, accounting for approximately 97% of the total material. Sanskrit, the classical language of Indian philosophical literature, constitutes around 3% of the dataset. The Sanskrit terms are transliterated into the Roman script. To accommodate the linguistic diversity and the specific needs of the study, the dataset includes words in both English and Sanskrit, without substituting the Sanskrit terms with any English translations. Translating ancient Sanskrit technical terms having considerably nuanced definitions into English is a non-trivial problem (Malhotra and Babaji, 2020). Hence, our dual-language approach ensures that the Sanskrit terms and concepts are accurately represented and accessible, thereby enhancing the authenticity of our research material. Frequently occurring Sanskrit terms in the corpus are shown in Fig. 2. For excerpts from passages, please refer to Appendix Table 2.
## 4 In-context RAG for niche domains
We now discuss the methodology adopted to build an in-context retrieval augmented chatbot from the custom dataset described above.
We first define a generic chatbot C g that does not use retrieval as follows: C g : q → a g where q is the user query and a g is the answer generated by the chatbot. Now, let D t represent the textual data corpus from our knowledge domain and R be the retriever. Our goal is to build a retrieval-augmented generation chatbot C r : q × R ( D t , q ) → a r that will generate answer a r for the query by retrieving relevant context from D t using R . An overview of our approach is illustrated in Fig. 3. We first build D t from 765 hours of public discourses on Advaita Vedanta introduced in Sec. 3. When deployed, the system processes q by first using retriever R to identify the top-k most relevant passages P from D t using a similarity metric. Subsequently, a large language model (LLM) is prompted with both the query and the retrieved passages in-context, following Ram et al. (2023), to generate a contextually relevant response.
We now describe each of the components in detail. We follow a four-stage process as follows:
Transcription. We first need to create a dense textual corpus targeted at our niche domain. Since our dataset consists of YouTube videos, we first employ a transcription model to transcribe the audio into text. Our video corpus D v consists of 612 videos totaling 765 hours of content, with an average length of 1.25 hours per video. We extract audio content from D v and transcribe it using OpenAI's Whisper large-v2 model (Radford et al., 2023). This step converts the spoken discourses into a transcribed textual corpus D t consisting of 10M tokens in total. Since Whisper is a multi-lingual model, it has the capacity to support the dual-language nature of our dataset. We evaluate the transcription quality of Whisper in Sec. C.1.
Datastore creation. The transcribed text in D t is then segmented into shorter chunks called passages P , consisting of 1500 characters each. These chunks are then processed by a deep embedder to produce deep embedding vectors z dense . These embedded chunks are stored in a vector database D z . Ultimately, we store approximately 25,000 passage embeddings z ∈ D z , each representing a discrete chunk of the philosophical discourse in D t .
Retrieval. To perform retrieval-augmented generation, we first need to build a retrieval system R : D z × q → P that retrieves contextually relevant textual passages P from D t given D z and q . The retriever performs the following operation: P = D t [ argTop -k z ∈ D z sim ( q, z )] , where we use cosine similarity as the similarity metric. Standard RAG models employ state-of-theart deep embedders to encode documents and retrieve them during inference. However, these semantic embeddings can struggle to disambiguate between specific niche terminology in custom domains (Mandikal and Mooney, 2024). This can be particularly problematic in datasets having long-tail distributions such as ours. In addition, retrieved fixed-length passages are sub-optimal. Short incomplete contexts can be particularly damaging for LFQA, while longer contexts can contain unnecessary information that can confuse the generation model. To mitigate these two issues, we experiment with two key changes: (1) a keyword-based
Figure 3: Overview of the RAG model. We present VedantaNY-10M, a dataset derived from over 750 hours of public discourses on the ancient Indian philosophy of Advaita Vedanta, and build a retrieval-augmented generation (RAG) chatbot for this knowledge domain. At deployment, given a query q , the retriever R first retrieves the top-k most relevant passages P from the datastore using a hybrid keyword-based retriever. It then refines this retrieved context using a keyword-based context reshaper to adjust the passage length. Finally, an LLM is prompted with the query and the refined passages in-context. We conduct an extensive evaluation with computational linguists and domain experts to assess the model's real-world utility and identify challenges.
<details>
<summary>Image 3 Details</summary>

### Visual Description
\n
## Diagram: Retrieval Augmented Generation (RAG) System Flow
### Overview
The image depicts a diagram illustrating the flow of information in a Retrieval Augmented Generation (RAG) system. The system consists of three main components: a Datastore (D), a Retriever (R), and a Chatbot. The Datastore contains various images related to Swami Sarvapriyananda's teachings. The Retriever extracts relevant passages from the Datastore based on a user query. The Chatbot then uses these passages to generate an answer.
### Components/Axes
The diagram is divided into three vertical sections labeled:
* **Datastore D**: Located on the left, containing images of book covers and portraits.
* **Retriever R**: Located in the center, showing the process of passage retrieval and context refinement.
* **Chatbot**: Located on the right, illustrating the Large Language Model (LLM) and answer generation.
Key labels and components within each section:
* **Datastore D**: Contains images with titles like "Mandukya Upanishad Chapter 2", "Vedantasara", "Bhagavad Gita", "Consciousness and Self in Vedanta", "Sunyam The Void", "Purnam The Full", "Models of Consciousness in Indian Philosophy", "All of None of It", "Questions and Answers". Each image also includes "Swami Sarvapriyananda" and the University of Michigan logo.
* **Retriever R**: Includes components labeled "Hybrid Retrieval: Deep + Sparse", "Keyword extraction", "Context Refiner", and a list of "Passage 1" through "Passage k".
* **Chatbot**: Includes components labeled "LLM" and "Answer".
* **Flow Arrows**: Arrows indicate the direction of information flow between components.
* **Variables**: "q" represents the Query, "p" represents the Prompt, and "a" represents the Answer.
### Detailed Analysis or Content Details
The diagram illustrates the following flow:
1. A **Query (q)** is input into the system.
2. **Keyword extraction** is performed on the query.
3. The **Hybrid Retrieval** component (Deep + Sparse) searches the **Datastore (D)** based on the extracted keywords.
4. The Retriever returns **Passage 1** through **Passage k** (a variable number of passages).
5. The **Context Refiner** processes these passages.
6. A **Prompt (p)** is generated using the refined passages.
7. The **LLM** (Large Language Model) receives the prompt.
8. The LLM generates an **Answer (a)**.
The Datastore contains images of the following:
* **Mandukya Upanishad Chapter 2 Karika-38**
* **Vedantasara** Part 3, Mantra 1.14-9
* **Bhagavad Gita** Verses 16-20
* **Consciousness and Self in Vedanta**
* **Sunyam The Void**
* **Purnam The Full**
* **Models of Consciousness in Indian Philosophy**
* **All of None of It**
* **Questions and Answers** June 7th, 2020
Each image in the Datastore is attributed to "Swami Sarvapriyananda" and the University of Michigan.
### Key Observations
The diagram highlights the key steps involved in a RAG system. The use of "Hybrid Retrieval" suggests a combination of semantic and keyword-based search methods. The "Context Refiner" component is crucial for ensuring the LLM receives relevant and concise information. The Datastore is specifically populated with content related to the teachings of Swami Sarvapriyananda.
### Interpretation
This diagram demonstrates a RAG architecture designed to answer questions based on a specific knowledge base – in this case, the teachings of Swami Sarvapriyananda. The system aims to improve the accuracy and relevance of LLM-generated responses by grounding them in factual information retrieved from the Datastore. The hybrid retrieval approach suggests an attempt to balance precision (keyword-based) and recall (semantic-based) in the search process. The Context Refiner is likely used to filter out irrelevant information and format the retrieved passages for optimal LLM consumption. The overall system is designed to provide informed and contextually relevant answers to user queries related to the subject matter contained within the Datastore. The use of a specific teacher's work suggests a focused application of RAG for a specialized domain.
</details>
hybrid retriever to focus on unique low-frequency words, and (2) a context-refiner to meaningfully shorten or expand retrieved context.
1. Keyword-based retrieval. To emphasize the importance of key terminology, we first employ keyword extraction and named-entity recognition techniques on the query q to extract important keywords κ . During retrieval, we advocate for a hybrid model combining both deep embeddings as well as sparse vector space embeddings. We encode the full query in the deep embedder and assign a higher importance to keyphrases in the sparse embedder. The idea is to have the sparse model retrieve domainspecific specialized terms that might otherwise be missed by the deep model. Our hybrid model uses a simple weighted combination of the query-document similarities in the sparse and dense embedding spaces. Specifically, we score a document D for query q and keywords κ using the ranking function:
<!-- formula-not-decoded -->
where z d and z s denotes the dense and sparse embedding functions and Sim is cosine simi- larity measuring the angle between such vector embedddings. In our experiments, we set λ = 0 . 2 . Amongst the top-n retrieved passages, we choose k passages containing the maximum number of unique keywords.
2. Keyword-based context refinement. Furthermore, we refine our retrieved passages by leveraging the extracted keywords using a heuristic-based refinement operation to produce P ′ = Ref ( P, κ ) . For extension, we expand the selected passage to include one preceding and one succeeding passage, and find the first and last occurrence of the extracted keywords. Next, we trim the expanded context from the first occurrence to the last. This can either expand or shorten the original passage depending on the placement of keywords. This ensures that retrieved context contains relevant information for the generation model.
Generation. For answer generation, we construct prompt p from the query q and the retrieved passages ( P ′ 1 , P ′ 2 , ..., P ′ k ) ∈ P in context. Finally, we invoke the chatbot C r to synthesize an answer a r from the constructed prompt. For an example of the constructed RAG bot prompt, please refer to Fig. 5. This four-stage process produces a retrieval- augmented chatbot that can generate contextually relevant responses for queries in our niche domain.
Implementation Details. For embedding and generation, we experiment with both closed and open source language models. For RAG vs non-RAG comparison, we use OpenAI's text-embeddingada-002 model (Brown et al.) as the embedder and GPT-4-turbo (OpenAI, 2023) as the LLM for both C r and C g . For comparing RAG model variants, we use the open source nomic-embed-textv1 (Nussbaum et al., 2024) as our deep embedder and Mixtral-8x7B-Instruct-v0.1 (Jiang et al., 2024) as our generation model. For keyword extraction, we use an ensemble of different models including OpenKP (Xiong et al., 2019), KeyBERT (Grootendorst, 2020) and SpanMarker (Aarsen, 2020). We experimented with using language models such as ChatGPT for keyword extraction, but the results were very poor as also corroborated in Song et al. (2024). For further implementation details of the eval metrics, see Appendix Sec. A. The VedantaNY-10M dataset, code and evaluation is publicly available at https://github.com/ priyankamandikal/vedantany-10m .
## 5 Evaluation
We now evaluate the model along two axes: automatic evaluation metrics and a human evaluation survey. To ensure a broad and comprehensive evaluation, we categorize the questions into five distinct types, each designed to test different aspects of the model's capabilities:
1. Anecdotal: Generate responses based on stories and anecdotes narrated by the speaker in the discourses.
2. Comparative : Analyze and compare different concepts, philosophies, or texts. This category tests the model's analytical skills and its ability to draw parallels and distinctions.
3. Reasoning Require logical reasoning, critical thinking, and the application of principles to new scenarios.
4. Scriptural : Test the model's ability to reference, interpret, and explain passages from religious or philosophical texts.
5. Terminology : Probe the model's understanding of specific technical terms and concepts.
For a sample set of questions across the above five categories, please refer to Appendix Table 4.
## 5.1 Automatic Evaluation
Inspired by Xu et al. (2023), we conduct an extensive automatic evaluation of the two RAG models on our evaluation set. We describe each metric type below and provide implementation details in Appendix Sec. A. Due to the lack of gold answers, we are unable to report reference-based metrics.
Answer-only metrics: We assess features like fluency and coherence by analyzing responses with specific metrics: (1) Self-BLEU (Zhu et al., 2018) for text diversity, where higher scores suggest less diversity, applied in open-ended text generation; (2) GPT-2 perplexity for textual fluency, used in prior studies on constrained generation. We also consider (3) Word and (4) Sentence counts as length-based metrics, owing to their significant influence on human preferences (Sun et al., 2019; Liu et al., 2022; Xu et al., 2023).
(Question, answer) metric: To ensure answers are relevant to the posed questions, we model p ( q | a ) for ranking responses with RankGen (Krishna et al., 2022). Leveraging the T5-XXL architecture, this encoder is specially trained via contrastive learning to evaluate model generations based on their relevance to a given prefix, in this context, the question. A higher RankGen score indicates a stronger alignment between the question and the answer, serving as a measure of relevance.
(Answer, evidence) metric: A key challenge in LFQA is assessing answer correctness without dedicated factuality metrics, akin to summarization's faithfulness. We apply QAFactEval (Fabbri et al., 2022), originally for summarization, to LFQA by considering the answer as a summary and evidence documents as the source. Answers deviating from source content, through hallucinations or external knowledge, will score lower on this metric.
## 5.2 Human Evaluation
We have three experienced domain experts evaluate the models across the five categories. Each of these experts is closely associated with Vedanta Society of New York, and has extensively studied the philosophy in question for up to a decade on average, being well-versed with domain-specific terminology and conceptual analysis. We conduct the human survey along two dimensions: retrieval and generation. For retrieval, we evaluate relevance and completeness, and for generation we evaluate factual correctness and completeness. In addition, we ask the reviewers to provide free-form justifi-
Figure 4: Human evaluation: RAG vs non-RAG. Both transcription and retrieval performance receive high scores from the evaluators. For generation, the RAG model outperforms the generic model across various metrics, particularly in factuality, completeness and specificity, while being marginally lower in ease of understanding.
<details>
<summary>Image 4 Details</summary>

### Visual Description
## Chart Compilation: Evaluation of RAG vs. Non-RAG Systems
### Overview
This image presents a compilation of charts evaluating the performance of systems with and without Retrieval-Augmented Generation (RAG) across several metrics: Transcription Quality, Generation (Factuality, Completeness, Specificity, Ease of Understanding, Faithfulness), Retrieval Relevance, Hallucinations, and Outside Knowledge. The charts primarily use bar graphs and pie charts to visualize the data.
### Components/Axes
The image is divided into four main sections: Transcription, Generation, Retrieval, and Hallucinations/Outside Knowledge.
* **Transcription Quality:** Pie chart with a single value.
* **Generation:** Bar graph with five metrics on the x-axis (Factuality, Completeness, Specificity, Ease of Understanding, Faithfulness) and a y-axis ranging from 0.00 to 5.00. Two data series are presented: Non-RAG (red) and RAG (green).
* **Retrieval Relevance:** Pie chart with a single value.
* **Hallucinations:** Two pie charts, one for Non-RAG and one for RAG, showing the distribution of responses categorized as "No", "Not Sure", and "Yes".
* **Outside Knowledge:** Pie chart for RAG, showing the distribution of responses categorized as "No", "Not Sure", and "Yes".
* **Legend:** Located at the bottom-center of the image, defining the colors for "No" (green), "Not Sure" (yellow), and "Yes" (red). The Generation chart legend is at the top-right, defining "Non-RAG" (red) and "RAG" (green).
### Detailed Analysis or Content Details
**1. Transcription Quality:**
* The pie chart shows a Transcription Quality score of approximately 4.48.
**2. Generation:**
* **Factuality:** Non-RAG is approximately 2.7, RAG is approximately 4.0.
* **Completeness:** Non-RAG is approximately 2.8, RAG is approximately 4.0.
* **Specificity:** Non-RAG is approximately 2.5, RAG is approximately 3.5.
* **Ease of Understanding:** Non-RAG is approximately 3.0, RAG is approximately 4.0.
* **Faithfulness:** Non-RAG is approximately 3.0, RAG is approximately 4.0.
* *Trend:* For all five metrics, the RAG data series consistently outperforms the Non-RAG data series, with RAG bars being significantly higher.
**3. Retrieval Relevance:**
* The pie chart shows a Retrieval Relevance score of approximately 4.34.
**4. Hallucinations (Non-RAG):**
* No: 30.0%
* Not Sure: 30.0%
* Yes: 40.0%
**5. Hallucinations (RAG):**
* No: 47.5%
* Not Sure: 25.0%
* Yes: 27.5%
**6. Outside Knowledge (RAG):**
* No: 43.36%
* Not Sure: 38.5%
* Yes: 18.14%
### Key Observations
* RAG consistently outperforms Non-RAG across all Generation metrics.
* RAG significantly reduces the occurrence of hallucinations compared to Non-RAG.
* The majority of responses from RAG indicate "No" outside knowledge, with a substantial portion being "Not Sure".
* Transcription and Retrieval Relevance both have high scores, around 4.3-4.5.
### Interpretation
The data strongly suggests that incorporating Retrieval-Augmented Generation (RAG) significantly improves the quality of generated text. RAG leads to higher factuality, completeness, specificity, ease of understanding, and faithfulness compared to systems without RAG. Furthermore, RAG demonstrably reduces the frequency of hallucinations.
The "Outside Knowledge" pie chart for RAG indicates that the system primarily relies on retrieved information, as the majority of responses indicate "No" outside knowledge. The substantial "Not Sure" category suggests that the system is cautious about asserting information not directly supported by the retrieved context.
The high scores for Transcription Quality and Retrieval Relevance suggest that the underlying retrieval and transcription components are performing well, providing a solid foundation for the generation process. The consistent improvement across all metrics when RAG is employed highlights its effectiveness in enhancing the overall performance of the system. The data suggests that RAG is a valuable technique for building more reliable and trustworthy language models.
</details>
cation for their choices, which proves to be very useful in analyzing the two models.
Relevance: Defined as the relevance of the retrieved passages to the user query, this metric is scored on a scale from 1 to 5 (1 = Not at all relevant, 5 = Extremely relevant).
Correctness: Factual accuracy of the generated answer (1 = Inaccurate, 5 = No inaccuracies)
Completeness: Measures if the retrieved passage and generated answer comprehensively cover all parts of the query (1 = Not at all comprehensive misses crucial points, 5 = Very comprehensive).
## 5.3 Results: RAG vs Non-RAG
We first conduct a human evaluation survey with 5 computational linguists and 3 domain experts on RAG vs non-RAG models. In the evaluation of the generation capabilities of our models, we consider five metrics: factuality, completeness, specificity, ease of understanding, and faithfulness. The performance of the RAG model is compared against a baseline non-RAG model across these dimensions in Fig. 4. The RAG model substantially outperforms the non-RAG model across various metrics, particularly in factuality, completeness and specificity, while being marginally lower in ease of understanding. Sample responses in Figs. 7-11.
## 5.4 Results: Standard RAG vs Keyword-based RAG
We report results in Table 1. The keyword based RAG model shows strong improvement across all automatic metrics while significantly outperforming the standard model in the human evaluation. Amongst the answer-only metrics, the model tends to produce longer, more comprehensive answers (indicated by longer length), which are more coherent (lower perplexity). The question-answer RankGen metric (Krishna et al., 2022) evaluates the probability of the answer given the question. A higher score for the model suggests more relevant answers to the question. Most notably, the keyword model does very well on QAFactEval (Fabbri et al., 2022), which evaluates faithfulness by comparing answers from the summary (in our case, the answer) and the evidence (retrievals). A higher score indicates greater faithfulness of the answer to retrieved passages, indicating fewer hallucinations and reliance on outside knowledge.
For the human evaluation in Table 1, we report scores normalized between 0 to 1. A relevance rating of 0.82 for keyword-based RAG vs 0.59 for standard RAG indicates a strong alignment between the retrieved content and the users' queries for our model, demonstrating the efficacy of the retrieval process. Conversely, the standard model
Table 1: Automatic and human evaluation: standard RAG (M1) vs keyword-based RAG (M2). We report both automatic and human evaluation metrics calculated on 25 triplets of {question, answer, retrievals} across 5 different question categories. The key-word based RAG model shows strong improvement across all automatic metrics while significantly outperforming the standard model in the human evaluation.
| Category RAG Model | Mean M1/M2 | Anecdotal M1/M2 | Comparative M1/M2 | Reasoning M1/M2 | Scriptural M1/M2 | Terminology M1/M2 |
|----------------------|----------------------|-------------------|---------------------|-------------------|--------------------|---------------------|
| Automatic metrics | Automatic metrics | Automatic metrics | Automatic metrics | Automatic metrics | Automatic metrics | Automatic metrics |
| Answer-only | | | | | | |
| GPT2-PPL ↓ | 16.6/ 15.3 | 16.6 / 16.6 | 16.9/ 15.7 | 13.9/ 11.9 | 14.2 /14.7 | 21.5/ 17.7 |
| Self-bleu ↓ | 0.12 /0.13 | 0.11/ 0.05 | 0.10/ 0.06 | 0.15 /0.27 | 0.13 /0.16 | 0.09 /0.14 |
| # Words ↑ | 196/ 227 | 189 / 189 | 174/ 206 | 218/ 282 | 225/ 243 | 216/ 261 |
| # Sentences ↑ | 9.0/ 10.1 | 8.2 /7.6 | 7.8/ 9.4 | 9.6/ 11.8 | 10.0/ 10.6 | 9.4/ 11.0 |
| (Question, answer) | (Question, answer) | | | | | |
| RankGen ↑ | 0.46/ 0.48 | 0.42/ 0.52 | 0.44/ 0.47 | 0.41/ 0.43 | 0.51/ 0.52 | 0.52 /0.46 |
| (Answer, retrievals) | (Answer, retrievals) | | | | | |
| QAFactEval ↑ | 1.36/ 1.60 | 1.01/ 1.14 | 1.53/ 1.94 | 1.18/ 1.61 | 1.52 /1.36 | 1.56/ 1.95 |
| Human evaluation | Human evaluation | Human evaluation | Human evaluation | Human evaluation | Human evaluation | Human evaluation |
| Retrieval | | | | | | |
| Relevance ↑ | 0.59/ 0.82 | 0.41/ 0.88 | 0.79/ 0.85 | 0.73/ 0.83 | 0.48/ 0.73 | 0.55/ 0.81 |
| Completeness ↑ | 0.52/ 0.79 | 0.41/ 0.86 | 0.72/ 0.79 | 0.57/ 0.83 | 0.37/ 0.68 | 0.52/ 0.79 |
| Answer | | | | | | |
| Correctness ↑ | 0.61/ 0.86 | 0.40/ 0.89 | 0.81/ 0.88 | 0.71/ 0.85 | 0.52/ 0.81 | 0.63/ 0.89 |
| Completeness ↑ | 0.58/ 0.85 | 0.42/ 0.92 | 0.80/ 0.85 | 0.72/ 0.81 | 0.49/ 0.77 | 0.63/ 0.91 |
sometimes fails to disambiguate unique terminology and retrieves incorrect passages (see Fig. 12). In assessing the accuracy of the generated answer, the keyword-based RAG model significantly outperforms the standard model, indicating better alignment with verifiable facts. Fig. 13 shows a factually inaccurate response from the generic model. The keyword model achieves higher completeness scores for both the retrieval and generation. Sample responses are shown in Figs. 12-16.
## 6 Challenges
The evaluation in Sec. 5 shows that the RAG model provides responses that are not only more aligned with the source material but are also more comprehensive, specific, and user-friendly compared to the responses generated by the generic language model. In this section, we discuss the challenges we encountered while building the retrieval-augmented chatbot for the niche knowledge domain of ancient Indian philosophy introduced in this work.
Transcription. Our requirement of using a niche data domain having long-tail knowledge precludes the use of source material that the LLM has previously been exposed to. To ensure this, we construct a textual corpus that is derived from automated transcripts of YouTube discourses. These transcripts can sometimes contain errors such as missing punc- tuations, incorrect transcriptions, and transliterations of Sanskrit terms. A sample of such errors is shown in Appendix Table 3. A proofreading mechanism and/or improved transcription models can help alleviate these issues to a large extent.
Spoken vs written language. Unlike traditional textual corpora that are compiled from written sources, our dataset is derived from spoken discourses. Spoken language is often more verbose and less structured than written text, with the speaker frequently jumping between concepts mid-sentence. This unstructured nature of the text can be unfamiliar for a language model trained extensively on written text. A peculiar failure case arising from this issue is shown in Appendix Fig. 6. This can be addressed by converting the spoken text into a more structured prose format with the help of well-crafted prompts to LLMs, followed by human proofreading.
Context length. The passages retrieved in the standard model are of a fixed length and can sometimes be too short for many queries, especially for long-form answering. For instance, the retrieved passage may include a snippet from the middle of the full context, leading to incomplete or incoherent chatbot responses (Fig. 11). This prompted us to employ a keyword-based context-expansion mechanism to provide a more complete context. While this results in much better answer generation, the retrieved passage may contain too much information, making it difficult for the generator to reason effectively. Moreover, the increase in the number of tokens increases processing time. Future work can explore more advanced retrieval models capable of handling longer contexts and summarizing them effectively before input to the LLM.
Retrieval-induced hallucinations. There are scenarios when the RAG model latches onto a particular word or phrase in the retrieved passage and hallucinates a response that is not only irrelevant but also factually incorrect. A sample of such a hallucination is in Fig. 10. This is a more challenging problem to address. However, retrieval models that can extract the full context, summarize it and remove irrelevant information should be capable of mitigating this issue to a reasonable extent.
## 7 Conclusion
In this work, we integrate modern retrievalaugmented large language models with the ancient Indian philosophy of Advaita Vedanta. Toward this end, we present VedantaNY-10M, a large dataset curated from automatic transcriptions of extensive philosophical discourses on YouTube. Validating these models along various axes using both automatic and human evaluation provides two key insights. First, RAG models significantly outperform non-RAG models, with domain experts expressing a strong preference for using such RAG models to supplement their daily studies. Second, the keyword-based hybrid RAG model underscores the merits of integrating classical and contemporary deep learning techniques for retrieval in niche and specialized domains. While there is much work to be done, our study underscores the potential of integrating modern machine learning techniques to unravel ancient knowledge systems.
## Limitations and Future Work
While our study demonstrates the utility of integrating retrieval-augmented LLMs with ancient knowledge systems, there are limitations and scope for future work. First, this study focuses on a single niche domain of Advaita Vedanta as taught by one teacher. Expanding this study to include other ancient philosophical systems, such as the Vedantic schools of Vishishtadvaita and Dvaita, as well as various Buddhist and Jain traditions, would be a valuable extension. Second, incorporating primary scriptural sources, in addition to spoken discourses, would enhance the authenticity of the RAG model's outputs. Third, while we only experiment with RAG models in this study, finetuning the language models themselves on philosophy datasets is an interesting future direction. Fourth, the context refiner is currently heuristic-based and may not generalize well to all scenarios. Replacing it with a trained refiner using abstractive or extractive summarization techniques would considerably improve its utility and efficiency. Fifth, expanding the evaluation set and involving more subjects for evaluation and will considerably strengthen the study's robustness. Finally, while the language models in this work are primarily in English and Latin script, building native LLMs capable of functioning in the original Sanskrit language of the scriptures using Devanagari script is essential future work.
## Acknowledgments
The author would like to thank Prof. Kyle Mahowald for his insightful course on form and functionality in LLMs, which guided the evaluation of the language models presented in this paper. Fangyuan Xu provided valuable information on automatic metrics for LFQA evaluation. The author extends their gratitude to all the human evaluators who took the survey and provided valuable feedback, with special thanks to Dr. Anandhi who coordinated the effort among domain experts. Finally, the author expresses deep gratitude to the Vedanta Society of New York and Swami Sarvapriyananda for the 750+ hours of public lectures that served as the dataset for this project.
## Ethics Statement
All data used in this project has been acquired from public lectures on YouTube delivered by Swami Sarvapriyananda of the Vedanta Society of New York. While our study explores integrating ancient knowledge systems with modern machine learning techniques, we recognize their inherent limitations. These knowledge traditions have always emphasized the importance of the teacher in transmitting knowledge. We do not see LLMs as replacements for monks and teachers of these ancient traditions, but only as tools to supplement analysis and study. Moreover, users of these tools need to be made well aware that these models can and do make errors, and should therefore seek guidance from qualified teachers to carefully progress on the path.
## References
- Tom Aarsen. 2020. SpanMarker for Named Entity Recognition.
- Yannis Assael, Thea Sommerschield, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipanagiotou, Ion Androutsopoulos, Jonathan Prag, and Nando de Freitas. 2022. Restoring and attributing ancient texts using deep neural networks. Nature , 603.
- David Bamman and Patrick J Burns. 2020. Latin bert: A contextual language model for classical philology. arXiv preprint arXiv:2009.10053 .
- Bhagavad Gita. 3000 B.C.E. The Bhagavad Gita.
- Bartosz Bogacz and Hubert Mara. 2020. Period classification of 3d cuneiform tablets with geometric neural networks. In ICFHR .
- Bernd Bohnet, Vinh Q Tran, Pat Verga, Roee Aharoni, Daniel Andor, Livio Baldini Soares, Jacob Eisenstein, Kuzman Ganchev, Jonathan Herzig, Kai Hui, et al. 2022. Attributed question answering: Evaluation and modeling for attributed large language models. arXiv preprint arXiv:2212.08037 .
- Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. In ICML .
- The Brahmasutras. 3000 B.C.E. The Brahmasutras.
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.
- Anthony Chen, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. 2020. MOCHA: A dataset for training and evaluating generative reading comprehension metrics. In EMNLP .
- Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pretraining text encoders as discriminators rather than generators. In ICLR .
- Alexander Fabbri, Chien-Sheng Wu, Wenhao Liu, and Caiming Xiong. 2022. QAFactEval: Improved QAbased factual consistency evaluation for summarization. In NAACL .
- Maarten Grootendorst. 2020. Keybert: Minimal keyword extraction with bert.
- Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Realm: Retrieval augmented language model pre-training. In ICML .
- Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. spaCy: Industrialstrength Natural Language Processing in Python.
- Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane DwivediYu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2022. Atlas: Few-shot learning with retrieval augmented language models. JMLR .
- Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, MarieAnne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2024. Mixtral of experts. Preprint , arXiv:2401.04088.
- Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, and Colin Raffel. 2023. Large language models struggle to learn long-tail knowledge. In ICML .
- Vladimir Karpukhin, Barlas O˘ guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for opendomain question answering. EMNLP .
- Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2020. Generalization through memorization: Nearest neighbor language models. ICLR .
- Kalpesh Krishna, Yapei Chang, John Wieting, and Mohit Iyyer. 2022. Rankgen: Improving text generation with large ranking models. arXiv:2205.09726 .
- Kalpesh Krishna, Aurko Roy, and Mohit Iyyer. 2021. Hurdles to progress in long-form question answering. In NAACL .
- Philippe Laban, Tobias Schnabel, Paul N. Bennett, and Marti A. Hearst. 2022. Summac: Re-visiting nlibased models for inconsistency detection in summarization. TACL .
- Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020a. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In ACL .
- Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020b. Retrieval-augmented generation for knowledge-intensive nlp tasks. NeurIPS .
- Yixin Liu, Alexander R. Fabbri, Pengfei Liu, Yilun Zhao, Linyong Nan, Ruilin Han, Simeng Han, Shafiq R. Joty, Chien-Sheng Wu, Caiming Xiong, and Dragomir R. Radev. 2022. Revisiting the gold standard: Grounding summarization evaluation with robust human evaluation. ArXiv , abs/2212.07981.
- Ligeia Lugli, Matej Martinc, Andraž Pelicon, and Senja Pollak. 2022. Embeddings models for buddhist Sanskrit. In LREC .
- Rajiv Malhotra. 2021. Artificial Intelligence and the Future of Power: 5 Battlegrounds . Rupa Publications.
- Rajiv Malhotra and Satyanarayana Dasa Babaji. 2020. Sanskrit Non-Translatables: The Importance of Sanskritizing English . Amaryllis.
- Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Hannaneh Hajishirzi, and Daniel Khashabi. 2023. When not to trust language models: Investigating effectiveness and limitations of parametric and nonparametric memories. ACL .
- Priyanka Mandikal and Raymond Mooney. 2024. Sparse meets dense: A hybrid approach to enhance scientific document retrieval. In The 4th CEUR Workshop on Scientific Document Understanding, AAAI .
- Joshua Maynez, Shashi Narayan, Bernd Bohnet, and Ryan McDonald. 2020. On faithfulness and factuality in abstractive summarization. In ACL .
- Jacob Menick, Maja Trebacz, Vladimir Mikulik, John Aslanides, Francis Song, Martin Chadwick, Mia Glaese, Susannah Young, Lucy CampbellGillingham, Geoffrey Irving, et al. 2022. Teaching language models to support answers with verified quotes. arXiv preprint arXiv:2203.11147 .
- Ragaa Moustafa, Farida Hesham, Samiha Hussein, Badr Amr, Samira Refaat, Nada Shorim, and Taraggy M Ghanim. 2022. Hieroglyphs language translator using deep learning techniques (scriba). In International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC) .
- Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, et al. 2021. Webgpt: Browser-assisted questionanswering with human feedback. arXiv preprint arXiv:2112.09332 .
- Sonika Narang, MK Jindal, and Munish Kumar. 2019. Devanagari ancient documents recognition using statistical feature extraction techniques. S¯ adhan¯ a , 44.
- Zach Nussbaum, John X. Morris, Brandon Duderstadt, and Andriy Mulyar. 2024. Nomic embed: Training a reproducible long context text embedder. Preprint , arXiv:2402.01613.
OpenAI. 2023. Gpt-4 technical report.
- Asimina Paparigopoulou, John Pavlopoulos, and Maria Konstantinidou. 2022. Dating greek papyri images with machine learning. In ICDAR Workshop on Computational Paleography, https://doi. org/10.21203/rs .
- Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2023. Robust speech recognition via large-scale weak supervision. In ICML .
- Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083 .
Ramakrishna Order. Belur math.
- Adi Shankaracharya. 700 C.E. Commentary on the Upanishads.
- Weijia Shi, Julian Michael, Suchin Gururangan, and Luke Zettlemoyer. 2022. Nearest neighbor zero-shot inference. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing .
- Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. 2023. Replug: Retrievalaugmented black-box language models. arXiv preprint arXiv:2301.12652 .
- Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. 2021. Retrieval augmentation reduces hallucination in conversation. ACL .
- Thea Sommerschield, Yannis Assael, John Pavlopoulos, Vanessa Stefanak, Andrew Senior, Chris Dyer, John Bodel, Jonathan Prag, Ion Androutsopoulos, and Nando de Freitas. 2023. Machine Learning for Ancient Languages: A Survey. Computational Linguistics .
- Mingyang Song, Xuelian Geng, Songfang Yao, Shilong Lu, Yi Feng, and Liping Jing. 2024. Large language models as zero-shot keyphrase extractors: A preliminary empirical study. Preprint , arXiv:2312.15156.
- Simeng Sun, Ori Shapira, Ido Dagan, and Ani Nenkova. 2019. How to compare summarizers without target length? pitfalls, solutions and re-examination of the neural summarization literature. Workshop on Methods for Optimizing and Evaluating Neural Language Generation .
Upanishads. >3000 B.C.E. The Upanishads.
- Vedanta Society of New York. Swami Sarvapriyananda's Vedanta discourses.
- Lee Xiong, Chuan Hu, Chenyan Xiong, Daniel Campos, and Arnold Overwijk. 2019. Open domain web keyphrase extraction beyond language modeling. arXiv preprint arXiv:1911.02671 .
- Fangyuan Xu, Yixiao Song, Mohit Iyyer, and Eunsol Choi. 2023. A critical evaluation of evaluations for long-form question answering. In ACL .
- Eunice Yiu, Eliza Kosoy, and Alison Gopnik. 2023. Imitation versus innovation: What children can do that large language and language-and-vision models cannot (yet)? ArXiv , abs/2305.07666.
- Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu. 2018. Texygen: A benchmarking platform for text generation models. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval .
## A Implementation Details
## A.1 Automatic Metrics
Following Xu et al. (2023), we implement a number of automatic evaluation metrics for LFQA as described below.
Length We use the Spacy package (Honnibal et al., 2020) for word tokenization.
Self-BLEU We calculate Self-BLEU by regarding one sentence as hypothesis and all others in the same answer paragraph as reference. We report self-BLEU-5 as a measure of coherence.
RankGen For a given question q and a modelgenerated answer a , we first transform them into fixed-size vectors ( q , a ) using the RankGen encoder (Krishna et al., 2022). To assess their relevance, we compute the dot product q · a . We utilize the T5-XXL (11B) encoder, which has been trained using both in-book negative instances and generative negatives.
QAFactEval QAFactEval is a QA-based metric recently introduced by Fabbri et al. (2022). It has demonstrated exceptional performance across multiple factuality benchmarks for summarization (Laban et al., 2022; Maynez et al., 2020). The pipeline includes four key components: (1) Noun Phrase (NP) extraction from sentence S represented as Ans ( S ) , (2) BART-large (Lewis et al., 2020a) for question generation denoted as Q G , (3) Electralarge (Clark et al., 2020) for question answering labeled as Q A , and (4) learned metrics LERC (Chen et al., 2020), to measure similarity as Sim ( p i , s i ) . An additional answerability classification module is incorporated to assess whether a question can be answered with the information provided in document D . Following Xu et al. (2023), we report LERC, which uses the learned metrics to compare Ans S and Ans D (a).
## A.2 Chat Prompt
For an example of the constructed RAG bot prompt, please refer to Fig. 5. In this scenario, the RAG bot C r is presented with the top-k retrieved passages alongside the query for generating a response, whereas a generic bot C g would only receive the query without additional context.
## B Sample Sanskrit terms
Table 2 contains excerpts from passages containing Sanskrit terms. The Sanskrit terms are italicized
<details>
<summary>Image 5 Details</summary>

### Visual Description
\n
## Diagram: RAG Bot vs. Generic Bot Prompt Comparison
### Overview
The image presents a comparison between the prompts used for a Retrieval-Augmented Generation (RAG) Bot and a Generic Bot. The comparison is visually represented as two rectangular blocks, one green for the RAG Bot and one red for the Generic Bot, stacked vertically. Each block contains the text of the prompt used for the respective bot.
### Components/Axes
The diagram consists of two main components:
* **RAG Bot Block:** A green rectangle at the top.
* **Generic Bot Block:** A red rectangle at the bottom.
Each block contains text defining the bot's role and instructions. The variable `{query}` appears in both prompts.
### Content Details
**RAG Bot Prompt (Green Block):**
"You are a helpful assistant that accurately answers queries using Swami Sarvapriyananda's YouTube talks. Use the following passages to provide a detailed answer to the query: {query}
Passages:
{Passage 1}
{Passage 2}
...
{Passage k}"
**Generic Bot Prompt (Red Block):**
"You are a helpful assistant that accurately answers queries using Swami Sarvapriyananda's YouTube talks. Provide a detailed answer to the query: {query}"
### Key Observations
The key difference between the two prompts is the inclusion of "Passages:" and the placeholder for multiple passages (`{Passage 1}`, `{Passage 2}`, ..., `{Passage k}`) in the RAG Bot prompt. This indicates that the RAG Bot is designed to utilize retrieved information (passages) to formulate its responses, while the Generic Bot does not explicitly have access to such passages within its prompt.
### Interpretation
The diagram illustrates the core distinction between a standard Large Language Model (LLM) and a RAG-enhanced LLM. The Generic Bot relies solely on its pre-trained knowledge to answer queries. The RAG Bot, however, is augmented with external knowledge retrieved from a source (in this case, Swami Sarvapriyananda's YouTube talks) and provided as context within the prompt. This allows the RAG Bot to provide more accurate and contextually relevant answers, especially for queries requiring information not readily available in the LLM's pre-trained data. The use of `{query}` in both prompts suggests that both bots are designed to respond to user input, but the RAG Bot's response will be informed by the provided passages. The "..." in the RAG Bot prompt indicates that the number of passages can vary. This is a conceptual diagram demonstrating the difference in prompt structure, not a presentation of data.
</details>
Figure 5: Prompts for the RAG and generic chatbots. RAG Bot receives the top-k retrieved relevant passages in the prompt along with the query, while the generic bot only receives the query.
and underlined. Notice that the passages contain detailed English explanations of these terms. To retain linguistic diversity, authenticity and comprehensiveness of the source material, we retain these Sanskrit terms as is in our passages as described in Sec. 3. Note that these are direct Whisper (Radford et al., 2023) transcriptions with no further postprocessing or proofreading. Transcriptions may not always be accurate.
## C Transcription
We asses the transcript quality and list out some common errors.
## C.1 Transcript Evaluation
Transcription quality is scored on a scale from 1 to 5 (where 1 = Poor, 5 = Perfect). On 10 randomly sampled transcripts, evaluators assign a high average score of 4.48 suggesting that the transcription of YouTube audio into text is highly accurate and clear, indicating that our constructed custom dataset D t is of high quality.
## C.2 Transcript Errors
Table 3 contains a few sample transcription errors. The transcriptions are largely good for English words and sentences. However, errors often arise from incorrectly transcribing Sanskrit terms and verses. Other less common errors include missing
## Sl. No. Excerpts from passages
1. Om Bhadram Karne Bhishrinu Yamadevaha Bhadram Pashyam Akshabhirya Jatraaha Sthirai Rangai Stushta Vagam Sasthanubhi Vyase Madevahitaiyadayoh Swasthina Indro Vriddha Shravaha Swasthina Phusa Vishwa Vedaaha Swasthina Starksho Arishta Nemi Swasthino Brihas Patir Dadhatu Om Shanti Shanti Shanti .
2. Samsara is our present situation, the trouble that we are caught in, the mess that we are caught in. Samsara is this. In Sanskrit, normally when you use the word samsara , it really means this world of our life, you know, being born and struggling in life and afflicted by suffering and death and hopelessness and meaninglessness.
3. The problem being ignorance, solution is knowledge and the method is Jnana Yoga , the path of knowledge. So what is mentioned here, Shravana Manana Nididhyasana , hearing, reflection, meditation, that is Jnana Yoga . So that's at the highest level of practice, way of knowledge.
4. In Sanskrit, ajnana and adhyasa , ignorance and superimposition. Now if you compare the four aspects of the self, the three appearances and the one reality, three appearances, waker, dreamer, deep sleeper, the one reality, turiyam , if you compare them with respect to ignorance and error, you will find the waker, that's us right now. We have both ignorance and error.
5. Mandukya investigates this and points out there is an underlying reality, the Atman , pure consciousness, which has certain characteristics. This is causality, it is beyond causality. It is neither a cause nor an effect. The Atman is not produced like this, nor is it a producer of this. It is beyond change. No change is there in the Atman , nirvikara . And third, it is not dual, it is non-dual, advaitam . This is kadyakarana in Sanskrit, this is kadyakarana vilakshana Atma . In Sanskrit this is savikara , this is nirvikara Atma . This is dvaita , this is advaita Atma . So this is samsara and this is moksha , freedom.
## Notes
This is a Sanskrit chant which is directly Romanized and processed. The automatic transcriptions often contain errors in word segmentation for Sanskrit verses.
Samsara is a Sanskrit term. The excerpt contains an explanation of the concept in English.
The excerpt contains an explanation of Jnana Yoga , the path of knowledge.
Ajnana , adhyasa and turiyam are Sanskrit terms. Notice that the passage implicitly contains rough English translations of these terms in the context of the overall discourse. For instance, ajnana is translated as ignorance and adhyasa is translated as superimposition.
The excerpt contains an explanation of different Sanskrit technical terms.
Table 2: Excerpts from passages containing Sanskrit terms. These excerpts contain detailed English descriptions of technical terms in Sanskrit (italicized and underlined). To retain authenticity to the source material, we retain these Sanskrit terms as is in our passages. Note that these are direct Whisper (Radford et al., 2023) transcriptions with no further post-processing or proofreading, so transcriptions may not always be accurate. For more details, refer to Sec. B.
Figure 6: Subject confusion: Aldous Huxley vs Alan Watts. In the retrieved passage, the speaker jumps between speaking about Aldous Huxley and Alan Watts. Moreover, the transcription has poor punctuation. As a result, the RAG bot incorrectly attributes information about Huxley to Watts in the response. Due to the unstructured nature of the spoken text, such errors can occur. This can be addressed by converting the spoken text into a more structured prose format with the help of well-crafted prompts to LLMs, followed by human proofreading (see Sec. D).
<details>
<summary>Image 6 Details</summary>

### Visual Description
\n
## Screenshot: Question & Answer with Passages
### Overview
This screenshot displays a question-and-answer interface, likely from a Retrieval-Augmented Generation (RAG) bot. The question asks about Alan Watts' association with Vedanta. The interface presents an answer generated by the "RAG Bot" and supporting passages from a text titled "Swami Vivekananda: The Power and the Glory" by Sarvapriyananda.
### Components/Axes
The screenshot is divided into three main sections:
1. **Question:** Located at the top-left, posing the query "Was Alan Watts" associated with Vedanta?".
2. **Answer:** Positioned below the question, labeled "Answer" and attributed to "RAG Bot".
3. **Passages:** Occupying the right side of the screenshot, labeled "Passages" and attributed to "Swami Vivekananda: The Power and the Glory | Swami Sarvapriyananda".
### Content Details
**Question:** "Was Alan Watts" associated with Vedanta?"
**Answer (RAG Bot):** "Yes, Alan Watts was associated with Vedanta. Although he was not directly connected with the Vedanta Society, he taught Vedanta and Zen, which was a kind of eclectic mixture of both. In the 1950s and 60s, when he wrote his book “The Perennial Philosophy,” some reviews were critical, suggesting that he had become part of some Hindu cult. However, he maintained his association with the Vedanta Society of Southern California until the very end. Alan Watts was one of the leading intellectuals in Britain, along with other British intellectuals like Aldous Huxley, Christopher Isherwood, and Gerald Hurd. While Huxley and Isherwood were directly associated with Vedanta, Alan Watts taught Vedanta and Zen, which shows his connection to Vedantic ideas."
**Passages (Swami Vivekananda: The Power and the Glory | Swami Sarvapriyananda):**
"50s and 60s and when he wrote that book some reviews were nasty some said that oh he’s become part of some Hindu cult or something like that you know but he maintained his association with the Vedanta Society till the very end it’s very closely connected with the Vedanta Society of Southern California Aldous Huxley one of the leading intellectuals in Britain and then they were all British by the way Huxley Christopher Isherwood Gerald Hurd, Alan Watts. Alan Watts was not directly connected with Vedanta but he taught Vedanta and Zen a kind of eclectic mixture of both. Aldous Huxley his amazing book the perennial philosophy. Houston Smith major figure in the study of comparative religion and his book the world’s religions he wrote that he was a disciple of Swami Satprakashananda in the Vedanta Society in St Louis so this spread of Vedantic ideas, yogic ideas of and then later on Buddhism and so on that Vivekananda was the pioneer he opened the door here and he went back to India the other side of his work. Sister Nivedita writes when he stood up to speak here in Chicago his message for the West he said I have a message for the West as Buddha had the message for the East but his message his words Nivedita writes traveled back across the dark oceans to a land to his motherland asleep to awaken her to a sense of her greatness. When he went back to India, India which was colonized which was starving superstitious divided he was the first person historian says to"
### Key Observations
* The answer provided by the RAG Bot directly addresses the question, confirming Alan Watts' association with Vedanta, albeit not a direct connection to the Vedanta Society.
* The passages provided support the answer, mentioning Alan Watts alongside other intellectuals involved with Vedanta and related philosophies.
* The passages are somewhat fragmented and appear to be transcriptions of spoken words, indicated by phrases like "you know" and incomplete sentences.
* There is repetition of Alan Watts' name within both the answer and the passages.
### Interpretation
The screenshot demonstrates a RAG system in action. The system successfully retrieves relevant passages from a source text ("Swami Vivekananda: The Power and the Glory") to support its answer to a user's question. The passages provide context and evidence for the claim that Alan Watts was associated with Vedanta, even though he wasn't a direct member of the Vedanta Society. The fragmented nature of the passages suggests the source material might be a transcript of a lecture or discussion. The system's ability to connect Watts to figures like Huxley and Isherwood highlights the broader intellectual context of Vedanta's influence in the mid-20th century. The system is functioning as expected, providing a relevant answer backed by supporting evidence.
</details>
or incorrect punctuation. Human proofreading will remove these errors to a large extent.
## D Spoken vs written language
Unlike traditional textual corpora that are compiled from written sources, our dataset is derived from spoken discourses. Spoken language is often more verbose and less structured than written text, with the speaker frequently jumping between concepts mid-sentence. This unstructured nature of the text can be unfamiliar for a language model trained extensively on written text, which expects a more coherent and structured input. Fig. 7 shows a peculiar failure case arising from this issue. The speaker jumps between speaking about Aldous Huxley and Alan Watts. As a result, the RAG bot incorrectly attributes information about Huxley to Watts while answering the query. Due to the unstructured nature of the spoken text, such errors can occur. This can be addressed by converting the spoken text into a more structured prose format with the help of well-crafted prompts to LLMs, followed by human proofreading.
## E Question Categories
To ensure a broad and comprehensive evaluation, we categorize the questions into five distinct types-anecdotal, teminology, scriptural, comparative and reasoning. Questions from each of these categories is provided in Tab. 4. Each category is designed to test different aspects of the model's capabilities. For details, refer to Sec. 5.
## F Sample Responses: RAG vs Non-RAG
In Figs. 7-11, we present the sample queries and responses from the RAG and generic chatbots for five sample queries. The query is highlighted in yellow, RAG response is in the green box. NonRAG response is in the red box. We use GPT-4turbo (OpenAI, 2023) as the large language model for both of them.
## G Sample Responses: Standard RAG vs Keyword-based Hybrid RAG
In Figs. 12-16, we present the sample queries and responses from the standard RAG and keywordbased hybrid RAG model for one query each from five categories. We additionally also show the human-annotated and model generated keywords. Keyword-RAG response is in the green box. Standard-RAG response is in the red box. We use Mixtral-8x7B-Instruct-v0.1 (Jiang et al., 2024) as the large language model for both of them.
Table 3: Sample transcription errors. For constructing our text corpus, we directly use the transcripts obtained from Whisper (Radford et al., 2023) with no further post-processing or proofreading. The transcriptions are largely good (with a score of 4.5/5 from human evaluators). However, errors arise from incorrectly transcribing Sanskrit terms, missing punctuations, etc. Human proofreading will remove these errors to a large extent.
| Sl. No. | Transcription errors | Notes |
|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| 1 | That's what Sam Altman, Chachjipiti, some- body asked him. | Should be ChatGPT |
| 2 | Last year, you studied extensively with Profes- sor Garfield, I believe, studying Vajamaka and the teachings of the Garjuna . | Should be Madhyamaka and Nagarjuna , respec- tively |
| 3 | From attachment comes desire, raga , I want it and if that desire is satisfied then there is no end to it, greed, lobha . But if it is somehow thwarted, then anger, kama krodho vijayate . | Should be bhijayate |
| 4 | In fact, one of the terms which is used in Mandukya Upanishad, Brahman is abhyavaharyam . | Should be avyavaharam |
| 5 | So, one of them was the Brahmo Samad , which was quite popular in Calcutta in those days. | Should be Samaj |
| 6 | I am awareness I'm eternal consciousness Al- dous Huxley Christopher Isherwood Gerald Hurd all of them were very close to Swami Prabhavananda in Southern California in Hol- lywood and look at the product of that Ish- erwood wrote that one of the most amazing biographies | The transcripts sometimes miss punctuation marks, making the passage dif- ficult to comprehend for both humans and language models |
Table 4: Question categories. To ensure a broad and comprehensive evaluation, we categorize the questions into five distinct types, each designed to test different aspects of the model's capabilities. For details, refer to Sec. 5.
| Category | Description | Questions |
|-------------|--------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Anecdotal | Stories and anecdotes narrated by the speaker in the discourses | • Does Swami speak about Wittgenstein's thesis defense? • Does Swami narrate any incident surrounding Shivaratri? • Does Swami speak about The Matrix movie? • Does Swami speak about Vachaspati Mishra? Does he narrate how Bhamati came to be written? • What was Christopher Isherwood's contribution to Vedanta? |
| Terminology | Probe the model's understanding of specific terms and concepts | • What is Adhyaropa Apavada? • What is Vikshepa Shakti? • What is the significance of the word 'Shraddha'? • What is Upadana Karana? • What constitutes Sadhana Chatushtaya? |
| Scriptural | Reference, interpret, and explain passages from religious or philosophical texts | • In Mandukya Upanishad, what is the significance of the word 'Om'? • In the Gospel, what parable does Sri Ramakrishna use to portray intense longing for God? • In the Mundaka Upanishad, how do we interpret the parable of the two birds? • How is Phala Vyapti and Vritti Vyapti defined in Vedantasara? • In the Gospel of Sri Ramakrishna, how do we understand the analogy of the salt doll that Thakur provides? |
| Comparative | Analyze and compare different concepts, philosophies, or texts | • As mentioned in the Yoga Sutras, is Samadhi necessary to attain enlightenment according to Advaita Vedanta? • Would Sri Ramakrishna's teachings be considered purely Ad- vaitic? • In Kashmir Shaivism, Chit is both Prakasha and Vimarsha i.e. it is both self-lumious and self-reflective. Would Advaita Vedanta agree with this? • How does Sankhya differ from Advaita Vedanta? • What is the main difference between Buddhist Shunyavada and Advaita Vedanta? |
| Reasoning | Require logical reasoning, critical thinking, and the application of principles to new scenarios | • Can AI ever become conscious? • Is the waking state similar to a dream or absolutely indistinguish- able from the dream state? • Do our senses report reality to us? • Dis-identifying myself from the body-mind seems to require a lot of effort. Did the reverse process of identification also involve as much effort? • If Brahman as Existence-Consciousness-Bliss is the eternal sub- ject that cannot be objectified, yet eternally reveals itself to give rise to apparent objects through the principle of Maya, can we infer that Brahman is Existence-Consciousness-Bliss-Maya? That is, is Brahman inseparable from Maya? |
<details>
<summary>Image 7 Details</summary>

### Visual Description
\n
## Text Extraction: Sam Altman and the Equivalence of Brahman and Atman
### Overview
The image presents a screenshot of a question-and-answer interface, likely from a chatbot or similar AI tool. The question concerns Sam Altman's statements about his belief in the absolute equivalence of Brahman and Atman. The interface displays both the chatbot's answer and relevant passages from the Bhagavad Gita.
### Components/Axes
The interface is divided into three main sections:
1. **Question:** At the top, the user's question is displayed.
2. **Answer:** Below the question, the chatbot's response is presented. This section is labeled "Answer" and "RAG Bot".
3. **Passages:** At the bottom, excerpts from the Bhagavad Gita are provided as supporting context. This section is labeled "Passages" and includes chapter and verse information.
### Detailed Analysis or Content Details
**Question:**
"Sam Altman says that he believes in the absolute equivalence of Brahman and Atman. What does it mean?"
**Answer (RAG Bot):**
"Sam Altman, a prominent figure in the tech industry, has expressed his belief in the absolute equivalence of Brahman and Atman. To understand what this means, we need to delve into the concepts of Brahman and Atman in the context of Advaita Vedanta, a school of Hindu philosophy. In Advaita Vedanta, Brahman refers to the ultimate reality or the absolute truth. It is considered to be the underlying essence of everything in the universe. Brahman is often described as infinite, eternal, and beyond all limitations. It is the source and substratum of all existence. On the other hand, Atman refers to the individual self or the true essence of an individual. It is the innermost core of our being, beyond the physical body, mind, and ego. Atman is said to be pure consciousness, unchanging and eternal. The belief in the absolute equivalence of Brahman and Atman means that there is no fundamental difference between the ultimate reality and the individual self. It suggests that the true nature of the individual self is identical to the nature of the ultimate reality. In other words, the essence of every individual is not separate from the essence of the universe. This concept is central to Advaita Vedanta, which emphasizes the non-dual nature of reality. It asserts that the perceived duality and multiplicity in the world is an illusion, and the truth is the oneness of Brahman. The individual self, or Atman, is considered to be a manifestation or reflection of Brahman. By recognizing the equivalence of Brahman and Atman, one can realize their true nature and attain liberation or enlightenment. This realization involves transcending the limited notion of individual self and ego and recognizing the underlying unity of all existence. It is worth noting that Altman’s views on the equivalence of Brahman and Atman align with the teachings of Advaita Vedanta. This philosophical perspective is expounded by various spiritual teachers and scholars, such as Adi Shankaracharya. The core idea is that the individual self (Atman) is ultimately identical to the universal self (Brahman)."
**Passages:**
1. **117. Bhagavad Gita | Chapter 9 Verse 34 Revision | Swami Sarvaprryananda**
"That’s what Sam Altman, Chachipiti, somebody asked him. Somebody in YouTube wrote a gave a talk and vaguely I mentioned I heard this, and then somebody on YouTube wrote a comment, that if you look up his Twitter feed, somebody asked him, Sam Altman, that tell us one thing you believe which mostly people don’t believe. And he said, I believe in the absolute equivalence of Brahman and Atman, using those words. I think he is a non-dualist maybe in a Vedantic sense. Okay, I will come to you. Gentleman at the back. Yes. It’s subtle. The ego is very subtle. It can, when you want to be spiritual and rise above the ego, the ego will come and say, I can help you do that. I will volunteer. I am going to be a good ego from now on. And for a long time we can’t help because we are so closely identified with the ego, the sense of I. It’s very difficult, you can say that I am witnessing consciousness. I am the witness of the ego, which is actually the fact. But we still feel and act like that and speak like that, that we are the ego. So till time the practice of this bhakti as Krishna says in the 9th chapter, to continuously surrender to the Lord, make the ego smaller by the presence of a greater I. Not the ego which is small I, but the big I which is the God. If you make that I smaller, Brahman, then ego itself will become inflated Brahman, that’s good, I like that, Sam Brahman. No so much better. Swami Turiyananji, a great Vedantist himself, he says, I don’t know."
2. **118. Bhagavad Gita | Chapter 13 Verse 13 Revision | Swami Sarvaprryananda**
"So the knower of the field, the knower of the body, is different from the field. The field is matter, the knower of the field is the knower, the Atman, the self. So this is the distinction. And then he says, and also, he says, that the one who knows this, the one who knows this distinction, he detaches himself. He becomes detached. He doesn’t get attached to the body, to the mind, to the senses, to the objects of the senses. He becomes detached. And that is the way to liberation. That is the way to freedom. So this is the teaching of the Gita. The Gita is saying that you are not the body, you are not the mind, you are not the senses, you are not the objects of the senses. You are the Atman, the self, the knower of the field. And when you realize this, you become detached and you are liberated. So this is the teaching of the Gita. And this is the teaching of Vedanta. And this is the teaching of all the great spiritual traditions of the world. So this is the teaching of the Gita. And this is the teaching of Vedanta. And this is the teaching of all the great spiritual traditions of the world. So this is the teaching of the Gita. And this is the teaching of Vedanta. And this is the teaching of all the great spiritual traditions of the world."
### Key Observations
* The chatbot provides a detailed explanation of Brahman and Atman within the context of Advaita Vedanta.
* The passages from the Bhagavad Gita support the concept of the equivalence of Brahman and Atman, with Swami Sarvaprryananda offering commentary.
* The Gita passages emphasize detachment from the body and mind to realize the true self (Atman).
* The chatbot references Sam Altman's statement and connects it to non-dualistic Vedantic philosophy.
### Interpretation
The data suggests a growing interest in Eastern philosophical concepts, particularly within the tech community, as exemplified by Sam Altman's expressed beliefs. The chatbot's response and the included Bhagavad Gita passages demonstrate a clear articulation of the Advaita Vedanta perspective, where the individual self (Atman) is not separate from the ultimate reality (Brahman). The Gita passages highlight the importance of self-realization and detachment as paths to liberation. The inclusion of these passages provides context and authority to the chatbot's explanation, grounding Altman's statement in a well-established philosophical tradition. The discussion of the ego in the first passage is particularly relevant, suggesting that recognizing the illusory nature of the ego is crucial for understanding the equivalence of Brahman and Atman. The overall presentation suggests an attempt to bridge the gap between technological innovation and ancient wisdom.
</details>
Figure 7: RAG vs Non-RAG: Sam Altman on Atman and Brahman. The retrieved passages are very good at capturing the context of the query. The first passage references the monk speaking about Sam Altman and his views on Atman and Brahman. The second passage contains a brief explanation of Atman and Brahman that is helpful to answer the query. The generic bot also generates a reasonable answer. However, it has a sentence stating that, 'Atman is a part of Brahman'. This is incorrect in the context of Advaita Vedanta as it is not the same as equivalence-this point was highlighted by the domain experts in the survey.
<details>
<summary>Image 8 Details</summary>

### Visual Description
\n
## Text Document: Analysis of Text Snippets on Wittgenstein & Advaita
### Overview
The image presents a screenshot of a question-and-answer interface, likely from a chatbot or knowledge base (labeled "RAG Bot"). The content revolves around a discussion of Swami Sarvapriyananda's views on Wittgenstein's dissertation defense, and connections to Advaita Vedanta philosophy. The interface includes a question, an answer, and several "Passages" of related text.
### Components/Axes
The image is structured into the following sections:
* **Question:** "Does Swamiji speak about Wittgenstein's dissertation defense?" - positioned at the top-left.
* **Answer:** A textual response from the "RAG Bot" - positioned below the question.
* **Passages:** Three distinct blocks of text labeled "Passages" with titles: "The Language of Paradox in Advaita - Swami Sarvapriyananda", "Vedanta Retreat at Loon Lake story in by Swami Sarvapriyananda", and "The central point of nondual realization - Swami Sarvapriyananda" - positioned on the right side of the image.
* **Image:** A small image of Swami Sarvapriyananda is present at the bottom-left.
### Detailed Analysis or Content Details
**Question:** "Does Swamiji speak about Wittgenstein's dissertation defense?"
**Answer:**
"Yes, Swami Sarvapriyananda does speak about Wittgenstein's dissertation defense in one of his YouTube talks. He shares an interesting story about Wittgenstein’s PhD examination at Cambridge University. The examiners for Wittgenstein’s thesis were Bertrand Russell and G.E. Moore, both renowned philosophers. However, instead of asking Wittgenstein questions about his thesis, Russell and Moore started chatting with each other. Eventually, Moore suggested that they should ask Wittgenstein some questions. They opened Wittgenstein’s book, *Tractatus*, and asked him to explain a particular statement. Wittgenstein and Russell then started quarreling over it. At this point, Wittgenstein stood up, patted the examiners on the back, and said, “Give it up, boys. You’ll never get it.” He then walked out of the room. Surprisingly, Russell and Moore awarded him the PhD right then and there. Swami Sarvapriyananda uses this story to emphasize the difficulty of understanding certain philosophical concepts. He mentions that even great philosophers like Wittgenstein can have disagreements and struggles in their pursuit of knowledge. Swami Sarvapriyananda also highlights the importance of engaging in dialectics and considering opposing viewpoints to deepen our understanding of our own positions. While Swami Sarvapriyananda does not directly discuss Wittgenstein’s dissertation defense in relation to spiritual or philosophical enlightenment, he does mention the value of engaging in discussions and responding to questions in the context of teaching and learning. He believes that defining angles and responding to queries provide an opportunity to think about teachings from different angles and deepen one’s understanding."
**Passage 1: "The Language of Paradox in Advaita - Swami Sarvapriyananda"**
"thesis and we will pass it. I’ll get it through. So you’ll get a doctorate and you don’t have to bother about it. Then you can teach. We want you to teach in Cambridge University. So the PhD examination of Wittgenstein. The story is very interesting. The examiners for a PhD thesis. Imagine who are the examiners. Bertrand Russell is the examiner. It’s already fixed. So and the other examiner, those who are in philosophy, know this name is G.E. Moore. G.E. Moore. Another one of Bertrand Russell’s senior contemporary. One of the great names of English philosophy. So these two are the PhD examiners. Imagine the examination. And the student is Wittgenstein. They started chatting to each other. After some time, Moore said to Russell, Bertie, don’t you think we should ask him some questions? So all right. They opened that book, *Tractatus*. They opened that book at a page and said, explain this. This statement. Wittgenstein said something. Bertrand Russell said something. They started quarreling over it. And then student Wittgenstein, he stands up. He pats the examiners on their back. The Oxford, the Cambridge dons on their back and says, give it up, boys. You’ll never get it. And he walks out. And they give him the PhD there and then. You’ve got a PhD. Says, give it up, boys. You’ll never get it. But we’ll try to understand. And we’ll bring the light of Shankaracharya’s commentary to bear on this problem. 1,400 years ago in India, Shankaracharya, who was 16 years old, if you go by"
**Passage 2: "Vedanta Retreat at Loon Lake story in by Swami Sarvapriyananda"**
"So, in a talk, at a Vedanta retreat at Loon Lake in New York, I was narrating this story. And after the talk, a gentleman came up to me. He was a professor of philosophy. A very senior professor of philosophy at a well-known university. And he said, Swamiji, you know, I have been thinking about this story for years. And I have come to the conclusion that Wittgenstein was trying to demonstrate something very profound. He was trying to demonstrate the limitations of language. The limitations of language in capturing reality. And he was trying to show that the ultimate reality is beyond language. And that is why he simply walked out. He didn’t bother to argue. He didn’t bother to explain. He simply walked out. Because he knew that language could never capture the ultimate reality."
**Passage 3: "The central point of nondual realization - Swami Sarvapriyananda"**
"The central point of nondual realization – Swamiji – is that you realize that you are not the agent. You are not the doer. You are not the one who is experiencing. You are the experience itself. That is the central point. And that is why it is so difficult to understand. Because we are so used to thinking of ourselves as agents. As doers. As experiencers. We are so used to thinking of ourselves as separate from the world. That it is very difficult to realize that we are not separate. That we are all one. That we are all connected. And that is why it takes so much practice. So much meditation. So much self-inquiry to realize this truth."
**Image:** The image at the bottom-left depicts Swami Sarvapriyananda.
### Key Observations
* The core theme is the intersection of Wittgenstein's philosophy, particularly the limitations of language, and Advaita Vedanta's concept of non-duality.
* The anecdote about Wittgenstein's PhD defense is central to illustrating the difficulty of conveying profound philosophical truths through language.
* The passages provide different perspectives on the story, ranging from a recounting of the event to interpretations of its philosophical significance.
* The "RAG Bot" answer synthesizes information from the passages.
### Interpretation
The data suggests a deliberate attempt to draw parallels between Western philosophical thought (Wittgenstein) and Eastern spiritual traditions (Advaita Vedanta). The story of Wittgenstein's dissertation defense serves as a compelling illustration of the limitations of language in grasping ultimate reality, a concept central to Advaita Vedanta. The passages, and the bot's response, demonstrate how a seemingly isolated historical event can be used to illuminate deeper philosophical and spiritual insights. The inclusion of multiple passages allows for a nuanced exploration of the topic, showcasing different interpretations and perspectives. The anecdote is used to support the idea that direct experience and intuition are necessary to transcend the limitations of conceptual thought. The bot's response acts as a summarization and synthesis of the provided information, demonstrating the potential of AI to facilitate cross-cultural philosophical dialogue.
</details>
Figure 8: RAG vs Non-RAG: Wittgenstein's thesis defense. The retrieved passage is about the monk narrating a story about Wittgenstein's thesis defense, which is highly relevant to the query. The generated RAG bot response makes use of this passage to answer the query. However, the latter half of the response is not relevant to the query and is purely hallucinated. The generic bot response, on the other hand, is completely fabricated and references a non-existent YouTube talk.
<details>
<summary>Image 9 Details</summary>

### Visual Description
## Text Extraction: RAG Bot & Swami Sarvapriyananda Discussion
### Overview
The image presents a screenshot of a RAG (Retrieval-Augmented Generation) bot interface displaying a conversation between a user and the bot, based on passages from a discussion with Swami Sarvapriyananda on May 28th, 2023. The conversation centers around the question of whether AI can ever become conscious. The interface is split into two main columns: "Answer" and "Passages". Each section contains multiple turns of the conversation, with the bot providing answers and referencing supporting passages.
### Components/Axes
The interface has the following components:
* **Title:** "Can AI ever become conscious?"
* **Column 1: "Answer"**: Contains the bot's responses. Each response is preceded by "Answer:" and is formatted as a paragraph of text.
* **Column 2: "Passages"**: Contains excerpts from the Swami Sarvapriyananda discussion. Each passage is preceded by "Passages" and includes the date "May 28th, 2023".
* **Conversation Turns**: The conversation is structured in multiple turns, with alternating "Answer" and "Passages" sections.
* **Scrollbar**: A vertical scrollbar indicates that the content extends beyond the visible area.
### Detailed Analysis or Content Details
**First Conversation Turn:**
* **Answer:** "According to Swami Sarvapriyananda, AI, or artificial intelligence, is not capable of becoming conscious. While AI can replicate certain human capacities such as intelligence, memory, decision-making, and even creativity, it does not possess consciousness. The term "artificial intelligence" itself implies the imitation or replication of human intelligence, not consciousness. The development of AI involves creating machines and programs that can perform tasks and simulate human-like behavior. For example, Swami Sarvapriyananda mentions an experiment where he asked an AI program called chat GPT to write poems on Swami Vivekananda, and it was able to generate multiple poems within seconds. However, this ability to imitate human creativity does not imply consciousness. Consciousness, as Swami Sarvapriyananda explains, refers to the feeling of life itself, the capability of experiencing that life, the awareness and subjective experience of humans. While AI can perform complex tasks and reflect consciousness to some extent, it does not generate consciousness itself. Swami Sarvapriyananda compares this to a mirror reflecting a face but not producing the original face. Furthermore, Swami Sarvapriyananda mentions that even if scientists were able to create an artificial system complex enough to reflect consciousness, it would still not mean that they have produced consciousness itself. Consciousness remains distinct from the artificial system created by AI. In summary, AI is focused on replicating human intelligence and behavior, but it does not possess consciousness itself. While AI can imitate certain aspects of consciousness, it cannot generate consciousness. The question of whether AI will ever be able to become conscious still remains debatable and exploration among experts in the field."
* **Passages:** "Ask Swami Sarvapriyananda [May 28th, 2023] of questions from the internet audience? And then we will wrap it up. Alright, I will do this a little bit rapid fire style. We have gone well over time as usual. The Do the AI question, Atanu Majumdar asks, scientists are busy creating consciousness through AI. If we can create machines which are self-conscious, how can we explain Atman then? You said that in deep sleep, some consciousness is still aware of the state of deep sleep. Scientifically, it has been observed that our brains are most active when we are in deep sleep. So isn't this awareness part of that? Is there any help to have Atman for explaining this? And then what is called materialist reductionism. Scientists are busy creating consciousness through AI. No, you just have to speak to a scientist. Any scientist. Anybody who is working in AI especially. So what is AI doing? Look at the name, artificial intelligence, artificial consciousness. So these new robots, AI, all of these programs, they are all imitating, replicating certain human capacities. Intelligence, memory, decision making, even creativity. The new AI machines are doing that. I was just mentioning this. I asked chat GPT to write me a poem on Swami Vivekananda. It did that. A very nice poem. And then he said, write another poem on Swami Vivekananda. It did that also. It took less than 3 or 4 seconds. And if I did that long, GPT did long, it would keep on writing poetry by the dozens. hundreds of chat to that. All different poems."
**Second Conversation Turn:**
* **Answer:** "Answer: Artificial intelligence, or AI, is not the same as artificial consciousness. AI focuses on replicating human intelligence and behavior, while consciousness refers to the subjective experience of being alive. According to Swami Sarvapriyananda, AI can imitate or simulate certain aspects of consciousness, such as intelligence, memory, and creativity, but it does not generate consciousness itself. He uses the analogy of a mirror reflecting a face, which does not produce the original face. Similarly, AI can reflect consciousness to some extent, but it does not possess consciousness itself. The key distinction lies in the fact that consciousness involves the feeling of life itself, the awareness and subjective experience of humans, which AI lacks. In essence, AI is a tool that can perform complex tasks and simulate human-like behavior, but it is not a conscious entity. The question of whether AI will ever be able to become conscious remains an open debate among experts in the field."
* **Passages:** "Passages: AI is especially. So what is AI doing? Look at the name, artificial intelligence, artificial consciousness. So these new robots, AI, all of these programs, they are all imitating, replicating certain human capacities. Intelligence, memory, decision making, even creativity. The new AI machines are doing that. I was just mentioning this. I asked chat GPT to write me a poem on Swami Vivekananda. It did that. A very nice poem. And then he said, write another poem on Swami Vivekananda. It did that also. It took less than 3 or 4 seconds. And if I did that long, GPT did long, it would keep on writing poetry by the dozens. hundreds of chat to that. All different poems. So what AI is doing is imitating, replicating certain human capacities. It is not creating consciousness. It is not creating intelligence. It is imitating intelligence. It is replicating intelligence. It is imitating creativity. It is replicating creativity. Consciousness is something else. Consciousness is the feeling of life itself. Consciousness is the awareness of being alive. Consciousness is the subjective experience of humans. AI does not have that. AI cannot have that. Because AI is a machine. AI is a program. AI is a set of algorithms."
**Third Conversation Turn:**
* **Answer:** "Answer: Swami Sarvapriyananda emphasizes that even if scientists were able to create an artificial system complex enough to reflect consciousness, it would still not mean that they have produced consciousness itself. Consciousness remains distinct from the artificial system created by AI. He explains that AI is focused on replicating human intelligence and behavior, but it does not possess consciousness itself. The question of whether AI will ever be able to become conscious is still debatable and exploration among experts in the field. In summary, while AI can imitate certain aspects of consciousness, such as intelligence, memory, and creativity, it cannot generate consciousness itself. Consciousness is the feeling of life itself, the awareness and subjective experience of humans, which AI lacks."
* **Passages:** "Passages: Consciousness is something else. Consciousness is the feeling of life itself. Consciousness is the awareness of being alive. Consciousness is the subjective experience of humans. AI does not have that. AI cannot have that. Because AI is a machine. AI is a program. AI is a set of algorithms. Even if you create a machine which is complex enough to reflect consciousness, that does not mean you have created consciousness. That is like a mirror reflecting a face. The mirror does not create the face. The mirror only reflects the face. Similarly, AI can reflect consciousness, but it cannot create consciousness. Consciousness is something else. It is the feeling of life itself. It is the awareness of being alive. It is the subjective experience of humans."
### Key Observations
* The bot consistently references Swami Sarvapriyananda's viewpoint that AI can *imitate* intelligence and creativity but cannot *generate* consciousness.
* The analogy of the mirror is used repeatedly to illustrate the difference between reflection and creation.
* The passages provide direct quotes from the Swami's discussion, supporting the bot's answers.
* The bot's responses are detailed and summarize the key points from the passages.
### Interpretation
The data demonstrates a RAG bot effectively utilizing a knowledge source (Swami Sarvapriyananda's discussion) to answer a complex philosophical question. The bot doesn't offer its own opinion but rather synthesizes and presents the views of the expert. The consistent use of the mirror analogy highlights the core argument: AI can simulate aspects of consciousness without actually possessing it. The bot's ability to extract relevant passages and provide coherent answers showcases the potential of RAG for knowledge retrieval and question answering. The conversation suggests a nuanced understanding of consciousness, distinguishing it from mere intelligence or the ability to perform complex tasks. The repeated emphasis on the subjective experience of being alive as a defining characteristic of consciousness is a key takeaway.
</details>
Figure 9: RAG vs Non-RAG: AI and Consciousness. The retrieved passages are again highly relevant to the query and the generated answer is coherent. However, it does contain a statement about 'a mirror reflecting a face' which is not present in the passages. This was pointed out by the experts as a relevant statement possibly drawn from outside knowledge. The generic bot's response is highly general and does not represent the monk's views on the topic.
<details>
<summary>Image 10 Details</summary>

### Visual Description
\n
## Textual Document: Compilation of Philosophical Passages & Q&A
### Overview
The image presents a compilation of textual content, primarily consisting of philosophical passages and question-answer segments related to the nature of reality, perception, and consciousness. The document appears to be a collection of excerpts from various sources, including writings by Swami Sarvapriyananda and responses from a "RAG Bot". The layout is dense, with multiple columns of text.
### Components/Axes
The document is structured into several sections:
* **Main Title:** "Do our senses report reality to us?"
* **Section 1:** "Answer" - Text attributed to "RAG Bot"
* **Section 2:** "Passages" - Contains three sub-sections:
* "Ask Swami Sarvapriyananda | Nov 19th, 2017"
* "Introduction to Vedanta - Swami Sarvapriyananda | Nov 16th, 2017"
* "Ask Swami Sarvapriyananda | Nov 17th, 2017"
* **Section 3:** "Additional Notes" - Text regarding the source of the content.
There are no axes or legends in the traditional sense, as this is a textual document.
### Detailed Analysis or Content Details
**Section 1: Answer (RAG Bot)**
"According to Swami Sarvapriyananda, our senses do not report reality to us. This is a conclusion supported by modern cognitive science and evolutionary science. Our sensory organs, such as our eyes and ears, provide us with information about the world, but they do not necessarily give us an accurate representation of reality. To illustrate this point, Swami Sarvapriyananda gives the example of a virtual reality movie. In a virtual reality movie, we may become completely immersed in the experience and forget our true identity. Similarly, we may sense immerse us in the experience of the world, but that does not mean that what we perceive is the ultimate reality. Advaita Vedanta, the philosophy that Swami Sarvapriyananda follows, helps us understand this perspective. It teaches us that our true nature is the experiencing consciousness, which is separate from the entities and experiences we perceive through our senses. If we mistakenly take these entities to be separate and real, we become attached to them and experience suffering. Shankaracharya, a prominent philosopher in Advaita Vedanta, explains that our ignorance of our true nature leads us to identify ourselves with our mind and body. This identification causes us to experience the external world and subjective feelings, both positive and negative. However, these problems and experiences are not inherent in the ultimate reality of consciousness. In the ultimate reality, the world is seen as an appearance, not as a separate reality of just as a mirage does not wet a grain of sand or a rope mistaken for a snake does not become poisonous, a subjective experience does not affect the reality of consciousness itself. Therefore, according to Swami Sarvapriyananda, our senses do not report reality to us. They provide us with appearances in consciousness, but it is through the guidance of Vedanta and the realization of the illusory nature of our consciousness that we can gain clarity and understand the ultimate reality of our sensory world."
**Section 2: Passages**
* **Ask Swami Sarvapriyananda | Nov 19th, 2017:** "at that time. So he says you can just walk across. All right. Now it's time for our monthly ask Swami. So here are questions. We have a wide internet audience all over the world and they send in a lot of questions, only a few of which we can answer. They are selected. And there are also questions from the live audience. So we'll start off with some questions from the internet audience. Yes. This first question is from. Are the questions audible? Yes. This first question is from Ajoy. Modern cognitive scientists are veering more and more towards the idea that consciousness is fundamental, and there is a deeper reality in the world than our senses tell us. A honeybee has a strong nasal sensory system that takes it to the source of honey. Similarly, a bat has sound receptacles that also to sense danger and also find its prey. They also do not see the world as we see it. Therefore, it is not a legitimate question to ask, how correct are we to presume that what we see is indeed the reality? How can we say our sensory organs tell us the truth? Can you give us a Vedantic perspective to Ajoy and how Advaita Vedanta sees such developments? You're right. It's very interesting. As it has written, that's true. Our senses, do they report the truth to us? Do they report reality to us? And the conclusion of modern cognitive science and indeed even evolutionary science is that they do not. The senses do not report reality to us. What do you mean by that? So are they misreporting? No."
* **Introduction to Vedanta - Swami Sarvapriyananda | Nov 16th, 2017:** "Introduction to Vedanta – Swami Sarvapriyananda | Nov 16th, 2017. (0:00-1:30) [unintelligible]… the question is, is there a reality beyond what our senses tell us? And the answer of Vedanta is, yes, absolutely. There is a reality beyond what our senses tell us. And the reason is very simple. Our senses, as wonderful as they are, are limited. They can only perceive a certain range of stimuli. For example, we cannot see ultraviolet light, we cannot hear ultrasound, and we cannot smell certain chemicals. So our senses are not giving us a complete picture of reality. They are only giving us a partial picture. And therefore, it is reasonable to assume that there is a reality beyond what our senses tell us. And that reality is what Vedanta calls Brahman. Brahman is the ultimate reality, the ground of all being. It is infinite, eternal, and unchanging. And it is beyond all concepts and descriptions."
* **Ask Swami Sarvapriyananda | Nov 17th, 2017:** "…and the question is, is it possible for us to know Brahman directly? And the answer is, yes, it is possible for us to know Brahman directly. But it is not easy. It requires a great deal of spiritual practice. And the first step in spiritual practice is to realize that our senses are not giving us a complete picture of reality. They are only giving us a partial picture. And therefore, we need to go beyond our senses in order to know Brahman. And the way to go beyond our senses is through meditation. Meditation is a process of quieting the mind and focusing on the present moment. And when we meditate, we begin to experience Brahman directly. We begin to realize that we are not separate from Brahman. We are all part of Brahman. And that is the ultimate goal of spiritual practice."
**Section 3: Additional Notes**
"Additional Notes: The content in this document is sourced from transcripts of talks and Q&A sessions with Swami Sarvapriyananda, as well as responses generated by a RAG (Retrieval-Augmented Generation) Bot. The RAG Bot utilizes these transcripts to answer questions related to the topics discussed."
### Key Observations
* The central theme revolves around the limitations of sensory perception and the existence of a reality beyond what our senses can detect.
* Advaita Vedanta is presented as a philosophical framework for understanding this deeper reality.
* The passages frequently use analogies (virtual reality, mirage, rope and snake) to illustrate the illusory nature of perceived reality.
* The Q&A segments address specific questions about the validity of sensory experience and the possibility of directly knowing ultimate reality (Brahman).
* The inclusion of a RAG Bot response suggests an attempt to synthesize traditional philosophical insights with modern cognitive science.
### Interpretation
The document presents a compelling argument for the idea that our senses do not provide a complete or accurate representation of reality. It draws upon both ancient philosophical traditions (Advaita Vedanta) and contemporary scientific perspectives (cognitive science, evolutionary science) to support this claim. The emphasis on the limitations of sensory perception encourages a critical examination of our assumptions about the nature of reality and the possibility of accessing a deeper, more fundamental truth. The use of analogies and the Q&A format make the complex philosophical concepts more accessible to a wider audience. The inclusion of the RAG Bot response indicates an effort to bridge the gap between traditional wisdom and modern technology, potentially offering new avenues for exploring these profound questions. The document suggests that true understanding requires transcending the limitations of our senses and cultivating a deeper awareness through practices like meditation. The repeated emphasis on the illusory nature of perceived reality points towards a non-dualistic worldview, where the distinction between subject and object, self and other, ultimately dissolves.
</details>
Figure 10: RAG vs Non-RAG: Senses and reality. In this scenario, the retrieved passages provide only partial context needed to answer the query, with additional irrelevant information. For example, one passage mentions cognitive scientist Donald Hoffman, inadvertently influencing the LLM to generate a response linking the Swami's views to Hoffman's concept of the world as a virtual reality. This is an instance where the LLM diverges from the query, using incomplete information from the passage. To mitigate this, implementing retrieval models capable of processing longer contexts and summarizing them effectively before input to the LLM could be beneficial. Despite this, the rest of the response aligns reasonably well with Advaita Vedanta philosophy, similar to the generic bot's response, which is also congruent with the philosophical context.
<details>
<summary>Image 11 Details</summary>

### Visual Description
\n
## Textual Document: Philosophical Discussion & Upanishad Passage
### Overview
The image presents a document containing a question-and-answer format discussion on the nature of reality and dreaming, alongside a passage from the Mandukya Upanishad. The document explores the difficulty in distinguishing between the dream state and waking reality, referencing the insights of Gaudapada and Swami Sarvopriyananda. The Upanishad passage elaborates on the illusory nature of perceived reality.
### Components/Axes
The document is structured into three main sections:
1. **Question:** "Dreams feel absolutely real while dreaming. What is the guarantee that I am not fast asleep in my head and dreaming right now?"
2. **Answer/RAG Bot:** A lengthy response addressing the question, divided into paragraphs.
3. **Passages:** A section containing a quote from the Mandukya Upanishad (Chapter 2 Karika 6) attributed to Swami Sarvopriyananda.
There are no axes or charts in this image. The document is purely textual.
### Detailed Analysis or Content Details
**Question:**
"Dreams feel absolutely real while dreaming. What is the guarantee that I am not fast asleep in my head and dreaming right now?"
**Answer/RAG Bot (Transcription):**
"According to Swami Sarvopriyananda, the question of whether we are dreaming or awake at any given moment is a profound one. In his talks, he refers to the insights of Gaudapada, an ancient Advaita Vedanta philosopher, who explored the nature of reality and the relationship between the dream state and the waking state. Gaudapada’s insight is that there is no definitive way to differentiate between the dream state and the waking state. He argues that our distinction we make between dreaming and waking in our understanding is actually fuzzy. He reduces the status of our waking state experience to that of our dream state experience. One objection that often arises is the feeling of reality in dreams. Dreams can feel absolutely real while we are experiencing them. We may interact with people, objects, and environments that seem just as solid and permanent as in the waking state. This raises the question: What is the guarantee that we are not currently dreaming and that our waking experience is not just another dream? Gaudapada’s response to this objection is that the feeling of reality in dreams does not necessarily mean the truth or permanence of the objects and experiences within the dream. In a dream, we may perceive objects and people as real, but upon waking up, we realize that they were mere appearances, not possessing any substantial reality. Similarly, Gaudapada argues that the objects and experiences we perceive in the waking state are also appearances, just like in a dream. Our beloved friends, family, and surroundings may seem real, and permanent, but they are as delusory as the “T” self that perceives them. In other words, the reality of the waking state is dependent on the perceiving consciousness, just as the reality of the dream state is dependent on the dreaming consciousness. This perspective is balanced by a crucial acknowledgement: although reality in experience ( *anubhava* ) is unreal, the reality of Brahman is not. He doesn’t deny that we experience things as real, but he denies their independent existence. He says that the waking state is not different from the dream state. The only difference is that in the waking state, we have a consistency of experience. The dream state is characterized by a lack of consistency. But this consistency, Gaudapada argues, does not prove the reality of the waking state. It simply means that we are having a long dream. The waking state is a more stable dream, but it is still a dream. Therefore, the guarantee that we are not dreaming is not to be found in the feeling of reality or the consistency of experience. Instead, Gaudapada points to the realization of Brahman, the ultimate reality, as the only way to transcend the illusion of both the dream state and the waking state. By realizing our true nature as Brahman, we can see through the illusion of duality and experience the non-dual reality that lies beyond both dreaming and waking. In essence, the question of whether we are dreaming or awake is ultimately irrelevant from the perspective of Advaita Vedanta. Both states are considered to be manifestations of Maya, the cosmic illusion, and the only true reality is Brahman. “
**Passages (Transcription):**
“20. Mandukya Upanishad | Chapter 2 Karika 6 | Swami Sarvopriyananda
main… Well, I mean, how would that… Yes. But let’s take it in a very simple way. The person in the dream and every other thing that you see in the dream are actually not there. You are with me? Are actually not there. They are all your own mind. You are actually sleeping on the bed. Just follow the facts as you know them to be. You are actually on your bed and sleeping and your mind generates all of this. Now compare it with what you actually experience in a dream, that you are in some place and you have a body and you are meeting with other people. But when you wake up, all of that is falsified. Why is it falsified? Because it has no existence apart from the mind which was dreaming it. Right? So that’s the example. Those things are false because they were dreamt up by a mind. They did not exist apart from which you… felt like it. felt like they were things apart from me. Similarly, now we want to prove to you in waking world, it seems to be a real world apart from me. It seems to be real. But just like those things those do not… do not exist apart from me. That is the dreamer’s mind, those waking entities also do not exist apart from consciousness. Proof, they cannot be explained without consciousness. Yes. They are clearly consciousness. Yes, I’ll come to your point. I’ll come to you, okay. I don’t know consciousness. Don’t go there. Just look at it from the point of view of the witness consciousness, from the point of view of witness consciousness. This is waking consciousness.”
### Key Observations
* The document centers around a core philosophical question about the nature of reality.
* The answer draws heavily on Advaita Vedanta philosophy and the teachings of Gaudapada and Swami Sarvopriyananda.
* The Upanishad passage provides a supporting analogy, emphasizing the mind-dependent nature of perceived reality.
* The text is dense and requires careful reading to grasp the nuanced arguments.
### Interpretation
The document presents a sophisticated exploration of the illusory nature of reality, challenging the common-sense assumption that our waking experience is fundamentally different from a dream. The argument hinges on the idea that both states are dependent on consciousness and lack inherent, independent existence. The reference to Brahman suggests that true reality lies beyond the realm of both dreaming and waking, accessible only through spiritual realization.
The Upanishad passage serves as a powerful illustration of this point, drawing a parallel between the objects and people in a dream and those in the waking world. Both are ultimately creations of the mind and lack independent reality. The emphasis on the "witness consciousness" suggests a detached perspective from which to observe the illusory nature of experience.
The document's overall message is that our perception of reality is fundamentally subjective and that the search for truth requires transcending the limitations of our senses and intellect. The question posed at the beginning is not meant to be answered with a definitive "yes" or "no," but rather to provoke deeper inquiry into the nature of existence.
</details>
Figure 11: RAG vs Non-RAG: Waking and dreaming. The passages retrieved for the query are relevant but only offer a segment from a broader context. As stated earlier, this can be mitigated by using more advanced retrieval models that summarize longer contexts before input to the LLM. The RAG bot's response, while relevant, is lengthy and complex, making it challenging to comprehend. The generic bot's response although more concise and coherent, is completely general and has no relation to the Advaita Vedanta philosophy.
Figure 12: Keyword-RAG vs Standard-RAG: Anecdotal. The keyword-based retriever accurately retrieves the correct snippet, while the dense retriever confuses 'Shivaratri' with 'Shivamayananda', possibly owing to similar embeddings and completely misses the context. As a result, the generated standard RAG answer is meaningless and lacks substance. The keyword-RAG model identifies the two key incidents relating to Shivaratri correctly and effectively summarizes them.
<details>
<summary>Image 12 Details</summary>

### Visual Description
\n
## Document: Textual Analysis of a Keyword-Based RAG System Response
### Overview
The image presents a comparison between a "Keyword-based RAG" (Retrieval-Augmented Generation) system response and a "Standard RAG" system response to a specific query. The document categorizes the query as "Anecdotal" and highlights keywords related to "Shivaratri". It includes the original query, the source text used by the Keyword-based RAG, and the responses from both systems. The document also includes human and model annotations.
### Components/Axes
The document is structured into distinct sections:
* **Category:** Anecdotal
* **Keywords:** Shivaratri
* **Query:** "Does Swami narrate an incident surrounding Shivaratri?"
* **Keyword-based RAG:** Contains the response generated by the keyword-based RAG system, along with the source text used.
* **Standard RAG:** Contains the response generated by the standard RAG system.
* **Source:** "Raja Yoga: The Path of Meditation (Part 1) | Swami Sarvapriyananda"
* **Human-annotated:** Shivaratri
* **Model ensemble:** Shivaratri
* **Section Title:** "Raja Yoga: The Path of Meditation (Part 1) | Swami Sarvapriyananda" - This section contains the source text.
### Detailed Analysis or Content Details
**1. Query:** "Does Swami narrate an incident surrounding Shivaratri?"
**2. Source Text (Raja Yoga: The Path of Meditation (Part 1) | Swami Sarvapriyananda):**
"back and meditate. Then think about that. So for ascecticism also that Remember Common sense. So Shivaratri I remember fasting I never felt earlier. The first time I fasted was when I became a monk. On Shivaratri night puja. Next day in the morning you get food. But in the afternoon you get some fruits. So I never ate so little in my life. I was wondering if I am becoming a sanyasi. Because I could not eat. I had to be happy. And I was happy. Because I was doing it for God. I was offered some delicious things, like a sweet, a nice mango. But I could not take it. Because it was Shivaratri. So I was happy. Because I was doing it for God. I was offered some delicious things, like a sweet, a nice mango. But I could not take it. Because it was Shivaratri. I was happy. Because I was doing it for God. I was offered some delicious things, like a sweet, a nice mango. But I could not take it. Because it was Shivaratri. I was happy. Because I was doing it for God. I was offered some delicious things, like a sweet, a nice mango. But I could not take it. Because it was Shivaratri. I was happy. Because I was doing it for God. I was offered some delicious things, like a sweet, a nice mango. But I could not take it. Because it was Shivaratri. I was happy. Because I was doing it for God. I was offered some delicious things, like a sweet, a nice mango. But I could not take it. Because it was Shivaratri. I was happy. Because I was doing it for God. I was offered some delicious things, like a sweet, a nice mango. But I could not take it. Because it was Shivaratri. I was happy. Because I was doing it for God."
**3. Keyword-based RAG Response:**
"Yes, in the lecture "Raja Yoga: The Path of Meditation (Part 1)," Swami Sarvapriyananda does share an incident related to the Hindu festival of Shivaratri. He talks about his own experience of fasting on Shivaratri for the first time when he became a monk. He explains that on this day, devotees of Lord Shiva observe a strict fast, which means no food or water from sunrise to sunset the next day."
"Swami Sarvapriyananda also narrates an incident where a mathematics teacher approached a Swami and asked if he could fast on Shivaratri. The Swami responded sarcastically, pointing out that even young children are able to fast on this day, implying that the teacher should not find it difficult. This story highlights the significance of Shivaratri and the spiritual strength and dedication it is associated with."
**4. Standard RAG Response:**
"In the provided passage, Swami Sarvapriyananda shares an incident involving Swami Shivamayanji, but it does not explicitly mention Shivaratri. However, the narrative does offer insights into the experiences and teachings of the swamis."
"According to the passage, Swami Shivamayanji once told a group of young monks, including Swami Sarvapriyananda, about an experience he had with his guru, Swami Gambhirananda. Swami Gambhirananda, at the time a vice president and later president of the Ramakrishna Math and Ramakrishna Mission, was describing the nature of God realization. He explained that the living form of the chosen deity appears in one’s heart as a blazing or lambent form [Bengali: ‘Ami jekhani chok bhoji, lata muti, jaljoki kare bhaweshi’ while Swami Gambhirananda further emphasized the importance of realizing Brahman in all beings, indicating a deep understanding and connection with the divine in everything and everyone."
"Swami Shivamayanji used this story to teach the young monks about the expectations of people who show respect and reverence to them. He wanted to remind them of the importance of their conduct and the impact they have on others."
"This narrative does not directly mention Shivaratri, it does provide a glimpse into the teaching and values of the swamis, which likely influenced their celebrations and observations of various Hindu festivals, including Shivaratri."
**5. Bengali Phrase:** "Ami jekhani chok bhoji, lata muti, jaljoki kare bhaweshi"
### Key Observations
* The Keyword-based RAG system directly addresses the query and provides relevant information about Shivaratri based on the source text.
* The Standard RAG system, while acknowledging the context of the passage, states that it does not explicitly mention Shivaratri. It focuses on a different anecdote within the same source.
* The Keyword-based RAG appears to be more effective in retrieving information specifically related to the query keywords.
* The source text contains repetitive phrasing ("Because it was Shivaratri. I was happy. Because I was doing it for God.") which may influence the RAG systems' responses.
### Interpretation
This document demonstrates a comparison of two RAG systems in answering a specific query. The Keyword-based RAG system appears to be more attuned to the explicit keywords in the query ("Shivaratri") and successfully retrieves relevant information from the source text. The Standard RAG system, however, prioritizes a different narrative within the same source, indicating a potentially broader or more contextual understanding of the text.
The difference in responses highlights the trade-offs between keyword-based retrieval and more semantic or contextual understanding in RAG systems. The Keyword-based RAG is effective when the query is specific and the desired information is explicitly present in the source text. The Standard RAG may be more suitable for complex queries that require a deeper understanding of the context and relationships within the text.
The inclusion of human and model annotations suggests an evaluation process to assess the quality and relevance of the RAG systems' responses. The Bengali phrase included in the Standard RAG response indicates the system's ability to handle and potentially translate or interpret text in other languages. The repetitive nature of the source text could be a factor influencing the performance of both systems, potentially leading to overemphasis on certain phrases or concepts.
</details>
Figure 13: Keyword-RAG vs Standard-RAG: Terminology. The keyword-RAG model retrieves a comprehensive exposition on the concept of Shraddha, loosely translated as conviction, in the context of qualifications for the study of Advaita Vedanta. The standard RAG although retrieves a passage containing the word, it is however not directly related to what the questioner intends. This seems to be an unfortunate case of false positive for standard RAG due to inadequate or implied meaning in the query.
<details>
<summary>Image 13 Details</summary>

### Visual Description
\n
## Document Comparison: 'Shraddha' - Keyword-based RAG vs. Standard RAG
### Overview
The image presents a side-by-side comparison of two Retrieval-Augmented Generation (RAG) approaches – Keyword-based and Standard – in answering the query: "What is the significance of the word 'Shraddha'?" The document compares the responses generated by each approach, along with their source citations. The document is split into two columns, with the left side representing the Keyword-based RAG and the right side representing the Standard RAG. Both sides include a response to the query, followed by the source material used to generate the response. A color-coded highlighting scheme is used to indicate the relevance of the source text to the generated response.
### Components/Axes
The document is structured into the following components:
* **Category:** Terminology (top-left)
* **Keywords:** Model ensemble: Shraddha, Human-annotated: Shraddha, significance (top-right)
* **Query:** What is the significance of the word 'Shraddha'? (center-top)
* **Left Column:** Keyword-based RAG
* Response text
* Source: 123. Bhagavad Gita (Chapter 17 Verse 3) - Swami Sarvopariyananda
* Source Text (highlighted)
* **Right Column:** Standard RAG
* Response text
* Source: 15. Mandukya Upanishad (Commentary in the Karika 2-3) - Swami Sarvopariyananda
* Source Text (highlighted)
* **Highlighting Scheme:**
* Green: Matched/Relevant
* Yellow: Partially Matched/Relevant
* Red: Not Matched/Not Relevant
### Detailed Analysis or Content Details
**Keyword-based RAG (Left Column):**
* **Response:** "The word 'Shraddha' in Swami Sarvopariyananda's teachings, particularly in the context of learning and understanding spiritual knowledge, holds a profound significance. It is often translated as 'faith' in English, but it goes beyond a simple belief or trust. 'Shraddha' implies a deep sense of reverence, respect, and acceptance for the teacher, the teachings, and the texts. It is a feeling that there is truth in the teachings, even if one does not fully comprehend it yet."
"Swami Sarvopariyananda emphasizes that Shraddha is essential for any meaningful investigation or learning to take place. It is the foundation that allows one to approach the subject with an open mind and heart, ready to understand and learn, rather than to judge, criticize, or dismiss."
"In the absence of Shraddha, one might quickly jump to conclusions, judge the teacher or the text, and create barriers to understanding. With Shraddha, one approaches the teacher and the teachings with humility, acknowledging that they have something valuable to learn, and seeking to bridge the gap between their current understanding and the teachings."
"In summary, Shraddha is a deep feeling of reverence, respect, and acceptance for the teacher, the teachings, and the texts, which forms the foundation for meaningful learning and understanding in spiritual philosophies."
* **Source:** 123. Bhagavad Gita (Chapter 17 Verse 3) - Swami Sarvopariyananda
* **Source Text:** The majority of the source text is highlighted in **green**, indicating high relevance. The text discusses the three-fold faith (Shraddha) – Sattvic, Rajasic, and Tamasic – and their respective objects of faith. The text details how each type of faith manifests in actions, desires, and knowledge.
**Standard RAG (Right Column):**
* **Response:** "The word 'Shraddha' has a significant meaning in the context of the Mandukya Upanishad and Hindu traditions. It is a Sanskrit word that primarily refers to the funeral rites performed for departed ancestors. This term carries both psychological and spiritual connotations."
"Psychologically, Shraddha signifies the process of letting go for people who are left behind. It helps the living come to terms with the loss and separation from their loved ones."
"On the spiritual side, Shraddha is performed to ensure that the deceased embarks on a higher path (gati) in their journey after leaving the body. The belief is that the prayers, worship, and rituals conducted on behalf of the departed can add to their store of good karma. This extra positive karma acts as a booster rocket, helping the deceased reach a better world or higher level in their spiritual evolution."
"In summary, 'Shraddha' is a crucial concept in Hindu traditions, symbolizing both a psychological release and a spiritual boost for the departed ancestors, enhancing their journey and ensuring a higher path for them in the afterlife."
* **Source:** 15. Mandukya Upanishad (Commentary in the Karika 2-3) - Swami Sarvopariyananda
* **Source Text:** The majority of the source text is highlighted in **green**, indicating high relevance. The text discusses the concept of Shraddha as a ritual offering to ancestors, its purpose in aiding their journey, and the psychological benefits for the living.
### Key Observations
* The Keyword-based RAG focuses on the philosophical meaning of 'Shraddha' as faith and reverence in learning, drawing from the Bhagavad Gita.
* The Standard RAG focuses on the ritualistic meaning of 'Shraddha' as funeral rites for ancestors, drawing from the Mandukya Upanishad.
* Both RAG approaches successfully identify relevant sections within their respective source texts, as indicated by the extensive green highlighting.
* The responses are distinct and reflect the different contexts of their source materials.
### Interpretation
This document demonstrates the impact of the retrieval strategy on the generated response in a RAG system. The Keyword-based RAG, likely utilizing keywords related to "faith" and "learning," retrieved a source discussing Shraddha in a philosophical context. Conversely, the Standard RAG, potentially relying on semantic similarity, retrieved a source discussing Shraddha in a ritualistic context. This highlights the importance of carefully designing the retrieval component of a RAG system to ensure it retrieves the most relevant and appropriate information for a given query. The color-coded highlighting provides a clear visual representation of the relevance of the source text to the generated response, allowing for a quick assessment of the RAG system's performance. The differing responses demonstrate that the same query can elicit different answers depending on the knowledge base and retrieval method employed. This is a valuable illustration of the nuances involved in building effective RAG systems.
</details>
<details>
<summary>Image 14 Details</summary>

### Visual Description
## Comparative Text Analysis: Buddhist Shunyata vs. Advaita Vedanta
### Overview
The image presents a comparative analysis of Buddhist Shunyata and Advaita Vedanta philosophies, specifically addressing the question: "What is the main difference between Buddhist Shunyata and Advaita Vedanta?". It displays two columns of text: "Keyword-based RAG" on the left and "Standard RAG" on the right, both containing detailed explanations. Above the columns are category labels ("Comparative") and keywords. Below the text are sections for "Model ensemble" and "Human-annotated" keywords.
### Components/Axes
* **Category:** Comparative (top-left)
* **Keywords (Human-annotated):** Buddhist Shunyata, Advaita (top-right)
* **Keywords (Model ensemble):** difference, buddhist shunyata, advaita vedanta (bottom-right)
* **Query:** What is the main difference between Buddhist Shunyata and Advaita Vedanta? (top-center)
* **Left Column Title:** Keyword-based RAG
* **Right Column Title:** Standard RAG
* **Sections within each column:** Introduction, Differences, Overlaps, and Conclusion.
* **Footer:** Contains numerical evaluation metrics (R@10, Precision@10, Recall@10, F1@10) for both RAG approaches.
### Detailed Analysis or Content Details
**Keyword-based RAG (Left Column):**
* **Introduction:** The core difference lies in the understanding of the self. Advaita Vedanta posits a permanent self (Atman) while Buddhist Shunyata denies an unchanging self, viewing it as a constantly changing process.
* **Differences:** Advaita acknowledges the existence of a permanent self underlying the changing body-mind complex. Shunyata asserts that everything, including the body-mind complex, is impermanent and momentary. It emphasizes emptiness (Shunyata) as the lack of inherent existence.
* **Overlaps:** Both philosophies share the idea that our ordinary perception of reality is illusory. Both aim to liberate individuals from suffering through realizing the true nature of reality. Both emphasize the importance of direct experience (vipashyana) in realizing this truth.
* **Conclusion:** The difference is nuanced. Shunyata doesn't deny conventional reality but asserts its lack of inherent existence. Advaita posits an underlying reality (Brahman) that is the true self. The key distinction is whether there is an ultimate, unchanging reality (Advaita) or whether all phenomena are empty of inherent existence (Shunyata).
* **Evaluation Metrics:** R@10: 0.60, Precision@10: 0.40, Recall@10: 0.30, F1@10: 0.35
**Standard RAG (Right Column):**
* **Introduction:** The main difference revolves around interpretations of Nagarjuna, and Advaita Vedanta's views on these interpretations.
* **Differences:** Traditional Advaita has often dismissed Shunyata as nihilism, suggesting it posits nothingness. However, this is a mischaracterization. Advaita asserts the existence of Brahman, while Shunyata emphasizes the lack of inherent existence.
* **Overlaps:** Modern Tibetan Buddhist scholars sometimes misinterpret Advaita Vedanta by assuming it posits a substantial self (Atman). However, Advaita Vedanta, particularly the teachings of Gaudapada, a significant Advaita Vedanta figure, indeed asserts that the world is a mere appearance (vivarta) and not a real, separate entity (paramartha).
* **Conclusion:** The difference between Buddhist Shunyata and Advaita Vedanta can be attributed to misunderstandings and oversimplifications by traditional followers on both sides. Their ultimate concerns—self-realization—are similar. Tibetan Buddhists misunderstand Advaita as asserting a substantial self, while Advaita often dismisses Shunyata as nihilistic.
* **Evaluation Metrics:** R@10: 0.80, Precision@10: 0.60, Recall@10: 0.50, F1@10: 0.55
### Key Observations
* The "Standard RAG" approach consistently outperforms the "Keyword-based RAG" approach across all evaluation metrics (R@10, Precision@10, Recall@10, F1@10).
* Both RAG approaches identify a core difference relating to the concept of "self" – whether it is permanent (Advaita) or impermanent/non-existent (Shunyata).
* Both approaches highlight the potential for misinterpretation and oversimplification of each philosophy by adherents of the other.
* The numerical metrics suggest the "Standard RAG" is more effective at retrieving relevant information (higher Recall) and providing precise answers (higher Precision).
### Interpretation
The data suggests that a "Standard RAG" approach, likely involving more sophisticated semantic understanding and contextualization, is superior to a "Keyword-based RAG" approach for answering complex philosophical questions. The higher scores in R@10, Precision, Recall, and F1 indicate that the Standard RAG is better at identifying relevant passages, providing accurate information, and comprehensively addressing the query.
The content itself reveals a nuanced comparison between Buddhist Shunyata and Advaita Vedanta. While seemingly divergent, both philosophies share a common goal of liberation from suffering through the realization of the true nature of reality. The core difference lies in their metaphysical frameworks: Advaita posits an underlying, unchanging reality (Brahman/Atman), while Shunyata emphasizes the emptiness of inherent existence in all phenomena. The observed misunderstandings between the two traditions highlight the challenges of cross-cultural philosophical dialogue and the importance of careful interpretation. The data demonstrates that the "Standard RAG" is better equipped to navigate these complexities and provide a more accurate and comprehensive response.
</details>
Figure 14: Keyword-RAG vs Standard-RAG: Comparative. This is a case where both models retrieve meaningful passages, with the keyword-based model providing greater context due to retrieval expansion. The standard model also seems to be using outside knowledge not mentioned in the passage.
Figure 15: Keyword-RAG vs Standard-RAG: Scriptural. Both models retrieve from the correct scripture. However, while the retrieved passage in keyword-RAG is very comprehensive and accurately captures the context of the question, the standard RAG misses the point of the question and gives a generic answer.
<details>
<summary>Image 15 Details</summary>

### Visual Description
\n
## Document: Scriptural Text Analysis - "Om" Significance in Mandukya Upanishad
### Overview
The image presents a comparative analysis of responses to the query "In Mandukya Upanishad, what is the significance of the word 'Om'?" It contrasts a "Keyword-based RAG" (Retrieval-Augmented Generation) approach with a "Standard RAG" approach, alongside human-annotated responses and keywords. The document also includes source citations from the Mandukya Upanishad. The layout is a two-column structure with the left column dedicated to Keyword-based RAG and the right to Standard RAG, both underpinned by a header section.
### Components/Axes
* **Category:** Scriptural (Top-left)
* **Query:** In Mandukya Upanishad, what is the significance of the word 'Om'? (Top-center)
* **Keywords:** Om significance, Mandukya Upanishad, Model ensemble (Top-right)
* **Column 1 Title:** Keyword-based RAG (Left)
* **Column 2 Title:** Standard RAG (Right)
* **Source:** 13. Mandukya Upanishad | Chapter I Mantra 9.11 | Swami Sarvprayananda (Bottom)
* **Source:** 48. Mandukya Upanishad | Chapter 3 | Karika 37-39 | Swami Sarvprayananda (Bottom-left)
### Detailed Analysis or Content Details
**Keyword-based RAG (Left Column):**
The text in the left column is a transcription of a response generated by a Keyword-based RAG system. It is a continuous block of text, approximately 300 words long. Key points include:
* The word "Om" holds great significance, representing the entirety of experience and encompassing the gross, subtle, and causal levels of existence.
* The three letters of "Om" – A, U, and Ma – correspond to the waking state, the dream state, and the deep sleep state (causal level) respectively.
* "Om" symbolizes the causal level where everything withdraws, a state of undifferentiated mind, intellect, and memory.
* The Upanishad teaches that one should understand oneself as pure consciousness, which appears as the waker, dreamer, and deep sleeper, as well as their respective worlds.
* The true essence of "Om" lies in the silence that follows its pronunciation, representing the transcendental reality of Turya.
* The text references Mandukya Upanishad Chapter 3, Karika 37-39, by Swami Sarvprayananda.
**Standard RAG (Right Column):**
The text in the right column is a transcription of a response generated by a Standard RAG system. It is also a continuous block of text, approximately 250 words long. Key points include:
* The word "Om" holds immense significance and is considered to be a Mahavakya, a profound statement that encapsulates the ultimate truth.
* "Om" is considered to convey the same truth more concisely than other phrases.
* "Om" is a symbol that represents the pure consciousness, the ultimate reality that is beyond the physical universe.
* Meditating on "Om" allows one to connect with the pure consciousness and superimpose the idea of the universe and the divine onto the reality.
* The text references Mandukya Upanishad Chapter 1, Mantra 9.11, by Swami Sarvprayananda.
**Human-annotated (Top-right):**
* Om significance, Mandukya Upanishad, Model ensemble.
### Key Observations
* Both RAG systems identify "Om" as profoundly significant in the Mandukya Upanishad.
* The Keyword-based RAG provides a more detailed explanation, referencing the three states of consciousness (waking, dreaming, deep sleep) and the concept of Turya.
* The Standard RAG focuses more on "Om" as a symbol of pure consciousness and its connection to the ultimate reality.
* The Standard RAG response is more concise and emphasizes the power of "Om" as a meditative tool.
* Both responses cite Swami Sarvprayananda's interpretations of the Mandukya Upanishad, but from different chapters/mantras.
* The human-annotated keywords are broad and reflect the core themes of the query.
### Interpretation
This document demonstrates a comparison of two different RAG approaches in answering a specific scriptural question. The Keyword-based RAG appears to leverage a broader range of keywords and contextual information, resulting in a more comprehensive, albeit verbose, response. The Standard RAG, on the other hand, prioritizes conciseness and focuses on the core philosophical implications of "Om."
The differing source citations (Chapter 3 vs. Chapter 1) suggest that the two RAG systems are drawing upon different sections of the Upanishad and potentially different interpretations by Swami Sarvprayananda. This highlights the importance of source selection and the potential for varying perspectives within scriptural texts.
The inclusion of human-annotated keywords provides a baseline for evaluating the relevance and accuracy of the RAG-generated responses. The fact that both systems align with the human-annotated keywords suggests that they are both capturing the essential meaning of the query.
The document serves as a valuable case study for evaluating the strengths and weaknesses of different RAG architectures in the context of complex, nuanced textual data. It also underscores the importance of considering multiple sources and interpretations when engaging with scriptural texts. The document is not presenting data in a numerical or graphical format, but rather a comparative textual analysis.
</details>
<details>
<summary>Image 16 Details</summary>

### Visual Description
## Textual Document: Comparison of Waking State and Dream State
### Overview
The image presents a comparative analysis of the waking state and the dream state, framed around the philosophical question of whether the waking state is similar to a dream or absolutely indistinguishable from it. The document is structured into two columns: "Keyword-based RAG" on the left and "Standard RAG" on the right. Both columns contain textual responses to the query, along with associated keywords and a similarity score. A bar chart at the bottom visually represents the similarity scores for different models.
### Components/Axes
The document consists of the following components:
* **Header:** "Category: Reasoning" and the query "Is the waking state similar to a dream or absolutely indistinguishable from the dream state?"
* **Left Column (Keyword-based RAG):** Contains a textual response, keywords, and a similarity score.
* **Right Column (Standard RAG):** Contains a textual response, keywords, and a similarity score.
* **Footer (Bar Chart):** A bar chart displaying similarity scores for various models.
* **X-axis:** Model names (Human-annotated, Model ensemble, Keyword-based RAG, Standard RAG).
* **Y-axis:** Similarity Score (ranging from 0 to 1).
### Detailed Analysis or Content Details
**Left Column (Keyword-based RAG):**
* **Text:** A detailed philosophical discussion referencing Swami Sarvupananda's lecture on Vedanta Retreat at Loon Lake. It discusses the illusory nature of both waking and dream states, referencing concepts like *maya*, *jiva*, and *Brahman*. It highlights the difference between the two states as being one of permanence and consistency.
* **Keywords:** waking, dream, indistinguishable
* **Similarity Score:** 0.863
**Right Column (Standard RAG):**
* **Text:** A response also discussing the indistinguishability of waking and dream states, referencing Swami Sarvupananda. It emphasizes the subjective experience and the consistency of objects and experiences within each state. It also touches upon the perception of reality and the role of the individual's mind.
* **Keywords:** waking, dream, state similar
* **Similarity Score:** 0.854
**Bar Chart:**
The bar chart displays the following approximate similarity scores:
* **Human-annotated:** 0.94 (highest score, light blue bar)
* **Model ensemble:** 0.88 (dark blue bar)
* **Keyword-based RAG:** 0.86 (green bar)
* **Standard RAG:** 0.85 (orange bar, lowest score)
The bars are arranged horizontally, with "Human-annotated" on the left and "Standard RAG" on the right. The Y-axis is labeled "Similarity Score" and ranges from 0 to 1.
### Key Observations
* The "Human-annotated" response has the highest similarity score, indicating it is the closest to a ground truth or ideal response.
* Both RAG models (Keyword-based and Standard) provide similar responses, with the Keyword-based RAG slightly outperforming the Standard RAG.
* The similarity scores are all relatively high (above 0.85), suggesting both models are capable of generating relevant and coherent responses.
* The textual content in both columns is largely overlapping, focusing on the philosophical concepts of illusion and reality in the context of waking and dream states.
### Interpretation
The document demonstrates a comparison of two different Retrieval-Augmented Generation (RAG) approaches in answering a complex philosophical question. The results suggest that both methods are effective in retrieving and synthesizing information to provide relevant responses. The higher score of the "Human-annotated" response serves as a benchmark for evaluating the performance of the models. The slight advantage of the "Keyword-based RAG" model could be attributed to its ability to focus on specific keywords related to the query, potentially leading to a more targeted and accurate response.
The consistent themes across both RAG responses – the illusory nature of reality, the subjective experience of consciousness, and the role of perception – highlight the core philosophical concepts at play. The bar chart provides a quantitative measure of the similarity between the model responses and the human-annotated response, offering insights into the effectiveness of each approach. The data suggests that while RAG models can generate insightful responses, they still fall short of the nuanced understanding and coherence of a human expert. The document is a clear example of how AI can be used to explore and analyze complex philosophical questions, but also underscores the importance of human oversight and evaluation.
</details>
Figure 16: Keyword-RAG vs Standard-RAG: Reasoning. The retrieved passage in keyword-RAG is technical and comprehensive and the generated answer effectively summarizes the main points. The standard model is also good, although the explanation is not as effective owing to the quality of retrieval.