This repository includes 100+ RAG interview questions with answers.

- 📗 LLM Interview Questions and Answers Hub - 100+ LLM interview questions and answers.
- 🚀Prompt Engineering Techniques Hub - 25+ prompt engineering techniques with LangChain implementations.
- 👨🏻💻 LLM Engineer Toolkit - Categories wise collection of 120+ LLM, RAG and Agent related libraries.
- 🩸LLM, RAG and Agents Survey Papers Collection - Category wise collection of 200+ survey papers.
| # | Question | Answer |
|---|---|---|
| 1 | Explain the requirement of RAG when LLMs are already powerful. | link |
| 2 | Is RAG still relevant in the era of long context LLMs? | link |
| 3 | What are the fundamental challenges of RAG systems? | link |
| 4 | What are effective strategies to reduce latency in RAG systems? | link |
| 5 | Explain R, A, and G in RAG. | link |
| 6 | How does RAG help reduce hallucinations in LLM generated responses? | link |
| 7 | Why is re-ranking important in the RAG pipeline after initial document retrieval? | link |
| 8 | What is the purpose of character overlap during chunking in a RAG pipeline? | link |
| 9 | What role does cosine similarity play in relevant chunk retrieval within a RAG pipeline? | link |
| 10 | Can you give examples of real-world applications where RAG systems have demonstrated value? | link |
| 11 | Explain the steps in the indexing process in a RAG pipeline. | link |
| 12 | Explain the importance of chunking in RAG. | link |
| 13 | How do you choose the chunk size for a RAG system? | link |
| 14 | What are the potential consequences of having chunks that are too large versus chunks that are too small? | link |
| 15 | Explain the retrieval process step-by-step in a RAG pipeline. | link |
| 16 | What are the key considerations when choosing an LLM for a RAG system? | link |
| 17 | How is the prompt provided to the LLM in a RAG system different from a standard, non-RAG prompt? | link |
| 18 | What are the key hyperparameters in a RAG pipeline? | link |
| 19 | What are the popular frameworks to implement a RAG system? Justify your choice of framework. | link |
| 20 | Explain the influence of LLM context window size on RAG hyperparameters. | link |
| 21 | How do you choose values for various LLM inference hyperparameters in a RAG system? | link |
| 22 | Compare reasoning vs. non-reasoning LLMs for RAG systems. | link |
| 23 | What happens with a weak generator LLM in a RAG system? | link |
| 24 | How do you handle ambiguous or vague user queries in RAG systems? | link |
| 25 | What are the different query transformation techniques that enhance user queries in RAG? | link |
| 26 | What are the pros and cons of query transformation techniques? | link |
| 27 | Explain how the HyDE query transformation technique works. | link |
| 28 | Explain how the HyPE technique works in RAG. | link |
| 29 | Compare HyPE and HyDE techniques in RAG. | link |
| 30 | To minimize RAG system latency, which pre-retrieval enhancement technique will you choose? | link |
| 31 | What are the different chunk enhancement techniques in RAG? | link |
| 32 | What are the pros and cons of chunk enhancement techniques in RAG? | link |
| 33 | Explain how the contextual chunk header technique enhances RAG retrieval. | link |
| 34 | What are some common chunking methods used in RAG? | link |
| 35 | What are the criteria to choose a specific chunking method in RAG? | link |
| 36 | Explain the pros and cons of semantic chunking. | link |
| 37 | How does the chunking strategy differ when dealing with structured documents (like PDFs with tables and figures) versus plain text documents? | link |
| 38 | What are the possible reasons for the poor performance of a RAG retriever? | link |
| 39 | What happens with a weak retriever in Retrieval-Augmented Generation (RAG) systems? | link |
| 40 | What are the common retrieval approaches used in RAG systems? | link |
| 41 | What are some common challenges in RAG retrieval? | link |
| 42 | What are the key metrics for evaluating retrieval quality in RAG? | link |
| 43 | What are embeddings, and how are they utilized in RAG retrieval? | link |
| 44 | What are the key considerations when choosing an embedding model for a RAG system? | link |
| 45 | What is a VectorDB, and how is it utilized in RAG retrieval? | link |
| 46 | Explain the role of ANN (Approximate Nearest Neighbor) search algorithms in RAG retrieval. | link |
| 47 | Explain the step-by-step working of ANN algorithms for fast search in RAG retrieval. | link |
| 48 | What are the typical distance metrics used for similarity search in vector databases, and why are they chosen? | link |
| 49 | Explain why cosine similarity is preferred over other distance metrics in RAG retrieval. | link |
| 50 | Compare keyword-based retrieval and semantic retrieval in RAG systems. | link |
| 51 | How does hybrid search work in the context of RAG retrieval? | link |
| 52 | When do you opt for hybrid search instead of semantic search? | link |
| 53 | How do you balance relevance and diversity when retrieving document chunks for RAG? | link |
| 54 | How do sparse embeddings differ from dense embeddings in terms of keyword matching and retrieval interpretability? | link |
| 55 | How can fine-tuning embedding models improve the retriever’s performance in RAG? | link |
| 56 | Design a retrieval strategy for a RAG system that needs to handle both structured data (knowledge graphs) and unstructured data (text documents) simultaneously. | link |
| 57 | Discuss the strategies to scale embeddings in RAG retrieval. | link |
| 58 | What advantages does quantization offer over dimensionality reduction for scaling embeddings? | link |
| 59 | Explain the pros and cons of quantized embeddings in RAG retrieval. | link |
| 60 | Compare scalar and binary quantization for embeddings in RAG retrieval. | link |
| 61 | How does re-ranking differ from the initial retrieval process in RAG? | link |
| 62 | Explain the pros and cons of using re-rankers in RAG. | link |
| 63 | What are the different types of re-ranker models that can be used in RAG? | link |
| 64 | Compare general re-rankers and instruction-following re-rankers in RAG. | link |
| 65 | Why is the cross-encoder typically used as the re-ranker rather than the bi-encoder? | link |
| 66 | A RAG system retrieves 20 candidate document chunks but can only fit 5 in the LLM's context window. Without re-ranking, how might this limitation affect response quality, and what specific problems would a re-ranker solve? | link |
| 67 | Describe a scenario where a BM25 retrieval might return relevant chunks but in poor ranking order. How would a neural re-ranker specifically address this limitation? | link |
| 68 | If your RAG system serves both simple factual queries and complex analytical questions, how would you decide when to bypass the re-ranker for efficiency while maintaining quality? | link |
| 69 | Describe the vector pre-computation and storage strategy in a bi-encoder + cross-encoder pipeline. Why can't cross-encoders pre-compute text representations like bi-encoders can? | link |
| 70 | Compare the noise reduction capabilities of re-rankers versus simply increasing the similarity threshold in initial retrieval. When would each approach be more appropriate? | link |
| 71 | What challenges do re-rankers face regarding computational overhead and latency? | link |
| 72 | In real-time applications with strict latency requirements, describe two specific optimization strategies you could implement to reduce re-ranking overhead while preserving most of the quality gains. | link |
| 73 | How would you evaluate the effectiveness of a reranker in a RAG system? Which metrics (e.g., MRR, MAP, NDCG) would you prioritize and why? | link |
| 74 | Explain the difference between Precision@k and Recall@k in the context of RAG. When might you prefer one over the other? | link |
| 75 | Why is MRR unsuitable when there are multiple relevant chunks per query, and how does MAP address this limitation? | link |
| 76 | Given a retrieval result, show how to manually calculate the MAP@5 (Mean Average Precision at 5). What does MAP reveal about the retrieval system that raw Precision does not? | link |
| 77 | If all the relevant chunks are at the very bottom, how would this affect MRR, MAP, and NDCG metrics? Explain each. | link |
| 78 | Suppose your RAG retriever gets perfect Recall@10 but low Precision@10. What problems could this cause for the downstream generator? | link |
| 79 | Compare and contrast “order-aware” and “order-unaware” retrieval metrics in RAG, giving examples for each from the set (Precision, Recall, MRR, MAP, NDCG). | link |
| 80 | How would the value of NDCG@k change if all relevant chunks are retrieved but in the reverse order (least to most relevant)? | link |
| 81 | What is the significance of Context Precision@K in evaluating a RAG retriever, and how does it differ from standard Precision@k in traditional information retrieval? | link |
| 82 | Why does Context Precision@K use a weighted sum approach with relevance indicators, and how does this better reflect RAG retriever performance? | link |
| 83 | Given a retrieval result where relevant chunks appear at positions 2, 4, 6, and 8 out of 10 total chunks, manually calculate the Context Precision@10. What does this score tell us about the retriever's ranking ability? | link |
| 84 | A RAG system achieves Context Precision@5 = 0.8. What are the possible scenarios that could lead to this score? | link |
| 85 | Explain the possible reasons for a RAG retrieval system with consistently low context precision. | link |
| 86 | Compare Context Recall with traditional information retrieval recall. Why is Context Recall computed using "ground truth claims" rather than simply counting relevant documents? | link |
| 87 | What does context precision measure in a RAG retriever, and how does it differ from context recall? | link |
| 88 | In a RAG pipeline, how might context recall impact the completeness of generated answers? Describe a scenario illustrating this relationship. | link |
| 89 | If your retriever achieves high context precision but low context recall, what types of user queries would likely suffer most? | link |
| 90 | In what situations would you prioritize Context Precision over Context Recall in a RAG retriever, and how would this impact the generator’s performance? | link |
| 91 | Describe a scenario where a RAG system might achieve high Context Recall but still produce poor answers. What complementary metrics would you use alongside Context Recall to get a complete picture of retriever performance? | link |
| 92 | If your RAG retriever consistently shows Context Recall scores below 0.6, what are the three potential root causes? | link |
| 93 | Why is it important for RAG systems to optimize both context precision and context recall simultaneously? What trade-offs might occur? | link |
| 94 | Explain why Context Relevancy is considered a "reference-free" metric while Context Precision and Context Recall are "reference-dependent." When would you prefer using Context Relevancy over the other two metrics? | link |
| 95 | Describe a scenario where a RAG retriever achieves high Context Relevancy but low Context Precision. What does this imply about the retriever’s performance? | link |
| 96 | Suppose a RAG retriever retrieves all relevant chunks but includes many irrelevant ones, leading to low Context Relevancy. How would you improve the retriever to address this issue? | link |
| 97 | How does the Faithfulness metric assess the quality of a RAG generator? | link |
| 98 | Distinguish between Faithfulness and Context Precision metrics in RAG evaluation. Why might a system have high Context Precision but low Faithfulness, and what would this indicate about your pipeline? | link |
| 99 | A RAG system has high context precision but low faithfulness. How would you address this? | link |
| 100 | Why might a RAG system with perfect Context Recall still fail to produce accurate responses? How does the Faithfulness metric help diagnose this issue? | link |
| 101 | Explain how hallucinations in LLMs specifically impact the Faithfulness metric. What techniques could you implement to improve the Faithfulness metric score? | link |
| 102 | How does Response Relevancy differ from Context Relevancy, and why do you need both metrics to properly evaluate a RAG system? | link |
| 103 | The generator’s response mentions facts not present in the retrieved context. Describe how faithfulness and response relevancy metrics would be impacted. | link |
| 104 | How does the Response Relevancy metric help evaluate whether a RAG generator is addressing the user’s query effectively? | link |
| 105 | When evaluating RAG generator output, what are the risks of relying solely on response relevancy? How can including the faithfulness metric improve reliability? | link |
Please consider giving a star, if you find this repository useful.