Youtube: RAG Crash Course for Beginners
RAG (Retrieval-Augmented Generation)
- Definition: a method
- where an AI model first retrieves relevant information from external sources (documents, databases, web pages)
- then uses that retrieved content to generate a more accurate answer.
Q: How to get better responses?
-
Prompt engineering (Improve results by writing bette)
-
Fine-tuning: from train model
- Train a pre-trained model further on a smaller, task-specific dataset
- better for communication style
-
RAG: knowledge from documents/database
-
Q: When no to RAG?
- If the main goal is not add missing knowledge via external documents
- but to keep the model’s output highly controlled and consistent, like
- strict output schema (JSON),
- stable tone,
- or a fixed template,
- prompting or fine-tuning is usually better.
Q: How we find relevant chunks? Retrieval
- Lexical Search (keyword-based): matches query terms with documents and ranks by term overlap.
- TF-IDF (Term Frequency – Inverse Document Frequency)
- weights terms higher if they are frequent in a doc but rare in the corpus.
- BM25: improves TF-IDF with TF saturation + document-length normalization.
- If a term appears many times, BM25 gives diminishing returns (repeating the same word doesn’t keep boosting the score a lot).
- If the chunk/document is long, BM25 reduces the score a bit, because long text naturally contains more words and shouldn’t get an unfair advantage.
- TF-IDF (Term Frequency – Inverse Document Frequency)
- Semantic Search
- Embedding retrieval:
- ranks by meaning similarity, not exact word overlap (handles paraphrases/synonyms).
- Embedding retrieval: