RAG-Driven-Generative-AI

🥇Gold

@Denis2054

RAG-Driven-Generative-AI offers a strong framework for building Retrieval Augmented Generation applications using LlamaIndex, Deep Lake, and Pinecone. Use OpenAI and Hugging Face models for enhanced content generation and evaluation.

5911930Updated 3mo ago

Intermediate15 minutes to implementdevelopment

Saves ~300 min per use

Quick InstallView Source

claude install Denis2054/RAG-Driven-Generative-AI

Works with:

ChatGPTGitHub Copilot

Overview

About This Skill

RAG-Driven-Generative-AI provides a structured framework for developing Retrieval Augmented Generation systems using LlamaIndex as the core orchestration layer, Deep Lake for vector storage, and Pinecone for semantic search capabilities. The skill leverages OpenAI and Hugging Face models to enhance both content generation and evaluation processes. Organized into 10 chapters of practical implementations, it covers foundational RAG concepts through advanced patterns including multimodal processing and dynamic retrieval strategies. This framework is designed for developers building production-grade AI applications that combine large language models with domain-specific knowledge retrieval.

How to Use

1. **Set Up Environment**: Install required packages: `pip install llama-index pinecone-client deeplake transformers openai`. Create accounts with Pinecone, Deep Lake, and OpenAI to obtain API keys. 2. **Prepare Data**: Organize your documents in a directory structure. For best results, use domain-specific documents (e.g., medical papers for healthcare applications). Clean and preprocess text to remove irrelevant content. 3. **Configure Components**: Edit the `llama_index_config` and `vector_store_config` dictionaries in the template. Specify your topic domain, chunking parameters, and model preferences. For production, use `gpt-4o` for generation and `text-embedding-3-large` for embeddings. 4. **Run Evaluation**: After generating responses, use the evaluation functions to assess quality. Adjust the `similarity_top_k` parameter in the query engine to balance between relevance and response length. For domain-specific evaluation, fine-tune the Hugging Face model on your dataset. 5. **Iterate and Deploy**: Review evaluation metrics and user feedback. Optimize chunking strategy, embedding models, or retrieval parameters based on results. Deploy the final application using FastAPI or Streamlit for end-user access. **Pro Tips:** - For technical domains, use domain-specific embedding models (e.g., `BAAI/bge-small-en-v1.5` from Hugging Face) instead of OpenAI embeddings. - Monitor Pinecone/Deep Lake costs as vector store size grows. Implement caching for frequent queries. - Use the `NodeParser` in LlamaIndex to customize how documents are split for your specific use case.

Use Cases

Automate content generation for marketing campaigns to enhance engagement and reach.

Retrieve and analyze data for sales pitches, improving the effectiveness of presentations.

Generate automated reports that summarize key metrics and insights for stakeholders.

Enhance customer support responses by providing accurate and context-aware information.

Best For

GrowthRevOps

Setup & Installation

Quick Install

Terminal

claude install Denis2054/RAG-Driven-Generative-AI

Alternative Install (Git Clone)

git clone https://github.com/Denis2054/RAG-Driven-Generative-AI

Requirements

Claude Code or compatible AI agent
Works with: ChatGPT, GitHub Copilot

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Build a RAG-driven generative AI application using [LLAMA_INDEX_CONFIG] and [VECTOR_STORE_CONFIG] to answer questions about [TOPIC_DOMAIN]. Use [OPENAI_MODEL] for generation and [HUGGING_FACE_MODEL] for evaluation. Include [QUERY_EXAMPLE] in your response. Structure the output as a Python script with clear comments.

Example Output

```python
# RAG-Driven Generative AI Application
# Domain: Healthcare Diagnostics
# Tools: LlamaIndex (v0.9.43), Deep Lake (v1.2.3), Pinecone (v2.2.2), OpenAI GPT-4o, Hugging Face BERT

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.evaluation import FaithfulnessEvaluator
from transformers import pipeline

# Configuration
llama_index_config = {
    "chunk_size": 512,
    "chunk_overlap": 50,
    "llm_model": "gpt-4o",
    "embedding_model": "text-embedding-3-large"
}

vector_store_config = {
    "index_name": "healthcare_diagnostics_v1",
    "namespace": "symptom_analysis",
    "pinecone_api_key": "your-pinecone-key",
    "environment": "us-west1-gcp"
}

# Initialize components
llm = OpenAI(model=llama_index_config["llm_model"], temperature=0.1)
embed_model = OpenAIEmbedding(model=llama_index_config["embedding_model"])

# Load documents (example: medical research papers)
documents = SimpleDirectoryReader("data/healthcare_papers").load_data()

# Create vector store
vector_store = PineconeVectorStore(**vector_store_config)
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embed_model,
    vector_store=vector_store
)

# Configure service context
service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
    chunk_size=llama_index_config["chunk_size"]
)

# Create query engine
query_engine = index.as_query_engine(
    service_context=service_context,
    similarity_top_k=5
)

# Evaluation setup
hf_evaluator = pipeline(
    "text-classification",
    model="bert-base-uncased",
    tokenizer="bert-base-uncased"
)

def evaluate_response(query, response):
    faithfulness_score = FaithfulnessEvaluator().evaluate_response(
        response=response,
        query=query
    )
    hf_score = hf_evaluator(response, return_all_scores=True)[0]['score']
    return {
        "faithfulness": faithfulness_score.passing_rate,
        "huggingface_score": hf_score
    }

# Example query
query = "What are the most common symptoms of Type 2 Diabetes in adults over 40?"
response = query_engine.query(query)

# Evaluate
scores = evaluate_response(query, str(response))

print(f"Query: {query}")
print(f"Response: {response}")
print(f"Evaluation Scores - Faithfulness: {scores['faithfulness']}, HuggingFace: {scores['huggingface_score']:.2f}")
```

**Evaluation Results:**
- Faithfulness Score: 0.92 (92% of claims in the response could be verified against source documents)
- HuggingFace BERT Evaluation: 0.87 (high semantic alignment with medical terminology)
- Response Quality: The generated answer correctly identified 5 primary symptoms and cited 3 relevant research papers from the vector store.

**Key Findings:**
1. The RAG pipeline successfully retrieved relevant documents about Type 2 Diabetes symptoms from the Deep Lake/Pinecone vector store.
2. OpenAI GPT-4o generated a coherent response that incorporated retrieved information without hallucination.
3. The evaluation pipeline confirmed both factual accuracy (via faithfulness) and semantic relevance (via BERT scoring).
4. The system achieved a 42% reduction in hallucination rate compared to a pure generative approach without retrieval.

Apply to these tools

Browse all tools

Hugging Face

Open-source hub for ML models, datasets, and demos

Rive

Create and collaborate on interactive animations with powerful, user-friendly tools.

Lever

Streamline talent acquisition with collaborative tools and customizable interview processes.

OpenAI

Pioneering accessible, high-performance AI models

TeamCity

CI/CD automation with build configuration as code

Lightstep

Enhance performance monitoring and root cause analysis with real-time distributed tracing.

Compatible MCP servers

Browse all MCP servers

Find the right skills for your stack

Take a free 3-minute scan and get personalized AI skill recommendations.

Take free scan

Overview

About This Skill

How to Use

Use Cases

Automate content generation for marketing campaigns to enhance engagement and reach.

Retrieve and analyze data for sales pitches, improving the effectiveness of presentations.

Generate automated reports that summarize key metrics and insights for stakeholders.

Enhance customer support responses by providing accurate and context-aware information.

Best For

GrowthRevOps

Quick Install

Terminal

claude install Denis2054/RAG-Driven-Generative-AI

Alternative Install (Git Clone)

git clone https://github.com/Denis2054/RAG-Driven-Generative-AI

Requirements

Claude Code or compatible AI agent
Works with: ChatGPT, GitHub Copilot

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Build a RAG-driven generative AI application using [LLAMA_INDEX_CONFIG] and [VECTOR_STORE_CONFIG] to answer questions about [TOPIC_DOMAIN]. Use [OPENAI_MODEL] for generation and [HUGGING_FACE_MODEL] for evaluation. Include [QUERY_EXAMPLE] in your response. Structure the output as a Python script with clear comments.

Example Output

```python
# RAG-Driven Generative AI Application
# Domain: Healthcare Diagnostics
# Tools: LlamaIndex (v0.9.43), Deep Lake (v1.2.3), Pinecone (v2.2.2), OpenAI GPT-4o, Hugging Face BERT

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.evaluation import FaithfulnessEvaluator
from transformers import pipeline

# Configuration
llama_index_config = {
    "chunk_size": 512,
    "chunk_overlap": 50,
    "llm_model": "gpt-4o",
    "embedding_model": "text-embedding-3-large"
}

vector_store_config = {
    "index_name": "healthcare_diagnostics_v1",
    "namespace": "symptom_analysis",
    "pinecone_api_key": "your-pinecone-key",
    "environment": "us-west1-gcp"
}

# Initialize components
llm = OpenAI(model=llama_index_config["llm_model"], temperature=0.1)
embed_model = OpenAIEmbedding(model=llama_index_config["embedding_model"])

# Load documents (example: medical research papers)
documents = SimpleDirectoryReader("data/healthcare_papers").load_data()

# Create vector store
vector_store = PineconeVectorStore(**vector_store_config)
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embed_model,
    vector_store=vector_store
)

# Configure service context
service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
    chunk_size=llama_index_config["chunk_size"]
)

# Create query engine
query_engine = index.as_query_engine(
    service_context=service_context,
    similarity_top_k=5
)

# Evaluation setup
hf_evaluator = pipeline(
    "text-classification",
    model="bert-base-uncased",
    tokenizer="bert-base-uncased"
)

def evaluate_response(query, response):
    faithfulness_score = FaithfulnessEvaluator().evaluate_response(
        response=response,
        query=query
    )
    hf_score = hf_evaluator(response, return_all_scores=True)[0]['score']
    return {
        "faithfulness": faithfulness_score.passing_rate,
        "huggingface_score": hf_score
    }

# Example query
query = "What are the most common symptoms of Type 2 Diabetes in adults over 40?"
response = query_engine.query(query)

# Evaluate
scores = evaluate_response(query, str(response))

print(f"Query: {query}")
print(f"Response: {response}")
print(f"Evaluation Scores - Faithfulness: {scores['faithfulness']}, HuggingFace: {scores['huggingface_score']:.2f}")
```

**Evaluation Results:**
- Faithfulness Score: 0.92 (92% of claims in the response could be verified against source documents)
- HuggingFace BERT Evaluation: 0.87 (high semantic alignment with medical terminology)
- Response Quality: The generated answer correctly identified 5 primary symptoms and cited 3 relevant research papers from the vector store.

**Key Findings:**
1. The RAG pipeline successfully retrieved relevant documents about Type 2 Diabetes symptoms from the Deep Lake/Pinecone vector store.
2. OpenAI GPT-4o generated a coherent response that incorporated retrieved information without hallucination.
3. The evaluation pipeline confirmed both factual accuracy (via faithfulness) and semantic relevance (via BERT scoring).
4. The system achieved a 42% reduction in hallucination rate compared to a pure generative approach without retrieval.

RAG-Driven-Generative-AI

Overview

About This Skill

How to Use

Use Cases

Best For

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Hugging Face

Rive

Lever

OpenAI

TeamCity

Lightstep

Compatible MCP servers

s

s

s

chat ui

mcp course

huggingface.js

Find the right skills for your stack

RAG-Driven-Generative-AI

Overview

About This Skill

How to Use

Use Cases

Best For

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Hugging Face

Rive

Lever

OpenAI

TeamCity

Lightstep

Compatible MCP servers

s

s

s

chat ui

mcp course

huggingface.js

Find the right skills for your stack