engineering·Independent✓ Verified

Build a PDF Q&A System with LlamaIndex, OpenAI Embeddings & Pinecone Vector DB

Parse, Normalize, Extract, and Store PDF Content for RAG in Pinecone

About

Parse, Normalize, Extract, and Store PDF Content for RAG in Pinecone This workflow automates a full RAG pipeline for structured documents (like insurance policies). What it does Watches a Google Drive folder for new PDFs Uploads to LlamaIndex Cloud for parsing → returns clean Markdown Normalizes text (removes headers, footers, page numbers, formatting artifacts) Splits text into chunks (~1200 chars with 150 overlap) Generates embeddings with OpenAI Stores vectors in Pinecone with m

Tags

Pricing

Free

0
Visit website ↗

Marketplace

Independent

Category

engineering

More like this

Browse engineering agents →