Parse, Normalize, Extract, and Store PDF Content for RAG in Pinecone
Parse, Normalize, Extract, and Store PDF Content for RAG in Pinecone This workflow automates a full RAG pipeline for structured documents (like insurance policies). What it does Watches a Google Drive folder for new PDFs Uploads to LlamaIndex Cloud for parsing → returns clean Markdown Normalizes text (removes headers, footers, page numbers, formatting artifacts) Splits text into chunks (~1200 chars with 150 overlap) Generates embeddings with OpenAI Stores vectors in Pinecone with m
Marketplace
Independent
Category
engineering
More like this
Browse engineering agents →
Refrax
Command-Line Agentic Refactoring of Java Code
Free
engineeringOpencode Plan Manager
A simple collection of tools for better plan management by AI agents on OpenCode.
Free
engineeringTabnine
Privacy-first AI code completion for enterprise teams
$12/mo
engineeringKitwork
Automate kit workflows effortlessly with a lightweight, high-performance, fast, and flexible engine for cloud or self-hosted environments.
Free