Vector Databases for RAG: Comparing pgvector, Pinecone, Chroma, and Weaviate
A practical comparison of four popular vector databases for RAG — pgvector, Pinecone, Chroma, and Weaviate — covering setup, indexing, query performance, and when to choose each one.
Why Vector Databases Matter for RAG
The retrieval step in RAG depends on finding document chunks whose embedding vectors are closest to the query vector. A vector database is purpose-built for this operation: it stores high-dimensional vectors, builds indexes for fast approximate nearest-neighbor (ANN) search, and returns results in milliseconds even across millions of documents.
Choosing the right vector database depends on your scale, infrastructure preferences, and operational requirements. This post gives you working code and honest tradeoffs for four leading options.
Option 1: pgvector — Vectors Inside PostgreSQL
pgvector is a PostgreSQL extension that adds vector data types and similarity search operators. If you already run Postgres, this is the lowest-friction path to production RAG.
import psycopg2
import numpy as np
conn = psycopg2.connect("postgresql://user:pass@localhost/ragdb")
cur = conn.cursor()
# Enable the extension
cur.execute("CREATE EXTENSION IF NOT EXISTS vector")
# Create a table with a vector column
cur.execute("""
CREATE TABLE IF NOT EXISTS documents (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding vector(1536)
)
""")
# Insert a document with its embedding
embedding = np.random.rand(1536).tolist() # replace with real embedding
cur.execute(
"INSERT INTO documents (content, metadata, embedding) VALUES (%s, %s, %s)",
("Refund policy for enterprise...", '{"source": "policies.md"}', str(embedding))
)
# Query: find 5 nearest neighbors
query_vec = np.random.rand(1536).tolist()
cur.execute("""
SELECT id, content, 1 - (embedding <=> %s::vector) AS similarity
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT 5
""", (str(query_vec), str(query_vec)))
results = cur.fetchall()
for doc_id, content, sim in results:
print(f"[{sim:.3f}] {content[:80]}...")
conn.commit()
Create an HNSW index for fast queries at scale:
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
Pros: No new infrastructure — lives in your existing Postgres. Full SQL joins with relational data. ACID transactions. Metadata filtering via standard WHERE clauses.
Cons: Slower than purpose-built vector DBs at very large scale (50M+ vectors). Limited to single-node without partitioning.
Option 2: Pinecone — Fully Managed Cloud Vector DB
Pinecone is a managed service that handles scaling, replication, and index management. You interact through an API — no servers to operate.
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="your-api-key")
# Create an index
pc.create_index(
name="rag-docs",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
index = pc.Index("rag-docs")
# Upsert vectors
index.upsert(vectors=[
{
"id": "doc-001",
"values": embedding_vector,
"metadata": {"source": "policies.md", "category": "billing"}
}
])
# Query with metadata filter
results = index.query(
vector=query_vector,
top_k=5,
include_metadata=True,
filter={"category": {"$eq": "billing"}}
)
for match in results["matches"]:
print(f"[{match['score']:.3f}] {match['id']} — {match['metadata']}")
Pros: Zero infrastructure management. Scales to billions of vectors. Built-in metadata filtering. SOC2 compliant.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Cons: Vendor lock-in. Network latency for every query. Monthly costs grow with scale. Data leaves your network.
Option 3: Chroma — Open-Source and Embedded
Chroma is an open-source embedding database designed for simplicity. It can run in-process (embedded) or as a client-server deployment.
import chromadb
from chromadb.utils import embedding_functions
# In-memory or persistent
client = chromadb.PersistentClient(path="./chroma_db")
# Use OpenAI embeddings automatically
ef = embedding_functions.OpenAIEmbeddingFunction(
model_name="text-embedding-3-small"
)
collection = client.get_or_create_collection(
name="documents",
embedding_function=ef,
metadata={"hnsw:space": "cosine"}
)
# Add documents — Chroma embeds them automatically
collection.add(
ids=["doc-001", "doc-002"],
documents=["Refund policy for enterprise plans...", "Billing cycle details..."],
metadatas=[{"source": "policies.md"}, {"source": "billing.md"}]
)
# Query
results = collection.query(
query_texts=["What is the refund policy?"],
n_results=5,
where={"source": "policies.md"}
)
for doc, dist in zip(results["documents"][0], results["distances"][0]):
print(f"[{1-dist:.3f}] {doc[:80]}...")
Pros: Simple API. Embedded mode has zero network overhead. Free and open-source. Great for prototyping.
Cons: Single-node only in embedded mode. Limited production tooling (no built-in backups, monitoring). Performance degrades past a few million vectors.
Option 4: Weaviate — Hybrid Search Built-In
Weaviate is an open-source vector database that natively supports both vector search and keyword (BM25) search, making hybrid retrieval straightforward.
import weaviate
import weaviate.classes.config as wc
client = weaviate.connect_to_local() # or connect_to_weaviate_cloud()
# Define a collection with vectorizer
collection = client.collections.create(
name="Document",
vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(
model="text-embedding-3-small"
),
properties=[
wc.Property(name="content", data_type=wc.DataType.TEXT),
wc.Property(name="source", data_type=wc.DataType.TEXT),
]
)
# Insert — Weaviate auto-embeds the content
collection.data.insert({"content": "Refund policy...", "source": "policies.md"})
# Hybrid search (vector + keyword)
results = collection.query.hybrid(
query="refund policy enterprise",
alpha=0.7, # 0=pure keyword, 1=pure vector
limit=5,
)
for obj in results.objects:
print(f"[{obj.metadata.score:.3f}] {obj.properties['content'][:80]}...")
client.close()
Pros: Native hybrid search. Auto-vectorization. Multi-tenancy support. Active open-source community.
Cons: Heavier operational footprint. Steeper learning curve. Java-based runtime requires more memory.
Quick Comparison Table
| Feature | pgvector | Pinecone | Chroma | Weaviate |
|---|---|---|---|---|
| Hosting | Self-managed | Managed | Either | Either |
| Hybrid search | Manual BM25 | Keyword filter only | No | Native |
| Max scale | ~10M vectors | Billions | ~5M vectors | ~100M vectors |
| Best for | Postgres shops | Zero-ops teams | Prototyping | Hybrid search |
FAQ
Can I start with Chroma and migrate to Pinecone or pgvector later?
Yes. Your embedding vectors are portable — they are just arrays of floats. Export them from Chroma and import into any other vector store. The main migration effort is adapting your query code and metadata schema to the target system's API.
Should I use a vector database or just compute cosine similarity in application code?
For under 10,000 documents, brute-force cosine similarity in NumPy is fast enough and simpler. Beyond that, ANN indexes in a vector database provide sub-linear search time that brute force cannot match. The crossover point where a vector DB becomes necessary is typically around 50K-100K vectors.
Is pgvector production-ready?
Yes. pgvector is used in production at companies of all sizes. With HNSW indexing, it handles millions of vectors with low-millisecond query times. The main limitation is that it runs on a single PostgreSQL node, so if you need distributed vector search across billions of vectors, a purpose-built solution like Pinecone or Weaviate is more appropriate.
#RAG #VectorDatabase #Pgvector #Pinecone #Chroma #Weaviate #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.