Embeddings and Vector Representations: How LLMs Understand Meaning

What Are Embeddings?

An embedding is a dense vector of floating-point numbers that represents the meaning of a piece of text. Instead of treating words as arbitrary symbols, embeddings place them in a continuous mathematical space where similar meanings are close together and different meanings are far apart.

The sentence "The dog chased the cat" and "A canine pursued the feline" would have very similar embedding vectors, even though they share no words in common. This is the foundation of semantic search, RAG, recommendation systems, and many other AI applications.

From Words to Vectors

Every LLM starts by converting tokens into embedding vectors. This is the very first operation in the model — before any attention or computation happens:

import numpy as np

# Simplified embedding lookup
# In a real model, these are learned during training
vocab = {"the": 0, "cat": 1, "sat": 2, "on": 3, "mat": 4}
embedding_dim = 4

# Embedding matrix: vocab_size x embedding_dim
# Each row is the learned vector for one token
embedding_matrix = np.array([
    [0.2, 0.1, -0.3, 0.5],   # "the"
    [0.8, -0.2, 0.6, 0.1],   # "cat"
    [-0.1, 0.7, 0.3, -0.4],  # "sat"
    [0.3, 0.0, -0.1, 0.8],   # "on"
    [0.5, 0.4, -0.2, 0.3],   # "mat"
])

def embed_tokens(tokens, embedding_matrix, vocab):
    """Look up embedding vectors for a sequence of tokens."""
    indices = [vocab[t] for t in tokens]
    return embedding_matrix[indices]  # Shape: (seq_len, embedding_dim)

sentence = ["the", "cat", "sat", "on", "the", "mat"]
embeddings = embed_tokens(sentence, embedding_matrix, vocab)
print(f"Shape: {embeddings.shape}")  # (6, 4) — 6 tokens, 4 dimensions each

In production models, the embedding dimension is much larger — 1536 for OpenAI's text-embedding-3-small, 3072 for text-embedding-3-large. These higher dimensions allow the model to capture more nuanced distinctions in meaning.

Using Embedding Models in Practice

Modern embedding models convert entire passages of text into a single vector that captures the overall meaning:

from openai import OpenAI

client = OpenAI()

def get_embedding(text: str, model: str = "text-embedding-3-small") -> list[float]:
    """Get the embedding vector for a piece of text."""
    response = client.embeddings.create(
        input=text,
        model=model,
    )
    return response.data[0].embedding

# Embed some sample texts
texts = [
    "How to train a machine learning model",
    "Steps for building an ML pipeline",
    "Best Italian restaurants in New York",
    "NYC dining guide for pasta lovers",
    "Understanding neural network backpropagation",
]

embeddings = [get_embedding(text) for text in texts]

print(f"Embedding dimension: {len(embeddings[0])}")  # 1536
print(f"Number of texts embedded: {len(embeddings)}")

Cosine Similarity: Measuring Meaning Distance

The standard way to compare embeddings is cosine similarity. It measures the angle between two vectors, ignoring their magnitude. Values range from -1 (opposite meaning) to 1 (identical meaning):

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

import numpy as np

def cosine_similarity(a, b):
    """Compute cosine similarity between two vectors."""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compare all pairs of our sample texts
print("Similarity matrix:")
print(f"{'':>45s}", end="")
for i in range(len(texts)):
    print(f"  [{i}]", end="")
print()

for i, text_a in enumerate(texts):
    print(f"[{i}] {text_a[:40]:>42s}", end="")
    for j, text_b in enumerate(texts):
        sim = cosine_similarity(embeddings[i], embeddings[j])
        print(f" {sim:.2f}", end="")
    print()

# Expected results:
# [0] and [1] — high similarity (both about ML training)
# [2] and [3] — high similarity (both about NYC restaurants)
# [0] and [2] — low similarity (ML vs restaurants)

Building a Semantic Search Engine

Embeddings power semantic search — finding documents by meaning rather than keyword matching. Here is a complete implementation:

import numpy as np
from openai import OpenAI

client = OpenAI()

class SemanticSearchEngine:
    """Simple in-memory semantic search using OpenAI embeddings."""

    def __init__(self, model="text-embedding-3-small"):
        self.model = model
        self.documents = []
        self.embeddings = []

    def add_documents(self, documents: list[str]):
        """Embed and index a list of documents."""
        response = client.embeddings.create(
            input=documents,
            model=self.model,
        )
        new_embeddings = [item.embedding for item in response.data]

        self.documents.extend(documents)
        self.embeddings.extend(new_embeddings)

    def search(self, query: str, top_k: int = 3) -> list[dict]:
        """Find the most semantically similar documents."""
        # Embed the query
        query_embedding = client.embeddings.create(
            input=query,
            model=self.model,
        ).data[0].embedding

        # Compute similarity against all documents
        similarities = [
            cosine_similarity(query_embedding, doc_emb)
            for doc_emb in self.embeddings
        ]

        # Return top-k results
        ranked = sorted(
            enumerate(similarities),
            key=lambda x: x[1],
            reverse=True,
        )[:top_k]

        return [
            {"document": self.documents[idx], "score": score}
            for idx, score in ranked
        ]

# Usage
engine = SemanticSearchEngine()
engine.add_documents([
    "Python is a high-level programming language known for readability.",
    "JavaScript runs in web browsers and on Node.js servers.",
    "PostgreSQL is an advanced open-source relational database.",
    "Redis is an in-memory data structure store used as cache.",
    "Docker packages applications into portable containers.",
    "Kubernetes orchestrates container deployment at scale.",
])

results = engine.search("How do I store data efficiently?")
for r in results:
    print(f"  Score: {r['score']:.3f} | {r['document']}")

Vector Databases: Scaling Beyond Memory

For production applications with millions of documents, you need a vector database that can perform approximate nearest neighbor (ANN) search efficiently:

# Using ChromaDB — a popular open-source vector database
import chromadb

client_db = chromadb.PersistentClient(path="./chroma_data")

# Create a collection with automatic embedding
collection = client_db.get_or_create_collection(
    name="knowledge_base",
    metadata={"hnsw:space": "cosine"},  # Use cosine similarity
)

# Add documents — ChromaDB handles embedding automatically
collection.add(
    documents=[
        "Machine learning models learn patterns from data.",
        "Neural networks are inspired by biological neurons.",
        "Gradient descent optimizes model parameters iteratively.",
    ],
    ids=["doc1", "doc2", "doc3"],
    metadatas=[
        {"topic": "ml", "difficulty": "beginner"},
        {"topic": "dl", "difficulty": "intermediate"},
        {"topic": "optimization", "difficulty": "advanced"},
    ],
)

# Query with metadata filtering
results = collection.query(
    query_texts=["How do AI models improve over time?"],
    n_results=2,
    where={"difficulty": {"$ne": "advanced"}},  # Exclude advanced docs
)

print(results["documents"])
print(results["distances"])  # Lower = more similar for cosine

Embedding Models: Choosing the Right One

Different embedding models offer different trade-offs:

# Compare embedding dimensions and performance
embedding_models = {
    "text-embedding-3-small": {"dim": 1536, "cost_per_M": 0.02, "provider": "OpenAI"},
    "text-embedding-3-large": {"dim": 3072, "cost_per_M": 0.13, "provider": "OpenAI"},
    "voyage-3": {"dim": 1024, "cost_per_M": 0.06, "provider": "Voyage AI"},
    "all-MiniLM-L6-v2": {"dim": 384, "cost_per_M": 0.00, "provider": "Local (HuggingFace)"},
}

for model, info in embedding_models.items():
    print(f"{model:30s} | dim={info['dim']:5d} | ${info['cost_per_M']:.2f}/M tokens | {info['provider']}")

For local embedding without API calls, use the sentence-transformers library:

from sentence_transformers import SentenceTransformer

# Download and load the model locally — no API key needed
model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "How to deploy a Python application",
    "Steps for shipping a Python app to production",
    "Best pizza places in Chicago",
]

# Generate embeddings locally
embeddings = model.encode(sentences)
print(f"Shape: {embeddings.shape}")  # (3, 384)

# Compute similarity
from sentence_transformers.util import cos_sim
similarity = cos_sim(embeddings[0], embeddings[1])
print(f"Similarity between first two: {similarity.item():.3f}")

FAQ

What is the difference between embeddings from an LLM and from an embedding model?

LLMs like GPT-4 produce internal embeddings as part of text generation, but these are not exposed through the API. Embedding models like text-embedding-3-small are specifically trained to produce embeddings optimized for similarity comparison. They are smaller, faster, and cheaper than full LLMs, and their embeddings are better suited for search and retrieval. Use embedding models for search and RAG; use LLMs for text generation.

How many dimensions should my embeddings have?

It depends on the complexity of your data and your storage budget. 384 dimensions (MiniLM) work well for many applications and are very storage-efficient. 1536 dimensions (text-embedding-3-small) capture more nuance and are the sweet spot for most production use. 3072 dimensions (text-embedding-3-large) offer marginal gains for specialized tasks. OpenAI's text-embedding-3 models support dimension reduction via the dimensions parameter, letting you choose your trade-off point.

Can I update embeddings without re-embedding everything?

No. If you change the embedding model or its version, all existing embeddings become incompatible and must be regenerated. This is because different models map texts to different vector spaces — a vector from model A is meaningless in model B's space. Plan for re-indexing when upgrading embedding models. Some teams maintain version numbers on their vector collections and run parallel indexes during transitions.

#Embeddings #VectorSearch #CosineSimilarity #SemanticSearch #RAG #AgenticAI #LearnAI #AIEngineering

Embeddings and Vector Representations: How LLMs Understand Meaning

What Are Embeddings?

From Words to Vectors

Using Embedding Models in Practice

Cosine Similarity: Measuring Meaning Distance

Building a Semantic Search Engine

Vector Databases: Scaling Beyond Memory

Embedding Models: Choosing the Right One

FAQ

What is the difference between embeddings from an LLM and from an embedding model?

How many dimensions should my embeddings have?

Can I update embeddings without re-embedding everything?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding