The short version
Computers are good with numbers and bad with meaning. The sentence "how do I deploy my app?" and "getting my code live on the internet" mean roughly the same thing, but a keyword search would miss the connection. Embeddings solve this by converting text into a list of numbers (a vector) where similar meanings end up close together in mathematical space.
This is the technology behind semantic search, recommendation systems, and RAG pipelines. It's how AI tools find relevant documents even when the search terms don't match the exact words.
How it works
An embedding model takes a piece of text and returns a vector, a list of numbers, typically 768 to 3,072 numbers long. Each number encodes some aspect of the text's meaning. You don't need to understand what each individual number represents. What matters is that similar texts produce similar vectors.
The process:
- Embed your content. Take each document, paragraph, or chunk of text and run it through an embedding model. Store the resulting vectors in a database alongside the original text.
- Embed the query. When a user asks a question, run that through the same embedding model to get a query vector.
- Compare. Use a distance metric (usually cosine similarity) to find which stored vectors are closest to the query vector. The closest matches are the most semantically similar content.
"how do I deploy?" → [0.12, -0.34, 0.56, ...] (1536 numbers)
"getting code live" → [0.11, -0.33, 0.55, ...] (similar numbers = similar meaning)
"best pizza in London" → [0.87, 0.23, -0.91, ...] (different numbers = different meaning)
Embedding models are separate from generation models. OpenAI's text-embedding-3-small, Cohere's embed-english-v3, and open-source models like nomic-embed-text are all embedding models. They're cheaper and faster than LLMs because they only need to read and encode, not generate.
Vector databases (Pinecone, Weaviate, Chroma, or the pgvector extension for PostgreSQL) are optimised for storing and searching these vectors efficiently.
Why it matters
Embeddings are the mechanism behind "chat with your docs," semantic search, and content recommendations. If you're building a RAG pipeline, embeddings are the retrieval step. Understanding what they do and how they work helps you make better decisions about chunking strategy (how you split documents), model choice, and why search results sometimes miss or surprise you.