A

Semantic Search - AI-Powered Search

semantic-search embeddings ai vector-search

Semantic Search - AI-Powered Search

Semantic search uses AI to understand the meaning of queries and documents, not just keywords. It's powered by vector embeddings that represent text as points in high-dimensional space, enabling searches based on conceptual similarity.

How Semantic Search Works

Vector Embeddings

Text is converted to high-dimensional vectors (arrays of numbers):

// Text to embedding
"javascript async programming" → [0.2, 0.8, 0.1, ..., 0.4]  // 768 dimensions

// Similar concepts have similar vectors
"javascript async programming" → [0.2, 0.8, 0.1, ..., 0.4]
"js asynchronous code"         → [0.3, 0.7, 0.2, ..., 0.5]  // Close in vector space

// Different concepts are far apart
"javascript async programming" → [0.2, 0.8, 0.1, ..., 0.4]
"cooking pasta recipes"        → [0.9, 0.1, 0.8, ..., 0.2]  // Far in vector space

Cosine Similarity

Measures how similar two vectors are:

// Calculate similarity (-1 to 1, higher is more similar)
function cosineSimilarity(vecA, vecB) {
  const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
  const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
  const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

// Example
queryVector = [0.2, 0.8, 0.1];
doc1Vector  = [0.3, 0.7, 0.2];  // similarity: 0.95 (very similar)
doc2Vector  = [0.9, 0.1, 0.8];  // similarity: 0.20 (not similar)

Search Process

  1. Indexing (done once):
// Convert documents to embeddings
const documents = [
  "How to deploy a JavaScript application",
  "Deploying apps to production servers",
  "Best practices for async JavaScript"
];

const embeddings = await Promise.all(
  documents.map(doc => generateEmbedding(doc))
);

// Store in database with vector index
await db.execute(`
  INSERT INTO articles (content, embedding)
  VALUES (?, vector(?))
`, [documents[0], embeddings[0]]);
  1. Searching (realtime):
// User query
const query = "how do I push my app to production?";

// Convert query to embedding
const queryEmbedding = await generateEmbedding(query);

// Find similar documents
const results = await db.execute(`
  SELECT
    content,
    vector_distance_cos(embedding, vector(?)) as similarity
  FROM articles
  ORDER BY similarity DESC
  LIMIT 10
`, [queryEmbedding]);

// Results ranked by semantic similarity:
// 1. "Deploying apps to production servers" (0.89)
// 2. "How to deploy a JavaScript application" (0.82)
// 3. "Best practices for async JavaScript" (0.45)

Embedding Models

Sentence Transformers

Pre-trained models that convert text to vectors:

import { pipeline } from '@xenova/transformers';

// Load embedding model
const embedder = await pipeline(
  'feature-extraction',
  'Xenova/all-MiniLM-L6-v2'
);

// Generate embedding
const text = "javascript async programming";
const embedding = await embedder(text, {
  pooling: 'mean',
  normalize: true
});

// Result: Float32Array of 384 dimensions
console.log(embedding.data);  // [0.123, -0.456, 0.789, ...]
Model Dimensions Speed Quality
all-MiniLM-L6-v2 384 Fast Good
all-mpnet-base-v2 768 Medium Better
text-embedding-3-small (OpenAI) 1536 API Excellent
text-embedding-ada-002 (OpenAI) 1536 API Excellent
textembedding-gecko (Gemini) 768 API Excellent

Implementation in Astro Vault

Indexing Content

// scripts/index-content.ts
import { indexContent } from '@logan/libsql-search';
import { getTursoClient } from './lib/turso';

const client = getTursoClient();
const articles = [
  {
    slug: 'javascript-async',
    title: 'JavaScript Async Programming',
    content: 'Learn how to use async/await...',
    tags: ['javascript', 'async'],
  },
  // ... more articles
];

// Generate embeddings and store with 768-dimensional vectors
await indexContent(
  client,
  'articles',
  articles,
  'local',  // Use local embedding model
  768       // Embedding dimensions
);

Searching

// src/pages/api/search.json.ts
import { searchArticles } from '@logan/libsql-search';
import { getTursoClient } from '../../lib/turso';

export async function GET({ request }) {
  const url = new URL(request.url);
  const query = url.searchParams.get('q') || '';

  const client = getTursoClient();

  // Semantic search
  const results = await searchArticles(
    client,
    'articles',
    query,
    'local',
    10
  );

  return new Response(JSON.stringify({ results }), {
    headers: { 'Content-Type': 'application/json' },
  });
}

Example Queries

Query: "how do I deploy my app?"

Full-Text Search:

-- Looks for keywords: "deploy", "app"
SELECT * FROM articles
WHERE content LIKE '%deploy%' AND content LIKE '%app%';

-- Results (keyword matching):
1. "Deploy your application to production"
2. "App deployment best practices"
3. "Deploying Docker apps"

Semantic Search:

// Understands: user wants to publish/release software
const results = await searchArticles(client, 'articles',
  "how do I deploy my app?", 'local', 10);

// Results (meaning-based):
1. "Pushing to production servers" (0.91 similarity)
2. "Publishing your application" (0.88)
3. "CI/CD deployment pipelines" (0.85)
4. "Deploy your application to production" (0.83)

Feature Comparison

Feature Full-Text Semantic
Speed 10-20ms 50-100ms
Setup Built-in DB Requires embeddings
Synonyms Manual dictionary Automatic
Typos Poor Better
Concept match No Yes
Natural queries Poor Excellent
Resource usage Low Medium
Index size Small Large

1. Understanding Synonyms

// Query: "car"
// Full-text: Only finds "car"
// Semantic: Finds "car", "automobile", "vehicle", "auto"

2. Natural Language

// Query: "how to make my site faster?"
// Full-text: "how", "to", "make", "my", "site", "faster"
// Semantic: Understands user wants performance optimization
//          Finds "speed up website", "optimize performance", etc.

3. Conceptual Understanding

// Query: "best laptop for coding"
// Full-text: Finds documents with those exact words
// Semantic: Understands "laptop for coding" = "developer laptop",
//          "programming computer", "development machine"

4. Typo Tolerance

// Query: "javascrpt async"  (typo)
// Full-text: No results (exact match only)
// Semantic: Still finds JavaScript async content (similar embedding)

5. Cross-Language Concepts

// Query in English: "error handling"
// Can find similar concepts even if expressed differently
// "exception management", "dealing with failures", etc.

Limitations

1. Slower Than Full-Text

Full-text search:  10ms
Semantic search:   50-100ms

Reason: Vector distance calculations are expensive

2. Requires Vector Index

-- LibSQL/SQLite
CREATE INDEX idx_embedding ON articles(libsql_vector_idx(embedding));

-- Index size for 10,000 documents with 768-dim embeddings:
-- ~30 MB (vs 5 MB for full-text index)

3. Cold Start Problem

// First query in session
const embedding = await generateEmbedding(query);
// Takes 200-500ms to initialize model

// Subsequent queries
const embedding2 = await generateEmbedding(query2);
// Takes 20-50ms (model cached)

4. Context Window Limits

// Most models have token limits
const maxTokens = 512;  // ~400 words

// Long documents need chunking
const chunks = splitIntoChunks(longDocument, maxTokens);
const embeddings = await Promise.all(
  chunks.map(chunk => generateEmbedding(chunk))
);

5. Exact Match Can Be Worse

// Query: "React.useState"
// Full-text: Finds exact "React.useState"
// Semantic: Might return general React state management docs
//          (less precise for exact API names)

Hybrid Search (Best of Both)

Combine full-text and semantic search:

async function hybridSearch(query: string) {
  // Full-text search
  const fulltextResults = await db.execute(`
    SELECT id, ts_rank(search_vector, to_tsquery(?)) * 2 as score
    FROM articles
    WHERE search_vector @@ to_tsquery(?)
  `, [query, query]);

  // Semantic search
  const queryEmbedding = await generateEmbedding(query);
  const semanticResults = await db.execute(`
    SELECT id, vector_distance_cos(embedding, vector(?)) as score
    FROM articles
    ORDER BY score DESC
    LIMIT 20
  `, [queryEmbedding]);

  // Merge and re-rank
  const combined = mergeResults(fulltextResults, semanticResults);

  return combined.sort((a, b) => b.totalScore - a.totalScore);
}

When to Use Hybrid

  • Technical documentation: Exact API names (full-text) + concepts (semantic)
  • E-commerce: Product codes (full-text) + descriptions (semantic)
  • Code search: Function names (full-text) + purpose (semantic)

Embedding Providers

Local (Xenova Transformers)

// Pros: Free, private, no API limits
// Cons: Slower, uses CPU/GPU

import { pipeline } from '@xenova/transformers';
const embedder = await pipeline('feature-extraction',
  'Xenova/all-MiniLM-L6-v2');
const embedding = await embedder(text);

OpenAI

// Pros: High quality, fast
// Cons: Costs money, rate limits

import { OpenAI } from 'openai';
const openai = new OpenAI();
const response = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: text,
});
const embedding = response.data[0].embedding;

Gemini

// Pros: High quality, generous free tier
// Cons: Rate limits

import { GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({ model: 'embedding-001' });
const result = await model.embedContent(text);
const embedding = result.embedding.values;

Use Cases

✅ Excellent For

  • Documentation search: Natural language queries
  • Customer support: Find relevant help articles
  • Content discovery: "More like this" recommendations
  • Question answering: Match questions to answers
  • Knowledge bases: Conceptual search across docs

⚠️ Consider Alternatives

  • Exact code search: Use full-text or grep
  • Product SKUs: Use full-text or database queries
  • Date/numeric filtering: Use traditional indexes
  • Very large scale: Specialized vector databases (Pinecone, Weaviate)

Resources

Last updated: October 16, 2025