Basic Text Search - Before Full-Text Search
Before full-text search capabilities were built into databases, searching text meant using basic SQL pattern matching. This approach had significant limitations that made it unsuitable for real search applications.
The LIKE Operator
Basic Pattern Matching
-- Find articles with "javascript" anywhere in title
SELECT * FROM articles
WHERE title LIKE '%javascript%';
-- Case-insensitive (PostgreSQL)
SELECT * FROM articles
WHERE title ILIKE '%javascript%';
-- Case-insensitive (MySQL)
SELECT * FROM articles
WHERE LOWER(title) LIKE LOWER('%javascript%');
Multiple Terms
-- Find articles with both "javascript" AND "async"
SELECT * FROM articles
WHERE title LIKE '%javascript%'
AND title LIKE '%async%';
-- Find articles with "javascript" OR "typescript"
SELECT * FROM articles
WHERE title LIKE '%javascript%'
OR title LIKE '%typescript%';
Why LIKE Queries Sucked
1. Performance
Cannot Use Indexes
-- This query does a full table scan
SELECT * FROM articles
WHERE content LIKE '%search term%';
-- Even with an index on content, it's not used
CREATE INDEX idx_content ON articles(content);
-- Index is useless for %term% patterns
O(n) Complexity
- Every query scans every row
- No way to optimize
- Gets slower as data grows
2. No Relevance Ranking
-- Returns results in insertion order, not relevance
SELECT * FROM articles
WHERE content LIKE '%javascript%';
-- No way to know which results are most relevant
All results are equal - no concept of:
- Term frequency
- Document importance
- Match quality
3. Poor Matching
Exact Substrings Only
-- Finds "running"
SELECT * FROM articles WHERE content LIKE '%running%';
-- Does NOT find "run", "runs", "ran"
No Stemming
LIKE '%run%'
won't match "running", "runs", "ran"- Must search for every variant manually
No Stop Words
- Searches "the", "and", "or" waste resources
- No way to exclude common words
4. No Multi-Field Search
-- Searching multiple columns is verbose
SELECT * FROM articles
WHERE title LIKE '%javascript%'
OR content LIKE '%javascript%'
OR tags LIKE '%javascript%';
-- No way to weight fields differently
5. Complex Queries Are Impossible
-- Phrase search is clunky
WHERE content LIKE '%async function%'
-- Proximity search? Not possible
-- "javascript" within 5 words of "async"? Can't do it
-- Fuzzy matching? Nope
-- Synonyms? Manual only
Real-World Problems
E-commerce Search
-- User searches "laptop computer"
SELECT * FROM products
WHERE name LIKE '%laptop%'
AND name LIKE '%computer%';
-- Problems:
-- - Product "laptop" excluded (doesn't contain "computer")
-- - Results not ranked by relevance
-- - Slow on 1M+ products
-- - Can't boost popular items
Documentation Search
-- User searches "how to deploy application"
SELECT * FROM docs
WHERE content LIKE '%how%'
AND content LIKE '%deploy%'
AND content LIKE '%application%';
-- Problems:
-- - Matches "how", "to", "deploy" literally
-- - Doesn't match "deployment", "deploying"
-- - No concept of document relevance
-- - Extremely slow
Blog Search
-- User searches "react hooks tutorial"
SELECT * FROM posts
WHERE title LIKE '%react%'
AND content LIKE '%hooks%'
AND content LIKE '%tutorial%';
-- Problems:
-- - Doesn't understand "react" is more important than "tutorial"
-- - Can't boost newer posts
-- - No way to search tags efficiently
-- - Full table scan on every search
Workarounds (All Bad)
Pre-computing Search Columns
-- Concatenate searchable fields
ALTER TABLE articles
ADD COLUMN search_text TEXT;
UPDATE articles
SET search_text = LOWER(title || ' ' || content || ' ' || tags);
-- Still requires full scan
SELECT * FROM articles
WHERE search_text LIKE '%javascript%';
Multiple LIKE Clauses
-- Try to be smart with OR conditions
SELECT * FROM articles
WHERE content LIKE '%javascript%'
OR content LIKE '%Java Script%'
OR content LIKE '%js%'
OR content LIKE '%JS%';
-- Verbose, slow, still no ranking
Application-Side Filtering
// Fetch everything, filter in application
const articles = await db.query('SELECT * FROM articles');
const results = articles.filter(a =>
a.content.toLowerCase().includes(searchTerm.toLowerCase())
);
// Even worse - transfers all data!
Regular Expressions
-- PostgreSQL regex
SELECT * FROM articles
WHERE content ~ 'javascript|typescript';
-- Still slow, no ranking, complex syntax
Performance Impact
Benchmark (1 million rows)
Operation | Time | Notes |
---|---|---|
LIKE '%term%' |
2000ms | Full table scan |
LIKE 'term%' |
50ms | Can use index (prefix only) |
Full-text search | 10ms | Indexed, ranked |
Semantic search | 50ms | Vector index, relevance |
Why So Slow?
-- PostgreSQL EXPLAIN
EXPLAIN SELECT * FROM articles WHERE content LIKE '%javascript%';
-- Result:
-- Seq Scan on articles (cost=0.00..10000.00 rows=100)
-- Filter: (content ~~ '%javascript%'::text)
-- Sequential scan = reads every row
-- No index usage possible
The Index Problem
Prefix Indexes Work
-- This CAN use an index
SELECT * FROM articles
WHERE title LIKE 'javascript%';
-- Index scan, fast!
Suffix/Infix Indexes Don't
-- This CANNOT use an index
SELECT * FROM articles
WHERE title LIKE '%javascript%';
-- Full table scan, slow!
Why?
B-tree indexes store data in sorted order:
'a'
'aa'
'aaa'
'javascript'
'javascriptcore'
- Prefix search: Start at 'javascript', read sequentially
- Suffix search: Would need to check every entry
When LIKE Actually Works
Autocomplete (Prefix Search)
-- Fast with index
SELECT name FROM users
WHERE name LIKE 'joh%'
LIMIT 10;
-- Works because:
-- 1. Prefix search can use index
-- 2. LIMIT reduces rows scanned
-- 3. Small result set
Exact Matching
-- Use = instead of LIKE
SELECT * FROM articles
WHERE slug = 'how-to-deploy';
-- Fast with unique index
Very Small Tables
-- 100 rows? LIKE is fine
SELECT * FROM settings
WHERE key LIKE '%cache%';
-- Fast enough, don't overcomplicate
Migration Path
From LIKE to Full-Text
-- Before
SELECT * FROM articles
WHERE content LIKE '%javascript%';
-- After (PostgreSQL)
SELECT * FROM articles
WHERE to_tsvector(content) @@ to_tsquery('javascript');
-- After (MySQL)
SELECT * FROM articles
WHERE MATCH(content) AGAINST('javascript');
From LIKE to Semantic
-- Before
SELECT * FROM articles
WHERE content LIKE '%how to deploy%';
-- After (with embeddings)
SELECT *, vector_distance_cos(embedding, query_vector) as similarity
FROM articles
ORDER BY similarity
LIMIT 10;
Key Takeaways
LIKE Problems
- Performance: Full table scans, O(n) complexity
- No relevance: Results not ranked
- Poor matching: Exact substrings only
- No stemming: Can't match word variants
- Complex queries: Impossible or very slow
When LIKE Is OK
- Small tables (< 1000 rows)
- Prefix search with index
- Exact matching
- Internal tools with low usage
Better Alternatives
- Full-text search: For keyword search (10-100x faster)
- Semantic search: For natural language (more relevant)
- Dedicated search: Algolia, Elasticsearch for complex needs
Evolution of Search
1970s: LIKE queries (slow, no ranking)
↓
1990s: Full-text search (fast, ranked)
↓
2000s: Elasticsearch (distributed, scalable)
↓
2020s: Semantic search (AI, understands meaning)
Conclusion
LIKE queries were a necessary evil before full-text search existed. They're slow, can't be indexed properly, provide no ranking, and don't understand language. Modern databases have built-in full-text search, and AI-powered semantic search is becoming standard for user-facing search.
Rule of thumb: If you're using LIKE '%term%'
for search in production, you're probably doing it wrong.
Resources
- PostgreSQL Full-Text: postgresql.org/docs/current/textsearch
- MySQL Full-Text: dev.mysql.com/doc/refman/fulltext-search
- Query Performance: use-the-index-luke.com