Chapter 4
Data Foundation
Garbage in, garbage out. The search engine is only as good as the data you feed it. This chapter covers the often-ignored but critical work of modeling, cleaning, and maintaining your data pipeline.
In This Chapter
4.1 Quality as a Data Problem
Why code changes rarely fix data issues.
4.2 Types of Data
Text, Keyword, Numeric, Date, and Geo. Choosing the right tool.
4.3 Document Modeling
Denormalization, nested objects, and parent-child relationships.
4.4 Text vs Structured
When to analyze and when to exact match. Common pitfalls.
4.5 Cleaning & Normalization
Dealing with HTML tags, encoding errors, and whitespace.
4.6 Freshness & Updates
Near-real-time ingestion vs batch processing.
4.7 Deletes & Reindexing
Handling soft deletes, hard deletes, and index aliasing.