Data
Lucene Search Engine Index + Query Pipeline
Built a search engine pipeline following Lucene fundamentals: ingesting and indexing text files, then running queries with analyzers and relevance scoring. Worked through build automation using Apache Ant and validated search behavior with test datasets.
Role
Search / Data Engineer
Timeframe
Sep 2024 - Oct 2024
Highlights
- Indexed large sets of .txt documents into a Lucene index with analyzers and structured fields.
- Implemented query workflows to return ranked results and validate relevance behavior.
- Used Apache Ant for repeatable builds and Luke to inspect/verify index structure.
- Documented how tokenization and analyzers impacted retrieval quality.
Tech Stack
JavaApache LuceneApache AntLukeGit