← Back to portfolio

Data

Lucene Search Engine Index + Query Pipeline

Built a search engine pipeline following Lucene fundamentals: ingesting and indexing text files, then running queries with analyzers and relevance scoring. Worked through build automation using Apache Ant and validated search behavior with test datasets.

Role

Search / Data Engineer

Timeframe

Sep 2024 - Oct 2024

Highlights

  • Indexed large sets of .txt documents into a Lucene index with analyzers and structured fields.
  • Implemented query workflows to return ranked results and validate relevance behavior.
  • Used Apache Ant for repeatable builds and Luke to inspect/verify index structure.
  • Documented how tokenization and analyzers impacted retrieval quality.

Tech Stack

JavaApache LuceneApache AntLukeGit