The naive PII strategy is to scrub the corpus at index time. It's also the strategy that quietly destroys recall on every query that legitimately mentions a public entity. Here's why I moved the redaction pass downstream of retrieval — and how a DeBERTa PII model, an HNSW index, and a cross-encoder reranker fit inside a sub-2s p95 budget without stepping on each other.
RAGGDPRArchitectureSAP