I gave my phone number to an AI. The naive path was Gemini Live for ~200ms speech-to-speech. That didn't survive contact with reality. Here's the three-stage pipeline I fell back to, the per-stage latency budget it forced, and the UX trick that makes 700ms feel like 300.
VoiceLiveKitLLMLatency