Actual Latency

Thu Aug 21 2025

Actual Latency is the real, measurable time it takes for a process to complete, from request to response. For my app, I can reduce this in a few ways:

Stream the LLM Response: Get the AI’s response word by word instead of waiting for the whole thing.
Chain the TTS Request: As soon as the first sentence is streamed from the LLM, send it to the TTS API. This pipelines the process.
Use Faster Models: Opt for models like Gemini 1.5 Flash that are optimized for speed.
Co-locate Services: Make sure my Firebase Functions are in the same region as the AI services to minimize network lag.