Actual Latency
Actual Latency is the real, measurable time it takes for a process to complete, from request to response. For my app, I can reduce this in a few ways:
-
Stream the LLM Response: Get the AI’s response word by word instead of waiting for the whole thing.
-
Chain the TTS Request: As soon as the first sentence is streamed from the LLM, send it to the TTS API. This pipelines the process.
-
Use Faster Models: Opt for models like Gemini 1.5 Flash that are optimized for speed.
-
Co-locate Services: Make sure my Firebase Functions are in the same region as the AI services to minimize network lag.