Why it matters
Users notice latency above 500ms and hang up above one second. In an already optimized pipeline, 75ms of network latency from models sitting in a different data center adds 30% overhead. Colocating everything in the same building drops that to around 5ms. Rishabh Bhargava from Together AI walks through the full speech
My takeaway: Users notice latency above 500ms and hang up above one second. In an already optimized pipeline, 75ms of network latency from models sitting in a different data center adds 30% overhead. Colocating everything in the same building drops that to around 5ms. Rishabh Bhargava from Together AI walks through the full speech