TTFT measure #11300
Unanswered
kde-kairntech
asked this question in
Q&A
TTFT measure
#11300
Replies: 1 comment
-
TTFT does not include the startup time of the vllm engine. If the engine is not ready, you are supposed to get a "500 Internal Server Error" response. TTFT starts when the server receives each request and includes the time the request is in the pending queue. Therefore, when you run a benchmark with high request rates, TTFT will increase, both for mean and median. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I may have misunderstood the ‘TTFT’ measurement. From my point of view, the median value is the median time to have the first token generated since the request was sent. Why, using LLama-3.1-70B on an H100, do I get around 65s? Does this include a start-up time or something like that? Thanks a lot :-)
Beta Was this translation helpful? Give feedback.
All reactions