Operation Elapsed property is not calculated properly for completed operations with retriable response status
Description
I reproduced the issue on the scenario with an error VBucketBelongsToAnotherServer so let’s look at it as an example.
Note: it is important to be able to simulate an operation that completes w/t exceptions, but retriable status code and to retry the operation until timeout.
When this kind of response is received, SDK calls HandleOperationCompleted and this method regardless of the branch always calls StopRecording method which actually stops internal Stopwatch object that an operation uses to track how much time the operation took to be completed.
Assuming that no exceptions are thrown in the above try block we see that eventually it calls:
Since it’s not a transport failure or something similar the ExecuteOp method is successfully completed with the response status as a return value.
Going up the stack we reach RetryOrchestrator.RetryAsync method which checks if the status is retriable. In our case it is retriable which means that the operation will be attempted to execute again. Thus the operation will be retried until the cancellation token provided is not cancelled by timeout.
And as we know when the operation was completed on the first time, its stopwatch was stopped, so {{Elapsed }}represents only the first attempt of the operation, so this code will throw TaskCancelledException, but in fact should throw TimeoutException.
I reproduced the issue on the scenario with an error VBucketBelongsToAnotherServer so let’s look at it as an example.
Note: it is important to be able to simulate an operation that completes w/t exceptions, but retriable status code and to retry the operation until timeout.
When this kind of response is received, SDK calls HandleOperationCompleted and this method regardless of the branch always calls StopRecording method which actually stops internal Stopwatch object that an operation uses to track how much time the operation took to be completed.
Assuming that no exceptions are thrown in the above try block we see that eventually it calls:
Which completes the task, so that [the rest of the ClusterNode.ExecuteOp code|https://github.com/couchbase/couchbase-net-client/blob/91b3674ab2c40f760c1ec75092f89c85b18ec9f2/src/Couchbase/Core/ClusterNode.cs#L555-L559] might be executed
Since it’s not a transport failure or something similar the ExecuteOp method is successfully completed with the response status as a return value.
Going up the stack we reach RetryOrchestrator.RetryAsync method which checks if the status is retriable. In our case it is retriable which means that the operation will be attempted to execute again. Thus the operation will be retried until the cancellation token provided is not cancelled by timeout.
After timeout the execution goes to the catch block in the RetryAsync method. And there is the following code:
And as we know when the operation was completed on the first time, its stopwatch was stopped, so {{Elapsed }}represents only the first attempt of the operation, so this code will throw
TaskCancelledException
, but in fact should throwTimeoutException.