Upgrade swap rebalance is re-tried with different params on operator pod deletion

Description

Cluster Setup

  • Kind cluster locally run on Mac

  • 7 nodes with services kv, index, n1ql

  • 6 buckets

  • Initial Cluster version : 7.6.0-2176

  • Upgrade Cluster version : 7.6.1-3200

Steps taken in the scenario

  • Created a cluster

  • Created 6 buckets

  • Issued an upgrade from 7.6.0-2176 to 7.6.1-3200 using swap rebalance

  • After one pod is successfully upgraded, stopped/failed the rebalance on the second pod upgrade swap rebalance. (cb-example-0002 was ejected and cb-example-0008 was the upgraded pod)

  • Also deleted the operator pod.

  • When the new operator pod comes back, rebalance is re-tried with different params.

  • cb-example-0000 was ejected and cb-example-0008 was added as the upgraded pod.

Rebalance before operator pod restart

 

Rebalance post operator pod restart

Issue

  • The upgrade stopped to behave like a swap rebalance that was intended with operator pod kill.

  • If 3 buckets are already swapped between cb-example-0002 and cb-example-0008 in the first rebalance which then failed, the remaining 3 should be rebalanced between the same pods.

  • By ejecting a new pod data, the existing data on cb-example-0008 has to be deleted and a new swap rebalance is started. This nullifies the advantages of the swap rebalance.

  • When the number of buckets are high and the data size is in terrabytes, this causes a huge performance deterioration.

  • The rebalance should be retried with the same configurations


Operator logs : https://cb-engineering.s3.amazonaws.com/K8S-3605/cbopinfo-20240801T185041+0530.tar.gz

Cluster logs : 
https://cb-engineering.s3.amazonaws.com/K8S-3605/collectinfo-2024-08-01T132027-ns_1%40cb-example-0007.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3605/collectinfo-2024-08-01T132027-ns_1%40cb-example-0008.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3605/collectinfo-2024-08-01T132027-ns_1%40cb-example-0009.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3605/collectinfo-2024-08-01T132027-ns_1%40cb-example-0010.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3605/collectinfo-2024-08-01T132027-ns_1%40cb-example-0011.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3605/collectinfo-2024-08-01T132027-ns_1%40cb-example-0012.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3605/collectinfo-2024-08-01T132027-ns_1%40cb-example-0011.cb-example.default.svc.zip
Cluster deployment : https://cb-engineering.s3.amazonaws.com/K8S-3605/couchbase-cluster.yaml

Bucket deployment : https://cb-engineering.s3.amazonaws.com/K8S-3605/couchbase-buckets.yaml

Cluster upgrade deployment : https://cb-engineering.s3.amazonaws.com/K8S-3605/couchbase-cluster-upgrade.yaml


  The cao tool and operator images were built locally on this commit

Environment

Initial Cluster version : 7.6.0-2176 Upgrade Cluster version : 7.6.1-3200 Kubernetes Version : v1.30.0 CAO and operator : 2.7.0 built locally Environment : Kind cluster

Release Notes Description

None

Attachments

2

Activity

Show:

Aryaan Bhaskar February 12, 2025 at 6:45 AM

Tested on Operator 2.8.0-310, the issue has been resolved.

CB robot December 3, 2024 at 8:35 AM

Build couchbase-operator-2.8.0-245 contains couchbase-operator commit 98441f7 with commit message:
Fixed swap rebalance parameters

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Labels

Story Points

Components

Sprint

Fix versions

Affects versions

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created August 1, 2024 at 1:00 PM
Updated February 24, 2025 at 1:35 PM
Resolved December 3, 2024 at 8:06 AM
Instabug