Eventing consumes large amount of CPU with no functions.
Description
Components
Affects versions
Labels
Environment
Link to Log File, atop/blg, CBCollectInfo, Core dump
Release Notes Description
Attachments
Activity
Sujay Gad October 7, 2021 at 5:17 AMEdited
Verified the fix on 7.0.2-6700, 7.1.0-1429.
STEPS
Create a cluster having 3 nodes with kv, index, query and eventing services colocated on each node.
Create 15 buckets each having 100MB RAM quota.
Delete and recreate all 15 buckets in quick succession.
Check CPU utilisation on all 3 nodes.
CASE A
Reproduced the issue on 7.0.2-6698.
CPU utilisation remains high on all 3 nodes after deletion and recreation of buckets.
CASE B
Verified the fix on 7.0.2-6700.
CPU utilisation was high only for a brief moment during bucket creation.
CASE C
Verified the fix on 7.1.0-1429.
CPU utilisation was high only for a brief moment during bucket creation.
CB robot October 5, 2021 at 4:36 PM
Build couchbase-server-7.0.2-6700 contains eventing commit 3c24dc9 with commit message:
: Fix goroutine leak due to bucket delete and recreate
Jon Strabala October 4, 2021 at 1:38 PMEdited
Jeelan and Rita’s the problem still occurs if I add a 65 second delay between the CRUD operations (I showed this in my prior tests above) and just adding buckets with no deletions.
So it his not dependent on “quick” CLI commands (although that does lower the threshold by a few buckets). Also once the high beam eventing-producer CPU issue occurs there seems to be no way to unwind other than removing the Eventing Service nodes and rebalancing or stopping and restarting every node (or deleting all my buckets I believe that I had to drop them all to stop the HTTP traffic and lower the CPU)
Maybe there are other work arounds or avoidance techniques like create the cluster KV nodes first then add your buckets the finally add the Eventing service (not sure as I haven’t tested this)
So 6.5.1 through 7.0.1 works with 30 buckets but if you use Eventing in 7.0.2 at 13 buckets no matter how careful you are your system goes into a busy spin. I also envision that customers with 15+ buckets that use Eventing will consistently run into this when they configure their test clusters.
CB robot October 4, 2021 at 7:59 AM
Build couchbase-server-7.1.0-1411 contains eventing commit 6fd5212 with commit message:
: Fix goroutine leak due to bucket delete and recreate
Jeelan Poola October 4, 2021 at 6:45 AM
Agree . Marking it for releasenote in 7.0.2. Also lowering the priority as it is not a common 80% use case. And there is an easy work around (delete the bkt and wait for a min or so).
Details
Assignee
Sujay GadSujay GadReporter
Jon StrabalaJon Strabala(Deactivated)Is this a Regression?
YesTriage
UntriagedStory Points
1Priority
CriticalInstabug
Open Instabug
Details
Details
Assignee
Reporter
Is this a Regression?
Triage
Story Points
Priority
Instabug
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

Sentry
Linked Issues
Sentry
Linked Issues
Sentry
Zendesk Support
Linked Tickets
Zendesk Support
Linked Tickets
Zendesk Support

When doing some tests on a 3 x t5.2xlarge AWS cluster I noticed that a set of symmetric servers (Data, Query, Index, Eventing) with default memory quotas have excessive CPU utilization when completely idle on two out of the three nodes. I am running Enterprise Edition 7.0.2 build 6683
Each node is a r5.2xlarge: 64 GiB of memory, 8 vCPUs, 64-bit platform
I created 20 buckets (default scope and default collection) loaded 50K small documents in each bucket and made a primary index in each.
There has never been an Eventing Function configured (nor does one exist in the Eenting UI) on any of the nodes, it seems like on two (2) of the nodes the "eventing-producer" and "beam.smp" interact adversely when they shouldn't. The first node (10.21.24.37) looks correct but the next two nodes (10.21.25.181 and 10.21.26.101) appear to have way too much CPU burned doing absolutely nothing the these nodes are both above 84% CPU utilization(while the first node is under 7%).
There is no issue if I drop Eventing as a Service from every node and re-run the exact same test (Data, Query, Index) there is no issue 20 buckets (default scope and default collection) loaded 50K small documents in each bucket and made a primary index in each. The result is every node looks the same in the idle state all measuring under 10% CPU utilization (9.3% 7.8% and 7.6%) see picture "compare_with_eventing_and_without_eventing.JPG"
ec2-user@ec2-15-223-36-143.ca-central-1.compute.amazonaws.com
private IP 10.21.24.37
ec2-user@ec2-3-99-49-144.ca-central-1.compute.amazonaws.com
private IP 10.21.25.181
ec2-user@ec2-15-223-36-53.ca-central-1.compute.amazonaws.com
private IP 10.21.26.101
I have attached CPU utilization pictures from both AWS and the Couchbase UI.