ChargeAssist Webhooks publish failed for many session updates
Date of incident: 2026-02-26
Summary of Impact
Customers have started to reach out to us about not getting CDR_AVAILABLE statuses through their webhooks after getting COMPLETED status.
The session in database is already in CDR_available status, but there was an error happening before publishing the session webhook.
Timeline of Events
- 26/02 10:35 Support reported that customers were not receiving webhook notifications for the CDR_AVAILABLE status.
- 26/02 14:00 Developers started analyzing the issue.
- 26/02 15:00 The root cause of the issue was identified.
- 26/02 16:30 A code fix was implemented and integration tests were executed.
- 27/02 09:52 The fix was deployed to Production.
Confirmed Root Cause of the Incident
The issue was caused by an error thrown by our Database during the insertion of a CDR field.
Investigation revealed that one of the database containers had a hot partition, which intermittently caused failures. When this error occurred, it prevented the webhook publishing process from completing successfully.
Mitigation and Resolution
The issue was resolved by implementing a more appropriate partition key in ChargeAssist, eliminating the hot partition problem.
Lessons Learned for System and Process Improvements
- Incorrect partition key selection for the DB container led to a hot partition scenario.
- Missing alerting/monitoring for webhook publishing failures delayed detection of the issue.
