Ask Flux
Help Center
How can we help?

2026-03-06 ChargeAssist API performance degradation

RCA for ChargeAssist API incident

Summary of Impact

On 6 March at approximately 09:30 CET, customers experienced significant slowdowns and occasional unavailability of the ChargeAssist API.

This led to delays in processing API requests and interruptions in normal operations for services depending on the API.

Timeline of Event

Time (CEST)
Event
09:30
Customers reported that API responses were extremely slow or not available.
09:40
Development team identified the underlying cause and applied mitigation by scaling up affected services.
09:48
The system was fully recovered and API performance returned to normal levels.

Confirmed Root Cause

The performance issues were caused by failures in one of the external webhook endpoints. This endpoint repeatedly returned errors, which led to a buildup of failed webhook delivery attempts. Because the API and the webhook processing were running on the same shared infrastructure, the increased load from these repeated retries consumed system resources. As a result, the ChargeAssist API became slow and, at times, temporarily unavailable.

Mitigation and Resolution Steps

To restore normal operations, the following actions were taken:

  • Increased compute capacity by scaling up service instances.
  • Cleared queued webhook messages that were contributing to the resource overload.

These steps immediately stabilized the system and restored API performance.

Lessons Learned & Improvement Actions

This incident revealed that external webhook failures can significantly affect API performance when both services share the same infrastructure.

To prevent similar issues in the future, we will work on the following improvements:

System Improvements

  • Isolate webhook processing components from the main API by running them on separate service plans or compute resources.
  • Strengthen webhook retry and throttling mechanisms to ensure they remain stable even under persistent failure conditions.

Process Improvements

  • Enhance monitoring and alerting around webhook error patterns and retry spikes.
  • Review circuit breaker configurations to better protect core API resources.
Did this answer your question?
😞
😐
🤩