Summary of Impact
Service bus, the message broker that integrates most of GFX services, suffered a period of slowness. This was caused by an infrastructure issue at Microsoft.
As a result, many services of GreenFlux were not responsive or showed delayed responses.
Detail of Events
Timestamps below refer to Central European Summer Time (CEST, local Amsterdam time)
2024-05-31 11:55: Software Engineer reaches out in incident chat after an alert was triggered.
2024-05-31 11:57: Customers informed about incident on StatusPage.
2024-05-31 11:58: Investigation of metrics shows overall slowness/delays on Service Bus. Ticket with Microsoft opened.
2024-05-31 12:05: Service Bus recovered, back within operational boundaries, GFX services still working on backlog of queued messages.
2024-05-31 13:36: System back to normal operating state.
Confirmed Root Cause of the Incident
Internal infrastructure issue at Microsoft.
Migration to Service Bus premium tier is being planned. This higher service tier will increase redundancy and isolation, possibly preventing similar issue to happen again.
Lessons Learned for System and Process Improvements
Service Bus migration to Premium Tier.
