Live CME outage

Incident Report for Databento

Postmortem

Summary

On July 30, 2025 17:21Z, all production CME Globex live gateways had an issue with the network stack and stopped normalizing data. This lasted until 17:48Z. Failover instances without gaps were brought into the production pool at 17:56Z.

ICE live gateways also experienced the same issue with the network stack at 17:21Z, but continued processing data with degraded performance. Failover instances were activated at 18:03Z.

Timeline

  • July 30, 2025 17:21Z: an issue with the network stack on the CME Globex and ICE production live gateways begins. The CME Globex instance stopped normalizing market data
  • July 30, 2025 17:23Z: internal alerting is triggered
  • July 30, 2025 17:47Z: an incident is created on Statuspage
  • July 30, 2025 17:48Z: CME Globex live gateways are restarted and resume processing data
  • July 30, 2025 17:56Z: the gapless failover gateways are brought online for CME Globex
  • July 30, 2025 18:03Z: the gapless failover gateways are brought online for ICE
  • July 30, 2025 18:08Z: the incident is announced as resolved

Root cause

An issue with the network stacked used in our GLBX and ICE live gateways. We've seen this behavior a handful of times now. We opened a ticket with them back on February 3, however, investigations by the vendor and our own have not led to any clues.

Impact

The network stack issue for GLBX meant no packets were being delivered to the socket and customers stopped receiving market data updates. Customers remained connected to the gateways, but would have only received heartbeats. ICE datasets suffered here as well, however packets were still passed through to the socket. ICE customers saw high rates of packet loss and increased latency.

Corrective and preventative measures

We'll continue to work with the vendor to identify the root cause of the loss of network acceleration. Failover took longer than anticipated, and as a result we've already made changes to speed up this process to reduce the length of the impact.

We're setting up automated incident response to improve response time in the future. We'll also move the ICE live gateways to their own NIC to limit the impact, should the issue reoccur.

Posted Aug 06, 2025 - 14:24 UTC

Resolved

This incident has been resolved.
Posted Jul 31, 2025 - 01:42 UTC

Monitoring

We updated our pool of production gateways for CME to ones without gaps. We also updated the ICE production gateways after observing decreased performance. Clients are instructed to reconnect.
Posted Jul 30, 2025 - 18:08 UTC

Identified

We are investigating an outage with live CME data out of our Aurora DC3 datacenter.
Posted Jul 30, 2025 - 17:21 UTC
This incident affected: Live CME - Aurora DC3 and Live ICE - Aurora DC3.