On Thursday afternoon, a deployment caused issues in our background order processing queue for some platforms. At first, this appeared to be a performance problem with orders being written to FTP with delays. We later discovered that some orders were being transmitted multiple times. Once identified, the deployment was rolled back and the issue resolved.
Incident timeline (manual log): 13:45 - Deployment to production 13:50 - First errors in monitoring tool of performance problems 13:52 - Investigation of this problems 14:00 - Orders observed with delays in being written to the FTP 14:21 - First fix of the problem being deployed 14:30 - Secondary errors being visible in the monitoring tools and further investigation 14:54 - Identification of some orders being transmitted multiple times, even though with the correct unique order ID 14:56 - Reverting and disabling of the implementation 15:00 - Working on manually getting any stuck orders through 15:30 - Confirmation that all orders have now gone through and order processing queue operating normally again