Service outage in AP1

Incident Report for Live Assist for Microsoft Dynamics 365

Postmortem

Incident Report

Root Cause Analysis

Summary

An infrastructure fault was experienced by our cloud service provider (CSP) resulted in a number of our servers being taken offline.

Detailed Summary

A hardware issue experienced by our CSP resulted in a number of our servers being taken offline ungracefully and automatic recovery mechanisms failed to restore them. This resulted in the servers constantly trying to start.

Resolution

Summary of resolution

Quarantine the failed servers to restore service. Manual intervention was required to redeploy the servers to new hardware.

This was a very rare occurrence. New procedures are now in place to detect this kind of fault so we will not be impacted in this manner again.

Posted Nov 22, 2018 - 13:19 UTC

Resolved

There was a service outage for the Admin Portal and for the Agent Widget service,
We identified this issue and were able to restore service.
We have now resolved this Issue.

Posted Nov 18, 2018 - 00:57 UTC

This incident affected: Co-browse Service, Chat Service, Voice and Video Escalation Service, Administration Portal, and Billing Portal.