Intermittent timeouts and slowness in Brick River
Incident Report for Brick River
Resolved
The issue has been resolved by Microsoft. Here is their statement on what happened. Technical stuff coming...

+++++++++++++++++++++++++++++++++++

Summary of Impact: Between 19:43 and 22:35 UTC on 02 May 2019, customers may have experienced intermittent connectivity issues with Azure and other Microsoft services (including M365, Dynamics, DevOps, etc). Most services were recovered by 21:30 UTC with the remaining recovered by 22:35 UTC.

Preliminary Root Cause: Engineers identified the underlying root cause as a nameserver delegation change affecting DNS resolution and resulting in downstream impact to Compute, Storage, App Service, AAD, and SQL Database services. During the migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated. No customer DNS records were impacted during this incident, and the availability of Azure DNS remained at 100% throughout the incident. The problem impacted only records for Microsoft services.

Mitigation: To mitigate, engineers corrected the nameserver delegation issue. Applications and services that accessed the incorrectly configured domains may have cached the incorrect information, leading to a longer restoration time until their cached information expired.

Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences. A detailed RCA will be provided within 72 hours.
Posted May 02, 2019 - 20:46 EDT
Update
We're almost there. Microsoft is saying "Mitigation Applied and Validating Final Recovery"

So looks like we're out of the woods. We're not seeing any more problems on our end. We'll send the official "resolved" message when we get final confirmation from Microsoft.
Posted May 02, 2019 - 20:01 EDT
Update
According to Microsoft...Most services are showing recovery.

We have been constantly checking Microsoft's DNS as well and we too see that it seems to be recovering.

This is the latest from Microsoft... They part is the last sentence...
Starting at 19:43 UTC on 02 May 2019, customers may experience intermittent connectivity issues with Azure and other Microsoft services (including M365, Dynamics, DevOps, etc).

Engineers have identified the underlying root cause as a name server delegation issue with DNS resolution, affecting network connectivity and downstream impact to Compute, Storage, App Service, AAD, and SQL Database services. Mitigation has been applied, and engineering teams are clearing resolver cache to fully mitigate the issue. Most services are showing recovery.
Posted May 02, 2019 - 19:14 EDT
Update
Just heard from Microsoft and they are saying "some customers may start to see recovery".

This is what Microsoft just wrote...
Customers may experience intermittent connectivity issues with Azure and other Microsoft services (including M365, Dynamics, DevOps, etc). Engineers are investigating DNS resolution issues affecting network connectivity. Connectivity issues are resulting in downstream impact to Compute, Storage, and Database services, and some customers may be unable to file support requests. More information will be provided as it becomes available.

Some customers may start to see recovery.
Posted May 02, 2019 - 18:10 EDT
Update
Microsoft (Azure) is still working on the issue. It's a DNS problem that's affecting all their customers.
Posted May 02, 2019 - 17:45 EDT
Update
It looks like it is a Microsoft Azure DNS issue.

HERE IS WHAT MICROSOFT JUST WROTE
Customers may experience intermittent connectivity issues with Azure and other Microsoft services (including M365, Dynamics, DevOps, etc). Engineers are investigating DNS resolution issues affecting network connectivity. Connectivity issues are resulting in downstream impact to Compute, Storage, and Database services, and some customers may be unable to file support requests. More information will be provided as it becomes available.
Posted May 02, 2019 - 17:25 EDT
Monitoring
We now know that the cause is Microsoft Azure (their cloud) having problems. We are monitoring the issue.

You can see this 21 second video
https://www.screencast.com/t/VC1Uc10fagZ8

Microsoft Azure is Microsoft's cloud that we and thousands of other companies across the world are on. So it's affecting companies everywhere.
Posted May 02, 2019 - 17:12 EDT
Identified
We have seen that the cause of the issue is something going on on Microsoft Azure. Here's what they posted.

MICROSOFT AZURE POSTED THIS...
We are investigating reports of connectivity issues with Azure Services. More information will be provided as it becomes available.

https://azure.microsoft.com/en-us/status/
Posted May 02, 2019 - 16:42 EDT
Update
We are continuing to investigate this issue.
Posted May 02, 2019 - 16:41 EDT
Investigating
We are seeing intermittent slowness and timeouts in Brick River websites and the system. It's not affecting everyone all the time but it is a problem. We are investigating.
Posted May 02, 2019 - 16:39 EDT
This incident affected: Brick River Websites, Brick River Web Console, Brick River Reg System, and Brick River Emailer.