Break Down : Google Maps Outage(18th March 2022)

Asutosh Panda
3 min readMar 29, 2022

Another outage and another breakdown. On 18th March Google Maps was down for 2 times from 08:27 to 10:58 and 12:08 to 12:11. People were experiencing high error rates and elevated response latency. The services which got impacted by this are : Google Maps JavaScript API, Maps Static API, Directions API, Maps SDK for Android, Maps SDK for iOS, Navigation SDK, and Gaming Services. The Maps Tile API was down for around 3 hours and 45 minutes globally. Many of these were experiencing HTTP 500 errors. Let’s checkout the breakdown of the outage.

Root Cause

Every now and then Google rolls out new features under highly controlled and monitored environments. But this time it ignited an issue that was similar to the GitHub outage issue. They were trying to rollout a new feature for Google Maps which after implementation ate up the current allocated resources for the service. All the components of such a huge applications are interlinked, no doubt some other component will face an impact if one goes down. Here Google Backend Map Tile-rendering servers started giving time out errors.

  • Cause for Latency : This Map Tile service of Google Maps is responsible for mapping the coordinates with the pixels. The service which was providing Google Maps Platform API exhausted the resources allocated to it following that Map Tile kept on trying to send request to the service. Tile includes internal queues to to hold the retry requests, eventually that also touched it’s limit capacity. As a result Tile Servers ran out of memory, started crashing which in the end directed to increased latency. That’s the reason many were getting rejections with HTTP 503 errors.
  • DoS in Tile Servers : The Tile-rendering server outage lead to the failure of Maps APIs. In return the external clients were trying to hit the APIs 10x than the usual requests
  • Cause of failure of other SDKs : Domino effect won and one service after another collapsed once the Tile servers were down. Maps SDK and Navigation SDK clients got most effected by this.

Remediation by Google

I have mentioned in previous articles also, the way a company responds and handles to it’s outages tells a lot about it’s engineering brilliance and culture. As a basic they gave a alert for the rate limit in monitoring services. Their engineering team went on tracing the error logs for the root cause. So when they got to know that a rollout feature was the origin of all this they disabled it for the moment as a part of Mitigation. In return all the cascading errors/failures were eradicated and the services were back to normal. Server Outage, API service down, DoS everything was out of picture.

What they could have done to avoid it

Few things are there that Google Maps team could have done to avoid this kind of resource running out issues. :-

  1. Some sort of mechanism that will reject more requests once the service reaches to it’s limit along with it the ongoing services also shouldn’t be impacted by this.
  2. They can put Bounds for a discrete-time multi-server queue. Bounds on different parameter too as in : the time interval, the load of request, waiting time.
  3. Similarly every time when some new features gets introduced there should be some monitoring parameters which will alert the team about the usage of allocated resources.
  4. By blocking the internal traffic to Backend server they can cut the some of the overlay features.
  5. Faster health checks, notifications, post regarding major outages should be visible instantly on the monitoring board of the Application.

Thanks for reading up to here😊

more resources to understand in-depth about it :-

https://developers.google.com/maps/documentation/javascript/coordinates#:~:text=Tiles%20in%20Google%20Maps%20are,increasing%20from%20north%20to%20south.

https://www.researchgate.net/publication/265615811_Bounds_and_an_approximation_for_single_server_queues

--

--

Asutosh Panda

I am a DevOps Engineer, interested in SRE and DevOps world, apart from tech I am into cinematography, poetry, dance