(Update: Less than four hours after tweeting out this blog post, we got our access turned back on. So the Google support team is definitely listening on social media. We very much appreciate that, because it resolves this issue as a crisis. We are still concerned by the “auto-off” trend and the missing button. But we will be working to make sure there is a better long term solution. Will update this post as appropriate moving forward.)
So today our Google Cloud Account was suspended. This is a pretty substantial problem, since we had committed to leveraging the Google Cloud at DocGraph. We presumed that Google Cloud was as mature and battle tested as other carrier grade cloud providers like Microsoft, Rackspace and Amazon. But it has just been made painfully clear to us that Google Cloud is not mature at all.
Last Thursday, we were sent a message titled “Action required: Critical Problem with your Google Cloud Platform / API Project ABCDE123456789” here is that message.
Which leads to our first issue Google is referring to the project by its id, and not its project name. It took us a considerable amount of time to figure out what they were talking about when they said “625834285688”. We probably lost a day trying to figure out what that meant. This is the first indication that they would be communicating with us in a manner that was deeply biased towards how they view their world of their cloud service internally, totally ignoring what we were seeing from the outside. While that was the first issue, it was nowhere near the biggest.
The biggest issue is that it was not possible to complete the “required action”. Thats right, Google threatened to shut our cloud account down in 3 days unless we did something… but made it impossible to complete that action.
Note that they do not actually detail the action needed, in the “action required” email. Instead they refer to a FAQ, where you find these instructions:
So we did that.. and guess what, we could not find the blue “Request an appeal” button anywhere. So we played a little “wheres waldo” on the Google Cloud console.
- We looked where they instructed us to.
- We looked at the obvious places
- We looked at the not-obvious places
As far as we can tell, there was no “Request an appeal” button anywhere in our interface. Essentially, this makes actually following the request impossible.
So we submitted a support request saying “Hey you want us to click something, but we cannot find it” and also “what exactly is the problem you have with our account in any case?”
However, early yesterday morning, despite us reaching out to their support services to figure out what was going on, Google shut our entire project down. Note that we did not say “shutdown the problematic server” or even “shutdown all your servers”. Google Cloud services shutdown the entire project. Although we use multiple google cloud APIs we thought it made sense to keep everything we were doing on the same project. For those wondering that is a significant design flaw, since Google has fully-automated systems that can shut down entire projects that cannot be manually overridden. (Or at least, they were not manually overridden for us).
We have lost access to multiple critical data stores because Google has an automated threat detection system that is incapable of handling false positives. This is the big takeaway: It is not safe to use any part of Google Cloud Services because their threat detection system has a fully automated allergic reaction to anything that has not seen before, and it is capable of taking down all of your cloud services, without limitation.
So how are we going to get out of this situation? Google offers support solutions where you can talk to a person if you have a problem. We view it as problematic that interrupting an “allergic reaction” as a “support issue”. However, we would be willing to purchase top-tier support in order to get this resolved quickly. But there does not appear to be an option to purchase access to a human to get this resolved. Apparently, we should have thought about that before our project was suspended.
Of course, we are very curious as to why our account was suspended. As data journalists, we are heavy users of little-known web services. We suspect that one of our API client implementations looked to Googles threat detection algorithms like it was “hacking” in one way or another. There are other, much less likely explanations, but that is our best guess as to what is happening.
But we have not idea what the problem is, because Google has given us no specific information about where to look for the problem. If were actually doing something nefarious, we would know which server was the problem. We would know exactly how we are breaking the rules, but because we are (in one way or another) a false positive in their system, we have no idea where to even start looking for the traffic pattern that Google finds concerning.
Now when we are logged in, we simply see an “appeal” page that asserts, boldly “Project is in violation of Google’s Terms of Service”. There is no conversation capacity, and filling out the form appears to simply loopback to the form itself.
It hardly matters, Googles support system is so completely broken, that this issue represents a denial of service attack vector. The simplest way to take down any infrastructure that relies on Google would be to hack a single server, and then send out really obvious hack attempts outbound from that server. Because Google ignores inbound support requests and has a broken “action required” mechanism, the automated systems will automatically take down an entire companies Cloud infrastructure, no matter what.
Obviously, we will give Google a few hours to see if they can fix the problem and we will update this article if they respond in a reasonable timeframe, but we will likely have to move our infrastructure to a Cloud provider that has a mature user interface and support ticketing system. While Google Cloud offers some powerful features, they are not safe to use until Google abandons its “guilty until proven innocent, without an option to prove” threat response model.