September 6, 2012 | Authored by: Vindicia Team Blogs
It's 4:00 AM. Do You Know Where Your Keys Are?
The details are sparse, yet, but it appears that Recurly has had a failure in their Hardware Security Module (HSM) that has left them unable to decrypt some or all of their clients' stored payment methods.
From the description on their blog, it seems that they had a primary/secondary setup, where the configuration on the primary was mirrored to the secondary. When the primary failed in a way that corrupted the key, that corruption was subsequently propagated to the secondary, leaving them with the encrypted account numbers, but no way to decrypt them.
As CTO at Vindicia, I've fielded a lot of questions today about what we do, and why I think that we're not vulnerable to the same problem. I am extremely confident that we are not exposed to the same risk. While I obviously don't want to give out all the details of how our encryption keys are stored and used on a public blog post, I can explain a little about how we do things that's different, and why that puts us in a much better position.
First, our analogous systems are clustered, not in a primary/secondary failover configuration. We do it this way both for reliability, but also for scalability - during peak times, we have more cards to decrypt than a single box could handle. Second, we never automatically propagate keys or configurations between them. If a single box in the cluster became corrupted, our monitoring systems would detect the failures to decrypt, and pull it out of rotation. Third, there is no way to propagate that error to other systems - no mechanism to automatically copy the data.
Obviously, the fundamental error here was one that we've all had beaten into our heads as technologists forever - have a backup! The trick is that people are rightly concerned about backing up something as sensitive as the key that decrypts all of your customers' credit card numbers.
We achieve that through a number of mechanisms. We have multiple copies in the decryption cluster in our primary datacenter in Las Vegas. There are also multiple copies in the identical cluster in our secondary datacenter in San Jose. Finally, we have two secure, undisclosed locations where they are tightly guarded and physically secured. In one of those locations, the key is stored on a USB drive. In the other, it is stored both on CD-ROM, and as a final measure, we have a copy printed on paper that we could literally type back in as our absolute worst-case solution.
Very early in Vindicia's life, I realized that "never ever ever lose the decryption key" had to be a core architectural design, and implemented our controls to achieve that goal. I'm confident nothing short of complete devastation of three Northern California cities and Las Vegas simultaneously would destroy all copies of our keys.
Which billing platform is right for B2C subscriptions?Download