During the past several weeks, we've accelerated a longer-running project to add redundancy and resilience to our websites and other non-network resources (we call these "non-network" because these items are not part of deliving cryptostorm's secure network itself, which is entirely separate from any websites or other single-point-of-failure components).
In the first two years of our existence, we didn't judge the need for such capacity to be mission critical; a small bit of downtime here and there with cryptostorm.is
, for example, might be a minor inconvenience for all of us but would not be critical path. Of course, we retain rolling backups of files (and most of our website source is already hosted at github and thus is on independent infrastructure), so in the event of a sustained outage we could - and several times, did - switch over to secondary server capacity with the backup images.
Most companies handle this issue by outsourcing their hosting to a "content delivery network" like Cloudflare. For a basket of reasons too long to list here, this is not an approach with which we are comfortable, though it is "easier" and for less technically centred project teams it will in many cases be too tempting to pass up.
So, as cryptostorm has grown and evolved since 2013, we've known that the need for redundant website (and email, and IRC... we'll just say "website" and assume all that is included, as well) capacity would eventually be something we'd need to address. As we discuss in a bit more detail in a parallel blog post
, recent attacks on Iceland's internet infrastructure have caused access to our websites (which have always been hosted there, with our colleagues at Datacell
) to become, in a word, sporadic (through no fault of Datacell's, to be clear).
Given that, we pushed forward to complete our internal effort to provide redundant, distributed, failsafe website access - we'd been making steady progress but with no deadline in sight, it naturally slipped behind critical tasks and was in some senses sleepwalking. Issues in Iceland got things into fast gear, and we set a tight timeline to get things in place.
Two days ago, on Saturday, we did our first production cut-over test of the new model we've put in place. Most went smoothly, and our security procedures held together comfortable. However, there were the (if we're being candid) expected hiccups here and there: the database powering this forum was intermittently refusing to stay up on Sunday evening, for example. Those issues are all now resolved and we're fine-tuning the details.
In this thread, we'll post a bit more technical detail on how we've approached this infrastructure redundancy project - some of it's a bit routine and boring, but other components are perhaps novel and even somewhat elegant in final form. It's worth nothing that the overall project is not complete; what we've done is the first cut-over test. Now, we're layering in the automated redundancy itself (in technical terms, the first step was actually more of a challenge than the redundancy itself).
Finally, it appears that our automated 'tokenbot' delivery of newly-purchased tokens was inactive from early Sunday through Monday morning. We'd concluded this was merely the result of cached DNS data in email delivery systems, but that conclusions was not accurate and in fact the tokenbot was simply not delivering tokens. Since then, we've manually confirmed all tokens not delivered timely during that period have now been delivered. Further, we've provided complimentary 66-day tokens to all those members affected by the delay. This was a genuine screw-up on our part - timely token delivery is a big deal to us, and to many members - and we offer our apologies for not being aware of the issue, and resolving it, sooner.
If there's additional questions or reports of transitional bugs, please do feel free to post them here - we'll do our best to stay current with replies. Through today, we've invested substantially all available team effort in completing the first step of this project, and thus haven't posted much data here on what's been in process. Now that's complete, we're able to do a better job of keeping the membership informed as to ongoing developments.
Best regards, ~ cryptostorm_team