(note: this post continues discussion started in a parallel thread, which provides useful backstory ~pj)
I've sat down to write up this summary of recent investigative and sanitization work I've undertaken after identifying a form of polymorphic, browser-based malware (that I unilaterally started calling "Sauron's Eye" because it seems apt, and also that blazing eye really does capture the creepy element to all this), and subsequently deleted a small pile of draft versions of the report after trying and failing to edit them into a form that worked. They all veered deep into technical terrain and quickly lost the thread of why it all matters in the first place. Technical data is important of course, and I've got a few hundred gigs of captured stuff that needs to be analysed, published, and reviewed. But, it is perhaps best to keep that work separate from a shorter, more relevant attempt to explain in plain language what's been going on and what it means.
So, this post is steering (mostly) away from technical analysis in favour of the "what and why" focus - which, it turns out, is a larger challenge to write than is the technical detail. Anyhow, let's get on with it.
- - -
I've chosen to wrap this essay in the narrative fabric of the Aliens movie, for two reasons. One, the movie's thematic arc really does help pull forward the important parts of this topic and thus provides a nice outline to help keep my writing focussed. Two, because it provides the opportunty for a bit of humour injection and that matters. Humour helps us retain a good sense of perspective, and remind ourselves that even "Serious Business" is often not quite as serious as we sometimes feel like it is when we're in the middle of it. It's good to chuckle a bit, helps keep us sane. Mostly.
What happened is easy to summarise: for the past couple months, I've taken it upon myself to commandeer bits and pieces of cryptostorm staff time and expertise in several research projects involving what we usually call "malware." Some of it's been related to malware compiled into and/or packaged along with "VPN service" installers, and some of it's related to the "superfish" SSL-interception toolkit Lenovo was including on newly-purchased laptops since last fall. In both cases, I felt the issues were relevant enough to cryptostorm to justify some research effort and resources investment.
Then, in mid-March, as we were fine-tuning the torstorm conventional-to-Tor .onion gateway service we provide, I noted some highly unusual behaviors on the part of browser installs being used for the testing itself. Curiosity piqued, I started saving off the source code being received as I loaded landing pages for a small group of .onion-based websites - as well as .pcap packet capture sessions. I quickly became confident that something was coming across the wire and into the browsers I was using during this testing, although I wasn't entirely sure what. This conclusion I reached based on the behavior of the browsers themselves: cache metrics, stack snapshots, and monotonitcally increasing instability that was persistent across application and underlying OS restarts.
My suspicions were apparently confirmed when, within 10 days of my publication of these conclusions, a number of high-severity bugs were patched by the Mozilla browser team - bugs directly relating to the attack vector I'd concluded was most likely causing the behaviour I was documenting. Then, less than a week later, a series of mysterious but powerful attacks on all of the largest "dark markets" hosted on .onion websites became publicly known. The second-largest such market, Evolution, vanished from the web in an alleged "exit scam," leaving no trace. The largest - Agora Marketplace, which I'd used in much of my load-testing - became even more consistently unavailable than it had historically been... and when it did appear, it was often serving strange, exotic fragments of gibberish. Word spread (and I was made aware of thanks to several generous community members) of one or several 0day attacks targeting Tor relay nodes, that allowed for injection of payload code into active .onion web- browsing sessions.
In sum, all appearances were that I'd noted a confluence of trends that resulted in a powerful-widespread, and ongoing attack on Tor hidden services onion websites - the "dark markets," in particular - and indeed had successfully captured forensic data and actual samples of whatever it was that was being deployed. Yay for me, I suppose.
What's funny, looking back, is that I never considered the obvious likelihood of being infected with exactly what I was studying. Why didn't I consider this? Honestly, I have no good answer: at some subconscious level I suppose I considered myself "exempt" from infection since I'm merely a researcher documenting events, not a participant... a bit like a journalist in a war zone, who has some safety in knowing she's not the intentional target of the belligerents.
Except of course, that's silly. This is code - it doesn't decide who is or is not a legitimate target.
- - -
To make a long story short(er), I got several of my research machines infected with SauronsEye. Despite being naive about the likelihood of infection, I was immediately aware something was afoot on these machines... whatever other flaws I have, I'm fairly good at noticing unusual local machine behaviours. I immediately air-gapped the machines in question, and cheerfully begain looking into what had come to reside in their digitial innards. This initial cheerfullness faded, as I realised both that I had no idea what was going on with the machines, and more importantly I had no good handle on how it spread.
Without knowing how it spread, I had no idea whether it had spread to other machines on my local network... or to the machines of other cryptostorm staff. That's a big deal.
Once I realised that, I made the decision to shift cryptostorm's non-sysadmin staff computers to lockdown mode: every machine was assumed compromised, and any data being removed from them was assumed to carry an infection vector. This is a procedure we have in place for such situations, although it's the first time we've made use of it. Is it overkill? Likely yes, but the point is that by the time we know for sure it's far too late to go back and redo things if it's not overkill. Is this frustrating for staff? Oh, hell yes - trust me, I have heard those frustrations loud and clear.
It's also profoundly frustrating for our members, partners, and vendors. As is quite obvious, staff availabilty suddenly dropped - and stayed dropped for nearly two weeks. Email unanswered, twitter questions hanging, threads here in the forum not getting replies. What we've now learned the hard way is that locking down support staff computers basically locks them off the resources needed to provide support. And this has direct, tangible impact on members and on everyone with whom we do business, as a project. To be honest, I imagined we'd simply lock down machines and staff could continue on with new hardware... which is naive, in hindsight.
Initially, we shifted hard drives to new computers and that seemed a reasonable approach - but almost immediately I shut that down, as of course anything on the hard drives would still be there on the new machines. Then we tried new machines, bringing over files selectively... but I saw test data myself that suggested the drive media itself might be a transmission vector - so I shut that down, too. Eventually we got to newly-purchased hardware, kept air-gap separated from existing hardware and needed passwords copied over manually, via pencil and paper. Which, sadly, doesn't work well for things like PGP keys and other long-form credentials. Nor for staff geographically distributed, and otten unavailable via physical postal service for opsec reasons.
In sum, the lockdown was (I believe) successful in ensuring the infection didn't spread within the staff or compromise production systems... but at the cost of making a hash of our ability to provide quality services as a team. That's a big lesson learned.
As I was the one who called the lockdown and enforced unreasonable demands on its implementation, I bear the responsibility for the disruptions it caused. I also, of course, was the one who got myself infected doing research without adequate protections - so I'm doubly at fault, here.
- - -
Infected with what, exactly?
Technical details set aside for a separate post, it's a modular exploit kit whose initial vector is browser-based and that shortly thereafter jumps out of the browser, and which then takes up residence on dedicated hard drive partitions. Once installed, it implements a series of proxy hijackings of all ip4 network traffic - including "encrypted" https/tls sessions. It can generate fake certificates if needed, and it can redirect traffic to alternative destinations without leaving an obvious footprint in ip4 routing and traffic statistics. Finally, it's able to hijack Debian-based repository updates and thus deepen its roots within the operating system itself (I assume the same is true of Windows, but my competence in that OS is so low as to be all but useless).
Once it's got its claws into the kernel, it creates an early boot-stage, modified Xen hypervisor that captures subsequent OS instantiates within its virtualised environment - thus taking the role of dom0 itself, and leaving the other OS instantiations as unknowing "guest sessions." i.e. domU. Getting out of that bind, from within domU, is all but impossible - I pulled it off once, on a test machine, but the slightest fumble and it all unwound back down to the evil hypervisor being in control.
The hard-drive based foundations are quite resilient, and quite virulent. I managed to spread it across OS variants, across Linux distributions, and across "full-wipe" deletions of the partitions in question. I think I managed to eradicate it with full-disk/all-sectors formatting... but I'm still not entirely confident in that. I did note some Android anomalies on devices that were in NFC/bluetooth/wired LAN connection reach of the infected machines - but I have no idea if that's just the usual tragic Android "security" or if it's related. I'd suspect it's unrelated - but I'd not bet heavily on that conclusion without further testing.
I don't know who runs it, where it's from, who is targeted, or why it has been created. I can speculate, but no more than can anyone else - I have no special insight into any of those questions. That said, I have no sense it targeted me - or cryptostorm - specifically. Indeed, it seems likely we picked it up merely as a side-effect of the research I cited above. The fact that we're all, as a team, suspicious of ip6 and largely have it disabled on local machines likely helped prevent its ability to "phone home" successfully, even before machines were air-gapped.
There's more, but that's for a separate post.
- - -
On a personal level, the experience of seeing several local machines become infected, working to understand the infection, ensuring it didn't spread within cryptostorm, and eventually giving a green light for staff to begin using full infrastructure once again has been... difficult. I am by no means a malware analysis specialist, and in the early days of this cycle I vastly underestimated the complexity and capability of what I'd invited into my local network. The word I'm looking for here is "hubris" - I assumed I could not only clean the infection, but do a nice tight job of documenting the process along the way: pithy, concise blog post to follow.
I tackled the work in a rush of enthusiastic confidence, and I pushed myself hard to get it done fast so the team could green-light and I could get back to 'real' cryptostorm work. I didn't sleep enough, assuming I'd push through the process and catch up on rest later... I went into that fugue state known to many technical folks, where hours and days blurred as I reviewed code and built my mental model of what was going on. This, of course, is not sustainable over the long term - although it can be quite useful for short-term productivity.
Within several days, I was strung-out and coming to the realisation that I'd underestimated the task badly. At the same time, the rest of the staff was waiting on me to clear them and their computers to get back to work - frustration grew. I could only say that I still didn't know what was happening but that I knew something was happening... hardly confidence-inspiring forensic analysis, on which to base the partial shutdown of a high-functioning team. As the pressure grew on me to, speaking bluntly, shit or get off the pot - come up with malware samples that could be used to clean and re-approve our computers - I became more and more stubborn in my insistence that the threat was real and that it had to be enumerated before anyone could start connecting to our in-house systems again.
At the same time, I was routinely seeing things happen on my test machines that absolutely could not happen, period. To someone like me with a rather rigid, formalistic turn of mind, this opens up a yawning chasm of epistemological vertigo. Terror, in a sense. One feels one has a general grasp of "how computers work." after decades in the trenches... and then over a period of a week or two one sees these assumptions seemingly ripped away by cold, hard, stubborn facts on the ground. I had moments of severe despondency - no sense in denying it. I questioned even my questions, which can become pretty self-destructive in no time at all.
Fortunately I have great colleagues, a supportive family, and several outside researchers who were at once supportive and also not intrusive in their questioning about what I was getting up to with all this. It's not that I am in the least bit protective of or "grabby" about owing these data - quite the reverse! However, before I could articulate even a loose theory of what was going on, I was loathe to dump the whole mess on others' doorstep. Call it pride, or call it hard experience, but that's not a path I will go down and in the early days it meant I was largely isolated and trying to make sense of things from down at the bottom of a pit of self-doubt and intellectual uncertainty.
Once I confirmed the presence of non-intended virtualisation, the pieces began to come back together for me and from there I knew how to clear our machines and get back to full production status.
And, yes... I cursed computers. I yelled at hard drives. I took long walks in the middle of the night, muttering to myself and undoubtedly causing the local fauna to wonder for my sanity. In daylight hours when sharing polite company, I slipped into the wrong spoken language many times - a social gaffe to which I'm prone when overly tired. I missed appointments, I let my lagging academic duties lag further. I took jangled notes in the heat of battle that, reading them later, made no sense and seemed more like graffiti than research data.
Also I decided early on not to rely heavily on google or on searching the literature, in general. This I did so I did not have preconceptions as to what was happening - it's a hard-edged decision to make, but in the end I'm glad I did. When I came back to present my unofficial conclusions to better-qualified colleagues than I, we were able to see whether my imputed conclusions matched up with published research others have done. It did, almost to a perfect match. That helped me be confident enough to green-light cryptostorm's machines for full use, once again.
Overall, these two weeks left me with several distinct byproducts: fatigue, morbid fascination, a certain paranoia, humility, embarrasment, and - yes - curious fascination. They only come out at night, right? Well, yeah... mostly.
- - -
Blah blah blah... what does it all mean, eh?
For nonspecialists such as myself who read the work done by front-line experts on state-level/APT malware technology, it' all fascinating but seemingly disconnected from our daily life. Regin, or Stuxnet, or DarkHotel target other people - not us. That's a comforting assumption, as it assumes targeting is both rational (someone really does choose who gets targeted, and who doesn't) and that the logic of the targeting is obvious to us.
Neither assumption, of course, is true.
Inevitably, some kit designed to target one group will "jump free" and run wild - that's how Stuxnet was discovered, after all. Worse, there's every incentive for spy agencies to essentially infect everyone and then only activate modules when they want to pick someone out of that vast ocean of candidates - much more reliable than trying to infect targets after they become targets, right? Is this happening, today? I'd turn the question around and ask it this way: is there any reasonable scenario under which it is not happening? No, there isn't. Ergo, it's happening. Q.E.D.
Further, who really knows who is targeted - or why. This isn't LEO (law enforcement organisations), with court orders and documents available via Freedom of Information requests. These toolkits are run by spy shops, largely off the books and designed to be plausibly deniable. Why knows why they target some people - perhaps they seek to leapfrog through them to their "real" target? No idea.
If the Tao wants your noods, the Tao gets your noods.
Mostly I suspect this happens quite often, but most folks don't notice it most of the time. Sometimes, however, someone picks up a corner of the carpet and sees the wild exuberance of what's squirming underfoot... the illusion of a stable foundation is shattered. I think that's what happened here - attuned to such matters, I heard the scratching at the cellar door and I opened the door.
Here's where I do the overly-broad, unsupportable, if-you-ask-me-on-a-dark-night style of "I can't prove it but if my life depends on it this is how I jump" meta-analysis:
- If you are running a Windows-based machine, and you connect routinely to the internet, you are compromised by at least one such APT rootkit - and likely several. If you run a mainstream Linux distro, relay on routine repository synch to keep your OS and packages current, and don't routinely self-compile and fingerprint-validate binaries on your machine, you're compromised. Compromise takes the form of performance anomalies, transient network hiccups, ssl validation problems, and difficulties with some encryption packages - all unintentional side-effects of the technical tools being used to maintain a toehold on your system, for future use.
You are compromised not by a specific "virus" with a specific name and code signature, but rather by these meta-frameworks that stitch together dozens of components - many of which are legitimate packages being used for nefarious purposes. The local mix of such sub-components on your machine, your router, your Android phone is going to be all but unique to you - and it'll vary over days and weeks, as each component self-updates, dies off, is replaced by others pieces, or os obviated by an OS update or whatnot. You exist in stable equilibrium, more or less, with this local micro-ecosystem of code that answers not to you as "root" superuser on your local machine, but to remote puppetmasters.. or to nobody at all, if the c&c infrastructure has gone down but the orphaned bits are left out there to fend for themselves.
There's a real risk that those open backdoors to your - to our - machines will be used to harm us or do evil... a risk impossible to quantify given the vast uncertainties of the whole affair. But it's not a zero risk, this is clear. And, knowing that we're not 100% secure in our own local network and machines, we know that we must be careful what we say, who we say it to, and what we do. This fear is the real cost of such a scenario - it's unhealty, but it's also based in reality.
Finally, I do know of techniques that can eliminate this thin smear of digital bacteria from our local machines - but they aren't the usual techniques we grew up being taught were "good security practice." Running antivirus apps perhaps doesn't do much harm, but it won't do anything to keep a robust defence against this sort of meta-threat. Antivirus apps are the modern Maginot line - and just as effective in protecting France from aggression, sadly.
These are all fair, points, but I'm writing anyway - for to reasons.
One, what I am writing is true. This is reality, this is objectively accurate as a description of the universe in which we live. Whether it is what we want to be true, or not, is not relevant to whether it is true or not. Rarely do I import much of my Buddhist viewpoint into technical work (not overtly), but here I do: reality transcends meaning, transcends explanation, transcends theory. Reality simply is.
Two, I don't feel these are gloomy, tragic, hopeless topics - genuinely I don't. I may sound paranoid and Enemy of the State creepy about "Them" and how "They" are tracking us, etc... but my strange obsesssions also help illiminate a path forward that's largely immune from these in-the-dark creepy-crawlies. The bearer of bad tidings may be bad news, as the old Quebec aphorism goes... but if the bad news brings with it the seeds for future good news, I think we've more than balanced things out along the way.
It's a complicated world out there. Big players with big power are battling over prizes and issues us mere mortals will most likely never understand. And yet, we're still at risk in the crossfire - so it behooves us to know the lay of the land, a bit, so we can stay out of the worst areas of danger. I knew all this, in theory... but the last few weeks have made it real, tangible, and personal to me.
I've felt the gaze of Sauron's Eye on me, and I've turned to look back at it. I don't recommend it to others - do as I say, not as I do! - but I also wasn't burned to a crisp merely by that gaze. And, having looked back, now I know how to slip futher from view in the future. Even Sauron has blind spots... once we know where they are, we know where to go so that we can live free from that nauseous feeling we're always being watched.
Because, yeah, we all know what happens at the end of Aliens, right?
- ~ pj
ps: just because