Hi tc, I saw your note on twitter & wanted to reply in very short form, though I'm sure others will be contributing as well.
Tealc wrote:- When testing the new widget I've found that the Portugal node was up and running, but I really think that he (or is it she?) is not in full production and I will explain this later on.
This is basically true, as there's been extensive work behind the scenes to get this cluster on solid ground. The issues with datacenters in Portugal are not ones I've seen before, myself, and I have to say it's been a learning experience. I would almost characterise this as a "hostile network environment," in terms of limited resilience of capacity.
- CryptoFree is awesome work, bravo, humans behind this idea, give yourself a round of applause's, this way I don't need a new token for my android phone, tablet
We did jump cryptofree "up the queue" in terms of other project tasks, because we feel it's important to get it into circulation. So far, so good - it scales elegantly, so the tech team workload on it has dropped noticeably now that it's successfully deployed.
- There have been a lot of questions concerning the way to connect to CS and the disperse information on how to do things the right way and what openvpn conf file to use, correct me if I'm wrong, wouldn't ALL nodes connect with the same type of conf file? So maybe it's better to make a official version with no remote server and leave that to be entered by the user? (vpnDarknet we need to talk and join forces to upgrade/update your wiki page I really think that this will be the way to go
maybe do a new page in the new domain run by us with the knowledge base in it)
I know that folks smarter than me are at work on this already, so I don't have much to add beyond noting that you're right in terms of a more cohesive set of howtos. Most of this relates to HAF and the 1.4 rollout, which is much more "interesting" to complete than it may appear on the surface. And I'm the bottleneck on that, as every step of it must be reviewed by me to ensure there's not any risks to security model. This slows down deployment enormously, and is frustrating, and makes folks curse my name because I cause it to lag. So it goes...
- I have improved the OpenWRT 14.07 installation of OpenVPN and keeping the "no leak" policy of CS, I'm searching for some beta testers that have a OpenWRT capable router to try the installation method (this is be done ONLY by terminal, sorry no Lucy available OpenVPN working module)
We'll be happy to help get word out on this - people do worry about bricking routers with tweaks (normal people, with actual lives - or so I hear), so when they hear "beta" and "router" they tend to head for the hills. But people also frantically want "leak protection," so there's a balance between fear and greed. Or something.
- "Status" page still have the "404" error, for me this is the best and simple page we can get and we could maybe put it in the CS.is domain and this way making it "official" and eliminating the "nodelist.txt"
Caught up in major internal/team philosophical/technical debate (or argument, or holy war... pick your descriptor). In a healthy way, to be clear - but not resolved fully. Getting close, I think. More below...
- Portugal isn't in a very good shape, the ping is the worst possible I have ever seen and pingtest.net gives this:
Maybe this is why Portugal Tagus isn't still officially announced?
Portugal has not been performance-tuned since the rollover. Hence not announced in full production yet. Hence testing that specific node for speed optimality is premature. It's there, and shows up in the widget... it's also secured and kernel-hardened and closely monitored by us as are all our nodes. It's not perf-tuned yet.
- ALL other nodes work very well, stable and everything, but I don't know why the France node gives out "Auth Failed" 2 out of 5 times I try to connect
This would have to be diagnosed with a logfile capture and whatnot; it's the first I've heard of it, so I hope you'll let us know more details. It is important to understand that network auth is PICKY and intentionally so. Handshake windows are quite tight. That is not an accident. If you've not bumped into the security considerations underyling that, let me or someone else know and we can dig up pointers to the relevant architectural essays here.
- Speeds have decreased A LOT, I remember a time where I could get almost 90 Mb/s in the Germany node, know It's more like a stable 10 Mb/s, I believed that this has something to do with the increased number of "new users"
Not directly, no. However please understand that the ability to move 100megabit/second chunks of UDP-tunnelled data around the internet, in general, is not universal! Hiccups anywhere along the line can limit that ability, far outside the scope of our nodes or control. Indeed, even on dedicated LAN segments and using flow-optimised TCP, getting into hundreds of megabits/second of real, sustained throughput takes a little tuning. Not alot, but a little (I can show you the papers on the process involved, if you're bored and curious). Out in the wilds of the public interweb, speeds at that level are delicate. Sometimes there's hiccups: someone's DDoSing a DC, or AS interconnect, or BGP router somewhere and your packet stream is caught up in that. The NSA has a fiber tap getting installed under the Pacific Ocean and packets are getting phase-torqued during the optical redirect. There's literally unlimited ways things can get weird out there. And they do, every day.
That said, if a dozen folks on any
node of ours are all pushing 100 megabit chunks of data concurrently, then yes they will step on each other's toes a bit. Rarely happens, but isn't impossible. And that's one reason loadbalancing at the HAF level is so bloody important (see below). The better the topological framework for the network, the better speeds will be on all nodes, all the time, in aggregate, for all members, ceteris paribus
I want to see sustained, robust support of 50 megabit sessions for members in every cluster we deploy, anywhere in the world. That's my own personal metric. Above 50 is super, but can be a bit hit and miss. Under 50 is unacceptable, and must be addressed.
- Great idea to "be more active" in the Twitter account, still there is a long way to go, to be on the same level of some of the others "farmers mill" VPN
Ha, "farmers mill" is a great phrase!
And yes, perhaps someday when we grow up we'll have a twitter feed as cool as PIA
Yes, so a quick little note on 1.4, HAF, nodes, instances, and ontological coherence. This is, to be blunt, terrain I occupy on the team. Usually, we've a good balance of expertise and experience when it comes to particular questions - someone might be particularly good in a given subject matter, but others have also some things to offer and that's super healthy. It's part of what makes things work, here: diverse experience, pooling of talent, multiple perspectives.
But when it comes to systems ontology, this is my formal academic background. And yes putting some crusty old academic into a production role is Bad Idea... but in this case there's enormous benefits to the project, and to the membership, if we do this right. And we're doing this right - in steps, but doing it right.
When the project launched last year in current form, I hacked together a workable little network topology model to get things going, and to ensure we could debug and tune things effectively during beta testing without risking any security impact on members connecting to the network. This required some trade-offs in terms of purity of model, and elegance of scaling behaviour going forward. We chose to make those trade-offs. Beta was successful, and the model is now scaling as fast as we can keep up.
However, now that old "workable little network topology model" is unable to keep up with the network's growth both in size and in connectional complexity. A good problem to have, perhaps... but still a problem. And something of a hideous problem, because replacing an "ontological model" in a network based service, on the fly, is sort of sounding like maybe not such a great task to tackle.
It's all good. When I hacked together that interim model, last year, I also put together the proper version to which I hoped to migrate the network in the future. Further, I knew there'd be a window during our growth in which - if all went ok - I could swap in the new model, concurrently with the old model, and phase over with minimal/no drama for members. Tight timing, but not impossible.
That's what's happening right now.
It's subtle and fiddly and security-relevant work. It involves a leveraging of elements of the DNS architecture of the internet that I think are both highly efficient, robust against many attack vectors, and - if I do say so myself - perhaps even a little bit elegant. But they only really work if the whole thing deploys at once, and there's no "subersion for DNS"... not that I know of, anyhow. Testing is production; miss an entry in hundreds of A Record updates, and you've got a nasty, subtle, frustrating, persistent bug that is all but impossible to track down later.
Because yes, flowing the complete 1.4 HAF-compliant topological model is sort of an "all at once" deal, in many regards. It gets done in a big push, or never gets done. Every day we wait to do it, the network grows... and the old hacked model gets deeper roots in production. This is a big juggle: rush, and risk production impact of considerable badness from error. Delay, and the switchover could end up being all but impossible.
This migration involves loadbalancers, widget nodelist, status page, cryptofree, conf for all non-windows deploys... the whole ball of yarn, basically. Do it all, and do it right, or don't do it.
So that's the behind the scenes on this. I'm very close to ready to pull the bell-cord and do the final roll-forward. There's also issues of backwards compatibility with earlier versions of the topology; these impact members, and thus are a Big Deal. They have been debated intensively amoungst the team. They continue to be debated. The end result is good decisions, we feel... it takes a bit of chewing to get there.
Once there's a minute of time, I'd like to write all this up with much more formal precision, for community review and critique. But that waits 'till it gets deployed, ironically enough, since the need to migrate is acute.
I know that, from the outside (and amoungst our team), this whole thing seems not terribly complex: a few nodes, some instances, some IP addresses. Just do it! But it's not like that, at all. There's deep structural stuff afoot in the decisions we make here, and how we implement them... those decisions reify in the future evolution of the entire network and everything connecting to it. They are largely non-reversible, phase-shift type systemic transitions, too. No rewind button.
There are few areas of technology, or much else, in which I feel comfortable saying "I know how to do this and do it right" - this is one of those areas for me. It's partly why I was pulled into the team during pre-beta work, last year - this is my sphere of expertise if I have such a thing anywhere. So I'm not concerned we're on the wrong track with this, at all. Rather, I can't fucking wait to get it deployed and watch it expand into full production. It's, in a word, beautiful.
In the meantime, a spot of patience: this is worth doing right, and there's no shortcut to make it go faster. I'm the bottleneck, I'm the final sign-off, and I'm not sorry to be taking it at a pace I know will work. Once it's done, this will all show in how things go from there.