December 17, 2017 Blog (Ivan Pepelnjak)

Please Respond: Survey on Interconnection Agreements

Marco Canini is working on another IXP-related research project and would like to get your feedback on inter-AS interconnection agreements, or as he said in an email he sent me:

As academics, it would be extremely valuable for us to receive feedback from network operators in the industry.

It’s fantastic to see researchers who want to base their work on real-life experience (as opposed to ideas that result in great-looking YouTube videos but fail miserably when faced with reality), so if you’re working for an ISP please take a few minutes and fill out this survey.

by Ivan Pepelnjak ( at December 17, 2017 07:49 AM

December 16, 2017 Blog (Ivan Pepelnjak)

What Exactly Should My MAC Address Be?

Looks like I’m becoming the gateway-of-last-resort for people encountering totally weird Nexus OS bugs. Here’s another beauty…

I'm involved in a Nexus 9500 (NX-OS) migration project, and one bug recently caused vPC-connected Catalyst switches to err-disable (STP channel-misconfig) their port-channel members (CSCvg05807), effectively shutting down the network for our campus during what was supposed to be a "non-disruptive" ISSU upgrade.

Weird, right? Wait, there’s more…

Read more ...

by Ivan Pepelnjak ( at December 16, 2017 08:22 AM

Networking Now (Juniper Blog)

Adaptive Security Policies for Dynamic Security Environments


Statically defined network security policies impose significant operational burden adapting to ever changing security environment. Lets see how Juniper’s innovative, patent-pending construct called “Dynamic Policy Actions” allows you have right security for right conditions

by snimmagadda at December 16, 2017 02:50 AM

December 15, 2017

Moving Packets

Microburst: PSIRT Notifications – Are They Good Or Bad?

If your hardware or software vendor issues a lot of PSIRT (Product Security Incident Response Team) notifications, is that a good thing or a bad thing? After all, a PSIRT bulletin means that there’s a security issue with the product, so lots of PSIRTs means that the product is insecure, right?

Mp psirt

What about the alternative, then? If a vendor issues very few PSIRT notifications does it mean that their product is somehow more secure? This is an issue I’ve been thinking about a lot over the last year, and the conclusion I came to is that if a vendor is not issuing regular bulletins, it’s a bad thing. Either the vendor doesn’t think its customers should be aware of vulnerabilities in the product, or perhaps the bugs aren’t being fixed. A PSIRT bulletin involves the vendor admitting that it got something wrong and potentially exposed its customers to a security vulnerability, and I’m ok with that. Sure, I don’t like sloppy coding, but I do appreciate the transparency.

I believe that when a vendor is shy about publishing security notifications it’s probably a decision made by management based on the naive belief that limiting the number of times they admit to a security vulnerability will give the impression to their customers that the product is, by inference, more secure. I’d argue though that the opposite is true. We know that coders make mistakes and we know that common libraries used by developers within their code or within the OS have bugs in them. As a nerd, I want to see those bugs; I need to see those bugs. Far from making me think the vendor sucks, it proves to me that the vendor acknowledges that there are issues, is responsive to vulnerabilities, and is proud to say that they have fixed them (hopefully quickly).

My 2 Bits

The announcement of a vulnerability, potential workarounds and the all-important “fixed-in” version, is operationally critical to users of the product. A vendor that quietly fixes bugs without announcing them runs the risk of its customers not realizing how important it is to upgrade their installed codebase, and thus leaves the customer vulnerable and unaware, for months or even years.

For our part as engineers, we should not be casting doubt upon companies that issue frequent PSIRTs. In my opinion the alternative is much worse.

If you liked this post, please do click through to the source at Microburst: PSIRT Notifications – Are They Good Or Bad? and give me a share/like. Thank you!

by John Herbert at December 15, 2017 05:25 PM

The Networking Nerd

Should We Build A Better BGP?

One story that seems to have flown under the radar this week with the Net Neutrality discussion being so dominant was the little hiccup with BGP on Wednesday. According to sources, sources inside AS39523 were able to redirect traffic from some major sites like Facebook, Google, and Microsoft through their network. Since the ISP in question is located inside Russia, there’s been quite a lot of conversation about the purpose of this misconfiguration. Is it simply an accident? Or is it a nefarious plot? Regardless of the intent, the fact that we live in 2017 and can cause massive portions of Internet traffic to be rerouted has many people worried.

Routing by Suggestion

BGP is the foundation of the modern Internet. It’s how routes are exchanged between every autonomous system (AS) and how traffic destined for your favorite cloud service or cat picture hosting provider gets to where it’s supposed to be going. BGP is the glue that makes the Internet work.

But BGP, for all of the greatness that it provides, is still very fallible. It’s prone to misconfiguration. Look no further than the Level 3 outage last month. Or the outage that Google caused in Japan in August. And those are just the top searches from Google. There have been a myriad of problems over the course of the past couple of decades. Some are benign. Some are more malicious. And in almost every case they were preventable.

BGP runs on the idea that people configuring it know what they’re doing. Much like RIP, the suggestion of a better route is enough to make BGP change the way that traffic flows between systems. You don’t have to be a evil mad genius to see this in action. Anyone that’s ever made a typo in their BGP border router configuration will tell you that if you make your system look like an attractive candidate for being a transit network, BGP is more than happy to pump a tidal wave of traffic through your network without regard for the consequences.

But why does it do that? Why does BGP act so stupid sometimes in comparison to OSPF and EIGRP? Well, take a look at the BGP path selection mechanism. CCIEs can probably recite this by heart. Things like Local Preference, Weight, and AS_PATH govern how BGP will install routes and change transit paths. Notice that these are all set by the user. There are not automatic conditions outside of the route’s origin. Unlike OSPF and EIGRP, there is no consideration for bandwidth or link delay. Why?

Well, the old Internet wasn’t incredibly reliable from the WAN side. You couldn’t guarantee that the path to the next AS was the “best” path. It may be an old serial link. It could have a lot of delay in the transit path. It could also be the only method of getting your traffic to the Internet. Rather than letting the routing protocol make arbitrary decisions about link quality the designers of BGP left it up to the person making the configuration. You can configure BGP to do whatever you want. And it will do what you tell it to do. And if you’ve ever taken the CCIE lab you know that you can make BGP do some very interesting things when you’re faced with a challenge.

BGP assumes a minimum level of competency to use correctly. The protocol doesn’t have any built in checks to avoid doing stupid things outside of the basics of not installing incorrect routes in the routing table. If you suddenly start announcing someone else’s AS with better metrics then the global BGP network is going to think you’re the better version of that AS and swing traffic your way. That may not be what you want. Given that most BGP outages or configurations of this type only last a couple of hours until the mistake is discovered, it’s safe to say that fat fingers cause big BGP problems.

Buttoning Down BGP

How do we fix this? Well, aside from making sure that anyone touching BGP knows exactly what they’re doing? Not much. Some Regional Internet Registrars (RIRs) require you to preconfigure new prefixes with them before they can be brought online. As mentioned in this Reddit thread, RIPE is pretty good about that. But some ISPs, especially ones in the US that work with ARIN, are less strict about that. And in some cases, they don’t even bring the pre-loaded prefixes online at the correct time. That can cause headaches when trying to figure out why your networks aren’t being announced even though your config is right.

Another person pointed out the Mutually Agreed Norms for Routing Security (MANRS). These look like some very good common sense things that we need to be doing to ensure that routing protocols are secure from hijacks and other issues. But, MANRS is still a manual setup that relies on the people implementing it to know what they’re doing.

Lastly, another option would be the Resource Public Key Infrastructure (RPKI) service that’s offered by ARIN. This services allows people that own IP Address space to specify which autonomous systems can originate their prefixes. In theory, this is an awesome idea that gives a lot of weight to trusting that only specific ASes are allowed to announce prefixes. In practice, it requires the use of PKI cryptographic infrastructure on your edge routers. And anyone that’s ever configured PKI on even simple devices knows how big of a pain that can be. Mixing PKI and BGP may be enough to drive people back to sniffing glue.

Tom’s Take

BGP works. It’s relatively simple and gets the job done. But it is far too trusting. It assumes that the people running the Internet are nerdy pioneers embarking on a journey of discovery and knowledge sharing. It doesn’t believe for one minute that bad people could be trying to do things to hijack traffic. Or, better still, that some operator fresh from getting his CCNP isn’t going to reroute Facebook traffic through a Cisco 2524 router in Iowa. BGP needs to get better. Or we need to make some changes to ensure that even if BGP still believes that the Internet is a utopia someone is behind it to ensure those rose colored glasses don’t cause it to walk into a bus.

by networkingnerd at December 15, 2017 04:37 PM Blog (Ivan Pepelnjak)

Create IP Multicast Tree Graphs from Operational Data

A while ago I created an Ansible playbook that creates network diagrams from LLDP information. Ben Roberts, a student in my Building Network Automation Solutions online course used those ideas to create an awesome solution: he’s graphing multicast trees.

Here’s how he described his solution:

Read more ...

by Ivan Pepelnjak ( at December 15, 2017 11:05 AM

Video: Avaya [now Extreme] Data Center Solutions

I haven’t done an update on what Avaya was doing in the data center space for years, so I asked my good friend Roger Lapuh to do a short presentation on:

  • Avaya’s data center switches and their Shortest Path Bridging (SPB) fabric;
  • SPB fabric features;
  • Interesting use cases enabled by SPB fabric.

The videos are now available to everyone with a valid account – the easiest way to get it is a trial subscription.

by Ivan Pepelnjak ( at December 15, 2017 08:28 AM

XKCD Comics

December 14, 2017 Blog (Ivan Pepelnjak)

Data Center BGP: Autonomous Systems and AS Numbers

Two weeks ago we discussed whether it makes sense to use BGP as the routing protocol in a data center fabric. Today we’ll tackle three additional design challenges:

  • Should you use IBGP or EBGP?
  • When should you run BGP on the spine switches?
  • Should every leaf switch have a different AS number or should they share the same AS number?

by Ivan Pepelnjak ( at December 14, 2017 07:58 AM

December 13, 2017

My Etherealmind

Internet Giants Should Be Broken Up

This is a 30 minute presentation that highlights the lack of societal value that Google, Apple, Facebook and Amazon deliver. Galloway examines their market dominance and how the market is failing to regulate or control the tech companies. I recommend watching this and considering the ideas proposed here. Galloway is well known and worth listening […]

by Greg Ferro at December 13, 2017 06:13 PM

XKCD Comics

December 12, 2017

Networking Now (Juniper Blog)

The Art of Fighting Cyber Crime

When it comes to defending your organization from cyber crime, time matters. Visibility matters. Environment matters. And, more than ever, conditions matter. In order to shrink the time from detection to remediation, security operators need a cyber defense system truly adapts to a hyper-active threat climate and is designed from inception to be agile. That window of time between detection and remediation defines the overall potential impact of a security breach. The longer the time, the greater potential for damage. The diversity of environments – physical, virtual, private cloud, public cloud, locations, and departments – drives the need for a more responsive and unified approach to cybersecurity. The sheer volume of information generated by your security environment creates a firehose of alerts from so many sources that security operators often have difficulty seeing the most crucial characteristics of the threats that come into their view.

by Amy James at December 12, 2017 09:25 PM

Moving Packets

One Little Thing Can Break All Your Automation

I’ve been doing some work automating A10 Networks load balancers recently, and while testing I discovered a bug which broke my scripts. As hard as I try to code my scripts to cope with both success and failure, I realized that the one thing I can’t do much with is a failure of a key protocol.

A10 Networks Logo

Breaking Badly

So what matters when I’m automating device configurations? Three things come to mind immediately:

The Network / Transport

I need reliable network connectivity, or my automation requests will constantly be timing out or failing. While a script should be able to cope with an occasional network failure, unreliable networks are not fun to work with.

Data Transfer Format

Pick a format (XML, JSON) and use it consistently. I’m happy so long as I can send a request and read a response in a given format. If I send a request using JSON, send a response using JSON. Funnily enough I was troubleshooting WordPress xmlrpc recently and noticed that when there was an error, the XML command being issued received a 404 error followed by, well, you’d hope an XML error response, right? No, because it was an HTTP 404 error, the site was returning the blog search page instead. I think I would have preferred an XML response explaining what the error actually was. Unsurprisingly, the client code using the XMLRPC connection was complaining about an unexpected XML response (correct, since it was HTML).

Consistent API

Create an API that makes sense (I can only dream). Create consistent responses so that I don’t have to “special case” every single response based on the request I make. If a particular response element can be an array, always send back an array, even if it only has a single entry; don’t send a string instead. Wrap responses consistently so that errors and responses can be easily distinguished and extracted. For example, I found this note to self in some code I wrote last year:

// Decode response into an object.
 // But because infoblox is stupid:
 //  - if it's an error it returns a hash;
 //  - if it's an actual result, it provides an array of hashes, because screw you.
 // So, we need to test the first byte of the response to see whether it's [ or {. 
 // If it's a { decode it as an error. Otherwise decode as an IPAM object.

Ok, it’s not the end of the world, but it does add an additional step which I really don’t appreciate.

ACOS For Concern

So, my recent discovery with the A10 Networks load balancers, which run the ACOS operating system, was that the encoding of escaped characters within the configuration can mean that ACOS will return invalid JSON in response to my request. For example, imagine that a health check must be configured to request the URL /checkseq\8s1. It’s an unusual URL because it has a backslash in it, but that’s what the server in question asks for, so that’s the health check that’s needed. ACOS understands escaped characters (using a backslash), so to send a \ in the health check, it would have to be entered as \\. Similarly, to send \r\n (carriage return, new line) the health check would contain \\r\\n, and that allows the addition of a custom HTML header as well, which in this example is called “X-Custom-Field” and has a value of 101:

health monitor checkweb
 method http url GET "/checkseq\\8s1\\r\\nX-Custom-Field: 101" expect 200"

When the health check is used, the GET string is analyzed and the escaped characters resolve to a more normal looking string:

/checkseq\8s1\r\nX-Custom-Field: 101

However, when viewing the health monitor’s configuration via the REST API, the same exact process occurs and the JSON for the method is encoded something like this:

    "method”: "http",
    "type": "url",
    "subtype": "GET",
    "expect": "200",
    "url": "/checkseq\8s1\r\nX-Custom-Field: 101"

When read by the received, the url string is again analyzed for escaped characters and the following are discovered:


Unfortunately, \8 is not a valid escape code, thus the JSON decoding process spews an error at this point. To me this is a failure in the JSON encoder in ACOS; it should take the interpreted string then make it ‘JSON-safe’. By having encoded a string including the invalid character “\8”, ACOS generated invalid JSON. Since my JSON decoder can’t handle invalid JSON, my automation fails on the spot. I don’t know if the query worked or not; I only know it couldn’t be decoded. Highly annoying.

The Workaround

This all started because of a health check URL containing a backslash. The workaround, rather than using “\\” is to URL-encode that backslash as %5C (or similar) in the original health check. However, there’s no way to stop a user creating another “url bomb” in the future, because ACOS will accept \\ in the url string without generating an error.

My 2 Bits

What this really brings home to me is how a breakdown in a key protocol – in this case JSON – can bring automation to its knees. We assume that protocols like TCP will just work, and at this point I think of JSON, largely, in the same way. Scripts rely upon formats like JSON to allow the accurate storage and transport of information, but if the JSON can’t be read by the recipient, the data is lost. In the case of my automation scripts, it brought a workflow to a screeching halt, and it was not possible to get past that point in the process without manually applying a workaround to the health check which was causing problems.

There’s certainly a lesson here about checking results, and raising alarms when an unexpected result shows up. Even a reliable automation script will need some tender loving care at times.

If you liked this post, please do click through to the source at One Little Thing Can Break All Your Automation and give me a share/like. Thank you!

by John Herbert at December 12, 2017 08:13 PM Blog (Ivan Pepelnjak)

Moving Complexity to Application Layer?

One of my readers sent me this question:

One thing that I notice is you mentioned moving the complexity to the upper layer. I was wondering why browsers don't support multiple IP addresses for a single site – when a browser receives more than one IP address in a DNS response, it could try to perform TCP SYN to the first address, and if it fails it will move to the other address. This way we don't need an anycast solution for DR site.

Of course I pointed out an old blog post ;), and we all know that Happy Eyeballs work this way.

Read more ...

by Ivan Pepelnjak ( at December 12, 2017 07:53 AM

December 11, 2017

A Christmas Support Story

Warning: Non-Technical Post

As it’s the festive period and this time of the year is for caring and sharing, here’s a short story from many years ago. This might make some chuckle, but some of these times were not pleasant and I can assure you, they were very real!

Like most IT related people, I started in support. The job paid peanuts, it was shift work and I had much to learn. Being quite eager to please, many mistakes were made and in these cases seniors were supposed to help the younglings (like me). For some companies, a functioning support network just isn’t there and low rank power struggles leave you fighting fires a la solo.

Within the first three months of the job, I experienced two major backhaul fibre outages, a group of people stealing our generator power cables and the air conditioning system failed to the point of meltdown. We also had a total power outage which took 40 hours or so of non-stop work to get everything back online and healthy.

These kinds of experiences make or break you. The phones do not stop ringing (at least when the power is on) and customers rightfully do not stop complaining. If you survive the pressure, your skin begins to thicken.

The thing that happened that should have never have happened, can also happen again. Lightening does strike twice in the same place!

It can always happen so be honest with yourself and customers. Understanding contractual obligations for both parties is important and despite the customer trying to push you in to a corner, you have real limits on what you can do.


When it comes to customers, I like to make sure they get what they pay for and feel connected to us. When you have a good relationship with your customers, awkward conversations are sometimes easier to have. I recall one particular customer who had leased line issue after leased line issue. Her issues combined with our issues made her job very risky after placing her eggs in our basket. Knowing that her job was at risk, she was often the first customer I called proactively when we were having issues. I nearly lost my job over this.

Despite how proactive you think you’re being, always ensure your management team understands what position your customers may be in due to issues in your realm. Sometimes, management want to come forward with a more rounded package and story as compensation instead of a young happy go person ringing them up to apologise and assure them everything is being done to restore a service.

Do not allow yourself to be brushed off by poor management but allow yourself to be guided by them when they understand what the full situation is. Shared responsibility goes a long way for your customer and for yourself. If they do not understand, try harder; that’s their job and don’t let them forget it!


To all of the support people over this festive period, good luck my friends. May it be an event-less and merry time.

The post A Christmas Support Story appeared first on

by David Gee at December 11, 2017 06:45 PM Blog (Ivan Pepelnjak)

First Speakers in the Spring 2018 Automation Online Course

For the first two sessions of the Building Network Automation Solutions online course I got awesome guest speakers, and it seems we’ll have another fantastic lineup in the Spring 2018 course:

Most network automation solutions focus on device configuration based on user request – service creation or change of data model describing the network. Another very important but often ignored aspect is automatic response to external events, and that’s what David Gee will describe in his presentation.

Read more ...

by Ivan Pepelnjak ( at December 11, 2017 07:43 AM

Networking Now (Juniper Blog)

Necurs Malspam Delivers GlobeImposter Ransomware



Necurs botnet seems to be coming up with a fresh wave of malspam delivering GlobeImposter ransomware. The malspam comes in the form of a quasi-blank email with little to no message content, a short subject line and an attached 7z archive containing a VBScript that downloads the ransomware.


Indicators of Compromise (IOCs)

GlobeImposter Payload MD5sum: c99e32fb49a2671a6136535c6537c4d7


Technical Analysis


Mail Attachment VBScript Analysis

Looking at spam mail that has already been gathered and comparing it to other samples seen on VirusTotal in the last few days, the spam mail samples seen so far show mostly blank mail with little to no content and malicious attachments in the form of 7z archive, containing VBScript files with a naming scheme that looks either numerical like 10006000420.7z or like FL_386828_11.30.2017.7z. Other attachment naming schemes have also been seen.


The VBScript file is somewhat obfuscate. It stores a hard-coded string that it parses to obtain various sub-strings, which it then uses to figure out the objects it needs to create for network communication, the file name to use for the ransomware to be downloaded and so on. The format of this string looks like the example below, delimited by the character “>”, and is present in the reverse form, which the malware reverses before extracting the necessary substrings.




The VBScript file has a hardcoded list of URLs, as shown in the below snapshot, to download the ransomware from. It. then loops through the list of URLs until it is successful. Various attachments from multiple spam emails were analyzed and showed a largely non-intersecting list of URLs in each attachment. Below left is the obfuscated code looping through the URL list trying to download the malware. Below right is the corresponding simplified pseudocode.



The VBS file saves the payload to the “C:\Users\user\AppData\Local\Temp\” folder and executes as seen below.



While the attachment itself is compressed, the VBScript file inside the archive, seen across various attachments samples, shows a recurring pattern in the code that can be leveraged by an IDS/IPS or YARA engine that can decompress the archive and match on the VBScript file.


Common patterns seen across the VBScript attachments include:


“krapivec\s*=\s*Array(“ -> Regex



“CUA ="Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0"”



And so on.


 Ransomware Payload Analysis

The ransomware payload is in the form of an NSIS installer. When spawned, it unpacks itself, spawns a copy of itself in suspended mode and injects its code into the child using process hollowing technique.




When the malicious payload is executed, the malware encrypts all files adding the “..doc”extension. Previous and other versions of the ransomware are known to use other extensions.


The malware ensures persistence by putting a run entry key of “UpdateBrowserCheck” with  the path to the ransomware executable under HKCU\Software\Microsoft\Windows\RunOnce.



It writes a temporary batch file to the system, as shown below, with commands to delete any shadow volume copies in order to prevent restoration of encrypted files. It also clears all event logs from the system to cover its tracks.




Strings from the unpacked sample:


The ransom file name and the ransom note.



Batch commands run by the malware:





Ransom Note

The ransomware note, “Read__ME.html” is dropped into every directory where files are encrypted, asking the user to connect to the TOR network for the decryptor. Upon clicking the “Buy Decryptor” button, it redirects the user to an onion link, mentioning the ransom amount, starting a timer for 48 hours and, doubling the ransom amount when the timer expires.


It is interesting to note the difference from other ransomware -  the start of the timer is not tied to the encryption time and the ransom amount is not known at the time the victim is notified of what just happened. Both require a victim’s action to visit the TOR web address before triggering the timer and finding out the ransom amount.


Akin to other ransomware, this variant of GlobeImposter allows the victim to decrypt one file of their choice to gain assurance that the decryption is possible prior to paying the ransom. Many victims indeed refuse to pay the ransom because they do not believe there is a guarantee of recovering their files. The ransomware criminal gangs are having to recover from the damage to their image done by wipers disguised as ransomware.




Both Cyphort (now a Juniper company) and Juniper Sky ATP detect the email attachment  and the ransomware payload.



Many thanks to Abhijit Mohanta form the Threat Reserach Team for co-authoring this blog.

by asaldanha at December 11, 2017 04:36 AM

XKCD Comics

December 10, 2017 Blog (Ivan Pepelnjak)

New Content: Debugging Ansible Playbooks and Jinja2 Templates

Here’s a quote from one of my friends who spent years working with Ansible playbooks:

Debugging Ansible is one of the most terrible experiences one can endure…

It’s not THAT bad, particularly if you have a good debugging toolbox. I described mine in the Debugging Ansible Playbooks part of the Ansible for Networking Engineers online course.

Please note that the Building Network Automation Solutions online course includes all material from the Ansible online course.

by Ivan Pepelnjak ( at December 10, 2017 12:23 PM

December 09, 2017 Blog (Ivan Pepelnjak)

Worth Reading: Postpone Inbox Procrastination

This blog post by Ethan Bank totally describes my (bad) Inbox habits. If you're anything like me, you might find Ethan's ideas useful (I do... following them is a different story though).

by Ivan Pepelnjak ( at December 09, 2017 06:12 AM

December 08, 2017

Ethan Banks on Technology

Pre-Order My Computer Networking Problems & Solutions Book And Save 40%

I co-authored Computer Networking Problems And Solutions with Russ White. The nice folks at are accepting pre-orders of the book and ebook at 40% off until December 16, 2017. Go get yourself a copy of this short 832 page read via this link containing all of InformIT’s titles coming soon.

Or, if you use the book’s product page instead of the “coming soon” link above, use code PREORDER to get the discount.

All “coming soon” titles on sale at InformIT:

Product Page for Computer Networking Problems & Solutions:

by Ethan Banks at December 08, 2017 08:11 PM

The Networking Nerd

Network Visibility with Barefoot Deep Insight

As you may have heard this week, Barefoot Networks is back in the news with the release of their newest product, Barefoot Deep Insight. Choosing to go down the road of naming a thing after what it actually does, Barefoot has created a solution to finding out why network packets are behaving the way they are.

Observer Problem

It’s no secret that modern network monitoring is coming out of the Dark Ages. ping, traceroute, and SNMP aren’t exactly the best tools to be giving any kind of real information about things. They were designed for a different time with much less packet flow. Even Netflow can’t keep up with modern networks running at multi-gigabit speeds. And even if it could, it’s still missing in-flight data about network paths and packet delays.

Imagine standing outside of the Holland Tunnel. You know that a car entered at a specific time. And you see the car exit. But you don’t know what happened to the car in between. If the car takes 5 minutes to traverse the tunnel you have no way of knowing if that’s normal or not. Likewise, if a car is delayed and takes 7-8 minutes to exit you can’t tell what caused the delay. Without being able to see the car at various points along the journey you are essentially guessing about the state of the transit network at any given time.

Trying to solve this problem in a network can be difficult. That’s because the OS running on the devices doesn’t generally lend itself to easy monitoring. The old days of SNMP proved that time and time again. Today’s networks are getting a bit better with regard to APIs and the like. You could even go all the way up the food chain and buy something like Cisco Tetration if you absolutely needed that much visibility.

Embedding Reporting

Barefoot solves this problem by using their P4 language in concert with the Tofino chipset to provide a way for there to be visibility into the packets as they traverse the network. P4 gives Tofino the flexibility to build on to the data plane processing of a packet. Rather than bolting the monitoring on after the fact you can now put it right along side the packet flow and collect information as it happens.

The other key is that the real work is done by the Deep Insight Analytics Software running outside of the switch. The Analytics platform takes the data collected from the Tofino switches and starts processing it. It creates baselines of traffic patterns and starts looking for anomalies in the data. This is why Deep Insight claims to be able to detect microbursts. Because the monitoring platform can analyze the data being fed to it and provide the operator with insights.

It’s important to note that this is info only. The insights gathered from Deep Insight are for informational purposes. This is where the skill of network professional comes into play. By gaining perspective into what could be causing issues like microbursts from the software you gain the ability to take your skills and fix those issues. Perhaps it’s a misconfigured ECMP pair. Maybe it’s a dead or dying cable in a link. Armed with the data from the platform, you can work your networking magic to make it right.

Barefoot says that Deep Insight builds on itself via machine learning. While machine learning is seems to be one of the buzzwords du jour it could be hoped that a platform that can analyze the states of packets can start to build an idea of what’s causing them to behave in certain ways. While not mentioned in the press release, it could also be inferred that there are ways to upload the data from your system to a larger set of servers. Then you can have more analytics applied to the datasets and more insights extracted.

Tom’s Take

The Deep Insight platform is what I was hoping to see from Barefoot after I saw them earlier this year at Networking Field Day 14. They are taking the flexibility of the Tofino chip and the extensibility of P4 and combining them to build new and exciting things that run right alongside the data plane on the switches. This means that they can provide the kinds of tools that companies are willing to pay quite a bit for and do it in a way that is 100% capable of being audited and extended by brilliant programmers. I hope that Deep Insight takes off and sees wide adoption for Barefoot customers. That will be the biggest endorsement of what they’re doing and give them a long runway to building more in the future.

by networkingnerd at December 08, 2017 02:34 PM Blog (Ivan Pepelnjak)
XKCD Comics

December 07, 2017

Dyn Research (Was Renesys Blog)

Puerto Rico’s Slow Internet Recovery

On 20 September 2017, Hurricane Maria made landfall in Puerto Rico.  Two and a half months later, the island is still recovering from the resulting devastation.  This extended phase of recovery is reflected in the state of the local internet and reveals how far Puerto Rico still has to go to make itself whole again.

While most of the BGP routes for Puerto Rico have returned, DNS query volumes from the island are still only a fraction of what they were on September 19th  — the day before the storm hit.  DNS activity is a better indicator of actual internet use (or lack thereof) than the simple announcements of BGP routes.

We have been analyzing the impacts of natural disasters such as hurricanes and earthquakes going back to Hurricane Katrina in 2005.  Compared to the earthquake near Japan in 2011, Hurricane Sandy in 2012, or the earthquake in Nepal in 2015, Puerto Rico’s disaster stands alone with respect to its prolonged and widespread impact on internet access.  The following analysis tells that story.

DNS statistics

Queries from Puerto Rico to our Internet Guide recursive DNS service have still not recovered to pre-hurricane levels as is illustrated in the plot below.  Earlier this week, on December 4th, we handled only 53% of the query volume from Puerto Rico that were received on September 18th, just before the hurricane.  Both dates are Mondays, hopefully ruling out possible day-of-the-week effects.


Queries from Puerto Rico to our authoritative DNS services are also reduced from prior to the hurricane, but not as much as to our recursive DNS service.  This may be because caching effects are more pronounced with our authoritative DNS, since they handle queries for a smaller set of domains than our recursive DNS service.  Additionally, we may have lost some Internet Guide clients if those computers reverted to a default DNS configuration upon returning to service.  Regardless, the volume is still lower than pre-hurricane levels for authoritative DNS. On December 4th, we handled 71% of the query volume from Puerto Rico as compared to September 18th.


Based on these two figures (53% and 71%), we estimate that internet service in Puerto Rico is only a little more than half of where it was before the hurricane.

BGP and traceroute measurement statistics

The graphic below shows the impact of the hurricane on the routed networks of Puerto Rico colored by the major providers.  Many of these BGP routes were withdrawn as the hurricane came ashore and the island suffered what has been labeled the largest power outage in US history.  By early November, most of these routes were once again being announced in the global routing table, however, damage to last-mile infrastructure meant that many Puerto Ricans were still unable to obtain internet access.


Our traceroute measurements to Puerto Rico, illustrated below, tell a similar story — a steep drop-off on 20 September 2017, followed by a long slow recovery that appears to come incrementally as different pieces of Puerto Rican infrastructure come back online.  Despite an island-wide power outage, some networks in Puerto Rico (like Critical Hub Networks) continued to be reachable throughout the period of analysis.  While the plot below shows a steeper dip than the BGP-based plot above, the responding hosts that we measured to are often part of the core infrastructure and more likely to be connected to backup power than access layer networks and could, like the BGP routes above, overstate the degree of recovery.


Submarine cable impact

Perhaps less appreciated about this incident is Hurricane Maria’s impact on connectivity in several South American countries.  Puerto Rico is an important landing site for several submarine cables that link South America to the global internet.  The cable landing station serving Telecom Italia’s Seabone network had to be powered down due to flooding.  A statement from Seabone read:

We must inform you that Hurricane Maria (Category 5) has impacted Puerto Rico causing serious damage and flooding on the island. We had to de-energize our nodes at the station to avoid serious damage to the equipment.

As a result, in the early afternoon on 21 September 2017, we observed traffic shifts away from Telecom Italia in multiple South American countries as the submarine cable became unavailable.  To illustrate the impact, below are four South American ASNs that experienced a loss of one of their transit providers at this moment in time.  Cablevision Argentina (AS10481) is from Argentina, while the other three are from Brazil.  Brazilian provider Citta Telecom AS27720) lost service from Eweka Internet, while the others lost Telecom Italia transit.

Additionally, following the hurricane, Venezuelan incumbent CANTV announced that their international capacity had been cut by 50% due to storm-related submarine cable damage. The announcement was met with skepticism from a citizenry increasingly subjected to censorship and surveillance by their government.

<script async="async" charset="utf-8" src=""></script>

However, our data shows impairment of CANTV’s international links aligns with other outages in the region due to the effects of the hurricane. The plots below show latencies to CANTV from several cities around the world spiking on 21 September 2017 after the submarine cable station in Puerto Rico was flooded.


Immediately following Hurricane Maria’s arrival in Puerto Rico, Sean Donelan, Principal Security Architect of Qmulos began dutifully posting status updates he had collected to the NANOG email list about the connectivity situation on the island.  In addition, the website was setup to collect various metrics about the status of the recovery.

Now with over two months of hindsight, we can truly appreciate just how devastating the hurricane was in many respects other than simply internet impairment.  Puerto Rico may no longer be in the headlines as it was just after the storm, but the resources required to get this part of the United States back on its feet are truly extensive.

Below are links about how you can help:

<script async="async" charset="utf-8" src=""></script>

by Doug Madory at December 07, 2017 05:07 PM Blog (Ivan Pepelnjak)

December 06, 2017 Blog (Ivan Pepelnjak)

Automate Remote Site Hardware Refresh Process

Every time we finish the Building Network Automation Solutions online course I ask the attendees to share their success stories with me. Stan Strijakov was quick to reply:

I have yet to complete the rest of the course and assignments, but the whole package was a tremendous help for me to get our Ansible running. We now deploy whole WAN sites within an hour.

Of course I wanted to know more and he sent me a detailed description of what they’re doing:

Read more ...

by Ivan Pepelnjak ( at December 06, 2017 07:12 AM

XKCD Comics

December 05, 2017 Blog (Ivan Pepelnjak)

Stop Googling and Start Testing

Here’s a question I got on one of my ancient blog posts:

How many OSPF process ID can be used in a single VRF instance?

Seriously? You have to ask that? OK, maybe the question isn’t as simple as it looks. It could be understood as:

Read more ...

by Ivan Pepelnjak ( at December 05, 2017 12:43 PM

December 04, 2017

Ethan Banks on Technology

Postpone Inbox Procrastination

I’ve recently admitted to myself that my ineptitude with my inbox is due largely to procrastination. That is, I can’t face the task that a particular inbox message presents, and thus I ignore the message. With this admission comes a desire to reach inbox zero each and every day. I don’t like my productivity squashed by ineptitude. I must overcome!

But how?

  1. Getting to inbox zero each day is, first of all, an important goal. In other words, I really want to be at inbox zero each day. I don’t want to leave items hanging around for the next day. Therefore, among all my tasks, I have to prioritize inbox management.
  2. I filter messages heavily. I use Gmail, and have begun digging into the filtering system. At the moment, I have 27 rules that route messages to folders. Those rules are covering several dozen PR agencies, newsletters, and auto-notifiers. This helps me to focus when I’m working on my inbox, making it much easier to evaluate and react to messages depending on the folder they were routed to.
  3. I unsubscribe from uninteresting lists. Because I work in media, I receive pitches everyday from PR firms who don’t know me, but found me in a database and hope I’ll cover their customer’s product. Therefore, everyday I have to unsubscribe from certain lists I didn’t ask to be on.
  4. Outbound messages breed inbound messages. Therefore, I don’t respond to messages unless absolutely necessary. When I do respond, I attempt to be as complete as possible to minimize the conversational exchange. That means I anticipate questions and action items, and handle everything up front in a single message if possible. I don’t create a minimum effort message and throw it over the wall, which is really just delaying completion of the task.
  5. I remind myself that my inbox is not a task management tool. If I have a message I can’t complete that moment, I will create a task with a due date and tackle it when my task manager says I need to get it done. Then I am free to archive the message and respond to it later. I’m starting to feel that archiving is greater than deleting, because the message goes away while still being searchable. On the other hand, that could lead to a large mail database, and I’m not sure how I feel about that.
  6. I postpone procrastination. If I open my inbox, that means I’m there to tackle each and every item, moving them all towards closure. I’m not going to cherrypick favorite items for that dopamine hit. Rather, I’m going to go through each message chronologically (I’m still terrible at this), and work it through. I will not leave for another day items that invoke dread, because another day becomes another week. A week becomes a month. A month becomes two months, or even three. Procrastination is not getting things done, so I leave procrastination for another day.

The big deal here…

…is focus. To be able to grind through the daily inbox flood, I stack the deck in favor of focus. When I focus, I get the inbox cleared out.

I think of inbox management like cleaning the catbox. Doing it every day is best. If I miss a day, it’s tolerable, but sort of gross. If I skip a couple of days, I don’t really want to go in there, because cleaning it up is just a nasty, nasty job.

Therefore, it’s best to exercise self-discipline, focus once a day, and sift the inbox clean.

by Ethan Banks at December 04, 2017 04:50 PM Blog (Ivan Pepelnjak)

Simplifying Products

When I started my project life was simple: I had a few webinars, and you could register for the live sessions. After a while I started adding recordings, subscriptions, bundles, roadmaps (and tracks), books… and a few years later workshops and online courses.

As you can imagine, the whole thing became a hard-to-navigate mess. Right now you can buy almost 70 different products on Time for a cleanup.

Read more ...

by Ivan Pepelnjak ( at December 04, 2017 07:12 AM

Potaroo blog

Network Neutrality - Again

It strikes me as odd to see a developed and, by any reasonable standard, a prosperous economy getting into so much trouble with its public communications policy framework.

December 04, 2017 12:45 AM

XKCD Comics

December 03, 2017 Blog (Ivan Pepelnjak)

Worth Reading: The Basic Math behind Reliability

Diptanshu Singh wrote a nice explanation of the math behind reliability calculations. Definitely worth reading even if you hated statistics.

by Ivan Pepelnjak ( at December 03, 2017 09:03 AM

December 02, 2017 Blog (Ivan Pepelnjak)

How Did NETCONF Start on Software Gone Wild

A long while ago Marcel Wiget sent me an interesting email along the lines “I think you should do a Software Gone Wild podcast with Phil Shafer, the granddaddy of NETCONF

Not surprisingly, as we started discovering the history behind NETCONF we quickly figured out that all the API and automation hype being touted these days is nothing new – some engineers have been doing that stuff for almost 20 years.

Read more ...

by Ivan Pepelnjak ( at December 02, 2017 02:59 PM

December 01, 2017

The Networking Nerd

Does Juniper Need To Be Purchased?

You probably saw the news this week that Nokia was looking to purchase Juniper Networks. You also saw pretty quickly that the news was denied, emphatically. It was a curious few hours when the network world was buzzing about the potential to see Juniper snapped up into a somewhat larger organization. There was also talk of product overlap and other kinds of less exciting but very necessary discussions during mergers like this. Which leads me to a great thought exercise: Does Juniper Need To Be Purchased?

Sins of The Father

More than any other networking company I know of, Juniper has paid the price for trying to break out of their mold. When you think Juniper, most networking professionals will tell you about their core routing capabilities. They’ll tell you how Juniper has a great line of carrier and enterprise switches. And, if by some chance, you find yourself talking to a security person, you’ll probably hear a lot about the SRX Firewall line. Forward thinking people may even tell you about their automation ideas and their charge into the world of software defined things.

Would you hear about their groundbreaking work with Puppet from 2013? How about their wireless portfolio from 2012? Would anyone even say anything about Junosphere and their modeling environments from years past? Odds are good you wouldn’t. The Puppet work is probably bundled in somewhere, but the person driving it in that video is on to greener pastures at this point. The wireless story is no longer a story, but a footnote. And the list could go on longer than that.

When Cisco makes a misstep, we see it buried, written off, and eventually become the butt of really inside jokes between groups of engineers that worked with the product during the short life it had on this planet. Sometimes it’s a hardware mistake. Other times it’s software architecture missteps. But in almost every case, those problems are anecdotes you tell as you watch the 800lb gorilla of networking squash their competitors.

With Juniper, it feels different. Every failed opportunity is just short of disaster. Every misstep feels like it lands on a land mine. Every advance not expanded upon is the “one that got away”. Yet we see it time and time again. If a company like Cisco pushed the envelope the way we see Juniper pushing it we would laud them with praise and tell the world that they are on the verge of greatness all over again.

Crimes Of The Family

Why then does Juniper look like a juicy acquisition target? Why are they slow being supplanted by Arista as the favored challenger of the Cisco Empire? How is it that we find Juniper under the crosshairs of everyone, fighting to say alive?

As it turns out, wars are expensive. And when you’re gearing to fight Cisco you need all the capital you can. That forces you to make alliances that may not be the best for you in the long run. And in the case of Juniper, it brought in some of the people that thought they could get in on the ground floor of a company that was ready to take on the 800lb gorilla and win.

Sadly, those “friends” tend to be the kind that desert you when you need them the most. When Juniper was fighting tooth and nail to build their offerings up to compete against Cisco, the investors were looking for easy gains and ways to make money. And when those investors realize that toppling empires takes more than two quarters, they got antsy. Some bailed. Those needed to go. But the ones that stayed cause more harm than good.

I’ve written before about Juniper’s issues with Elliott Capital Management, but it bears repeating here. Elliott is an activist investor in the same vein as Carl Ichan. They take a substantial position in a company and then immediately start demanding changes to raise the stock price. If they don’t get their way, they release paper after paper decrying the situation to the market until the stock price is depressed enough to get the company to listen to Elliott. Once Elliott’s demands are met, the company exits their position. They get a small profit and move on to do it all over again, leaving behind a shell of a company wonder what happened.

Elliott has done this to Juniper in droves. Pulse VPN. Trapeze. They’ve demanded executive changes and forced Juniper to abandon good projects that have long term payoffs because they won’t bounce the stock price higher this quarter. And worse yet, if you look back over the last five years you can find story in the finance industry about Juniper being up for sale or being a potential acquisition target. Five. Years. When’s the last time you heard about Cisco being a potential target for buyout? Hell, even Arista doesn’t get shopped as much as Juniper.

Tom’s Take

I think these symptoms are all the same root issue. Juniper is a great technology company that does some exciting and innovative things. But, much like a beautiful potted plant in my house, they are reaching the maximum amount of size they can grow to without making a move. Like a plant, you can only grow as big as their container. If you leave them in a small one, they’ll only ever be small. You can transfer them to something larger but you risk harm or death. But you’ll never grow if you don’t change. Juniper has the minds and the capability to grow. And maybe with the eyes of the Wall Street buzzards looking elsewhere for a while, they can build a practice that gives them the capability to challenge in the areas they are good at, not just being the answer for everything Cisco is doing.

by networkingnerd at December 01, 2017 05:37 PM

Ethan Banks on Technology

All Of Ethan’s Podcasts And Articles For November 2017

Here’s a catalog of all the media I produced (or helped produce) in November 2017. I’ve included content summaries to motivate you to click. See, that’s coming right at you with how I’m trying to manipulate your behavior. I’m honest like that.



  • Episode 134 – Meet ZeroTier–Open Source Networking. I interview Adam Ierymenko about ZeroTier, his overlay networking baby that connects devices at L2 no matter where they are. Really interesting tech. This could be one of the most discussed shows in the Packet Pushers catalog. Plus, Adam joined the Packet Pushers audience Slack channel and has been interacting with the community.
  • Episode 135 – Master Python Networking–The Book. I interview Eric Chou, author of this book, who is donating all the proceeds to charity. Lots of folks have reacted to this interview, reflecting the strong interest from the networking community in automation.


  • Episode 108 – Building Service Meshes With Avi Networks (Sponsored). Service meshes are the latest in the evolution of application delivery controllers. The big idea is to put a service anywhere it’s needed and route traffic through it in a dynamic infrastructure environment. Pair “service mesh” with “cloud native,” and you’re starting to get it.
  • Episode 109 – Run VMware Apps In The Cloud With Ravello (Sponsored). Oracle Ravello makes a product that allows you to pick up your data center as-is and run it in the cloud. Lots of use cases–lab, change validation, infrastructure modeling, user acceptance testing, quality assurance, and even production.
  • Episode 110 – The Future Of Storage. We interview Tom Lyon, Sun Microsystems employee number 8, about where storage is headed. He seems to be a good person to ask, as he’s working at DriveScale these days, creating a distributed storage product designed for leading edge compute.
  • Episode 111 – NVMe And Its Network Impact. Cisco’s J Metz makes a repeat Datanauts appearance. We go full nerd discussing how the incredible performance of NVMe drives will impact storage networks. This is a very big deal that not enough people are talking about, IMHO.
  • Episode 112 – Building The Perfect Data Center Beast. We talk through several aspects of building a physical data center including power distribution, hot/cold aisle designs, racks, and cabling plant. A longer-than-average show that’s seen a good bit of feedback on Twitter already, including opening up the question of, “Does anyone build raised floor facilities anymore?”

Briefings In Brief Podcast

  • Understanding Wireshark Capture Filters. I dive deep on a specific Wireshark capture filter, explaining how it works piece by piece, and concluding with a list of resources to find even more information.


  • Nothing new this month, although I have decided that I am going to focus on personal productivity in this blog. I have felt for a while that I needed a specific topic to write about here, and productivity is an area where I continue to evolve.


  • Human Infrastructure Magazine 70 – How Do You Learn? I ask for feedback from you about how you learn. We’re working on new styles of content over at Packet Pushers Heavy Industries, and want to come as close as we can to getting it correct out of the gate. Your feedback appreciated.


  • I delivered a QoS Fundamentals webinar over at this month. That went reasonably well, although I got some feedback that made me question how I should be doing slides, etc. when doing live over-the-Internet presentations. I’ve since bought a Wacom tablet that I need to figure out how to use. My idea is to use the Wacom to do live whiteboarding during webinars.
  • I spent a day with a higher ed institution, acting as facilitator for a devops workshop they ran internally. That was quite intriguing as their issues were far more human than technical. There’s content there somewhere. I need to think it through and decide what to focus on.
  • The devops workshop did leave me with a technical question I don’t have a great answer to yet. That is, can devops practices be applied in the case of shops deploying lots of shrink-wrap software? That’s a different pipeline than an in-house dev shop pushing code through a CI/CD pipeline into prod using microservices over cloud native. And yet…there are many parallels as well as demands of efficiency. Where does devops, as traditionally defined, fit? I have homework to do and perhaps some folks to interview to shine some light on this topic as I have mixed opinions right now.
  • I am knee-deep into the Todoist task management app, working to make it my single source of truth. Post coming.
  • I have completed migration of my home and lab networks to a D-Link L3 switch instead of a Cisco SG-300 that was running, but had taken a lightning hit and lost an ASIC (I think) and therefore several front panel ports. The D-Link is a DGS-1510-52 gigabit Ethernet switch with a ton of capabilities. I have spent time going through the manual, and I’m favorably impressed. I will likely blog about some of the lesser-known features when I get a chance to study them.
  • I have also migrated my home firewall from a VMware instance of pfSense to a bare-metal instance. Now I have a beast-mode firewall at home with a quad-core Xeon CPU and 32GB of RAM. It’s barely ticking over with the load I’m placing on it, but that was the point. I am going to be loading it up with as many features as I think I can take advantage of, and I don’t want hardware to be a question mark. I still have a ways to go on this box, but so far I’ve got it serving forward and reverse DNS locally, which has made some of the auxiliary packages like BandwidthD offer some more interesting statistics. For example, I have Amazon Echo devices sucking down gigabytes of data from the Internet. Fascinating, and vaguely worrying until I have a better idea of what that data is. In any case, I have plans for ZeroTier on this pfSense box, but I need to do more homework to grok how to install the package, as it doesn’t seem to be supported as a core function. Not sure yet on this, as I’ve heard it can be done, but haven’t managed to hit the right support page explaining it.
  • After many hours, I managed to get minikube (virtualized Kubernetes cluster running on a single host) running on my iMac. I was making it harder than it needed to be, wanting to run minikube in a Linux VM. It kept failing miserably until I opted to do the minikube install like the guide suggested, leveraging Fusion as the hypervisor but otherwise running OS X native. I need minikube to support my reading of the Kubernetes Up And Running book sitting on my desk. It’s not a long book, but it won’t mean much without the lab work to reinforce concepts.

by Ethan Banks at December 01, 2017 05:00 AM

XKCD Comics

November 30, 2017 Blog (Ivan Pepelnjak)

Automate End-to-End Latency Measurements

Here’s another idea from the Building Network Automation Solutions online course: Ruben Tripiana decided to implement a latency measurement tool. His playbook takes a list of managed devices from Ansible inventory, generates a set of unique device pairs, measures latency between them, and produces a summary report (see also his description of the project).

Read more ...

by Ivan Pepelnjak ( at November 30, 2017 07:19 AM

November 29, 2017 Blog (Ivan Pepelnjak)

BGP as a Better IGP? When and Where?

A while ago I helped a large enterprise redesign their data center fabric. They did a wonderful job optimizing their infrastructure, so all they really needed were two switches in each location.

Some vendors couldn’t fathom that. One of them proposed to build a “future-proof” (and twice as expensive) leaf-and-spine fabric with two leaves and two spines. On top of that they proposed to use EBGP as the only routing protocol because draft-lapukhov-bgp-routing-large-dc – a clear case of missing the customer needs.

Read more ...

by Ivan Pepelnjak ( at November 29, 2017 08:13 AM

XKCD Comics

November 28, 2017

Dyn Research (Was Renesys Blog)

The Migration of Political Internet Shutdowns

In January 2011, what was arguably the first significant disconnection of an entire country from the Internet took place when routes to Egyptian networks disappeared from the Internet’s global routing table, leaving no valid paths by which the rest of the world could exchange Internet traffic with Egypt’s service providers. It was followed in short order by nationwide disruptions in Bahrain, Libya, and Syria. These outages took place during what became known as the Arab Spring, highlighting the role that the Internet had come to play in political protest, and heralding the wider use of national Internet shutdowns as a means of control.

“How hard is it to disconnect a country from the Internet, really?”

After these events, and another significant Internet outage in Syria, this question led a blog post published in November 2012 by former Dyn Chief Scientist Jim Cowie that examined the risk of Internet disconnection for countries around the world, based on the number of Internet connections at their international border. “You can think of this, to [a] first approximation,” Cowie wrote, “as the number of phone calls (or legal writs, or infrastructure attacks) that would have to be performed in order to decouple the domestic Internet from the global Internet.”

Defining Internet Disconnection Risk

Based on our aggregated view of the global Internet routing table at the time, we identified the set of border providers in each country: domestic network providers (autonomous systems, in BGP parlance) who have direct connections, visible in routing, to international (foreign) providers. From that data set, four tiers were defined to classify a country’s risk of Internet disconnection. A summary of these classifications is below – additional context can be found in the original blog post:

  • If a country has only 1 or 2 service providers at its international frontier, it is classified as being at severe risk of Internet disconnection.
  • With fewer than 10 service providers at its international frontier, a country is classified as being at significant risk of Internet disconnection.
  • A country’s risk of Internet disconnection is classified as low risk with between 10 and 40 internationally-connected service providers.
  • Finally, countries with more than 40 providers at their borders are considered to be resistant to Internet disconnection.

The original blog post classified 223 countries and territories, with the largest number of them classified as being at significant risk of Internet disconnection.

A February 2014 update to the original post, entitled “Syria, Venezuela, Ukraine: Internet Under Fire” examined changes observed in the 16 months since the original post, highlighting both increases and decreases in Internet disconnection risk level across a number of countries. The post noted the continued fragility of Internet connectivity in Syria, owing in part to its classification of being at severe risk of Internet disconnection, as well as mentioning the lack of nationwide Internet disruptions in Venezuela despite periodic slowdowns and regional access disruptions.

It has been five years since the original blog post, and over three and a half years since the followup post, so we thought that it would be interesting to take a new look at Internet resiliency around the world. Has connection diversity increased, and does that lead to a potential decrease in vulnerability to Internet shutdown?

However, as the 2014 blog post notes, “We acknowledge the limitations of such a simple model in predicting complex events such as Internet shutdowns. Many factors can contribute to making countries more fragile than they appear at the surface (for example, shared physical infrastructure under the control of a central authority, or the physical limitations of a few shared fiber optic connections to distant countries).” For instance, at the time of the original (2012) post, New Zealand relied primarily on the Southern Cross submarine cable connection to Australia for international Internet connectivity, despite our data showing dozens of border network providers. And while Iraq has gained numerous border relationships since 2012, most of the country (except for Kurdistan in the north) relies on a national fiber backbone which the Iraqi government has shut down dozens of times since 2014 to combat cheating on student exams, stifle protests, and disrupt ISIS communication.

In addition, it’s worth recognizing that there likely isn’t a meaningful difference in resilience in a country with 39 border providers (which would classify it as “low risk”) and 41 border providers (which would classify it as “resistant”). With these caveats in mind, an updated world map reflecting the risk of Internet disconnection as classified in our 2017 data set is presented below.

What’s Happened Since Then?

In reviewing other notable Internet shutdowns that have occurred since the 2014 post was published, a few things stood out:

However, the most interesting observation was the ‘migration’ of politically-motivated nationwide Internet disruptions. The outages that occurred during the Arab Spring time frame were largely concentrated in North Africa and the Middle East, shifting over the last several years into sub-Saharan Africa. This shift has not gone unnoticed, with online publication Quartz also highlighting the growing trend of African governments blocking the Internet to silence dissent, and the United Nations taking note as well. In addition, as these shutdowns are now a more regular occurrence, both in Africa and in other areas around the world, it is also worth looking at the financial impact that they have on affected countries.

Nearly three years ago, in January 2015, an Internet shutdown was put into place in Kinshasa, the capital of the Democratic Republic of Congo, after bloody clashes took place between opponents of President Joseph Kabila and police.  Banks and government agencies reportedly regained access after four days, while subscribers remained offline for three weeks. Almost two years later, in December 2016, an Internet shutdown was ordered as a means of blocking access to social media sites to prevent mobilization of those protesting against the president’s stay in office beyond the two-term limit.

<script async="async" charset="utf-8" src=""></script>

While many governments force Internet shutdowns that last for just a few hours, or across multiple days or weeks, Gabon combined both in September 2016, implementing a nightly “Internet curfew” that lasted for 23 days. The regular Internet disruptions occurred on the heels of a disputed national election that ultimately saw the incumbent president win a second term by a slim vote margin. International Internet connectivity was also reportedly restricted in the week before the election. With Internet access largely concentrated through Gabon Telecom, the country is at severe risk of Internet shutdown.

<script async="async" charset="utf-8" src=""></script>

In late November 2016, Internet connectivity in Gambia was shut down ahead of a national election that saw the country’s president of more than 20 years upset by the opposition candidate. Published reports noted that the opposition party relied on Internet messaging apps to organize rallies and demonstrations. Efforts by the incumbent party to disrupt Internet connectivity were presumably intended to derail this organizing, as well as to limit potential protests depending on the outcome of the election.

<script async="async" charset="utf-8" src=""></script>

In Cameroon, Internet connectivity was blocked in English-speaking parts of the country starting in January 2017, reportedly affecting about 20 percent of the population. The government reportedly suspended Internet service for users in the Southwest and Northwest provinces after a series of protests that resulted in violence and the arrest of community leaders. Ten months later, Internet access remains unstable in Cameroon, highlighted by the #BringBackOurInternet hashtag on Twitter.

<script async="async" charset="utf-8" src=""></script>

In Togo, throughout the fall of 2017, protesters have been calling for the resignation of President Faure Gnassingbe, who has been in power since his father died in 2005. In response, the country’s government has limited Internet access in an effort to prevent demonstrators from organizing on social media, and has also blocked text messaging. Published reports indicate that the mobile messaging app WhatsApp was a particular target, although some users resorted to VPNs to maintain access to the tool. Looking at the graph below, the Internet restrictions have not generally been implemented through broad manipulation or removal of routes — while some instability is evident, there have not been widespread outages, as have been seen in the past in countries such as Syria.

<script async="async" charset="utf-8" src=""></script>

Most recently, the government of Equatorial Guinea widely blocked access to the Internet ahead of a nationwide election that was widely expected to keep the ruling party in power. Local service providers GuineaNet and IPXEG, among others, were taken completely offline. This disruption followed blocking access to opposition Web sites, which has been going on since  2013, as well blocking access to Facebook, which was put into place when the electoral campaign started on October 27.

<script async="async" charset="utf-8" src=""></script>

“Swift and Dramatic” Economic Damage

In 2011, the Organisation for Economic Co-operation and Development (OECD) estimated that the economic impact of Egypt’s five-day national Internet shutdown “incurred direct costs of at minimum USD 90 million.” They estimated that lost revenues due to blocked telecommunications and Internet services accounted for approximately USD 18 million per day. However, the OECD also noted that “this amount does not include the secondary economic impacts which resulted from a loss of business in other sectors affected by the shutdown of communication services e.g. e-commerce, tourism and call centres.”

The true cost to a country of a nationwide Internet shutdown can be significant. An October 2016 study produced by Deloitte reached the following conclusions:

“The impacts of a temporary shutdown of the Internet grow larger as a country develops and as a more mature online ecosystem emerges. It is estimated that for a highly Internet connected country, the per day impact of a temporary shutdown of the Internet and all of its services would be on average $23.6 million per 10 million population. With lower levels of Internet access, the average estimated GDP impacts amount to $6.6 million and to $0.6 million per 10 million population for medium and low Internet connectivity economies, respectively.”

The study also noted that if Internet disruptions become more frequent and longer-term in nature, these impacts are likely to be magnified.

The Brookings Institute also published a report in October 2016 that looked at the cost of Internet shutdowns over the previous year. The report’s headline claim that “Internet shutdowns cost countries $2.4 billion last year” was cited in publications including Techcrunch and an Internet Society Policy Brief. However, within the report, so-called Internet shutdowns are broken down into a number of categories. By their count, 36 instances of “national Internet” shutdowns led to just under 20 days of aggregate downtime, responsible for almost USD 295 million of financial impact. In contrast, blocking access to apps at a nationwide level accounted for nearly half of the claimed financial impact.

The costs of a nationwide Internet shutdown to a country’s economy are clearly very real. In an October 2016 article in The Atlantic on this topic, my colleague Doug Madory noted “The hope is that a government would be less likely to order an Internet blackout if it knew the negative impacts of such a decision in terms of hard dollar figures.” We can hope that in the future, national governments will recognize that the money that these nationwide outages would cost them would be better redirected into improving Internet connectivity for citizens and businesses across their countries.


In 2012, we published the “Could It Happen In Your Country?” analysis in the aftermath of the Internet disruptions of the Arab Spring. Since then, we have observed and documented the trend of national Internet blackouts as they have migrated, most recently, to Africa.

While the studies by Deloitte and Brookings have pointed out the severe negative economic consequences of these blackouts, NGOs like AccessNow and Internet Sans Frontières do advocacy work by drawing attention to the adverse impacts on human rights when governments decide to cut communications lines. The role we play, and have played for many years, is to inform the Internet blackout discussion with expert technical analysis.

We can only hope that our combined efforts help to reduce the frequency of future government-directed Internet disruptions. Given the number of blackouts we’ve observed in recent months, help can’t come fast enough.

by David Belson at November 28, 2017 03:30 PM Blog (Ivan Pepelnjak)

Security or Convenience, That’s the Question

One of my readers was so delighted that something finally happened after I wrote about a NX-OS bug that he sent me a pointer to another one that has been pending for a long while, and is now officially terminated as FAD (Functions-as-Designed… even documented in the Further Problem Description).

Here’s what he wrote (slightly reworded)…

Read more ...

by Ivan Pepelnjak ( at November 28, 2017 08:21 AM

November 27, 2017

Potaroo blog

Helping Resolvers to help the DNS

Here, I'd like to look at ways that recursive resolvers in the DNS can take some further steps that assist other parts of the DNS, notably the set of authoritative name servers, including root zone servers, to function more efficiently, and to mitigate some of the negative consequences if these authoritative name servers are exposed to damaging DOS attacks.

November 27, 2017 12:45 AM

Hiding the DNS

I’d like to look in a little more detail at the efforts to hide the DNS behind HTTPS, and put the work in the IETF's DOH Working Group into a broader perspective. There are a number of possible approaches here, and they can be classified according to the level of interaction between the DNS application and the underlying HTTPS encrypted session.

November 27, 2017 12:30 AM

XKCD Comics

November 26, 2017

Potaroo blog


It took some hundreds of years, but Europe eventually reacted to the introduction of gunpowder and artillery by recognising that they simply could not build castles large enough to defend against any conceivable attack. So they stopped. I hope it does not take us the same amount of time to understand that building ever more massively fortified and over-provisioned DNS servers is simply a tactic for today, not a strategy for tomorrow.

November 26, 2017 10:30 PM


It took some hundreds of years, but Europe eventually reacted to the introduction of gunpowder and artillery by recognising that they simply could not build castles large enough to defend against any conceivable attack. So they stopped. I hope it does not take us the same amount of time to understand that building ever more massively fortified and over-provisioned DNS servers is simply a tactic for today, not a strategy for tomorrow.

November 26, 2017 10:30 PM