September 21, 2018 Blog (Ivan Pepelnjak)

Smart or Dumb NICs on Software Gone Wild

Hardware vendors are always making their silicon more complex and feature-rich. Is that a great idea or a disaster waiting to happen? We asked Luke Gorrie, the lead developer of Snabb Switch (an open-source user-land virtual switch written in Lua) about his opinions on the topic.

TL&DL version: Give me a dumb NIC, software can do everything else.

If you want to know more, listen to Episode 93 of Software Gone Wild.

by Ivan Pepelnjak ( at September 21, 2018 06:26 AM

XKCD Comics

September 20, 2018 Blog (Ivan Pepelnjak)

Using CSR1000V in AWS Instead of Automation or Orchestration System

As anyone starting their journey into AWS quickly discovers, cloud is different (or as I wrote in the description of my AWS workshop you feel like Alice in Wonderland). One of the gotchas: when you link multiple routing domains (Virtual Private Clouds – the other VPC) you have to create static routing table entries on both ends. Even worse, there’s no transit VPC – you have to build a full mesh of relationships.

The correct solution to this challenge is automation:

Read more ...

by Ivan Pepelnjak ( at September 20, 2018 05:34 AM

September 19, 2018

Dyn Research (Was Renesys Blog)

First Subsea Cable Across South Atlantic Activated

Yesterday marked the first time in recent Internet history that a new submarine cable carried live traffic across the South Atlantic, directly connecting South America to Sub-Saharan Africa.  The South Atlantic Cable System (SACS) built by Angola Cables achieved this feat around midday on 18 September 2018.

Our Internet monitoring tools noticed a change in latency between our measurement servers in various Brazilian cities and Luanda, Angola, decreasing from over 300ms to close to 100ms.  Below these are measurements to Angolan telecoms TVCABO (AS36907) and Movicel (AS37081) as the SACS cable came online yesterday.

A race to be first

In the past decade there have been multiple submarine cable proposals to fill this gap in international connectivity, such as South Atlantic Express (SAEx) and South Atlantic Inter Link (SAIL) cables.

In recent weeks, the SAIL cable, financed and built by China, announced that they had completed construction of their cable and it was the first cable connecting Brazil to Africa (Cameroon). However, since we haven’t seen any changes in international connectivity for Cameroon, we don’t believe this cable is carrying any traffic yet.

What’s the significance?

In addition to directly connecting Brazil to Portuguese-speaking Angola, the cable offers South America its first new submarine cable link to the outside world in 18 years that doesn’t go through the United States.  The upcoming EllaLink cable that will connect Brazil directly to Europe (Portugal) has an RFS date in the year 2020.

The SACS cable will enable South America to more directly reach the growing Internet economies of Africa, as well as offer an alternative path to Europe after cross-connecting to other submarine cables hugging Africa’s western coast.  Eventually the SACS cable, by traversing the African continent itself, will enable more direct connectivity between South America and Asia bypassing Europe and the United States altogether.

In addition, the SACS cable connects to Google’s MONET cable at Fortaleza, Brazil enabling the African Internet a more direct path to the United States without first passing through Europe.


It is hard to overstate the potential for this new cable to profoundly alter how traffic is routed (or not) between the northern and southern hemispheres of the Internet.  The South Atlantic was the last major unserviced transoceanic Internet route and the activation of SACS is a tremendous milestone for the growth and resilience of the global Internet.

by Doug Madory at September 19, 2018 09:48 PM Blog (Ivan Pepelnjak)

Infrastructure-as-Code, NETCONF and REST API

This is the third blog post in “thinking out loud while preparing Network Infrastructure as Code presentation for the network automation course” series. You might want to start with Network-Infrastructure-as-Code Is Nothing New and Adjusting System State blog posts.

As I described in the previous blog post, the hardest problem any infrastructure-as-code (IaC) tool must solve is “how to adjust current system state to desired state described in state definition file(s)”… preferably without restarting or rebuilding the system.

There are two approaches to adjusting system state:

Read more ...

by Ivan Pepelnjak ( at September 19, 2018 10:56 AM

XKCD Comics

September 18, 2018 Blog (Ivan Pepelnjak)

Data Point: Why Automation Won’t Replace Humans

Here’s a bit of good news for those of you scared of network automation replacing your jobs: even Elon Musk didn’t manage to pull it off, so I don’t think a networking vendor dabbling in intent will manage to do it (particularly considering the track record of networking vendors’ network management and orchestration systems).

Read more ...

by Ivan Pepelnjak ( at September 18, 2018 07:12 AM

September 17, 2018 Blog (Ivan Pepelnjak)

Valley-Free Routing in Data Center Fabrics

You might have noticed that almost every BGP as Data Center IGP design uses the same AS number on all spine switches (there are exceptions coming from people who use BGP as RIP with AS-path length serving as hop count… but let’s not go there).

There are two reasons for that design choice:

Read more ...

by Ivan Pepelnjak ( at September 17, 2018 05:48 AM

XKCD Comics

September 14, 2018

The Networking Nerd

A Matter of Perspective

Have you ever taken the opportunity to think about something from a completely different perspective? Or seen someone experience something you have seen through new eyes? It’s not easy for sure. But it is a very enlightening experience that can help you understand why people sometimes see things entirely differently even when presented with the same information.

Overcast Networking

The first time I saw this in action was with Aviatrix Systems. I first got to see them at Cisco Live 2018. They did a 1-hour presentation about their solution and gave everyone an overview of what it could do. For the networking people in the room it was pretty straightforward. Aviatrix did a lot of the things that networking should do. It was just in the cloud instead of in a data center. It’s not that Aviatrix wasn’t impressive. It’s the networking people have a very clear idea of what a networking platform should do.

Fast forward two months to Cloud Field Day 4. Aviatrix presents again, only this time to a group of cloud professionals. The message was a little more refined from their first presentation. They included some different topics to appeal more to a cloud audience, such as AWS encryption or egress security. The reception from the delegates was the differencue between night and day. Rather than just be satisfied with the message that Aviatrix put forward, the Cloud Field Day delegates were completely blown away! They loved everything that Aviatrix had to say. They loved the way that Aviatrix approached a problem they had seen and couldn’t quite understand. How to extend networking into the cloud and take control of it.

Did Aviatrix do something different? Why was the reaction between the two groups so stark? How did it happen this way? I think it is in part because networking people talk to a networking company and see networking. They find the things they expect to find and don’t look any deeper. But when the same company presents to an audience that doesn’t have networking on the brain for the entirety of their career it’s something entirely different. While a networking audience may understand the technology a cloud audience may understand how to make it work better for their needs because they can see the advantages. Perspective matters in this case because people exposed to new ideas find ways to make them work in ways that can only be seen with fresh eyes.

Letting Go of Wires

The second time I saw an example of perspective at play was at Mobility Field Day 3 with Arista Networks. Arista is a powerhouse in the data center networking space. They have gone up against Cisco and taken them head-to-head in a lot of deals. They have been gaining marketshare from Cisco in a narrow range of products focused on the data center. But they’re also now moving into campus switching as well as wireless with the acquisition of Mojo Networks.

When Arista stepped up to present at Mobility Field Day 3, the audience wasn’t a group of networking people that wanted to hear about CloudVision or 400GbE or even EOS. The audience of wireless and mobility professionals wanted to hear how Arista is going to integrate the Mojo product line into their existing infrastructure. The audience was waiting for a message that everything would work together and the way forward would be clear. I don’t know that they heard that message, but it wasn’t because of anything that Arista did on purpose.

Arista is very much trying to understand how they’re going to integrate Mojo Networks into what they do. They’re also very focused on the management and control plane of the access points. These are solved problems in the wireless world right now. When you talk to a wireless professional about centralized management of the device or a survivable control plane that can keep running if the management system is offline they’ll probably laugh. They’ve been able to experience this for the past several years so far. They know what SDN should look like because it’s the way that CAPWAP controllers have always operated. Wireless pros can tell you the flaws behind backhauling all your traffic through a controller and why there are much better options to keep from overwhelming the device.

Wireless pros have a different perspective from networking people right now. Things that networking pros are just now learning about are the past to wireless people. Wireless pros are focused more on the radio side of the equation than the routing and switching side. That perspective gives the wireless crowd a very narrow focus on solving some very hard problems but it does make them miss the point that their expertise can be invaluable to helping both networking pros and networking companies see how to take the best elements of wireless networking control mechanisms and implement them in such a way as to benefit everyone.

Tom’s Take

For me, the difficulty in seeing things differently doesn’t come from having an open mind. Instead, it comes from the fact that most people don’t have a conception of anything outside their frame of reference. We can’t really comprehend things we can’t conceive of. What you need to do to really understand what it feels like to be in someone else’s shoes is have someone show you what it looks like to be in them. Observe people learning something for the first time. Or see how they react to a topic you know well. Odds are good you might just find that you will know it better because they helped you understand it better.

by networkingnerd at September 14, 2018 06:59 AM Blog (Ivan Pepelnjak)

Video: What Is SDWAN?

Pradosh Mohapatra, the author of last week’s SD-WAN Overview webinar started his presentation with a seemingly simple question: What Is SD-WAN?

You need at least free subscription to watch his answer.

by Ivan Pepelnjak ( at September 14, 2018 05:39 AM

XKCD Comics

September 13, 2018 Blog (Ivan Pepelnjak)

Worth Reading: Intent-Based Networking Taxonomy

This blog post was initially sent to the subscribers of my SDN and Network Automation mailing list. Subscribe here.

Saša Ratković (Apstra) published a must-read Intent-Based Networking Taxonomy which (not surprisingly) isn’t too far from what I had to say about the topic in a blog post and related webinar.

It’s also interesting to note that the first three levels of intent-based networking he described match closely what we’re discussing in Building Network Automation Solutions online course and what David Barroso described in Network Automation Use Cases webinar:

Read more ...

by Ivan Pepelnjak ( at September 13, 2018 05:42 AM

September 12, 2018 Blog (Ivan Pepelnjak)

GitOps in Networking

This blog post was initially sent to the subscribers of my SDN and Network Automation mailing list. Subscribe here.

Tom Limoncelli published a must-read article in ACM Queue describing GitOps – the idea of using Pull Requests together with CI/CD pipeline to give your users the ability to request changes to infrastructure configuration.

Using GitOps in networking is nothing new – Leslie Carr talked about this concept almost three years ago @ RIPE 71, and I described some of the workflows you could use in Network Automation 101 webinar.

Read more ...

by Ivan Pepelnjak ( at September 12, 2018 04:41 PM

Adjusting System State with Infrastructure as Code

This is the second blog post in “thinking out loud while preparing Network Infrastructure as Code presentation for the network automation course” series. If you stumbled upon it, you might want to start here.

An anonymous commenter to my previous blog post on the topic hit the crux of the infrastructure-as-code challenge when he wrote: “It's hard to do a declarative approach with Ansible and the nice network vendor APIs.” Let’s see what he was trying to tell us.

Read more ...

by Ivan Pepelnjak ( at September 12, 2018 05:42 AM

XKCD Comics

September 11, 2018 Blog (Ivan Pepelnjak)

Network Automation with Ansible for Undergraduate Students

Last year’s experiment generated so much interest that I decided to repeat it this year: if you’re an undergraduate or Master's student and manage to persuade us that you’re motivated enough to automate the **** out of everything, you’ll get a free seat in Ansible for Networking Engineers online course.

Interested? Check out the details, and apply before October 1st.

Too old? Please spread the word ;)

by Ivan Pepelnjak ( at September 11, 2018 07:04 AM

September 10, 2018

About Networks

Site-to-site VPN tunnels between Meraki MX and Cisco ASA

As I wrote on my recent post here, I was involved into a project to implement a Meraki MX into the Azure Cloud. This project also includes a migration phase with site-to-site VPN tunnels between Meraki MX and Cisco ASA. Even if the “Non-Meraki VPN peers” are supported on the Meraki MX, you may have
Read More »

The post Site-to-site VPN tunnels between Meraki MX and Cisco ASA appeared first on

by Jerome Tissieres at September 10, 2018 10:39 AM Blog (Ivan Pepelnjak)

Routing in Data Center: What Problem Are You Trying to Solve?

Here’s a question I got from an attendee of my Building Next-Generation Data Center online course:

As far as I understood […] it is obsolete nowadays to build a new DC fabric with routing on the host using BGP, the proper way to go is to use IGP + SDN overlay. Is my understanding correct?

Ignoring for the moment the fact that nothing is ever obsolete in IT, the right answer is it depends… this time on answer(s) to two seemingly simple questions “what services are we offering?” and “what connectivity problem are we trying to solve?”.

Read more ...

by Ivan Pepelnjak ( at September 10, 2018 05:49 AM

XKCD Comics

September 08, 2018 Blog (Ivan Pepelnjak)

Repost: Tony Przygienda on Valley-Free (or Non-ZigZag) Routing

Most blog posts generate the usual noise from the anonymous peanut gallery (if only they'd have at least a sliver of Statler and Waldorf in them), but every now and then there's a comment that's pure gold. The one made by Tony Przygienda (of RIFT fame) on Valley-Free Routing post is so good and relevant that I decided to republish it as a separate blog post. Enjoy!

Read more ...

by Ivan Pepelnjak ( at September 08, 2018 09:59 AM

September 07, 2018

The Networking Nerd

A Review of Ubiquiti Wireless

About six months ago, I got fed up with my Meraki MR34 APs. They ran just fine, but they needed attention. They needed licenses. They needed me to pay for a dashboard I used rarely but yet had to keep up yearly. And that dashboard had most of the “advanced” features hidden away under lock and key. I was beyond frustrated. I happen to be at the Wireless LAN Professionals Conference (WLPC) and ran into Darrell DeRosia (@Darrell_DeRosia) about my plight. His response was pretty simple:

“Dude, you should check out Ubiquiti.”

Now, my understanding of Ubiquiti up to that point was practically nothing. I knew they sold into the SMB side of the market. They weren’t “enterprise grade” like Cisco or Aruba or even Meraki. I didn’t even know the specs on their APs. After a conversation with Darrell and some of the fine folks at Ubiquiti, I replaced my MR34s with a UniFI AP-AC-HD and an AP-AC-InWall-Pro. I also installed one of their UniFi Security Gateways to upgrade my existing Linksys connection device.

You may recall my issue with redundancy and my cable modem battery when I tried to install the UniFi Security Gateway for the first time. After I figured out how to really clear the ARP entries in my cable modem I got to work. I was able to install the gateway and get everything back up and running on the new Ubiquiti APs. How easy was it? Well, after renaming the SSID on the new APs to the same as the old one, I was able to connect all my devices without anyone in the house having to reconnect any of their devices. As far as they knew, nothing changed. Except for the slightly brighter blue light in my office.

I installed the controller software on a spare machine I had running. No more cloud controllers for me. I knew that I could replicate those features with a Ubiquiti Cloud Key, but my need to edit wireless settings away from home was pretty rare.

Edit: As pointed out by my fact checked Marko Milivojevic, you don’t need a Cloud Key for remote access. The Cloud Key functions more as a secure standalone controller instance that has remote access capabilities. You can still run the UniFi controller on lots of different servers, including dedicated rack-mount gear or a Mac Mini (like I have).

I logged into my new wireless dashboard for the first time:

It’s lovely! It gives me all the info I could want for my settings and my statistics. At a glance, I can see clients, devices, throughput, and even a quick speed test of my WAN connection. You’re probably saying to yourself right now “So what? This kind of info is tablestakes, right?” And you wouldn’t be wrong. But, the great thing about Ubiquiti is that its going to keep working after 366 days of installation without buying any additional licenses. It’s not going to start emailing me telling me it’s time to sink a few hundred dollars into keeping the lights on. That’s a big deal for me at home. Enterprises may be able to amortize license costs over the long haul but small businesses aren’t so lucky.

The Ubiquiti UniFi dashboard also has some other great things. Like a settings page:

Why is that such a huge deal? Well, Ubiquiti doesn’t remove functionality from the dashboard. They put it where you can find it. They make it easy to tweak settings without wishing on a star. They want you to use the wireless network the way you need to use it. If that means enabling or disabling features here and there to get things working, so be it. Those features aren’t locked away behind a support firewall that needs an act of Congress to access.

But the most ringing endorsement of Ubiquiti for me? Zero complaints in my house. Not once has anyone said anything about the wireless. It just “works”. With all the streaming and Youtube watching and online video game playing that goes on around here it’s pretty easy to saturate a network. But the Ubiquiti APs have kept up with all the things that have been thrown at them and more.

I also keep forgetting that I even have them installed. That’s a good thing. Because I don’t spend all my time tinkering with them they tend to fade away into the background of the house. Even the upstairs in-wall AP is chugging right along and serving clients with no issues. Small enough to fit into a wall box, powerful enough to feed Netflix for a whole family.

Tom’s Take

I must say that I’m very impressed by Ubiquiti. My impressions about their suitability for SMB/SME was all wrong. Thanks to Darrell I now know that Ubiquiti is capable of handling a lot of things that I considered “enterprise only” features. Even Lee Hutchinson at Are Technica is a fan of Ubiquiti at home. I also noticed that the school my kids attend installed Ubiquiti APs over the summer. It looks like Ubiquiti is making in-roads into SMB/SME and education. And it’s a very workable solution for what you need from a wireless system. Add in the fact that the software doesn’t require yearly upkeep and it makes all the sense in the world for someone that’s not ready to commit to the treadmill of constant licensing.

by networkingnerd at September 07, 2018 04:16 PM Blog (Ivan Pepelnjak)

Worth Reading: IPv6 Renumbering == Pain in the …

Johannes Weber was forced to stress-test the IPv6 networks are easy to renumber nonsense and documented his test results – a must-read for everyone deploying IPv6.

He found out that renumbering IPv6 in his lab required almost four times as many changes as renumbering (outside) IPv4 in the same lab.

My cynical take on that experience: “Now that you’ve documented everything that needs to be changed, make sure it’s automated the next time ;)

by Ivan Pepelnjak ( at September 07, 2018 09:04 AM

XKCD Comics

September 06, 2018

Networking Now (Juniper Blog)

Black Hat 2018: Securing the Expanding Cyberattack Landscape

 There is a recurring theme at Black Hat every year: security researchers come together and show the world how to hack into systems and things. This year was no exception. 


The sheer size of the cyber-attack surface becomes more daunting by the day. Networks now connect data centers and on-premise hardware to private and public clouds and IoT devices operating at the network edge, exposing more potential entry points and increasing vulnerability. The growing attack landscape places defenders and security teams at a disadvantage against cybercriminals. From the booth demonstrations to the keynote speeches and session presentations, this reality was inescapable at Black Hat.


Especially with IoT devices, the explosion of endpoints creates more vulnerabilities and exacerbates potential security risks. As consumers and enterprises witness the rapid proliferation of IoT – from smart watches and home security systems to medical devices and industrial equipment – security has widely been an afterthought. The majority of stock IoT devices are not designed or constructed with cybersecurity in mind, which is why we see PoCs compromise smart city systems and telecom gateways.




There were two sessions that I particularly enjoyed and that stood out to me, both centered around machine learning. The first session showed how supervised machine learning can lead a security solution down the wrong path when data scientists training the models are not paired with subject matter experts. The other session showed how deep neural networks can be used to mount a targeted attack where both the intended victim’s identity and the malicious payload are hidden within the ML models and are impossible to extricate by reverse engineering.


When touring the exhibit floor, a striking observation was that the majority of exhibiting vendors act as though they have solved the security problem for everyone. As an industry, it’s important we remember that our customers are people responsible for defending their networks against all kinds of attacks and in turn we have to help them achieve that goal, not swamp them with flashy screens that bypass the difficult use cases. Security vendors are for-profit companies and there is nothing wrong with that, but we need to make sure we are wowing customers with real capabilities that solve some of the tough problems they’re facing while at the same time clearly showing them where the gaps exist. No single vendor can solve the security challenge on their own – this is an already established reality. So, we need to help customers identify the gaps that they need to close with a handful of solutions. It comes down to the fact that we are all fighting the same fight, vendors and customers alike.


New security challenges are emerging from the explosive growth of connected devices, multicloud adoption and 5G, and they do not have clear solutions. Although the conference shed light on the challenges the security industry needs to address, unfortunately, solutions were not abundant because these challenges are complex and lack a silver bullet. There is no quick fix for securing the vast attack surface created by our networks. But, as a defender, I’m optimistic at the industry’s commitment to meet these challenges head-on. One initiative worth mentioning is the creation of the Cyber Threat Alliance in which major security vendors are sharing threat intelligence data on a daily basis to raise everyone’s customers’ security posture to a better level. Juniper Networks is a member of this alliance and strongly believes that defending all customers’ networks, not just those with Juniper security products, is a cause worth pursuing.


<iframe allow="autoplay" frameborder="no" height="166" scrolling="no" src=";color=%23ff5500&amp;auto_play=false&amp;hide_related=false&amp;show_comments=true&amp;show_user=true&amp;show_reposts=false&amp;show_teaser=true" width="100%"></iframe>



by mhahad at September 06, 2018 11:12 PM

Dyn Research (Was Renesys Blog)

Last Month In Internet Intelligence: August 2018

During August 2018, the Oracle Internet Intelligence Map surfaced Internet disruptions around the world due to familiar causes including nationwide exams, elections, maintenance, and power outages. A more targeted disruption due to a DDoS attack was also evident, as were a number of issues that may have been related to submarine cable connectivity. In addition, in a bit of good news, the Internet Intelligence Map also provided evidence of two nationwide trials of mobile Internet services in Cuba.


On August 15, the Oracle Internet Intelligence Twitter account highlighted that a surge in DNS queries observed the prior day was related to a nationwide test of mobile Internet service, marking the first time that Internet services were available nationwide in Cuba’s history. The figure below shows two marked peaks in DNS query rates from resolvers located in Cuba during the second half of the day (GMT) on the 14th. Paul Calvano, a Web performance architect at Akamai, also observed a roughly 25% increase in their HTTP traffic to Cuba during the trial period.

This testing was reported by ETECSA (the Cuban state telecommunications company) in a Facebook post in which they noted:

The Telecommunications company of Cuba S.A. Etecsa, reports that as part of the preparatory actions for the start of the internet service via mobile, it has carried out some tests to verify the functioning of all elements involved in it.

A Timely test with customers of the prepaid cell service has been developed today, which have been able to make internet connections at no cost. This and other tests make it possible to assess available capacities and different experiences of use achieved in correspondence with the characteristics of the access network present at the time and place of connection in order to make some adjustments. Further tests shall be carried out in successive days.

The start date of the service, as well as its rates and other details of interest, shall be informed through the official media and official channels of the company.

A little more than a week later, another surge in DNS queries from Cuba was observed, resulting from ETECSA performing a second test of national mobile Internet service. As seen in the figure below, it appears that service was enabled around 12:00 GMT on August 22, but was shut off around 05:00 GMT the following day. A notice posted by ETECSA on their Facebook page on the 22nd informed customers of prepaid cellular services that they had the opportunity to acquire a one-time 70 MB data “package” free of charge that could be used at any time during the test period. A subsequent Facebook post noted that customers that exceeded their 70 MB allocation would no longer be able to browse the Internet.

DDoS Attack

On August 27, the Bank of Spain Web site was targeted with a DDoS attack that published reports indicate temporarily disrupted access to the site. A DNS lookup on the site’s hostname,, shows that it resolves to a single IP address ( This IP address is part of a block of addresses routed by AS20905, registered to Banco de España. The figure below shows a significant increase in latency in traceroutes to endpoints within that autonomous system during the reported time of the attack. Such increased latency is a concomitant effect of a DDoS attack that floods a target with traffic.

Presumably in response to being targeted by a DDoS attack, the Bank of Spain activated a DDoS mitigation service the following day, as the figure below shows that on August 28, the majority of traceroutes to endpoints in AS20905 started going through Akamai’s Prolexic service (in green on the graph).


Following a number of similar disruptions at the end of July, the Internet in Syria was down from 01:00-05:30 GMT (4:00 am-8:30 am local) on August 1 & 2 as part of an effort to prevent cheating on high school exams, as seen in the figure below.

Four similar exam-related disruptions occurred the following week as well, as the figure below shows traceroute completion ratios and the number of routed prefixes dropping to zero on August 5, 6, 7, and 8. (As noted in last month’s post, we believe that the concurrent spikes in DNS queries are due to the Internet shutdowns being implemented asymmetrically – that is, traffic from within Syria can reach the global Internet, but traffic from outside the country can’t get in. These spikes in DNS traffic are likely related to local DNS resolvers retrying when they don’t receive the response from Oracle Dyn authoritative nameservers.)


Ahead of presidential run-off elections in the country on August 11, authorities in Mali reportedly disrupted Internet access in the country. While not as obvious as disruptions observed in other countries, the figure below shows a decrease in the number of routed networks within the country for several hours in the middle of the day (GMT) on August 10. A published report notes that the Internet shutdown was confirmed by on-the-ground reports from Internet users in Bamako and Gao. In addition, a study by advocacy group NetBlocks found that access to social media and messaging platforms was also disrupted during this period.

Another Internet disruption occured in Mali after the election took place, but before the results were publicly announced. On August 16, advocacy group Internet Without Borders Tweeted:

<script async="async" charset="utf-8" src=""></script>

As the figure below shows, the both the Traceroute Completion Ratio and BGP Routes metrics experienced noticeable multi-hour decreases starting later on August 15 (GMT). While not as obvious, the DNS Query Rate metric also appears to be at a slightly lower level than at similar times during the previous days. Published reports (Bourse Direct, La Nouvelle Tribune) indicated that this second disruption may have primarily targeted mobile networks in the capital city of Bamako, with local reports of connectivity working over fixed-line/Wi-Fi, but problems connecting over 3G.


On August 20, Sure Telecom of Diego Garcia in the British Indian Ocean Territory posted a  notice on their homepage alerting users of a multi-hour “maintenance outage” that would impact availability of Internet services offered by the company.

The impact of this maintenance outage can be seen in the figure below, with the Traceroute Completion Ratio and DNS Query Rates dropping to zero during the specified maintenance window. (British Indian Ocean Territory is GMT+6.) The BGP Routes metric was also lower during the maintenance period, but didn’t drop to zero. Sure Telecom appears to be the sole commercial Internet Service Provider in the British Indian Ocean Territory, so it is not surprising that this maintenance had such a significant impact on Internet availability in the region.

Power Outages

On August 17, a massive power outage affected the Sindh and Balochistan provinces in Pakistan for several hours. The blackout also had an impact on Internet availability within the country. As the figure below shows, the Traceroute Completion Ratio metric declined sharply as the power outage occurred, remained low for a few hours, and then gradually recovered. This indicates that traceroute endpoints in impacted locations were likely unreachable due to the power outage. However, because it occurred later on Friday evening local time, Internet usage was likely ramping down anyway, so there was no clear impact on the DNS Query Rate metric; the BGP Routes metric was unaffected because the routers announcing routes to IP address prefixes in the affected regions are either located in data centers with backup power, or are located in Pakistani cities not impacted by the power outage.

Closing out the month, on August 31, an explosion at an electrical substation in Maracaibo, Venezuela, plunged much of the city into darkness, and had a visible impact on the country’s Internet connectivity as well. The figure below shows a decline in the Traceroute Completion Ratio metric at approximately 05:30 GMT, coincident with the explosion, which published reports state occurred at 1:36 AM local time. A minor increase in the unstable network count can be seen at approximately the same time as well.

Submarine Cables

Internet disruptions due to issues with submarine cables are not uncommon, but are often hard to confirm as cable operators are often reticent to publicize faults in the cables that they are responsible for. However, sometimes the issues cause are significant enough to be covered in the news, and other times impacted service providers will expose such issues as the root cause of problems that their customers are experiencing.

The latter scenario occurred on August 28/29 in the Maldives, as illustrated by Tweets from two local providers:

<script async="async" charset="utf-8" src=""></script>

<script async="async" charset="utf-8" src=""></script>

As the figure below illustrates, at approximately 01:00 GMT on August 29, both the Traceroute Completion Ratio and BGP Routes metrics for the Maldives declined from their ‘steady-state’ values. As evidenced by the Tweets from local Internet Service Providers shown above, these declines are likely due to the referenced submarine cable issue.

Two cables carry international traffic for the Maldives: the WARF Submarine Cable, which runs to Sri Lanka and India; and the Dhiraagu-SLT Submarine Cable Network, which runs to Sri Lanka. Based on analysis of routing path data collected by Oracle’s Internet Intelligence team, we can posit that issues with the WARF cable likely caused the observed/reported Internet disruption, as upstream providers for both Ooredoo Maldives and Raajjé Online include companies listed as owners of the cable.

Several additional unattributed Internet disruptions were observed during August in areas that have a significant reliance on submarine cables for international Internet connectivity.

On August 14, the Internet Intelligence Map showed an Internet disruption in Vanuatu, a nation made up of approximately 80 islands in the South Pacific. As seen in the figure below, the values of all three metrics declined for a multi-hour period.

Vanuatu is connected to Fiji via Interchange Cable Network 1 (ICN1). The figure below shows that nearly all traceroutes to Telecom Vanuatu reach the network through Vodafone Fiji Limited, and that the number of completed traceroutes to Telecom Vanuatu dropped to near zero during the same period shown in the figure above. Nearly all traceroutes to Vodafone Fiji Limited reach the network through Telstra Global, an Australian provider, and the figure below also shows that the number of completed traceroutes to Vodafone Fiji Limited dropped to near zero concurrently. Fiji connects to Australia via the Southern Cross Cable Network (SCCN). As such, the observed disruption may be related to issues with one of these two cables, possibly cable maintenance, as the disruption occurred in the middle of the night local time. Interestingly, this disruption occurred while the 2018 Asia Pacific region Internet Governance Forum (APrIGF 2018) was taking place in Port Vila, Vanuatu.

On August 4, concurrent Internet disruptions were observed in Angola, Cameroon, and Gabon, as seen in the figures below. Although brief, all three metrics were affected across the three impacted countries. While no specific publicly available information about the cause of the disruption could be found, research shows that all three countries are connected to both the SAT-3/WASC and Africa Coast to Europe (ACE) submarine cable systems. Damage to, or maintenance on, a segment of one of these cables could potentially have caused the observed disruption. This blog has previously covered the impact of damage to the ACE cable on Internet connectivity in African countries.

In the Caribbean, a brief Internet disruption was observed on August 5 in Grenada and St. Vincent & the Grenadines. Both countries connect to the Eastern Caribbean Fiber System (ECFS) and Southern Caribbean Fiber cables. On August 21, a brief Internet disruption was observed in Saint Barthelemy, which is also connected to the Southern Caribbean Fiber cable (on a spur from Anguilla). These disruptions are evident in the figures below. As island nations, all three countries are heavily reliant on submarine cables for international Internet connectivity, meaning that the observed disruptions could have been caused by damage to, or maintenance of, these cables.


Although we have historically focused on the value of the Internet Intelligence Map in identifying disruptions to Internet service, it was encouraging to also see it offer evidence in August of the nationwide mobile Internet service tests conducted in Cuba. Limited Internet access has been available on the island through paid public hotspots, but if ETECSA makes mobile access widely available (and affordable), then disruptions to Internet connectivity in Cuba (ultimately visible in the Internet Intelligence Map) will have a much more significant impact.

by David Belson at September 06, 2018 02:38 PM Blog (Ivan Pepelnjak)

Valley-Free Routing

Reading academic articles about Internet-wide routing challenges you might stumble upon valley-free routing – a pretty important concept with applications in WAN and data center routing design.

If you’re interested in the academic discussions, you’ll find a pretty exhaustive list of papers on this topic in the Informative References section of RFC 7908; here’s the over-simplified version.

Read more ...

by Ivan Pepelnjak ( at September 06, 2018 07:56 AM

September 05, 2018 Blog (Ivan Pepelnjak)

Network Infrastructure as Code Is Nothing New

Following “if you can’t explain it, you don’t understand it” mantra I decided to use blog posts to organize my ideas while preparing my Networking Infrastructure as Code presentation for the Autumn 2018 Building Network Automation Solutions online course. Constructive feedback is highly appreciated.

Let’s start with a simple terminology question: what exactly is Infrastructure as Code that everyone is raving about? Here’s what Wikipedia has to say on the topic:

Read more ...

by Ivan Pepelnjak ( at September 05, 2018 06:12 AM

XKCD Comics

September 04, 2018

My Etherealmind

September 03, 2018 Blog (Ivan Pepelnjak)

Do We Need Leaf-and-Spine Fabrics?

Evil CCIE left a lengthy comment on one of my blog posts including this interesting observation:

It's always interesting to hear all kind of reasons from people to deploy CLOS fabrics in DC in Enterprise segment typically that I deal with while they mostly don't have clue about why they should be doing it in first place. […] Usually a good justification is DC to support high amount of East-West Traffic....but really? […] Ask them if they even have any benchmarks or tools to measure that in first place :)

What he wrote proves that most networking practitioners never move beyond regurgitating vendor marketing (because that’s so much easier than making the first step toward becoming an engineer by figuring out how technology really works).

Read more ...

by Ivan Pepelnjak ( at September 03, 2018 06:38 AM

XKCD Comics

August 31, 2018

The Networking Nerd

The Long And Winding Network Road

How do you see your network? Odds are good it looks like a big collection of devices and protocols that you use to connect everything. It doesn’t matter what those devices are. They’re just another source of packets that you have to deal with. Sometimes those devices are more needy than others. Maybe it’s a phone server that needs QoS. Or a storage device that needs a dedicated transport to guarantee that nothing is lost.

But what does the network look like to those developers?

Work Is A Highway

When is the last time you thought about how the network looks to people? Here’s a thought exercise for you:

Think about a highway. Think about all the engineering that goes into building a highway. How many companies are involved in building it. How many resources are required. Now, think of that every time you want to go to the store.

It’s a bit overwhelming. There are dozens, if not hundreds, of companies that are dedicated to building highways and other surface streets. Perhaps they are architects or construction crews or even just maintenance workers. But all of them have a function. All for the sake of letting us drive on roads to get places. To us, the road isn’t the primary thing. It’s just a way to go somewhere that we want to be. In fact, the only time we really notice the road is when it is in disrepair or under construction. We only see the road when it impacts our ability to do the things it enables.

Now, think about the network. Networking professionals spend their entire careers building bigger, faster networks. We take weeks to decide how best to handle routing decisions or agonize over which routing protocols are necessary to make things work the way we want. We make it our mission to build something that will stand the test of time and be the Eighth Wonder of the World. At least until it’s time to refresh it again for slightly faster hardware.

The difference between these two examples is the way that the creators approach their creation. Highway workers may be proud of their creation but they don’t spend hours each day extolling the virtues of the asphalt they used or the creative way they routed a particular curve. They don’t demand respect from drivers every time someone jumps on the highway to go to the department store.

Networking people have a visibility problem. They’re too close to their creation to have the vision to understand that it’s just another road to developers. Developers spend all their time worrying about memory allocation and UI design. They don’t care if the network actually works at 10GbE or 100 GbE. They want a service that transports packets back and forth to their destination.

The Old New Network

We’ve had discussion in the last few years about everything under the sun that is designed to make networking easier. VXLAN, NFV, Service Mesh, Analytics, ZTP, and on and on. But these things don’t make networking easier for users. They make networking easier for networking professionals. All of these constructs are really just designed to help us do our jobs a little faster and make things work just a little bit better than they did before.

Imagine all the work that goes into electrical power generation. Imagine the level of automation and monitoring necessary to make sure that power gets from the generation point to your house. It’s impressive. And yet, you don’t know anything about it. It’s all hidden away somewhere unimpressive. You don’t need to describe the operation of a transformer to be able to plug in a toaster. And no matter how much that technology changes it doesn’t impact your life until the power goes out.

Networking needs to be a utility. It needs to move away from the old methods of worrying about how we create VLANs and routing protocols and instead needs to focus on disappearing just like the power grid. We should be proud of what we build. But we shouldn’t make our pride the focus of hubris about what we do. Networking professionals are like highway workers or electrical company employees. We work hard behind the scenes to provide transport for services. The cloud has changed the way we look at the destination for those services. And it’s high time we examine our role in things as well.

Tom’s Take

Telco workers. COBOL programmers. Data entry specialists. All of these people used to be the kings and queens of their field. They were the people with the respect of hundreds. They were the gatekeepers because their technology and job roles were critical. Until they weren’t any more. Networking is getting there quickly. We’ve been so focused on making our job easy to do that we’ve missed the point. We need to be invisible. Just like a well built road or a functioning electrical grid. We are not the goal of infrastructure. We’re just a part of it. And the sooner we realize that and get out of our own way, we’re going to find that the world is a much better place for everyone involved in IT, from developers to users.

by networkingnerd at August 31, 2018 08:14 PM Blog (Ivan Pepelnjak)

Is BGP Good Enough with Dinesh Dutt on Software Gone Wild

In recent Software Gone Wild episodes we explored emerging routing protocols trying to address the specific needs of highly-meshed data center fabrics – RIFT and OpenFabric. In Episode 92 with Dinesh Dutt we decided to revisit the basics trying to answer a seemingly simple question: do we really need new routing protocols?

Read more ...

by Ivan Pepelnjak ( at August 31, 2018 06:48 AM

XKCD Comics

August 30, 2018 Blog (Ivan Pepelnjak)

Traditional Leaf-and-Spine Fabric Versus Cisco ACI

One of my subscribers wondered whether it would make sense to build a traditional leaf-and-spine fabric or go for Cisco ACI. He started his email with:

One option is a "standalone" Spine/Leaf VXLAN-with EVPN deployment based on Nexus equipment. This approach could probably be accompanied by some kind of automation like Ansible to ease operation/maintenance of the network.

This is what I would do these days if the customer feels comfortable investing at least the minimum amount of work into an automation solution. Having simpler technology + well-understood automation solution is (in my biased opinion) better than having a complex black box.

Read more ...

by Ivan Pepelnjak ( at August 30, 2018 05:27 AM

August 29, 2018 Blog (Ivan Pepelnjak)

Upcoming Webinars and Events: Autumn 2018

The summer break is over, and we’ve already scheduled a half-dozen events and webinars in August and September:

We’ll run an event or webinar in almost every single week in September:

Read more ...

by Ivan Pepelnjak ( at August 29, 2018 10:46 AM

XKCD Comics

August 28, 2018 Blog (Ivan Pepelnjak)

Interview: Benefits of Network Automation (Part 2)

As promised, here’s the second part of my Benefits of Network Automation interview with Christoph Jaggi published in German on Inside-IT last Friday (part 1 is here).

What are some of the challenges?

The biggest challenge everyone faces when starting the network automation is the snowflake nature of most enterprise networks and the million one-off exceptions we had to make in the past to cope with badly-designed applications or unrealistic user requirements. Remember: you cannot automate what you cannot describe in enough details.

Read more ...

by Ivan Pepelnjak ( at August 28, 2018 09:13 AM

August 27, 2018

Dyn Research (Was Renesys Blog)

Does Establishing More IXPs Keep Data Local? Brazil and Mexico Might Offer Answers

Much like air travel, the internet has certain hubs that play important relay functions in the delivery of information. Just as Heathrow Airport serves as a hub for passengers traveling to or from Europe, AMS-IX (Amsterdam Internet Exchange) is a key hub for information getting in or out of Europe. Instead of airline companies gathering in one place to drop off or pick up passengers, it’s internet service providers coming together to swap data – lots and lots of data.

Where the world’s largest internet exchange points (IXPs) reside are mostly where you would expect to find them: advanced economies with sophisticated internet infrastructure. As internet access reached new populations around the world, however, growth in IXPs lagged and traffic tended to make some roundabout, and seemingly irrational, trips to the more established IXPs.

For example, users connected to a server just a few miles away may be surprised to discover that data will cross an entire ocean, turn 180 degrees, and cross that ocean again to arrive at its destination. This occurrence, known as the “boomeranging” or “hair-pinning” (or “trombone effect” due to the path’s shape), is especially true for emerging markets, where local ISPs are less interconnected and hand off data to bigger carriers located in more established markets such as the United States or Europe, forcing all traffic to cover more ground and be billed transit rates.

To prevent this from occurring, and after realizing that building submarine cables wasn’t enough, there was and continues to be a strong push to build out internet infrastructure by developing more IXPs with the idea that offering a place where local network operators can connect would reduce latencies (delays) and costs. International organizations like the Packet Clearing House set out to provide equipment, training, and operational support to governments and network providers to establish hundreds of IXPs around the world. The Internet Society offers an excellent policy brief describing how cost savings from IXPs can be upwards of 20 percent depending on what portion of a country’s overall internet traffic is local.

The impact in some countries has been significant. In a 2012 study evaluating the impact of IXPs in sub-Sahara Africa, the introduction of the Internet Exchange Point of Nigeria (IXPN) allowed local operators to save over $1 million USD per year on international connectivity and encouraged Google to place a cache in Lagos (a service that significantly speeds up access to popular web content). The Kenya Internet Exchange Point (KIXP) allowed ISPs to save almost $1.5 million USD per year and increased mobile data revenues by an estimated $6 million USD. Both IXPs now act as regional hubs.

The call to expand IXP development grew louder as privacy concerns and the desire to keep data more sovereign increased, backed up by research from the Organization for Economic Cooperation and Development (OECD) asserting that countries with fewer IXPs typically experience more cross-border data flows. As a result, IXP growth accelerated even more, particularly in emerging economies. In 2011, OECD researchers counted 357 IXPs around the globe. By 2015, the number had grown to 452. Latin America went from 20 to 34 IXPs. In the past year alone, the percentage change in IXP growth for all regions was in the double digits.

Source: Packet Clearing House, Internet exchange point directory reports. Retrieved on 17 August 2018 from

Recent research, however, illustrates the challenges that hold IXPs back. In its 2012 review of Telecommunication Policy and Regulation in Mexico, the OECD noted that Mexico was the only member state that did not have an IXP and recommended they establish one for many of the reasons stated above: it improves efficiency, lowers cost, and keeps data local. They said it would also incentivize the creation of local content and data center infrastructure.

In 2017, the OECD followed up with another report, this time lamenting the fact that despite the establishment of an IXP in Mexico City, traffic exchange was low. They noted that the incumbent telecommunications operator in the country wasn’t participating in the IXP, leading them to conclude that IXPs are inhibited in markets where a single player has an overwhelming share and does not participate in the exchange. Subsequent studies have concluded the same.

Our own data confirms these insights. A week’s worth of traceroutes showed numerous examples of packets originating in Guadalajara destined for Mexico City hair-pinning up to the United States. One example actually went to straight to Mexico City and still hair-pinned outside of the country’s borders before coming right back.

Even in Brazil, which implemented a unique government-commissioned, multi-stakeholder steering committee that created IXPs throughout the country, local traffic still sometimes hairpins to the US, Italy, or Spain. Again, research shows that the dynamic political economy of the telecommunications industry may be at play. However, instead of an absent incumbent operator, IXPs are sometimes bypassed due to distrust among the country’s many operators who are concerned that interconnection would benefit competitors. Moreover, trans-regional routing remains very common. In at least one scenario, traffic going from Sao Paolo, Brazil to Uruguay (about 1,700 kilometers away from each other), will transit through two US cities, Spain, and Argentina before arriving in Montevideo. Using our own data, we were able to observe packets routing up to Miami on multiple traceroutes from Sao Paolo to Rio de Janeiro 

While IXPs have certainly helped several regions reduce the costs and latencies of internet traffic, they should not be thought of as a panacea to transnational routing. Their ability to do that can be inhibited by a host of other factors, which the Internet Society summarizes in four categories; a lack of trust between service providers, limited technical expertise, the cost of network infrastructure, and the cost of hosting an IXP in a neutral location.

Until these are resolved, a lot data makes intercontinental roundtrips. And for now, the United States appears to still be Latin America’s biggest hub.

by Conor Sanchez at August 27, 2018 12:15 PM

XKCD Comics

August 24, 2018

The Networking Nerd

Fixing My Twitter

It’s no surprise that Twitter’s developers are messing around with the platform. Again. This time, it’s the implementation of changes announced back in May. Twitter is finally cutting off access to their API that third party clients have been using for the past few years. They’re forcing these clients to use their new API structure for things like notifications and removing support for streaming. This new API structure also has a hefty price tag. For 250 users it’s almost $3,000/month.

You can imagine the feedback that Twitter has gotten. Users of popular programs like Tweetbot and Twitterific were forced to degrade client functionality thanks to the implementation of these changes. Twitter power users have been voicing their opinions with the hashtag #BreakingMyTwitter. I’m among the people that are frustrated that Twitter is chasing the dollar instead of the users.

Breaking The Bank

Twitter is beholden to a harsh mistress. Wall Street doesn’t care about user interface or API accessibility. They care about money. They care are results and profit. And if you aren’t turning a profit you’re a loser that people will abandon. So Twitter has to make money somehow. And how is Twitter supposed to make money in today’s climate?


Users are the coin of Twitter’s realm. The more users they have active the more eyeballs they can get on their ads and sponsored tweets and hashtags. Twitter wants to court celebrities with huge followings that want to put sponsored tweets in their faces. Sadly for Twitter, those celebrities are moving to platforms like Instagram as Twitter becomes overrun with bots and loses the ability to have discourse about topics.

Twitter needs real users looking at ads and sponsored things to get customers to pay for them. They need to get people to the Twitter website where these things can be displayed. And that means choking off third party clients. But it’s not just a war on Tweetbot and Twitterific. They’ve already killed off their Mac client. They have left Tweetdeck in a state that’s barely usable, positioning it for power users. Yet, power users prefer other clients.

How can Twitter live in a world where no one wants to use their tools but can’t use the tools they like because access to the vital APIs that run them are choked off behind a paywall that no one wants to pay for? How can us poor users continue to use a service that sucks when used through the preferred web portal?

You probably heard my rant on the Gestalt IT Rundown this week. If not, here you go:

<iframe allowfullscreen="true" class="youtube-player" height="329" src=";rel=1&amp;fs=1&amp;autohide=2&amp;showsearch=0&amp;showinfo=1&amp;iv_load_policy=1&amp;start=795&amp;wmode=transparent" style="border:0;" type="text/html" width="584"></iframe>

I was a little animated because I’m tired of getting screwed by developers that don’t use Twitter the way that I use it. I came up with a huge list of things I didn’t like. But I wanted to take a moment to talk about some things that I think Twitter should do to get their power users back on their side.

  1. Restore API Access to Third Party Clients – This is a no-brainer for Twitter. If you don’t want to maintain the old code, then give API access to these third party developers at the rates they used to have it. Don’t force the developers working hard to make your service usable to foot the bills that you think you should be getting. If you want people to continue to develop good features that you’ll “borrow” later, you need to give them access to your client.
  2. Enforce Ads on Third Party Clients – I hate this idea, but if it’s what it takes to restore functionality, so be it. Give API access to Tweetbot and Twitterific, but in order to qualify for a reduced rate they have to start displaying ads and promoted tweets from Twitter. It’s going to clog our timeline but it would also finance a usable client. Sometimes we have to put up with the noise to keep the signal.
  3. Let Users Customize Their Experience – If you’re going to drive me to the website, let me choose how I view my tweets. I don’t want to see what my followers liked on someone else’s timeline. I don’t want to see interesting tweets from people I don’t follow. I want to get a simple timeline with conversations that don’t expand until I click on them. I want to be able to scroll the way I want, not the way you want me to use your platform. Customizability is why power users use tools like third party clients. If you want to win those users back, you need to investigate letting power users use the web platform in the same way.
  4. Buy A Third Party Client and Don’t Kill Them Off – This one’s kind of hard for Twitter. Tweetie. The original Tweetdeck. There’s a graveyard of clients that Twitter has bought and caused to fail through inattention and inability to capitalize on their usability. I’m sure Loren Britcher is happy to know that his most popular app is now sitting on a scrap heap somewhere. Twitter needs to pick up a third party developer, let them develop their client in peace without interference internally at Twitter, and then not get fired for producing.
  5. Listen To Your Users, Not Your Investors – Let’s be honest. If you don’t have users on Twitter, you don’t have investors. Rather than chasing the dollars every quarter and trying to keep Wall Street happy, you should instead listen to the people that use your platform and implement the changes their asking for. Some are simple, like group DMs in third party clients. Or polls that are visible. Some are harder, like robust reporting mechanisms or the ability to remove accounts that are causing issues. But if Twitter keeps ignoring their user base in favor of their flighty investors they’re going to be sitting on a pile of nothing very soon.

Tom’s Take

I use Twitter all the time. It’s my job. It’s my hobby. It’s a place where I can talk to smart people and learn things. But it’s not easy to do that when the company that builds the platform tries as hard as possible to make it difficult for me to use it the way that I want. Instead of trying to shut down things I actively use and am happy with, perhaps Twitter can do some soul searching and find a way to appeal to the people that use the platform all the time. That’s the only way to fix this mess before you’re in the same boat as Orkut and MySpace.

by networkingnerd at August 24, 2018 04:03 AM

XKCD Comics

August 23, 2018

Junos Kafka & InfluxDB Exporters

This post acts as the introduction for two other posts, which cover Junos data collection tools for Kafka and InfluxDB. The code is open-sourced and licensed under MIT. Both applications are ready for release and I’ve spent considerable spare time building and testing both pieces of software.

To Go or not to Go

Back in yesteryear, I used to be a C developer and enthusiast. Thanks to the infamous K&R C book, it made perfect sense when I needed a language that provided syntax what I thought of at the time as one level higher than assembly.
Roll the clock forwards a decade and Go has become my ‘go to’ (multi-pun intended) language. It’s powerful from its simplicity, easy to debug and super easy to observe when things aren’t going as you planned. The concurrency capabilities seem to make perfect sense and development cycles are short thanks to the powerful “batteries included” tool-chain. Building binaries couldn’t be easier and building containers for the likes of Docker is a piece of cake. Thanks also to the language’s popularity, tools like Travis-CI are easy to work with. Powerful enough to do almost anything, easy enough to learn in days and offers grace when handling concurrent logic flows.

For example, Junos runs FreeBSD and building binaries to run on Junos is as easy as one can imagine. Whether you develop on OSX, Linux or Windows, the process is identical:

GOOS=freebsd go build
. Aint that simple?

Just one note here, on veriexec enabled Junos it isn’t enough to just build the binaries; it needs packaging too!


Working for vendors (at the time of writing, Juniper networks) I get a fair amount of insight to the real problems network operators (SP/enterprise/telcos/data centres) face daily simultaneously whilst trying to evolve their strategy. Thanks to the current automation craze, any set of data applied to the network mutating its state, comes from a decision to make the change (the when), the desired graph end state (the what) and input mechanisms that trigger mutation (the how).
This post acts as an umbrella for two other posts based on the “when” and how to correlate what when is.
The problem with “when” is that of multi-dimensions. Time moves on whilst you’re in the past and looking forwards. A different way of looking at this would be, correlation tasks over time, become a squeeze-box sliding window of observation. Real-time distributed systems apply back pressure on to correlation systems as state-machines hold in various positions waiting for signals that trigger transitional shifts to the end-state. The only reasonable means to go forwards is make the data pipelines as fluid as possible. Kafka and InfluxDB are perfect weapons for this challenge and treat correlation as one or more components that do not lock up these data systems.

From a “signal-to-event” correlation process, I’ve began to think correlation is best served from a micro-services architecture, with each correlation being it’s own application. Currently the trend is to use a platform that holds a lot of state whilst correlating. That means if it crashes, all of that is lost too. I’ll write something about that soon! For now and the purpose of this post, the focus is squarely on extracting the data from Junos.

Exploring the Process

Kafka and InfluxDB have their own requirements for the insertion of data.

With respect to data, Kafka requires that the producer writes data to a topic. InfluxDB requires the producer knows the database name, tag and field sets.

I’ve seen frustration around InfluxDB when there isn’t enough tag or field information to discern data sources, leading to data overwrites when the time stamp is the same. You do not have this issue with Kafka due to Kafka working with offsets within a topic within a partition within a cluster (unless you forcibly overwrite the offset!).

When it came to the design of the collector, I followed the IP Engineer learning charter and discovered most of the work was done for me. Here’s the UML diagram of what I needed to do.

// Here is the YUML that created the diagram above
// Here is the YUML that created the diagram above
// {type: activity}
(start) -> (Start Kafka & ZooKeeper) -> (Kafka & ZooKeeper)
(start) -> (Start InfluxDB) -> (InfluxDB)
(start) -> (Start Chronograf)
(note: For admin of InfluxDB and checking) -> (Start Chronograf)
(start) -> (Collect specific Junos metrics over NETCONF) -> (Transform data to Go variables) -> (Go variables)
(Go variables) -> (Extract name for Kafka topic) -> (Insert data to topic) -> (Kafka & ZooKeeper)
(Go variables) -> (Extract data for InfluxDB tag and field sets) -> (Insert data with timestamp) -> (InfluxDB)

After a short hunt around using my Google powers, I found Daniel Czerwonk’s Junos exporter for Prometheus. This appeared to do all of the hard work of data acquisition and all I needed to do was replace the Prometheus code with Kafka and InfluxDB client function calls and do some handling for clean exit.

What I did next was:

  1. Used my fork of go-netconf to acquire the data from Junos.
  2. Added the Influx & Kafka clients to Daniel’s code and removed traces of Prometheus.
  3. Wrapped the code with channels for starting and stopping routines and sharing data between Go routines
  4. Added command line arguments to each application and the ability to use SSH keys as well as plain text user/password credentials

Both applications were relatively easy to put together. I had a concern of memory leaks for a short while, but after using ‘gops’ for a few days and monitoring usage of memory with InfluxDB and Chronograf, I put those worries out the back of my brain. Any application that launches multiple Go routines and is truly asynchronous, will behave in a way that feels organic. As tasks are dealt with by routines and as context switching happens, it can feel very fluid with waves of growth and shrinkage over time.

For instructions on using both the Junos Kafka and InfluxDB applications, use the links below. Each application is command line argument driven and is easy to build and install. Each application also comes with Dockerfiles for ease of use.

Exporter idea

Modern operational processes demand that metrics like device, interface, process and service be obtained from the network nodes for a variety of uses like machine-learning based prediction and event-driven automation. Junos offers a solid NETCONF interface that can be used to obtain operational and configuration information as well as newer gRPC based telemetry subsystems. On Junos systems that do not offer a telemetry subsystem, data can be reliably obtained from NETCONF, transformed and published in a required format to time series databases and streaming platforms. Junos as of yet cannot do this today without some off-box help. This is where applications like the below come in to play.

Applications listed in the following sections can be run in Docker containers or as foreground applications, connect to the NETCONF service on a Junos instance via username/password credential pairs or SSH keys and connect to the target data producer APIs which at the time of writing are InfluxDB and Kafka.

One might be wondering about the use of Go here and the answer is simple and comes in the form of bullet points:

  • Go is simple
  • Go does not have version 2.x or 3.x differences that will break things
  • Go compiles in to a target OS specific build, exceptionally easily
  • Go empowers the user to create custom collectors and compile them in to the binary
  • Go offers powerful concurrency right out of the box without thinking about dodgy parallelism libraries

Collectors are very easy to create and included in the build. Take a look inside the

directories of each project to figure out how to create and customize your own. A number are included covering the major topics.


InfluxDB & Junos link

Note: Requires that InfluxDB has been built with uint64 support


Kafka & Junos link


I’ve spent considerable time working on both applications and will support them best effort. Pull requests are welcomed for bug fixes, feature additions and general help!


Please note, each of these applications have been published by me directly and are by no means supported by Juniper Networks. They will be supported best effort through GitHub. You use them squarely at your own risk.

The post Junos Kafka & InfluxDB Exporters appeared first on

by David Gee at August 23, 2018 07:36 PM

August 22, 2018

XKCD Comics

August 21, 2018

Potaroo blog

The Law of Snooping

There is a saying, attributed to Abraham Maslow, that when all you have is a hammer then everything looks like a nail. A variation is that when all you have is a hammer, then all you can do it hit things! For a legislative body, when all you can do is enact legislation, then that’s all you do! Even when it’s pretty clear that the underlying issues do not appear to be all that amenable to legislative measures, some legislatures will boldly step forward into the uncertain morass and legislate where wiser heads may have taken a more cautious and considered stance.

August 21, 2018 04:15 AM

August 20, 2018

Potaroo blog


In this article I'd like to look at the roles of Security Extensions for the DNS (DNSSEC) and DNS over Transport Layer Security (DoT) and question DoT could conceivably replace DNSSEC in the DNS.

August 20, 2018 04:15 AM

XKCD Comics

August 17, 2018

The Networking Nerd

The Cargo Cult of Google Tools

You should definitely watch this amazing video from Ben Sigelman of LightStep that was recorded at Cloud Field Day 4. The good stuff comes right up front.

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="329" mozallowfullscreen="mozallowfullscreen" src="" title="LightStep Rethinking Observability with Ben Sigelman" webkitallowfullscreen="webkitallowfullscreen" width="584"></iframe>

In less than five minutes, he takes apart crazy notions that we have in the world today. I like the observation that you can’t build a system more than three or four orders of magnitude. Yes, you really shouldn’t be using Hadoop for simple things. And Machine Learning is not a magic wand that fixes every problem.

However, my favorite thing was the quick mention of how emulating Google for the sake of using their tools for every solution is folly. Ben should know, because he is an ex-Googler. I think I can sum up this entire discussion in less than a minute of his talk here:

Google’s solutions were built for scale that basically doesn’t exist outside of a maybe a handful of companies with a trillion dollar valuation. It’s foolish to assume that their solutions are better. They’re just more scalable. But they are actually very feature-poor. There’s a tradeoff there. We should not be imitating what Google did without thinking about why they did it. Sometimes the “whys” will apply to us, sometimes they won’t.

Gee, where have I heard something like this before? Oh yeah. How about this post. Or maybe this one on OCP. If I had a microphone I would have handed it to Ben so he could drop it.

Building a Laser Moustrap

We’ve reached the point in networking and other IT disciplines where we have built cargo cults around Facebook and Google. We practically worship every tool they release into the wild and try to emulate that style in our own networks. And it’s not just the tools we use, either. We also keep trying to emulate the service provider style of Facebook and Google where they treated their primary users and consumers of services like your ISP treats you. That architectural style is being lauded by so many analysts and forward-thinking firms that you’re probably sick of hearing about it.

Guess what? You are not Google. Or Facebook. Or LinkedIn. You are not solving massive problems at the scale that they are solving them. Your 50-person office does not need Cassandra or Hadoop or TensorFlow. Why?

  • Google Has Massive Scale – Ben mentioned it in the video above. The published scale of Google is massive, and even it’s on the low side of the number. The real numbers could even be an order of magnitude higher than what we realize. When you have to start quoting throughput numbers in “Library of Congress” numbers to make sense to normal people, you’re in a class by yourself.
  • Google Builds Solutions For Their Problems – It’s all well and good that Google has built a ton of tools to solve their issues. It’s even nice of them to have shared those tools with the community through open source. But realistically speaking, when are you really going to use Cassandra to solve all but the most complicated and complex database issues? It’s like a guy that goes out to buy a pneumatic impact wrench to fix the training wheels on his daughter’s bike. Sure, it will get the job done. But it’s going to be way overpowered and cause more problems than it solves.
  • Google’s Tools Don’t Solve Your Problems – This is the crux of Ben’s argument above. Google’s tools aren’t designed to solve a small flow issue in an SME network. They’re designed to keep the lights on in an organization that maps the world and provides video content to billions of people. Google tools are purpose built. And they aren’t flexible outside that purpose. They are built to be scalable, not flexible.

Down To Earth

Since Google’s scale numbers are hard to comprehend, let’s look at a better example from days gone by. I’m talking about the Cisco Aironet-to-LWAPP Upgrade Tool:

I used this a lot back in the day to upgrade autonomous APs to LWAPP controller-based APs. It was a very simple tool. It did exactly what it said in the title. And it didn’t do much more than that. You fed it an image and pointed it at an AP and it did the rest. There was some magic on the backend of removing and installing certificates and other necessary things to pave the way for the upgrade, but it was essentially a batch TFTP server.

It was simple. It didn’t check that you had the right image for the AP. It didn’t throw out good error codes when you blew something up. It only ran on a maximum of 5 APs at a time. And you had to close the tool every three or four uses because it had a memory leak! But, it was a still a better choice than trying to upgrade those APs by hand through the CLI.

This tool is over ten years old at this point and is still available for download on Cisco’s site. Why? Because you may still need it. It doesn’t scale to 1,000 APs. It doesn’t give you any other functionality other than upgrading 5 Aironet APs at a time to LWAPP (or CAPWAP) images. That’s it. That’s the purpose of the tool. And it’s still useful.

Tools like this aren’t built to be the ultimate solution to every problem. They don’t try to pack in every possible feature to be a “single pane of glass” problem solver. Instead, they focus on one problem and solve it better than anything else. Now, imagine that tool running at a scale your mind can’t comprehend. And you’ll know now why Google builds their tools the way they do.

Tom’s Take

I have a constant discussion on Twitter about the phrase “begs the question”. Begging the question is a logical fallacy. Almost every time the speaker really means “raises the question”. Likewise, every time you think you need to use a Google tool to solve a problem, you’re almost always wrong. You’re not operating at the scale necessary to need that solution. Instead, the majority of people looking to implement Google solutions in their networks are like people that put chrome everything on a car. They’re looking to show off instead of get things done. It’s time to retire the Google Cargo Cult and instead ask ourselves what problems we’re really trying to solve, as Ben Sigelman mentions above. I think we’ll end up much happier in the long run and find our work lives much less complicated.

by networkingnerd at August 17, 2018 01:29 PM

XKCD Comics