November 12, 2018 Blog (Ivan Pepelnjak)

How Network Automation Increases Security

This blog post was initially sent to subscribers of my SDN and Network Automation mailing list. Subscribe here.

After publishing the Manual Work Is a Bug blog post, I got this feedback from Michele Chubirka explaining why automating changes in your network also increases network security:

Read more ...

by Ivan Pepelnjak ( at November 12, 2018 06:27 AM

XKCD Comics

November 09, 2018

Dyn Research (Was Renesys Blog)

Last Month in Internet Intelligence: October 2018

The level of significant Internet disruptions observed through the Oracle Internet Intelligence Map was lower in October, though the underlying reasons for these disruptions remained generally consistent compared to prior months. For enterprises, the importance of redundant Internet connectivity and regularly exercised failover plans is clear. Unfortunately, for state-imposed Internet outages, such planning and best practices may need to include failsafes for operations while periodically offline.

Directed disconnection

On October 10, Ethiopian Prime Minister Abiy Ahmed met with several hundred soldiers who had marched on his office to demand increased pay. The Ethiopian Broadcasting Corporation (formerly known as ETV) did not cover the soldiers marching but noted that Internet connectivity within the country had been shut off for several hours to prevent “fake news” from circulating on social media. This aligned with residents’ reports of a three-hour Internet outage. The figure below shows that the disruption began around 12:00 GMT, significantly impacting both traceroutes to, and DNS query traffic from, Ethiopia for several hours.

The impact of the Internet shutdown is also clearly evident in the figure below, which shows traceroutes into Ethio Telecom, the state-owned telecommunications service provider. Similar to the country-level graph shown above, the number of completed traceroutes into Ethio Telecom dropped significantly for several hours. However, a complete Internet outage was not observed in either case.


On October 14, the Ministry of Communications in Iraq announced the latest round of Internet shutdowns within the country in conjunction with nationwide exams. According to the translation of an article posted on the Iraqi Media Network, “The ministry’s spokesman Hazem Mohammad Ali said in a statement that the Internet service will be interrupted for two hours a day from 11 am to 1 pm for ten days from Sunday, 2018/10/14 until Wednesday, 2018/10/24, This piece came at the request of the Ministry of Education and will stop the Internet service for the days of examinations exclusively.”

As shown in the figures below, multiple Internet shutdowns were observed during the specified period, but they did not appear to take place on October 19, 23, or 24 as expected. As has been seen during previous Internet disruptions in Iraq, the government’s actions cause a decline in the number of completed traceroutes to targets in the country, a reduction in the number of routed networks from the country, and lower levels of DNS traffic seen from resolvers on Iraqi networks.

As noted in the past, during these nationwide disruptions, telecoms with independent Internet connections through the north of Iraq often stay online, as do those with independent satellite links. However, the figures below illustrate the impact of these state-imposed disruptions on two major Iraqi network providers, ITC and Earthlink. These graphs show that the observed disruptions within these networks appear to be near-complete, as well as the lack of anticipated outages on the 19th, 23rd, and 24th.

Power outage

<script async="async" charset="utf-8" src=""></script>

<script async="async" charset="utf-8" src=""></script>

On October 15, the Tweets above (among others) provided insight into the scope of a widespread power outage in Venezuela. This October blackout follows similar events in July and August. As seen in the figure below, the traceroute completion ratio metric saw a sharp but partial drop during the evening of October 15, aligned with the time the power outage reportedly began. (Venezuela is 4 hours behind GMT.) The metric recovered gradually over the following 24 hours, though it took a few days for it to return to normal. The blackout did not have a meaningful impact on the BGP routes metric, which is not surprising, because the relevant routers are generally located in data centers with backup/auxiliary power, such as generators. Interestingly, the power outage appeared to have something of a delayed impact on the DNS query rate metric; while the request traffic followed a pattern roughly similar to that seen on preceding/following days, the volume of requests was slightly lower.

The impact of the power outage was also visible in the Traffic Shifts  graphs of a number of Venezuelan network providers, as shown in the figure below. It is particularly evident in the graphs for Net Uno and Inter, seen in the top row, with noticeable declines in the number of completed traceroutes to targets in those networks. The impact was less pronounced in other networks such as Digitel and CANTV, with the number of completed traceroutes seeing a more nominal decline.

Severe weather

After getting battered by Typhoon Mangkhut in September, the Northern Mariana Islands were devastated on October 24-25 by Super Typhoon Yutu, which hit as a Category 5 typhoon.  The figure below shows an Internet disruption starting mid-morning (GMT) on October 24. (The Northern Mariana Islands are 10 hours ahead of GMT.) As the graphs show, both the traceroute completion ratio and DNS query rate metrics dropped concurrent with the arrival of the storm, but the routing infrastructure remained stable. The other figure below illustrates the impact of Yutu on IT&E Overseas, a Guam-based telecommunications firm that also provides Internet connectivity in the Northern Mariana Islands.  As seen in the figure, the number of completed traceroutes reaching endpoints in IT&E declined as the storm hit, with upstream connectivity through Hutchinson, Level 3, Tata, and Cogent evidently impacted.

Network malfunction

On October 22, East Timor (also known as Timor-Leste) suffered a multi-hour Internet disruption, reportedly due to a failure at an upstream provider of the country’s largest telecommunications operator. Coverage of the issue noted that Timor Telecom’s link to upstream provider Telkomsel failed at around 17:45 local time (08:45 GMT), and that a failover to satellite provider O3b did not occur as expected. Services were reportedly restored just before 23:00 local time (14:00 GMT). The figure below shows how the link failure impacted connectivity at a country level in East Timor. The traceroute completion ratio metric was lower for the 5-plus hour duration of the disruption, and the number of routed networks dipped lower for the period as well. The impact is harder to see in the in the DNS query rate graph, likely due to the skew caused by the spikes on October 21 and 23, but the graph does appear to flatten during the period of the disruption.

The traffic shifts graphs below illustrate how the Telkomsel link failure impacted network providers in East Timor. Published reports quoted a representative of Timor Telecom, and the first figure corroborates their report of the problems with Telkomsel and failed shift of upstream traffic to O3b. (Telkomsel is a subsidiary of Telekomunikasi Indonesia International, which is shown in the graphs below.) The graph shows that that the majority of the traceroutes to targets in Timor Telecom go through satellite Internet provider O3b, with a fraction going through Telekomunikasi Indonesia International (T.L.), which is presumably a network identifier that the Indonesian provider uses for Internet services in East Timor. However, when the link to Telekomunikasi Indonesia International failed, it appears that the link to O3b did as well, dropping the number of completed traceroutes to near zero, and spiking the latency for those that did complete.

The second figure shows that Telekomunikasi Indonesia International (T.L.)  gets nearly all of its upstream connectivity through PT Telekomunikasi Indonesia, and the link failure is clearly evident in that graph. Finally, the third figure illustrates the impact of the link failure on Viettel Timor Leste, which also uses Telekomunikasi Indonesia International (T.L.) as an upstream provider. The graph shows that when the problems with Telekomunikasi Indonesia International (T.L.) occurred, traceroutes to targets in Viettel found alternate paths, with increasing numbers going through Asia Satellite Internet eXchange (ASIX) and PT. Sarana Mukti Adijaya.


In addition to the Internet disruptions reviewed above, notable irregularities were observed in Mayotte, Mali, Botswana, and St. Vincent and the Grenadines during October. However, the root causes of these disruptions remain unknown. Observed network-level disruptions aligned with the country-level ones, but no public information was found that explained exactly why these Internet outages occurred. And in addition to these, the Oracle Internet Intelligence Map surfaced hundreds of brief and/or minor Internet disruptions around the world over the course of the month.

Regardless of the underlying causes, the importance of redundant Internet connections and the need to regularly test failover/backup infrastructure cannot be understated, as we saw in East Timor. While this may be challenging, and even expensive, in remote locations dependent on submarine cables or satellite connectivity for Internet access, the growing importance of the Internet for communication, commerce, and even government services means that wide-scale Internet disruptions, even brief ones, can no longer be tolerated.

by David Belson at November 09, 2018 03:51 PM Blog (Ivan Pepelnjak)

No Scripting Required to Start Your Automation Journey

The “everyone should be a programmer” crowd did a really good job of scaring network engineers (congratulations, just what we need!). Here’s a typical question I’m getting:

Do I need to be good in scripting to attend your automation course.

TL&DR: Absolutely not.

Read more ...

by Ivan Pepelnjak ( at November 09, 2018 07:47 AM

XKCD Comics

November 07, 2018

My Etherealmind Blog (Ivan Pepelnjak)

Using MPLS+EVPN in Data Center Fabrics

Here’s a question I got from someone attending the Building Next-Generation Data Center online course:

Cisco NCS5000 is positioned as a building block for a data center MPLS fabric – a leaf-and-spine fabric with MPLS and EVPN control plane. This raised a question regarding MPLS vs VXLAN: why would one choose to build an MPLS-based fabric instead of a VXLAN-based one assuming hardware costs are similar?

There’s a fundamental difference between MPLS- and VXLAN-based transport: the amount of coupling between edge and core devices.

Read more ...

by Ivan Pepelnjak ( at November 07, 2018 06:19 AM

XKCD Comics

November 06, 2018

The Networking Nerd

Presenting To The D-Suite

Do you present to an audience? Odds are good that most of us have had to do it more than once in our life or career. Some of us do it rather often. And there’s no shortage of advice out there about how to present to an audience. A lot of it is aimed at people that are trying to speak to a general audience. Still more of it is designed as a primer on how to speak to executives, often from a sales pitch perspective. But, how do you present to the people that get stuff done? Instead of honing your skills for the C-Suite, let’s look at what it takes to present to the D-Suite.

1. No Problemo

If you’ve listened to a presentation aimed at execs any time recently, such as on Shark Tank or Dragon’s Den, you know all about The Problem. It’s a required part of every introduction. You need to present a huge problem that needs to be solved. You need to discuss why this problem is so important. Once you’ve got every head nodding, that’s when you jump in with your solution. You highlight why you are the only person that can do this. It’s a home run, right?

Except when it isn’t. Executives love to hear about problems. Because, often, that’s what they see. They don’t hear about technical details. They just see challenges. Or, if they don’t, then they are totally unaware of this particular issue. And problems tend to have lots of nuts and bolts. And the more you’re forced to summarize them the less impact they have:

Now, what happens when you try this approach with the Do-ers? Do they nod their heads? Or do they look bored because it’s a problem they’ve already seen a hundred times? Odds are good if you’re telling me that WANs are complicated or software is hard to write or the cloud is expensive I’m already going to know this. Instead of spending a quarter of your presentation painting the Perfect Problem Picture, just acknowledge there is a problem and get to your solution.

Hi, we’re Widgets Incorporated. We make widgets that fold spacetime. Why? Are you familiar with the massive distance between places? Well, our widget makes travel instantaneous.

With this approach, you tell me what you do and make sure that I know about the problem already. If I don’t, I can stop you and tell you I’m not familiar with it. Cue the exposition. Otherwise, you can get to the real benefits.

2. Why Should I Care?

Execs love to hear about Return on Investment (ROI). When will I make my investment back? How much time will this save me? Why will this pay off down the road? C-Suite presentations are heavy on the monetary aspects of things because that’s how execs think. Every problem is a resource problem. It costs something to make a thing happen. And if that resource is something other than money, it can quickly be quoted in those terms for reference (see also: man hours).

But what about the D-Suite? They don’t care about costs. Managers worry about blowing budgets. People that do the work care about time. They care about complexity. I once told a manager that the motivation to hit my budgeted time for a project was minimal at best. When they finished gasping at my frankness, I hit them with the uppercut: My only motivation for getting a project done quickly was going home. I didn’t care if it took me a day or a week. If I got the installation done and never had to come back I was happy.

Do-ers don’t want to hear about your 12% year-over-year return. They don’t want to hear about recurring investment paying off as people jump on board. Instead, they want to hear about how much time you’re going to save them. How you’re going to end repetitive tasks. Give them control of their lives back. And how you’re going to reduce the complexity of dealing with modern IT. That’s how you get the D-Suite to care.

3. Any Questions? No? Good!

Let me state the obvious here: if no one is asking questions about your topic, you’re not getting through to them. Take any course on active listening and they’ll tell you flat out that you need to rephrase. You need to reference what you’ve heard. Because if you’re just listening passively, you’re not going to get it.

Execs ask pointed questions. If they’re short, they are probably trying to get it. If they’re long winded, they probably stopped caring three slides ago. So most conventional wisdom says you need to leave a little time for questions at the end. And you need to have the answers at your fingertips. You need to anticipate everything that might get asked but not put it into your presentation for fear of boring people to tears.

But what about the Do-ers? You better be ready to get stopped. Practitioners don’t like to wait until the end to summarize. They don’t like to expend effort thinking through things only to find out they were wrong in the first place. They are very active listeners. They’re going to stop you. Reframe conversation. Explore tangent ideas quickly. Pick things apart at a detail level. Because that’s how Do-ers operate. They don’t truly understand something until they take it apart and put it back together again.

But Do-ers hate being lied to more than anything else. Don’t know the answer? Admit it. Can’t think of the right number? Tell them you’ll get it. But don’t make something up on the spot. Odds are good that if a D-Suite person asks you a leading question, they have an idea of the answer. And if your response is outside their parameters they’re going to pin you to the wall about it. That’s not a comfortable place to get grilled for precious minutes.

4. Data, Data, Data

Once you’re finished, how should you proceed? Summarize? Thank you? Go on about your life? If you’re talking to the C-Suite that’s generally the answer. You boil everything down to a memorable set of bullet points and then follow up in a week to make sure they still have it. Execs have data points streamed into the brains on a daily basis. They don’t have time to do much more than remember a few talking points. Why do you think elevator pitches are honed to an art?

Do-ers in the D-Suite are a different animal. They want all the data. They want to see how you came to your conclusions. Send them your deck. Give them reference points. They may even ask who your competitors are. Share that info. Let them figure out how you came to the place where you are.

Remember how I said that Do-ers love to disassemble things? Well, they really understand everything when they’re allow to put them back together again. If they can come to your conclusion independently of you then they know where you’re coming from. Give them that opportunity.

Tom’s Take

I’ve spent a lot of time in my career both presenting and being presented to. And one thing remains the same: You have to know your audience. If I know I’m presenting to executives I file off the rough edges and help them make conclusions. If I know I’m talking to practitioners I know I need to go a little deeper. Leave time for questions. Let them understand the process, not the problem. That’s why I love Tech Field Day. Even before I went to work there I enjoyed being a delegate. Because I got to ask questions and get real answers instead of sales pitches. People there understood my need to examine things from the perspective of a Do-er. And as I’ve grown with Tech Field Day, I’ve tried to help others understand why this approach is so important. Because the C-Suite may make the decisions, but it’s up the D-Suite to make things happen.

by networkingnerd at November 06, 2018 04:27 PM Blog (Ivan Pepelnjak)

Upcoming Webinars and Events: November 2018

The last two months of 2018 will be jam-packed with webinars and on-site events:

December will be a storage, EVPN and SDN month:

Read more ...

by Ivan Pepelnjak ( at November 06, 2018 06:35 AM

November 05, 2018

Dyn Research (Was Renesys Blog)

China Telecom’s Internet Traffic Misdirection

In recent weeks, the Naval War College published a paper that contained a number of claims about purported efforts by the Chinese government to manipulate BGP routing in order to intercept internet traffic.

In this blog post, I don’t intend to address the paper’s claims around the motivations of these actions. However, there is truth to the assertion that China Telecom (whether intentionally or not) has misdirected internet traffic (including out of the United States) in recent years. I know because I expended a great deal of effort to stop it in 2017.

Traffic misdirection by AS4134

On 9 December 2015, SK Broadband (formerly Hanaro) experienced a brief routing leak lasting little more than a minute. During the incident, SK’s ASN, AS9318, announced over 300 Verizon routes that were picked up by OpenDNS’s BGPstream service:

<script async="async" charset="utf-8" src=""></script>

The leak was announced exclusively through China Telecom (AS4134), one of SK Broadband’s transit providers. Shortly afterwards, AS9318 began transiting the same routes from Verizon APAC (AS703) to China Telecom (AS4134), who in turn began announcing them to international carriers such as Telia (AS1299), Tata (AS6453), GTT (AS3257) and Vodafone (AS1273). This resulted in AS paths such as:

… {1299, 6453, 3257, 1273} 4134 9318 703

Networks around the world who accepted these routes inadvertently sent traffic to Verizon APAC (AS703) through China Telecom (AS4134). Below is a traceroute mapping the path of internet traffic from London to address space belonging to the Australian government. Prior to this routing phenomenon, it never traversed China Telecom.

Over the course of several months last year, I alerted Verizon and other Tier 1 carriers of the situation and, ultimately, Telia and GTT (the biggest carriers of these routes) put filters in place to ensure they would no longer accept Verizon routes from China Telecom. That action reduced the footprint of these routes by 90% but couldn’t prevent them from reaching those who were peering directly with China Telecom.

At times in the past year, Verizon APAC sent routes from Verizon North America (AS701) via this AS path creating AS paths such as:

… (peers_of_4134) 4134 9318 703 701

When these routes were in circulation, networks peering with China Telecom (including those in the US) accepted AS701 routes via AS4134, sending US-to-US traffic via mainland China. One of our affected clients was a major US internet infrastructure company. Shortly after alerting them of the situation, they deployed filters on their peering sessions with China Telecom to block Verizon routes from being accepted. Below is a screenshot of one of thousands of traceroutes from within the US to Verizon (in the US) that illustrate the path of traffic outside of the country.

Internet Path Monitoring

The common focus of BGP hijack alerting is looking for unexpected origins or immediate upstreams for routed address space. However, traffic misdirection can occur at other parts of the AS path. In this case, Verizon APAC (AS703) likely established a settlement-free peering relationship with SK Broadband (AS9318), unaware that AS9318 would then send Verizon’s routes exclusively on to China Telecom and who would in turn send them on to the global internet.

We would classify this as a peer leak and the result was China Telecom’s network being inserted into the inbound path of traffic to Verizon. The problematic routing decisions were occurring multiple AS hops from the origin, beyond its immediate upstream.

Conversely, the routes accepted from one’s peers also need monitoring – a fairly rare practice. Blindly accepting routes from a peer enables the peer to (intentionally or not) insert itself into the path of your outbound traffic.


In 2014, I wrote a blog post entitled “Use Protection if Peering Promiscuously” that highlighted the problems with bad route propagation, such as in the incidents described above. It is problems such as this that spurred my friend Alexander Azimov at QRator Labs to lead an on-going effort to create an IETF standard for RPKI-based AS path verification. Such a mechanism, if deployed, would drop BGP announcements with AS paths that violate the valley-free property, for example, based on a known set of AS-AS relationships. Such a mechanism would have at very least contained some of the bad routing described above.

In the meantime, sign on to the Internet Society’s MANRS project to enforce routing security and watch your routes!

by Doug Madory at November 05, 2018 08:53 PM Blog (Ivan Pepelnjak)

Architecture before Products

Yves Haemmerli, Consulting IT Architect at IBM Switzerland, sent me a thoughtful response to my we need product documentation rant. Hope you’ll enjoy it as much as I did.

Yes, whatever the project is, the real added value of an IT/network architect consultant is definitely his/her ability to create models (sometimes meta-models) of what exists, and what the customer is really looking for.

Read more ...

by Ivan Pepelnjak ( at November 05, 2018 06:54 AM

XKCD Comics

November 02, 2018

The Networking Nerd

Clear Skys for IBM and Red Hat

There was a lot of buzz this week when IBM announced they were acquiring Red Hat. A lot has been discussed about this in the past five days, including some coverage that I recorded with the Gestalt IT team on Monday. What I wanted to discuss quickly here is the aspirations that IBM now has for the cloud. Or, more appropriately, what they aren’t going to be doing.

Build You Own Cloud

It’s funny how many cloud providers started springing from the earth as soon as AWS started turning a profit. Microsoft and Google seem to be doing a good job of challenging for the crown. But the next tier down is littered with people trying to make a go of it. VMware with vCloud Air before they sold it. Oracle. Digital Ocean. IBM. And that doesn’t count the number of companies offering a specific function, like storage, and are calling themselves a cloud service provider.

IBM was well positioned to be a contender in the cloud service provider (CSP) market. Except they started the race with a huge disadvantage. IBM was a company that was focused on selling solutions to their customers. Just like Oracle, IBM’s primary customer was external. The people they cared most about wrote them checks and they shipped out eServers and OS/2 Warp boxes.

Compare and contrast that with Amazon. Who is Amazon’s customer? Well, anyone that wants to buy something. But who consumes the products that Amazon builds for IT? Amazon people. Their infrastructure is built to provide a better experience for the people using their site. Amazon is very good at building systems that can handle a high amount of users and load and not buckle.

Now, contrary to the myth, Amazon did not start AWS with spare capacity. While that makes for a good folk tale, Amazon probably didn’t have a lot of spare capacity lying around. Instead, they had a lot of great expertise in building large-scale reliable systems and they parlayed that into a solution that could be used to bring even more people into the Amazon ecosystem. They built AWS with an eye toward selling services, not hardware.

Likewise, Microsoft’s biggest customers are their developers. They are focused on making the best applications and operating systems they can. They don’t sell hardware, aside from the rare occasional foray into phones or laptops. But they wanted their customers to benefit from the years of work they had put in to developing robust systems. That’s where Azure came from.

IBM is focused on customers buying their hardware and their expertise for installing it. AWS and Microsoft want to rent their expertise and software for building platforms. That difference in perspective is why IBM’s cloud aspirations were never going to take off to new heights. They couldn’t challenge for the top three places unless Google suddenly decided to shut down Google Cloud Engine. And no matter how hard they tried, Larry Ellison was always going to be nipping at their heels by pouring money into his cloud offerings to be on top. He may never get there but he is determined to make the best showing he can.

Putting On The Red Hat

Where does that leave IBM after buying Red Hat. Well, Red Hat sells software and services for it. But those services are all focused on integration. Red Hat has never built their own cloud platform. Instead, they work on everyone else’s platform effectively. They can deploy an OS or a container system on Amazon or Azure with no hiccups.

IBM has to realize now that they will never unseat Amazon. The momentum behind this 850-lb gorilla is just too much to challenge. The remaining players are fighting for a small piece of third or fourth place at this point. And yes, while Google has a comfortable hold on third place right now, they do have a tendency to kill projects that aren’t AdWords or the search engine homepage. Anything else lives in a world of uncertainty.

So, how does IBM compete? They need to leverage their expertise. They’ve sold off anything that has blinking lights, save for the mainframe division. They need to embrace their Global Services heritage and shepherd the SMEs that are afraid of the cloud. They need to help enterprises in the mid-range build into AWS and Azure instead of trying to make a huge profit off them and leave them high and dry. The days of making a fortune from Fortune 100 companies with no cloud aspirations are over. Just like the fight for cloud dominance, the battle lines are drawn and the prize isn’t one or two big companies. It’s a bunch of smaller ones.

The irony isn’t lost on me that IBM’s future lies in smaller companies. The days of “No one ever got fired for buying IBM” are long past in the rearview mirror. Instead, companies need the help of smart people to move into the cloud. But they also need to do it natively. They don’t need to keep running their old hybrid infrastructure. They need a trusted advisor that can help them build something that will last. IBM could be that company with the help of Red Hat. They could reinvent themselves all over again and beat the coming collapse of providers of infrastructure. As more companies start to look toward the cloud, IBM can help them along the path. But it’s going to take some realistic looks at what IBM can provide. And the end of IBM’s hope of running their own CSP.

Tom’s Take

I’m an old IBMer. At least, I interned there in 2001. I was around for all the changes that Lou Gerstner was trying to implement. I worked in IBM Global Services where they made the AS/400. As I’m fond of saying over and over again, IBM today is not Tom Watson’s IBM. It’s a different animal that changed with the times at just the right time. IBM is still changing today, but they aren’t as nimble as they were before. Their expertise lies all over the landscape of hot new tech, but people don’t want blockchain-enabled AI for IoT edge computing. They want a trusted partner than can help them with the projects they can’t get done today. That’s how you keep your foot in the door. Red Hat gives IBM that advantage. They key is whether or not IBM can see that the way forward for them isn’t as cloudy as they had first imagined.

by networkingnerd at November 02, 2018 01:59 PM Blog (Ivan Pepelnjak)

Worth Reading: Notes on Distributed Systems

I long while ago I stumbled upon an excellent resource describing why distributed systems are hard (what I happened to be claiming years ago when OpenFlow was at the peak of the hype cycle ;)… lost it and found it again a few weeks ago.

If you want to understand why networking is hard (apart from the obvious MacGyver reasons) read it several times; here are just a few points:

Read more ...

by Ivan Pepelnjak ( at November 02, 2018 07:51 AM

XKCD Comics

November 01, 2018

Brian Raaen

New Python Project for RPZ Blacklists

I have been working with RPZ policy blacklists to do DNS firewalling. A great introduction to DNS firewalling can be found here

An overview of the syntax can be found at this

I had previously done blacklisting at home using pi-hole and wanted to migrate to an RPZ based solution. I wrote a python script which downloads the blacklist files and then creates an RPZ formated file I can use with ISC named.

Here is a link to that project

by braaen at November 01, 2018 09:09 PM

October 31, 2018

Potaroo blog

Analyzing the KSK Roll

It's been more than two weeks since the roll of the Key Signing Key (KSK) of the root zone on October 11 2018, and it's time to look at the data to see what we can learn from the first roll of the root zone's KSK.

October 31, 2018 02:00 AM

XKCD Comics

October 30, 2018


Interview with Juniper Networks Ambassador Pierre-Yves Maunier

In our next Juniper Ambassador interview, I spend time with fellow Juniper Ambassador and French compatriot Pierre-Yves Maunier at the Juniper NXTWORK 2018 conference in Las Vegas. We discuss his life as an Ambassador, his architecture role at Dailymotion, his thoughts on the conference around DevOps and automation, and his family life back home. Pierre’s …

by Stefan Fouant at October 30, 2018 01:10 PM Blog (Ivan Pepelnjak)

It’s All About Business…

A few years ago I got cornered by an enthusiastic academic praising the beauties of his cryptography-based system that would (after replacing the whole Internet) solve all the supposed woes we’re facing with BGP today.

His ideas were technically sound, but probably won’t ever see widespread adoption – it doesn’t matter if you have great ideas if there’s not enough motivation to implementing them (The Myths of Innovation is a mandatory reading if you’re interested in these topics).

Read more ...

by Ivan Pepelnjak ( at October 30, 2018 07:53 AM

October 29, 2018

Network Design and Architecture

Network documentation 101 ! How? When? Why?

Documentation is an extremely important rule when building a network. You will know what has been done in your network. With a good network documentation, the network support and maintenance procedures could handle the incidents in a more professional and organized way.     Without a good network documentation, there is no map, topology or […]

The post Network documentation 101 ! How? When? Why? appeared first on Cisco Network Design and Architecture | CCDE Bootcamp |

by admin at October 29, 2018 10:02 AM Blog (Ivan Pepelnjak)

Observability Is the New Black

In early October I had a chat with Dinesh Dutt discussing the outline of the webinar he’ll do in November. A few days later Fastly published a blog post on almost exactly the same topic. Coincidence? Probably… but it does seem like observability is the next emerging buzzword, and Dinesh will try to put it into perspective answering these questions:

Read more ...

by Ivan Pepelnjak ( at October 29, 2018 07:49 AM

XKCD Comics

October 28, 2018 Blog (Ivan Pepelnjak)

New: Expert Subscription

Earlier this month I got this email from someone who had attended one of my online courses before and wanted to watch another one of them:

Is it possible for you to bundle a 1 year subscription at no extra cost if I purchase the Building Next-Generation Data Center course?

We were planning to do something along these lines for a long time, and his email was just what I needed to start a weekend-long hackathon.

End result: Expert Subscription. It includes:

Read more ...

by Ivan Pepelnjak ( at October 28, 2018 10:04 AM

October 26, 2018

The Networking Nerd

How High Can The CCIE Go?

Congratulations to Michael Wong, CCIE #60064! And yes, you’re reading that right. Cisco has certified 30,000 new CCIEs in the last nine years. The next big milestone for CCIE nerds will be 65,536, otherwise known as CCIE 0x10000. How did we get here? And what does this really mean for everyone in the networking industry?

A Short Disclaimer

Before we get started here, a short disclaimer. I am currently on the Cisco CCIE Advisory Board for 2018 and 2019. My opinions here do not reflect those of Cisco, only me. No insider information has been used in the crafting of this post. Any sources are freely available or represent my own opinions.

Ticket To Ride

Why the push for a certified workforce? It really does make sense when you look at it in perspective. More trained people means more people that know how to implement your system properly. More people implementing your systems means more people that will pick that solution over others when they’re offered. And that means more sales. And hopefully also less support time spent by your organization based on the trained people doing the job right in the first place.

You can’t fault people for wanting to show off their training programs. CWNP just announced at Wi-Fi Trek 2018 that they’ve certified CWNE #300, Robert Boardman (@Robb_404). Does that mean that any future CWNEs won’t know what they’re doing compared to the first one, Devin Akin? Or does it mean that CWNP has hit critical mass with their certification program and their 900-page tome of wireless knowledge? I’d like to believe it’s the latter.

You can’t fault Cisco for their successes in getting people certified. Just like Novell and Microsoft, Cisco wants everyone installing their products to be trained. Which would you rather deal with? A complete novice who has no idea how the command line works? Or someone competent that makes simple mistakes that cause issues down the road? I know I’d rather deal with a semi-professional instead of a complete amateur.

The only way that we can get to a workforce that has pervasive knowledge of a particular type of technology is if the certification program expands. For everyone that claims they want to keep their numbers small you should have a bit of reflective doubt. Either they don’t want to spend the money to expand their program or they don’t have the ability to expand it. Because a rising tide lifts all boats. When everyone knows more about your solutions the entire community and industry benefit from that knowledge.

Tradition Is An Old Word

Another criticism of the CCIE today is that it doesn’t address the changing way we’re doing our jobs. Every month I hear people asking for a CCIE Automation or CCIE SDN or some thing like that. I also remember years ago hearing people clamoring for CCIE OnePK, so just take that with a grain of salt.

Why is the CCIE so slow to change? Think about it from the perspective of the people writing the test. It takes months to get single changes made to questions. it takes many, many months to get new topics added to the test via blueprints. And it could take at least two years (or more) to expand the number of topics tested by introducing a new track. So, why then would Cisco or any other company spend time introducing new and potentially controversial topics into one of their most venerable and traditional tests without vetting things thoroughly before finalizing them.

Cisco took some flak for introducing the CCIE Data Center with the Application Control Engine (ACE) module in version 1. Many critics felt that the solution was outdated and no one used it in real life. Yet it took a revision or two before it was finally removed. Imagine what would happen if something like that were to occur as someone was developing a new test.

Could you imagine the furor if Cisco had decided to build a CCIE OpenFlow exam? What would be tested? Which version would have been used? How will you test integration on non-Cisco devices? Which controller would you use? Why aren’t you testing on this esoteric feature in 1.1 that hasn’t officially been deprecated yet. Why don’t you just forget it because OpenFlow is a failure? I purposely picked a controversial topic to highlight how silly it would have been to build an OpenFlow test but feel free to attach that to the technology de jour, like IoT.

Tom’s Take

The CCIE is a bellwether. It changes when it needs to change. When the CCIE Voice became the CCIE Collaboration, it was an endorsement of the fact that the nature of communications was changing away from a focus on phones and more toward presence and other methods. When the CCIE Data Center was announced, Cisco formalized their plans to stay in the data center instead of selling a few servers and then exiting the market. The CCIE doesn’t change to suit the whims of everyone in the community that wants to wear a badge that’s shiny or has a buzzword on it. Just like the retired CCIE tracks like ISP Dial or Design, you don’t want to wear that yoke around your neck going into the future of technology.

I’m happy that Cisco has a force of CCIEs. I’m deeply honored to know quite a few of them going all the way back to Terry Slattery. I can tell you that every person that has earned their number has done so with the kind of study and intense concentration that is necessary to achieve this feat. Whether they get it through self-study, bootcamp practice, or good old fashioned work experience you can believe that, no matter what their number might be, they’re there because they want to be there.

by networkingnerd at October 26, 2018 04:20 PM Blog (Ivan Pepelnjak)

netdev 0x12 Update on Software Gone Wild

In recent years Linux networking started evolving at an amazing pace. You can hear about all the cool new stuff at netdev conference… or listen to Episode 94 of Software Gone Wild to get a CliffsNotes version.

Roopa Prabhu, Jamal Hadi Salim, and Tom Herbert joined Nick Buraglio and myself and we couldn’t help diverging into the beauties of tc, and the intricacies of low-latency forwarding before coming back on track and started discussing cool stuff like:

Read more ...

by Ivan Pepelnjak ( at October 26, 2018 11:28 AM

Potaroo blog

Has Internet Governance become Irrelevant?

A panel session has been scheduled at the forthcoming Internet Governance Forum (IGF) in Paris in November that speaks to the topic that Internet Governance is on a path to irrelevance. What's this all about?

October 26, 2018 07:50 AM

XKCD Comics

October 25, 2018

Peter's CCIE Musings and Rants

Collaboration Edge Deployment Guide
Default login to tandberg is admin password is TANDBERG

by Peter ( at October 25, 2018 07:25 PM

Virl 2.0: Bigger, Better and 100 percent More Cisco (Wait, just version, but with good enhancements)

VIRL has been around for a little while now, for those like me who bit the bullet and paid the $150 asking price it's a decent product BUT a lot of the functionality you could obtain in GNS with admittedly older hardware platforms.

The fundamental missing component for GNS has always been a lack of Catalyst Switch Support. GNS includes  something called "IOSl2"

For those with VIRL already bought and up and running, you can find in-place upgrade instructions here:

For those of you going to purchase VIRL, you can find the setup instructions and a link to purchase here:

(Note: Installing VIRL is not hugely complicated but it's fairly time consuming, so set aside some time to get it up and running)

by Peter ( at October 25, 2018 07:25 PM

CCIE DC: Narrowing down server pool qualification

Time for some Server Pool Qualification-Inception!

So, you have made a server pool, and you assign a qualification against it etc, etc. All pretty normal right? Yep of course it is.

So next, you want to assign a service profile to a blade, during this process you can select a pool, one of the pools available is called "Default":

What you are doing HERE, is you can specify that not only does it have to be a member of a particular pool (as shown below) but it must also meet ANOTHER qualification, which is your seperate server pool qualification shown below


As the dialogue box even says itself "the selected qualification will be used to narrow down the set of eligble servers, it will NOT overwrite the pool policies associated with the pool

So there you have it, Pool Policy Inception

by Peter ( at October 25, 2018 07:25 PM

How to tell what dial-peers are being matched on an ALREADY ACTIVE call

Hey Guys!

Working out what dial-peers got matched after a call has already begun sometimes seems like a bit of a mystery. For me at least I could never quite work it out. I ran across this command and thought I would share:

maui-gwy-06#show call active voice brief

!--- This information was captured once the call was placed and active.
!--- <some omitted="" output="">
!--- Notice that in this case, default VoIP(keyword IP) dial-peer 0 was
!--- matched inbound.</some>

Total call-legs: 2
87 : 257583579hs.1 +105 pid:0 Answer active
In the example above, Dial-peer:0 was matched for the incoming since it showed Originate,  answer will show for incoming

Here's an example of a complete call:


0 : 1008757 10:18:47.997 CST Fri Aug 14 2015.1 +2630 pid:9999 Originate 15552226550 connected
dur 00:17:21 tx:52067/8327756 rx:52050/8328000
IP 216.200.200.XXX:19296 SRTP: off rtt:0ms pl:0/0ms lost:0/0/0 delay:0/0/0ms g711ulaw TextRelay: off
media inactive detected:n media contrl rcvd:n/a timestamp:n/a
long duration call detected:n long duration call duration:n/a timestamp:n/a

0 : 1008795 10:30:13.867 CST Fri Aug 14 2015.1 +11460 pid:99 Answer 5555921332 active
dur 00:05:46 tx:17311/2769760 rx:17313/2770080
IP SRTP: off rtt:0ms pl:0/0ms lost:0/0/0 delay:0/0/0ms g711ulaw TextRelay: off
media inactive detected:n media contrl rcvd:n/a timestamp:n/a
long duration call detected:n long duration call duration:n/a timestamp:n/a
The important sections are in bold, pid:XXXX (where XXX is the dial-peer) the keyword answer or originate (depending on
if this is the incoming dial-peer or outbound dialpeeer) and then the number that has been rung.

I hope this helps!

More Info Here:


by Peter ( at October 25, 2018 07:25 PM

Networking Now (Juniper Blog)

Safeguarding the Nation’s Critical Infrastructure: Q&A with Mounir Hahad and David Mihelcic

Mounir Hahad is head of Juniper Threat Labs, the organization at Juniper Networks identifying and tracking malicious threats in the wild and ensuring Juniper products implement effective detection techniques. David Mihelcic is federal chief technology and strategy officer for Juniper Networks, supporting the company’s federal sales, engineering and operations teams. Prior to joining Juniper, David spent 18 years with the Defense Information Systems Agency.

by mhahad at October 25, 2018 01:45 PM

My Etherealmind

Keynote address from Tim Cook to European Parliament

Passionate advocation for privacy by Tim Cook as the voice of Apple in this speech to European Parliament.

The post Keynote address from Tim Cook to European Parliament appeared first on EtherealMind.

by Greg Ferro at October 25, 2018 12:24 PM Blog (Ivan Pepelnjak)

What’s the Big Deal with Validation?

This blog post was initially sent to subscribers of my mailing list. Subscribe here.

In his Intent-Based Networking Taxonomy blog post Saša Ratković mentioned real-time change validation as one of the requirements for a true intent-based networking product.

Old-time networkers would instinctively say “sure, we need that” while most everyone else might be totally flabbergasted. After all, when you create a VM, the VM is there (or you’d get an error message), and when you write to a file and sync the file system the data is stored, right?

As is often the case, networking is different.

Read more ...

by Ivan Pepelnjak ( at October 25, 2018 05:51 AM

October 24, 2018 Blog (Ivan Pepelnjak)

VMware NSX: The Good, the Bad and the Ugly

After four live sessions we finished the VMware NSX Technical Deep Dive webinar yesterday. Still have to edit the materials, but right now the whole thing is already over 6 hours long, and there are two more guest speaker sessions to come.

Anyways, in the previous sessions we covered all the good parts of NSX and a few of the bad ones. Everything that was left for yesterday were the ugly parts.

Read more ...

by Ivan Pepelnjak ( at October 24, 2018 05:44 AM

XKCD Comics

October 23, 2018

Networking Now (Juniper Blog)

HoneyProcs: Going Beyond Honeyfiles for Deception on Endpoints

Co-Author: Abhijit Mohanta


Deploying detection solutions on an endpoint host comes with constraints - limited availability of CPU, memory, disk and other resources, stability constraints, policy adherence and restrictions, the need to be non-intrusive to the user, the host OS and other applications on the host.


In response to this, Juniper Threat Labs research presents HoneyProcs, a new deception methodology (patent pending) and an all user space method that extends existing deception honeypot technology on endpoint hosts. HoneyProcs complements existing deception technology by using forged, controlled decoy processes to catch info stealers, Banking Trojans, rootkits and other generic malware, and it does so by exploiting a common trait exhibited by these malwares - code injection.


By limiting its inspection footprint to only these decoy processes, HoneyProcs effectively addresses efficacy and performance concerns that otherwise constrain endpoint deployments.  Throughout this article, we further explain how the reduced and targeted inspection footprint can be leveraged to turn HoneyProcs into an intelligence gathering toolkit that can be used to write automated signatures for other antivirus and detection solutions to remediate infections on the system.


Turning Malware Behavior Against Itself


A common trait shared by most malware is code injection - HoneyProcs exploits this trait and uses it to form the foundation of its detection methodology.


Malware injects code into other processes for the following reasons:

  • Malware can inject its payload into an existing clean system process like svchost or explorer in order to avoid detection by solutions looking for suspicious process names.
  • Malware can inject into explorer and task manager to create user mode rootkits in order to hide their artifacts, like their files and processes.
  • Information stealers and banking malware inject into browsers in order to intercept and steal user credentials when they log into a website of interest.


While there are malware that spawn new processes and inject into them, the above mentioned categories of malware like info stealers, Banking Trojans, rootkits and some other generic malware inject their malicious code into existing, running benign processes without necessarily breaking their functionality.


Info Stealers


Malware that steals credentials and other important data from your computer are called info stealers. Info stealers can steal credentials from your social engineering sites and many of the info stealers, like Zeus, inject their malicious code into browsers. Keylogging can be considered  one of the oldest methods for stealing data, so why is there a need to complicate the process by injecting code into browsers? There can be multiple reasons for this. One can be the introduction of virtual keyboards. By injecting code into browsers, malware can hook APIs that are responsible for sending and receiving HTTP(s) requests and responses. Malware would gain the capability to intercept, steal and manipulate the HTTP(s) requests and responses. This kind of attack is sometimes categorized as “man in the browser attack”.


Banking Trojans


Banking malware is a type of info stealer that has seen a rise of 50 percent in 2018. Banking trojans target stealing banking credentials. It can be done by installing keyloggers on a machine, stealing data from browsers or redirecting the victim to phishing sites. The most common technique used these days is stealing data from the browsers. This is done by injecting a malware module into the browser. The module is mostly used for API hooking, a technique that can manipulate the functionality of a legitimate API. As an example, one common API hooked by banking trojans is HttpSendRequest() from wininet.dll on a Windows machine. The API can be used by an application to send an http request to a server. After hooking the API, the malware can intercept the http requests sent from the browser to the banking site. The http request can contain username, password and other credentials. The hooked function can send the intercepted data to the attacker’s command and control server. This technique is called “form grabbing”.


Another popular technique used by banking malware is “web inject”. In this method, the injected malicious code in the browser modifies the response from a legitimate site. Most of the time, malware injects javascript code into the html page in the browser. And, note that there are no changes in the server code. The victim only sees a forged view of the original page.






Some famous banking trojans like zbot, spyeye, trickbot and kronos use web injects.




Rootkits are used to hide malware artifacts on the system such as files, process, network and registry entries on Windows. Rootkits can be both User mode and Kernel mode. User mode rootkits are usually created by API hooking while kernel mode is done by injecting kernel drivers that can hook kernel APIs/system calls or manipulate kernel data structures related to process, files and network.


A regular Windows user browses the file system using “explorer”. So, in order to hide its files, a malware injects code into explorer.exe process. FindFirstFile(), FindNextFile() are Windows APIs that are used to traverse the files. Malware can hook these APIs in explorer.exe process and manipulate the results returned by the APIs in order to hide their files.


Similarly, in order to hide a particular process in task manager, malware hooks Process32First and Process32Next in the Windows task manager process. So, a regular user who tries to view the list of running processes using task manager cannot locate the malware’s processes.


HoneyProcs - A New Dawn in Deception Technology for Endpoints


HoneyProcs is a new deception methodology that complements and extends existing honeypot technology on endpoint hosts. It works by exploiting an important trait of Banking Trojans and rootkits - code injection - and extends to all kinds of malware that inject into legitimate processes.


HoneyProcs works by using forged controlled decoy processes that mimic other legitimate processes that are usually targeted by aforementioned malware for injecting code. By controlling the state and properties of these decoy processes and using this fixed state as a baseline, and by monitoring for any changes to this state, we are able to effectively track the presence of infections on the system.


Our solution consist of two components: The Decoys and The Scanner.


The Decoys


To start, we have forged multiple programs whose processes are the usual targets of Banking Trojans and Rootkits - Chrome, Firefox, Internet Explorer, explorer and svchost.




Each of the forged programs has been developed to have its processes mimic and look similar to its corresponding original benign counterpart’s processes. Some of the methods used by HoneyProcs to mimic their corresponding benign counterparts include loading the same dependent libraries and using the same file size on disk, similar amount of memory, similar PE properties, a similar directory location on disk, the same working directory and the same number of threads, etc.


The screenshot below shows the loaded libraries for the benign processes on the left hand side and its corresponding HoneyProc decoy processes on the right hand side. As you can see, the loaded libraries are similar.




The forged processes have also been developed to go into a fixed, non-modifying state after starting up, which is either achieved by having the process go into a sleep loop or by carrying out some other NO-OP type of activity that keeps the threads running without leading to a change in the process state or properties. Each of these forged processes also don’t have a UI, so there are no chances for a regular user to interact with them and modify their state.


The forged processes have been created to handle all exceptions, in order to handle the scenario where the process might crash due to a faulty injection by a malware. While a crash of the decoy process can indicate the possibility of some meddling with the process, an exception handler can also aid in helping the scanner (explained in the next section) accurately figure out the presence of an injection and also extract other intelligence on the injection and the infection.


The Scanner


Post deploying the forged decoy processes and once they reach their steady state, the scanner process monitors these decoy processes. The scanner stores a baseline of the process state for each of these decoys. The baseline state stored includes a snapshot of the memory map including the size, properties, permissions of the memory pages, number of threads, etc. The properties can be expanded to include a hash of each page, but in light of keeping the scanning lightweight, it may not be necessary.




Post saving a baseline, the scanner continuously monitors these decoy processes, where periodically it snapshots the decoy processes’ properties and compares it to the baseline state it saved earlier at the start of the decoy process. If a change in state is noticed, it indicates a code injection from a malware infection on the system and generates an alert.





HoneyProcs: Case in Point


The screenshots below show HoneyProcs in action in combination with Feurboos trojan.  


We have set up a decoy process mimicking the Chrome browser process. On the left hand side we show the memory map of the decoy process before injection. On the right hand side, we see the memory map post the malware’s code injection.


The malware starts up, goes through the list of currently running processes on the system until it finds the decoy process for “Chrome” and injects its code into it via a new memory block allocation at 0x1910000 with RWX(Read Write Execute) permission.


The scanner detects the injection and alerts with the MessageBox alert “INJECTION DETECTED”






Code injection remains a vital component for malware. Even more so for Banking Trojans, roolkits and certain categories of malware where code injection is the focal point of their functionality. Effective detection of such malware on endpoints is important while keeping the solution lightweight, efficient, non-intrusive and stable. On a deception front, although we do see solutions, the number of solutions that target endpoint host deployments are few.


HoneyProcs opens the door for a new deception technique that is lightweight, non-intrusive and efficient. It complements existing honeypot technologies and extends the detection net laid out by other solutions. Also being an all user space solution, it addresses stability and complexity issues that otherwise concern kernel based solutions.


Catching malware is a cat and mouse game and we do expect malware to get smart and add armoring against HoneyProcs on the system. Such armoring enhancements from malware have to be dealt with on a case by case basis. Some of our future research will focus on tackling possible armoring directions.

by asaldanha at October 23, 2018 01:45 PM Blog (Ivan Pepelnjak)

Figuring Out AWS Networking

One of my friends reviewing the material of my AWS Networking webinar sent me this remark:

I'm always interested in hearing more about how AWS network works under the hood – it’s difficult to gain that knowledge.

As always, it’s almost impossible to find out the behind-the-scenes details, and whatever Amazon is telling you at their re:Invent conference should be taken with a truckload of salt… but it’s relatively easy to figure out a lot of things just by observing them and performing controlled experiments.

Read more ...

by Ivan Pepelnjak ( at October 23, 2018 05:00 AM

YANG, OpenAPI, Swagger and Code Generation

Sometimes during exploration or projects, I want to take a YANG model and convert it along with related dependencies to a Swagger format (think OpenAPI if you’re not familiar with this) so I can create a REST or RESTConf API interface. OpenDayLight does something very similar for it’s Swagger based North Bound Interface (NBI), more information here and just being able to look at the model this way is sometimes helpful. If you’re wondering how helpful this could be, think about developing a client. Using this approach, it’s possible to create stub client and server code for a software implementation, leaving just the logic of what to do when a POST is made or a GET is requested etc.

You may be familiar enough with YANG to know that YANG is a modeling language with its own extensible type system. These YANG models are mostly used for modeling how a programmatic interface to control a feature should be on routers and switches. More recently thanks to the wave of automation sweeping across the globe, YANG models are now used for modeling services, which in turn are rendered over one or more nodes by something else. We’re not going to cover the “else” here, but just the conversion of YANG to Swagger and Swagger to something useful like Go or Python stub bindings!

I’ve done this before but have never documented it, thinking it wasn’t of any value. That said, some tools do not support of the YANG built-in types like “bits” and other tools have issues with XPATH expressions. Not being a Java programmer (most of the tools are written in Java), I decided not to create a PR and mend them, but to continue onwards and find a tool that just worked. Nothing (too) against Java, but I just don’t code with nor do I want to.

I’ve found that tools listed in this post have about an 80% success rate and are fine for generic models. When you get in to the more weird, or complex models, these tools are pushed beyond the 80% and present a pandemonium of errors.

The Tale of RFC8299

RFC8299 is the “YANG data Model for L3VPN Service Delivery” and is interesting for many reasons. If you’re not familiar with SP networks, that’s fine. This will not dive in to MPLS, BGP-MP, VRFs, RTs, RDs and the like. This particular RFC has several nested YANG dependencies and uses the “bits” YANG native type, which some of the tools do not support. Here is one that is documented.

If you want to re-produce this for fun, follow along! You’ll require a Browser, a working installation of Docker and Python (2.7 is just fine). You’ll also need to follow the breadcrumbs in each of the YANG files. Whilst this post is to help you figure out how to convert YANG to an OpenAPI variant, I’ve left you the challenge of obtaining the YANG dependencies and placing them in to the correct directory for import by the tools. Directly below is an example import statement. In this case, put the .yang file extension on the name and that’s your YANG file name (the import below would be named ietf-netconf-acm.yang and contains a valid yang model).

import ietf-netconf-acm {
  prefix nacm;

1. Create a directory that your YANG and RFC files will live.

$ mkdir yang_rfcs
$ cd yang_rfcs
$ export YANGDIR=`echo $PWD`

2. Download the text version using the button of RFC8299 from this link:

3. Next, we’re going to use the ‘xym’ tool to extract the YANG module from the RFC. Modules are published elsewhere on GitHub but I prefer the source, just in case!

Follow the instructions here:

$ git clone
$ cd xym
$ virtualenv xym
$ source xym/bin/activate
$ pip install requests
$ python install
$ xym ../rfc8299.txt --dstdir $YANGDIR
$ cd ../
$ ls -la
-rw-r--r--   1 davidgee  staff   72828  ietf-l3vpn-svc@2018-01-19.yang
-rw-r--r--@  1 davidgee  staff  344738  rfc8299.txt
drwxr-xr-x  17 davidgee  staff     544  xym
$ deactivate
$ mv ietf-l3vpn-svc@2018-01-19.yang ietf-l3vpn-svc.yang

4. Now we have the YANG module extracted from RFC8299, it’s time to convert it from YANG to JSON. For this trick, we’ll use EAGLE from the OpenNetworkingFoundation. This project uses Pyang a Python YANG tool for YANG model validation, conversion, transformation and code generation.

$ git clone
$ cd EAGLE-Open-Model-Profile-and-Tools/YangJsonTools
$ virtualenv eagle
$ source eagle/bin/activate # At this point, the prompt will change to signify the venv activation
$ pip install pyang
$ export PYBINDPLUGIN=`echo $PWD`

Up until this point, we’ve activated a Python virtual environment, created an environment variable (PYBINDPLUGIN) and the most important line is next. This is the use of the pyang tool to achieve conversion using the plugins located in the current directory.

$ pyang --plugindir $PYBINDPLUGIN -f swagger -p $YANGDIR -o export/rfc8299.json $YANGDIR/ietf-l3vpn-svc.yang --generate-rpc=False
$ deactivate
$ ls -la export
-rw-r--r--  1 davidgee  staff       42
-rw-r--r--  1 davidgee  staff  1298758  rfc8299.json

With regards to the last line, the

switch allows you to tell pyang where the YANG module directory is for dependencies.

5. Now we have our OpenAPI JSON file, we can use the Swagger-UI to visualize it.

$ cd export
$ export EXPORTS=`echo $PWD`
$ docker run --name swagger -d -p 80:8080 -e BASE_URL=/swagger -e SWAGGER_JSON=/swaggerfiles/rfc8299.json -v $EXPORTS:/swaggerfiles swaggerapi/swagger-ui

6. Sit back, relax and enjoy your hand work by taking a browse of the Swagger browser displaying the API generated from the YANG model and associated Swagger conversion!

Open your browser to: http://localhost/swagger/.

7. We’re not done yet. Maybe not with this model (because it’s just ruddy huge), but with a simpler one, you might fancy generating some code bindings. Code bindings are automatically generated blobs of code that allows us to create a full implementation of an API. This might be pulling information out of a database, or doing some writes in the background to a device, orchestrator or graph.

You have two main choices with Swagger Codegen:

  1. Download and use
  2. Do it online with SwaggerHub

Let’s download Swagger-Codgen and create some Golang client bindings! Such fun!

$ cp $PYBINDPLUGIN/export/rfc8299.json ./
# Instructions located here
$ wget -O swagger-codegen-cli.jar
$ mkdir go_output
$ java -jar swagger-codegen-cli.jar help
$ java -jar swagger-codegen-cli.jar generate -i rfc8299.json -l go -o ./go_output
$ ls -la go_output

If you want to create the server bindings, replace ‘go’ with ‘go-server’.

I’ll leave you to see the surprising amount of Go files, but try it with python instead of go if it isn’t your bag.

Here is some more helpful reading with regards to generating code.

$ mkdir python_output
$ java -jar swagger-codegen-cli.jar generate -i rfc8299.json -l python -o ./python_output

When it comes to Python servers, replace ‘python’ with ‘python-flask’ for a popular variant of server.

What Failed

For converting the YANG model to Swagger, yang-swagger spat it’s dummy out over lack of support for the native “bits” type. This is not uncommon and will probably change in the future.

yang-swagger -f yaml -o swagger.yaml ietf-l3vpn-svc.yang
# Output
WARNING: No configurations found in configuration directory:/Users/davidgee/Documents/yang_rfc/config
WARNING: To disable this warning set SUPPRESS_NO_CONFIG_WARNING in the environment.
unable to parse &apos./ietf-netconf-acm.yang&apos YANG module from &apos/Users/davidgee/Documents/yang_rfc/ietf-netconf-acm.yang&apos
{ ExpressionError: [module(ietf-netconf-acm)/typedef(access-operations-type)/type(bits)] unable to resolve typedef for bits
# Output truncated to save eyeball space

For the same conversion step, swagger-generator-cli Failed mainly due to XPATH issues. I don’t understand Java enough here really to comment, other than as a tool it didn’t work for this particular model and dependencies.

java -jar ~/.m2/repository/com/mrv/yangtools/swagger-generator-cli/1.1-SNAPSHOT/swagger-generator-cli-1.1-SNAPSHOT-executable.jar -yang-dir $YANGDIR/yang -output swagger.yaml ietf-l3vpn-svc.yang
# Output
2018-10-22 16:44:39,782 [main] WARN  o.o.y.yang.parser.stmt.rfc6020.Utils - Argument "derived-from-or-self(../rp-discovery-type, &aposl3vpn-svc:bsr-rp&apos)" is not valid XPath string at "null:800:7"
javax.xml.xpath.XPathExpressionException: javax.xml.transform.TransformerException: Could not find function: derived-from-or-self
# Output truncated to save eyeball space. Lots and lots of error output because Java is noisy!


Thanks for reading and taking part by following along if you did. Hopefully this was useful!

I’ve tried out the steps involved in this post and at the time of writing, they were correct and functional.

The post YANG, OpenAPI, Swagger and Code Generation appeared first on

by David Gee at October 23, 2018 12:04 AM

October 22, 2018


Interview with Juniper Networks Ambassador Jeff Fry

In this first ever Juniper Ambassador interview, I spend time with fellow Juniper Ambassador Jeff Fry at the Juniper NXTWORK 2018 conference in Las Vegas. We discuss his life as an Ambassador, his job at Dimension Data, his recent contribution to the 2018 Juniper Ambassador’s Cookbook, his interest in DevOps and automation, and his contribution …

by Stefan Fouant at October 22, 2018 04:46 PM Blog (Ivan Pepelnjak)

Automation Win: Configure Cisco ACI with an Ansible Playbook

This blog post was initially sent to subscribers of my mailing list. Subscribe here.

Following on his previous work with Cisco ACI Dirk Feldhaus decided to create an Ansible playbook that would create and configure a new tenant and provision a vSRX firewall for the tenant when working on the Create Network Services hands-on exercise in the Building Network Automation Solutions online course.

Read more ...

by Ivan Pepelnjak ( at October 22, 2018 05:38 AM

XKCD Comics

October 20, 2018

The Networking Nerd

What Makes a Security Company?

When you think of a “security” company, what comes to mind? Is it a software house making leaps in technology to save us from DDoS attacks or malicious actors? Maybe it’s a company that makes firewalls or intrusion detection systems that stand guard to keep the bad people out of places they aren’t supposed to be. Or maybe it’s something else entirely.

Tradition Since Twenty Minutes Ago

What comes to mind when you think of a traditional security company? What kinds of technology do they make? Maybe it’s a firewall. Maybe it’s an anti-virus program. Or maybe it’s something else that you’ve never thought of.

Is a lock company like Schlage a security company? Perhaps they aren’t a “traditional” IT security company but you can guarantee that you’ve seen their products protecting data centers and IDF closets. What about a Halon system manufacturer? They may not be a first thought for security, but you can believe that a fire in your data center is going cause security issues. Also, I remember that I learned more about Halon and wet/dry pipe fire sprinkler systems from my CISSP study than anywhere else.

The problem with classifying security companies as “traditional” or “non-traditional” is that it doesn’t reflect the ways that security can move and change over the course of time. Even for something as cut-and-dried as anti-virus, tradition doesn’t mean a lot. Symantec is a traditional AV vendor according to most people. But the product that used to be called Norton Antivirus and the product suite that now includes is are worlds apart in functionality. Even though Symantec is “traditional”, what they do isn’t. And when you look at companies that are doing more advanced threat protection mechanisms like deception-based security or using AI and ML to detect patterns, the lines blur considerably.

But, it doesn’t obviate the fact that Symantec is a security company. Likewise, a company can be a security company even if they security isn’t their main focus. Like the Schlage example above, you can have security aspects to your business model without being totally and completely focused on security. And there’s no bigger example of this than a company like Cisco.

A Bridge Not Far Enough?

Cisco is a networking company right? Or are they a server company now? Maybe they’re a wireless company? Or do they do cloud now? There are many aspects to their business models, but very few people think of them as a security company. Even though they have firewalls, identity management, mobile security, Malware protection, VPN products, Email and Web Security, DNS Protection, and even Threat Detection. Does that mean they aren’t really a security company?

It could be rightfully pointed out that Cisco isn’t a security company because many of these technologies they have were purchased over the years from other companies. But does that mean that their solutions aren’t useful or maintained? As I was a doing research for this point, a friend pointed out the story of Cisco MARS and how it was purchased and ultimately retired by Cisco. However, the Cisco acquisition of Protego that netted them MARS happened in 2004. The EOL announcement was in 2011, and the final end-of-support was in 2016. Twelve years is a pretty decent lifetime for any security product.

The other argument is that Cisco doesn’t have a solid security portfolio because they have trouble integrating their products together. A common criticism of large companies like Cisco or Dell EMC is that it is too difficult to integrate their products together. This is especially true in situations where the technologies were acquired over time, just like Cisco.

However, is the converse true? Are standalone products easier to integrate? Is is more simple to take solutions from six different companies and integrate them together in some fashion? I’d be willing to be that outside of robust API support, most people will find that integrating security products from different vendors is as difficult (if not more so) than integrating products from one vendor. Does Cisco have a perfect integration solution? No, they don’t. But why should they? Why should it be expected that companies that acquire solutions immediate burn cycles to make everything integrate seamlessly. Sure, that’s on the roadmap. But integrations with other products is on everyone’s road map.

The last argument that I heard in my research is that Cisco isn’t a security company because they don’t focus on it. They’re a networking (or wireless or server) company. Yet, when you look at the number of people that Cisco has working in a specific business unit on a product, it can often be higher headcount that some independent firms have working on their solutions. Does that mean that Cisco doesn’t know what they’re doing? Or does it mean that individual organizations can have multiple focuses? That’s a question for the customers to answer.

Tom’s Take

I take issue with a definition of “traditional” versus non-traditional. For the reason that Apple is a traditional computer company and so is Wang Computers. Guess which one is still making computers? And even in the case of Apple, you could argue that their main line-of-business is mobile devices now. But, does anyone dispute Apple’s ability to make a laptop? Would a company that does nothing but make laptops be a “better” computer company? The trap of labels like that is that it ignores a significant amount of investment in business at the expense of a quick and easy label. What makes a company a computer company or a security company isn’t how they label themselves. It’s what they do with the technology they have.

by networkingnerd at October 20, 2018 05:25 PM

October 19, 2018

Automation: Flow Control & Dimensionality

Human beings as we are, struggle sometimes to think multi-dimensionally about tasks. Our brains seem to have a conscious layer and a sub-conscious layer. Whether you think in words, noise or images, your brain is a single threaded engine with a silent co-processor that can either assist or annoy. Experience has shown that we look at network automation challenges through this shaped lens and try and solve things that makes sense to humans, but not necessarily for mechanized processes.

In an attempt not to lose my own thread, I’ll try and explain some different view points through examples.

Example One: I’m English, Make me some Tea!

Making a a cup of tea is a very English thing to do and the process of making one will suffice for this example.

Let’s look at the process involved:

// { type: activity}
(Start)-><a>[kettle empty]->(Fill Kettle)->|b|
<a>-(note: Kettle activities)
<a>[kettle full]->|b|->(Boil Kettle)->|c|
|b|->(Add Tea Bag)-><d>[Sugar: yes]->(Add Sugar)->(Add Milk)
<d>[Sugar: no]->(Add Milk)
<d>-(note: Sweet tooth?)
(Add Milk)->|c|->(Pour Boiled Water)
(Pour Boiled Water)->(Enjoy)->(Stop)


This makes us a relative standard cup of English breakfast tea.

Let’s assume macros exist for milk and sugar quantity and the dealing of a mug or best china has been dealt with.

Let’s analyze from a flow control perspective.

  1. Human wants tea
  2. Invoke start
  3. Check if kettle needs water
  4. If kettle needs water, fill it, else;
  5. Boil kettle
  6. Add tea bag
  7. If I want sugar, add it (decision)
  8. Add milk (Ok, some would argue the milk will be scolded)
  9. Pour boiled water
  10. Drink

There are interesting points in this flow chart that need pointing out. Boiling of the kettle is a long task, so we can do the other parallel tasks whilst the kettle boils. Even so, adding the tea bag, sugar and milk are still sequential tasks. So we have a long lived task and short sequential tasks giving the impression of efficiency. All tasks then merge and wait for the kettle to finish boiling. Note, no question was asked if the water has boiled as the statement implies you pour boiled water! It can’t be poured boiled water if it hasn’t boiled! The “Pour Boiled Water” can be represented in a self-contained flow-chart and for the purposes of this post is also an asynchronous function.

// { type: activity}
(Start)->[Yes]->(Pour Boiling Water)
(note: Has kettle light gone off?)->


Flow control in this flowchart is sequential in a single time dimension. We could optimize this and the argument against it would be the classic “optimizing the point of constraint”. For one cup of tea, the main flow-chart illustrated in this section is good enough.

Example Two: Surprise Family Visit Tea Factory

It’s a Sunday and the family have arrived. Ten of them. Each take their tea differently and with different tea bags. Although the process is ultimately the same, we now have multiple kettles to cope with the water quantity and a host of ingredients. A sequential model just won’t cut it as the family will moan that their beverages are all different temperatures. Some tasks are sequential and single dimensional and other tasks can be run in multi-dimensions, but all with state injected from the originating dimension.

Two major trains of thought exist here:

  1. Orchestrated Workflows
  2. Autonomous Workflows

Orchestrated Workflows are driven from a central “brain” and workflows are typically single threaded with most platforms having the ability to spawn branches if instructed to do so by the workflow creator. Platforms like StackStorm or Salt serve these needs.

In the case of our tea making exercise, a single workflow is triggered, which may spawn jobs both parallel and sequential to boil the kettles, prepare each mug and pour the boiled water. This relies on jobs being spawned, and the orchestrator waiting for each job to exit successfully before carrying on with the next task. The last sentence may require some more explaining but here goes. One cannot pour boiling water without the cups being appropriately laced with tea, sugar and milk (according to the person’s order).

Autonomous Workflows are a little more complicated. These workflows can split from and join each other and can be viewed as sharing data. The whole workflow is applicable to one cup of tea, and as a result, the workflow starts with an identification of all ten cups of tea. Some actions are designated to run once, or in lower numbers than the ten cups of tea. Imagine if programmatically speaking, all ten tea making workflows live in a list type, the first two entries in the list might be responsible for boiling the kettles, although each workflow has the kettle task. The first entry might be responsible for asking and confirming the type of each drink and quantities of milk and sugar. These details are available to all workflows (as the workflows are identical) and the only defining item is a spawn $ID in this case from 0-9. Therefore imagine a “mandate” which looks like below:

    0:  # Drink ID
        kind: "engish breakfast"
        sugar: 1
        milk: "whole"
        water_from_kettle: 0
        for: "Aunt Dorris"
    1:  # Drink ID
        kind: "earl grey"
        sugar: 0
        milk: "lemon juice"
        water_from_kettle: 1
        for: "Uncle Dave"
    3:  # Etc

        -   0
        -   1


Its possible for each instance of the workflow to get relevant information from the data structure about what it is expected to do, like what drink and if it is responsible for boiling a kettle. Some workflows may require automatic spawning of kettles and quantities of hot water, but for this example this mandate will do just fine.

Creation of the workflow has to be crafted very carefully to ensure the correct decisions are being made at the correct time. Platform by platform, execution style can change however. Ansible will run each task sequentially, dealing with specifics of parallelism when required. For example, one task might be to start boiling the kettle by two of the ‘hosts’, or in our case the drink ID, other tasks are performed like loading of the mugs, then a looped check might happen until the kettles have boiled and both pouring tasks can be completed. This is asynchronous activity versus synchronous activity and has to be managed correctly with correct exit, error and re-try logic.

Networks have a habit of screwing up transitioning to states that require human intervention and any good workflow will provide a “hammer time” revert with a careful mutation, careful being “mutate” and “verify”.

Here is what a modified version of the workflow may look like:

// {type: activity}
(Start)->(Get Drink Mandate)
(Get Drink Mandate)->[kettle $ID empty]->(Fill Kettle $ID)->|b|
-(note: Kettle activities)
[kettle $ID full]->|b|->(Boil Kettle $ID)->|c|
|b|->(Add Tea Bag to $ID)-><d>[$ID Sugar: yes]->(Add Sugar to $ID)->(Add Milk to $ID)
<d>[$ID Sugar: no]->(Add Milk to $ID)
(Add Milk to $ID)->|c|->(Pour Boiled Water from kettle $ID)
(Pour Boiled Water from kettle $ID)->(Enjoy)->(Stop)


The drink and kettle IDs are indexed using the instance $ID of the workflow, so for example, instance 0 boils one of the kettles and deals with making the drink for Aunty Dorris, chatterbox extraordinaire. Each workflow can be used autonomously and in parallel, but with instance 0 and 1 boiling the water and workflows 2-9 consuming water from the kettles boiled by 0 and 1. The kettles in this instance are external and asynchronous services.


I’ve flicked between words like dimensional, sequential, parallel, synchronous and asynchronous. Language in automation mainly comes from control-theory and industrial automation.

I propose the following language for orchestration:

Orchestrated Workflows For workflows requiring central co-ordination. A good example would be coordinating humans to fetch ingredients and handle the ingredient stock. Then the tasks would be executed one by one with central decision making on task progression. Large mechanized processes between organizational units requires this kind of approach. Business Process Management tools fall in this category and workflow engines as previously mentioned Salt, StackStorm etc)

Autonomous Workflows These can be viewed as self-contained workflows. A set of ingredients is despatched with an instruction set. A tool interprets the instruction set and gets on with the job using the ingredients. Some of these tasks might be intra-organizational and CI/CD tooling like Jenkins and GitLab can fall in this category

I propose the following language for your workflows:
Single-Dimension For workflows that make all decisions in a single plane of logic like making one cup of tea. This could be a task engine like Ansible.

Constraints here might be boiling many kettles and having the first drink go cold whilst finishing the last.

Multi-Dimensional For workflows requiring more than one set of tasks to run in parallel and in different time domains. Cleaning the dishes whilst making the tea would be advantageous to being a good host. A workflow engine could spawn these workflows from an orchestration perspective. Just to confuse you more, the tea making exercise can be spawned ten times with some tasks running just once or tied to an instance ID.

Managing multi-dimensional state can be difficult, so the danger here is not managing error and exit conditions correctly. Instance awareness happens in this mode of operation. This mechanism provides a great way of tying variables to workflow instances like our tea making mandate example (Fig3). In simple terms, an instance can apply it’s own ID as an index into data it needs.

I propose the following language for the tasks that live in the workflows:

Sequential For tasks that require one to finish before the next can start.

This might be taking a mug out of a cupboard and placing it on to the side before throwing in a tea bag, sugar and milk.

Parallelized Sequential For identical tasks that take the same amount of time, like loading ten tea bags into ten mugs.

Another example might be taking mugs out of the cupboard in parallel, then loading them all with tea bags, sugar and milk. After all, we’re making one batch of tea but with multiple cups.

Mixed Parallelized Sequential For workflows containing tasks requiring multiple branches of differing tasks. This could be preparing dinner whilst making the drinks if we carry on with our use cases. The tasks are now varying lengths, but to gain maximum efficiency, how cool would it be to deliver drinks and announce the time of dinner after one “as short as possible” kitchen visit?

Asynchronous For tasks that take a long time to complete, like boiling a kettle. These tasks will have a mechanism that reports their state, or a mechanism to report task completion or error. These mechanisms are accessible platform wide and queryable from any dimension or task. Another example would be having a chef prepare your dinner, give him or her instruction then occasionally shout through to the kitchen.

Closing Words

I can hear screaming and the reason for that is, choice complicates the approach. What about a “Centrally Orchestrated, Multi-dimensional, Parallelized Sequential” workflow? Now you’re talking coffee shop grade automation with two very simple constraints: till speed and worst drink preparation time. It would be like being in a cafe but being served as if you were the only person in it.

It’s better to be armed with knowledge than fumble your way through. My advice is always to draw this out on a whiteboard or on paper. I draft everything in UML which includes the programmatically created diagrams in this post.

When it comes to design, split workflows and tasks apart and do the simplest thing, always. In our tea example, if Aunty Dorris is chatting everyone’s ear off, then your guests may require your refreshments faster and therefore remove as many bottlenecks as possible. If Uncle Dave, ‘Dare Devil Extreme’ is sharing a story about base jumping, they might not care about time, but eventual delivery within the time constraints of the story will be well received.

As the number of components increases, the higher the number of error and recovery scenarios you have to consider. A centrally orchestrated, multi-dimension with asynchronous and parallelized sequential tasked workflow *breathes in* needs to be designed with reliability in mind. Mechanisms like watchdog timers, auto-remediation for predictable errors and hard mutation reverts based on unrecoverable errors all help. When there is an unrecoverable failure, the creator should have a path to invoke human assistance. One example might be to use a ChatOps module or call a phone with a pre-recorded message! *Imagine Google’s Assistant ringing you? Cool or eery?* Multi-dimension automation without central orchestration still needs recovery plans, but with fewer things to go wrong, it’s more likely a single task will fail rather than the orchestration engine itself.

In short, the more stuff you have, the more fragile it becomes. With great automation powers, comes great potential failure, but also fantastic gains if done correctly! Sorry Uncle Ben, this needed closure.

This post has mainly focussed on an easy to explain “drink making” workflow and network automation really isn’t any different. A workflow is a workflow, domain differences withstanding. Great network automations come from people who truly understand their own domain. Despite being wordy, the language used in this post should help you to define how the workflow behaves and identify the correct way to execute tasks. Out of experience, half the battle is knowing where to start and that is with drafting workflows. The tools and platforms can be identified later and should never lead discussions of process mechanization.

Until the next time, thank you for reading and please leave comments or ask questions.

Helpful Notes

Theory of Constraints:
Control Theory:

The post Automation: Flow Control & Dimensionality appeared first on

by David Gee at October 19, 2018 05:45 PM

Potaroo blog

Diving into the DNS

DNS OARC organizes two meetings a year. They are two-day meetings with a concentrated dose of DNS esoterica. Here’s what I took away from the recent 29th meeting of OARC, held in Amsterdam in mid-October 2018.

October 19, 2018 02:50 PM

Networking Now (Juniper Blog)

Outsmarting Cybercriminals at Work: Your Employees are Your First Line of Defense

No matter where you work – be it a corporate office, a retail store, healthcare institution, place of academia or government agency – every employee has a role to play in ensuring your organization maintains good security hygiene.


by Amy James at October 19, 2018 01:45 PM

XKCD Comics