September 18, 2019 Blog (Ivan Pepelnjak)

Worth Reading: TCP MSS Values in the Wild

In Never-Ending Story of IP Fragmentation I described how you could use TCP Maximum Segment Size to minimize the impact of IP fragmentation and PMTUD blackholes (more details on TCP MSS clamping)… but one has to wonder how people use TCP MSS in the wild and what values you might see.

As is often the case, Geoff Houston found a way to measure them, and published the answer: TCP MSS Values

by Ivan Pepelnjak ( at September 18, 2019 07:32 AM

September 17, 2019

My Etherealmind

Formalisation of Automation: WIP

For the last five or six years, I’ve not really done any networking and have focussed on software, automation and the mechanisation of processes so that they may be manifested as network driving workflows. I try to keep up with networking technology and working for Juniper has really made me level up in this aspect. I’m lucky to be surrounded by an army of real experts and it’s humbling. What’s still a thorn in my side is the beginner expert community around automation, and I’m working to bring awareness to this through providing questions and insight with methodologies to bootstrap the journey. More on that another time. This paragraph is to position some emotions for what’s about to follow!

To get to the crux of this post, now shift your view to your every day life. How many times a day does an app crash on your phone, laptop or tablet? When was the last time a feature wasn’t available on your TV because you didn’t upgrade to the latest version of software? Right at the beginning of my career, I worked in real time electronics. Machinery that should not die randomly, or just become obsolete because of the hardware swap out cycles. I carried this mentality forward into whatever path my career went. Now I’m in software full time, my brain reels at the fact that quality
and fit-for-purpose state in so many cases is oh so low. The ever moving change frontier, architecture without end-state or just the socio need of ‘more features’ can be pointed at as a source of blame, but I believe these things are not the absolute root cause.

Obsessed with complexity and the handling of it in software, I’ve added to my tech library significantly with books from Harlan Mills, Fred Brooks, Edsger Dijkstra and have followed characters like Joe Armstrong until he passed away sadly in April 2019. What we do isn’t new and history has a great many things to teach us. Living in a world where I see network engineers desperately trying
to enhance their skills, I see disparity between the perceived end-goals and actual. Writing a Python “Hello World” is a great start to invoking the interpreter of the language and a print statement, but it teaches you nothing of semantics, data handling or the expression of logic. Ivan Pepelnjak wrote about the expert beginners a while ago and part of my current mission is to provide a framework to describe network automation requirements, a language to have conversations and methods of diagramming and visual expression. Ultimately, this is math based. These tasks should be expressed and solvable in an abstract way, the characteristics identified then lead you to the selection of tools and platforms. Today, I fear the popularity of tools like Ansible have lead us to an Ansible first world, where the ‘problem’ at hand is only truly understood once the tool has been pushed to it’s limits. Only at that point is the ‘problem’ understood and what follows is a set of requests to make Ansible do new things. As is often the way with software, something that is successful is pushed beyond recognition in the name of familiarity. To burn out the villain in this forrest, I desire a ‘first principles’ approach instead of a solution based one. A solution is great once you truly know the problem. Plying off the shelf solutions to barely understood requirements leads personal and professional development of people and ultimately the success or failure for the people employing organisations.

How about this? Give a person a fish, they will eat for a day. Teach them how to fish, they will eat regularly. The state of learning in my opinion is akin to what’s happening to the oceans. Teach a person to fish today and some of the catch will be plastic, inedible sea creatures and the odd fish. The act of teaching someone to fish needs an upgrade. Today if I was automating in operations instead of being a designer and architect, I’m not sure I could handle the anxiety given the blunderbuss approach. Maybe it’s the state of industry infancy in this discipline or the vendors themselves pushing narrowband solutions, a call for first principles in my simple view is required.

Imagine a world where you can write a first principles automata formula on a whiteboard, then lay your automation requirements out in the same language. The design will naturally and methodically come together for your workflows through a formal design framework. Implementation becomes a case of transferring each identified requirement and laying it out over composable architecture like a workflow engine, configuration management tooling, graph databases and data analysis systems. Stay tuned if this is of interest!

The post Formalisation of Automation: WIP appeared first on

by David Gee at September 17, 2019 11:29 AM Blog (Ivan Pepelnjak)

Beware the Marketing Magic of GUI-Based Programming

Someone working for a network automation startup desperately tried to persuade me how cool their product is. Here’s what he sent me:

We let network engineers build their own network automation solutions in no time without requiring coding or scripting knowledge. It’s all GUI based, specifically geared towards network engineers - they can simply model services or roll-out networks “as-designed”.

The only problem: I’ve seen that same argument numerous times…

Read more ...

by Ivan Pepelnjak ( at September 17, 2019 06:33 AM

September 16, 2019

The Data Center Overlords

Wow: NVMe and PCIe Gen 4

Recently it’d come to my attention that my old PC rig wasn’t cutting it.

Considering it was 10 years old, it was doing really well. I mean, I went from HDD to 500 GB SSD to 1 TB SSD, up’d the RAM, and replaced the GPU at least once. But still, it was a 4-core system (8 threads) and it had performed admirably.

The Intel NIC was needed because the built-in ASUS Realtek NIC was a piece of crap, only able to push about 90 MB/s. The Intel NIC was able to push 120 MB/s (close to the theoretical max for 1 Gigabit which is 125 MB/s).

The thing that broke the camel’s back, however, was video. Specifically 4K video. I’ve been doing video edits and so forth in 1080p, but moving to 4K and the power of Premerier Pro (as opposed to iMovie) was just killing my system. 1080p was a challenge, and 4K made it keel over.

I tend to get obsessive about new tech purchases. My first flat screen TV purchase in 2006 was the result of about a month of in-depth research. I pour over specs and reviews for everything from parachutes (btw, did you know I’m a skydiver?) to RAM.

Eventually, here’s the system I settled on:

AMD came out of nowhere and launched Ryzen 3, which put ADM from a budget-has-been to a major contender in the desktop world. Plus, they were the first to come out with PCIe Gen 4.0, which allowed for each lane of PCIe to give you 2 GB/s of bandwidth. m.2 drives can connect to 4 lanes, giving a possible throughput of 8 GB/s of bandwidth.
Compare that with SATA 3, at 600 MB/s, and that’s quite a difference. SATA is fine for spinning rust, but it’s clear NVMe is the only way to unlock SSD storage’s potential.
When I built the system, I initially installed Linux (CentOS 7.6, to be exact) just to run a few benchmarks. I was primarily interested in the NVMe drive and the throughput I could expect. The drive advertises 5 GB/s reads and 4.3 GB/s writes.
Using dd if=/dev/zero of=testfile and using various blocksizes and counts to write a 100 GB file, I was able to get about 2.8 GB/s writes. Not quite what the drive had promised in terms of writes, but much better than the 120. I was able to get about 3.2 GB/s reads.
For various reasons (including that while Linux is a fantastic OS in lots of regards, it still sucks on the desktop, especially for my particular needs) I loaded up Windows 10. CrystalDiskMark is a good free benchmark and I was able to test my new NVMe drive there.
I ran it, thinking I’d get the same results from Linux. Nope!
I got pretty much what the drive promised.
As a comparison, here’s how my old SATA SSD fared:
About 10x performance. Here’s a couple of takeaways:
PCIe 4 does matter for storage throughput. Would I actually notice in my day-to-day operations the difference between PCIe 3 and PCIe 4? Probably not. But I’m working with 4K video and some people are already working with 6K and even 8K video, that’s not too far down the line for me.
SATA is dead for SSD storage. The new drives are more than capable of utterly overwhelming SATA 3 (600 MB/s, LOL).  Right now, SATA is sufficient for HDDs, but as platters get bigger sequential reads will continue to climb.
I don’t doubt that Linux can do the same, it’s just my methodology failed me. The dd command from /dev/zero had never failed to be the best way to test write speeds for HDD and SATA SSDs, but now I need to find another method for Linux (or perhaps there is some type of bottleneck in Linux).
New PCIe 4 NVMe SSDs are super fast and can be had for a relatively low amount of money ($180 USD for 1 TB). They’re insanely fast.
I need a new way to benchmark Linux storage.

by tonybourke at September 16, 2019 06:48 PM

My Etherealmind Blog (Ivan Pepelnjak)

Just Published: High-Level Azure Networking Concepts

Last week we started the Microsoft Azure Networking saga that will eventually mirror the AWS Networking materials.

I recorded the hands-on demos in advance so we had plenty of time to discuss Azure API and CLI, geographies, regions and availability zones, high-availability concepts, and deployments models… and spent the second half of the live session focusing on virtual networks, subnets, interface, and IP addresses. The videos are already online and accessible with Standard Subscription.

Next step (on September 24th): network security and user-defined routes.

by Ivan Pepelnjak ( at September 16, 2019 06:41 AM

XKCD Comics

September 15, 2019 Blog (Ivan Pepelnjak)

If You Travel to Slovenia, You SHOULD NOT Fly with Adria Airways

I apologize to my regular readers for a completely off-topic post, but if I manage to save a single traveller the frustrations I experienced a few weeks ago it was well worth it. Also, please help spread the word…

TL&DR: If you travel to Slovenia, DO NOT even consider flying with Adria Airways (and carefully check the code-share flights, they might be hiding under a Lufthansa or Swiss flight number). Their actual flight schedule is resembling a lottery, and while I always had great experience with the friendly, courteous and highly professional cabin crews, it’s totally impossible to reach their customer service.

Alternate nearby destinations are Vienna, Zagreb, Graz or Trieste, or you could go via Venice and Treviso. There are regular shuttles operating between all those airports and Ljubljana.

Read more ...

by Ivan Pepelnjak ( at September 15, 2019 07:26 AM

September 13, 2019

The Networking Nerd

Keynote Hate – Celebrity Edition

We all know by now that I’m not a huge fan of keynotes. While I’ve pulled back in recent years from the all out snark during industry keynotes, it’s nice to see that friends like Justin Warren (@JPWarren) and Corey Quinn (@QuinnyPig) have stepped up their game. Instead, I try to pull nuggets of importance from a speech designed to rally investors instead of the users. However, there is one thing I really have to stand my ground against.

Celebrity Keynotes.

We’ve seen these a hundred times at dozens of events. After the cheers and adulation of the CEO giving a big speech and again after the technical stuff happens with the CTO or product teams, it’s time to talk about…nothing.

Celebrity keynotes break down into two distinct categories. The first is when your celebrity is actually well-spoken and can write a speech that enthralls the audience. This means they get the stage to talk about whatever they want, like their accomplishments in their career or the charity work their pushing this week. I don’t mind these as much because they feel like a real talk that I might want to attend. Generally the celebrity talking about charity or about other things knows how to keep the conversation moving and peppers the speech with anecdotes or amusing tales that keep the audience riveted. These aren’t my favorite speeches but they are miles ahead of the second kind of celebrity keynote.

The Interview.

These. Are. The. Worst. Nothing like a sports star or an actor getting on stage for 45 minutes of forced banter with an interviewer. Often, the person on stage is a C-level person that has a personal relationship with the celebrity and called in a favor to get them to the event. Maybe it’s a chance to talk about their favorite charity or some of the humanitarian work they’re doing. Or maybe the celebrity donated a bunch of their earnings to the interviewer’s pet project.

No matter the reasons, most of these are always the same. A highlight reel of the celebrity in case someone in the audience forget they played sports ball or invented time travel. Discussion of what they’ve been doing recently or what their life was like after they retired. A quirky game where the celebrity tries to guess what they company does or tries to show they know something about IT. Then the plug for the charity or the fund they’re wanting to really talk about. Then a posed picture on stage with lots of smiles as the rank and file shuffle out of the room to the next session.

Why is this so irritating? Well, for one, no one cares what a quarterback thinks about enterprise storage. Most people rarely care about what the CEO thinks about enterprise storage as long as they aren’t going to shut down the product line or sell it to a competitor. Being an actor in a movie doesn’t qualify you to talk about hacking things on the Internet any more than being in Top Gun qualifies you to fly a fighter jet. Forcing “regular” people to talk about things outside their knowledge base is painful at best.

So why do it then? Well, prestige is one thing. Notice how the C-level execs flock to the stage to get pics with the celebrity after the speech? More posters for the power wall in their office. As if having a posed pic with a celebrity you paid to come to your conference makes you friends. Or perhaps its a chance to get some extra star power later on for a big launch. I can’t tell you the number of times that I’ve made a big IT purchasing decision based on how my favorite actor feels about using merchant silicon over custom ASICs. Wait, I can tell you. Hint, it’s a number that rhymes with “Nero”.

Getting Your Fix

As I’ve said many times, “Complaining without a solution is whining.” So, how do we fix the celebrity keynote conundrum?

Well, my first suggestion would be to get someone that we actually want to listen to. Instead of a quarterback or an actor, how about instead we get someone that can teach us something we need to learn? A self-help expert or a writer that’s done research into some kind of skill that everyone can find important? Motivational speakers are always a good touch too. Anyone that can make us feel better about ourselves or let us walk away from the presentation with a new skill or idea are especially welcome.

Going back to the earlier storyteller keynotes, these are also a big draw for people. Why? Because people want to be entertained. And who better to entertain than someone that does it for their job? Instead of letting some C-level exec spend another keynote dominating half the conversation, why not let your guest do that instead? Of course, it’s not easy to find these celebrities. And more often than not they cost more in terms of speaker fees or donations. And they may not highlight all the themes of your conference like you’d want with someone guiding them. But I promise your audience will walk away happier and better off.

Tom’s Take

Keynotes are a necessary evil of conferences. You can’t have a big event without some direction from the higher-ups. You need to have some kind of rally where everyone can share their wins and talk about strategy. But you can totally do away with the celebrity keynotes. Instead of grandstanding about how you know someone famous you should do your attendees a favor and help them out somehow. Let them hear a great story or give them some new ideas or skill to help them out. They’ll remember that long after the end of an actor’s career or a sports star’s run in the big leagues.

by networkingnerd at September 13, 2019 04:46 PM Blog (Ivan Pepelnjak)
XKCD Comics

September 12, 2019 Blog (Ivan Pepelnjak)

Disaster Recovery Test Faking: Another Use Case for Stretched VLANs

The March 2019 Packet Pushers Virtual Design Clinic had to deal with an interesting question:

Our server team is nervous about full-scale DR testing. So they have asked us to stretch L2 between sites. Is this a good idea?

The design clinic participants were a bit more diplomatic (watch the video) than my TL&DR answer which would be: **** NO!

Let’s step back and try to understand what’s really going on:

Read more ...

by Ivan Pepelnjak ( at September 12, 2019 12:57 PM

September 11, 2019

Networking Now (Juniper Blog)

Juniper in the Eyes of a CISO: The Customer's Perspective (Part One)

Being a Chief Information Security Officer (CISO) is a challenging and complex task with growing importance. Nearly every company today is technology dependent, and the potential impact of information risk expands. In parallel, there is an increasing pressure on CISOs to drive forward digital transformation, or risk losing their influence on board and management key business decisions. In recent years, related roles such as Chief Data Officer (CDO) and Chief Privacy Officer (CPO) have emerged. The division of workloads and responsibilities between company functions, as well as the management reporting structure, have changed to minimize conflicts of interest and increase cooperation.

While historically it was a technology-driven position, the expectations of a CISO today can be summarized in the following phrase: Enable and Protect Everything.


  • Enable – As a CISO allows the digital transformation and growth into the cloud that the business units are seeking in order to gain a competitive advantage, he/she is also managing and securing a growing number of complex third-party relationships. Supply chain attacks, which target less-secure elements in the supply network of organizations, are on the rise.
  • Protect – As the CISO still maintains the traditional responsibility for managing risks and resources, and acts as a guardian for data that often resides elsewhere, he/she might also pay the price when incidents occur.
  • Everything – A CISO is often required to have a corporate-wide view beyond IT that also covers IoT, business resilience, privacy aspects and more.


Value proposition in key activity areas

The four key activity domains listed in the Certified Information Security Manager (CISM) framework of the Information Systems Audit and Control Association (ISACA) are: 1) maintaining a security governance to allow management oversight for all operations; 2) establishing a comprehensive information security program; 3) managing risk throughout the organization and; 4) handling incident response. Let’s see how Juniper can help modern CISOs achieve their goals by leveraging the technology and features that they care about, in each of these domains.


1.     Governance

Governance supports management’s oversight for all operations to ensure information security strategy is aligned with organizational goals and objectives. A major part of governance is risk and compliance management, which has become more important than ever due to the increasingly intangible nature of business value and the growing risk of violating customer trust.

Juniper Networks contributes to governance by allowing proper security processes. Our management software Security Director fully supports change control workflow (also known as change management) and separation of duties. This capability allows for a staged deployment of changes after a review of the risk, reducing the likelihood of unexpected risks as a result of any security policy change. It also ensures tasks and privileges are distributed between people to prevent fraud and human errors.


pic1-1.pngChange control workflow and separation of duties (SoD) in action

In addition, Juniper Networks helps organizations meet regulatory compliance standards such as GDPR with Juniper Secure Analytics (JSA) SIEM powered by IBM® QRadar®. By adding the Risk Manager license, network topology visualization and device configuration monitoring help in mapping systems and user access to resources. The Risk Scanner free extension app scans databases for Personally Identifiable Information (PII) and populates an asset risk value based on the results. Other governance, risk management and compliance (GRC) add-ons for JSA are offered by third-party vendors and are available in the IBM Security App Exchange.


2.     Information security program

The information security program is an organization’s practical plan for governance and security which should include effective resource management. People and their skillsets are considered the most valuable resource to many organizations. But just as technologies and organizational structure continue to change, so does the skillset required from security personnel.

Current trends in cybersecurity require three new types of skills. The first is data science and analytics as machine learning security implementations become widely deployed. The second is threat hunting as incident handling becomes a ‘must’ function in many organizations. The third is automation as DevSecOps – the scripting of security operations – evolves. More on the latter in this blog post: Harnessing Cybersecurity Automation.

Juniper’s Advanced Threat Prevention (JATP) Appliance can greatly help in overcoming the skill gaps in threat hunting and incident handling. It leverages machine learning-based detection engines, threat intelligence feeds and incident timeline views.

As demonstrated in our cybersecurity calculator, JATP Appliance can simplify and accelerate time-consuming tasks and introduce annual incident response cost savings. Additionally, Juniper Connected Security can facilitate out-of-the-box automation to simplify security operations. There will be more on the benefits of both solutions in Part Two of this blog.


pic-1-2.pngJuniper’s cybersecurity calculatorIn the context of governance, security policy is the collection of technology-agnostic definitions of internal laws and regulations for protecting important assets in alignment with security strategy. Security baselines, which include product rules and definitions, should reflect the intention of the security policy across multiple domains.Juniper can help organizations deploy a single rule-set regardless of data location and cloud boundaries. This approach is often referred to as “user-intent policy”, since rules are expressed by user- and business-related terms such as location, applications, user groups, business processes, etc. rather than technical terms such as VLANs, IPs, subnets or ports.


pic-1-3.pngUser intent-based security rules

3.     Risk management

Juniper can enhance the risk management process by improving control over change management risks. If proper procedures are in place, every change made to a firewall rule requires reevaluation of the risk – a task that consumes time and resources.

A feature called Dynamic Address Group automates a stream of metadata based contextual objects, so rules can be static, and the only change required will be automated label-mapping, which does not require a “commit” action on the SRX Series Services Gateway nor risk reevaluation. This work mode can be used to facilitate a unified user-intent policy across the multicloud as discussed earlier, or restrict resource access based on GeoIP, bad reputation, compliance needs, etc. The feed sources include Juniper threat feeds from our cloud-based service, third-party threat feeds, and any other feed derived from detection products deployed on-site. Administrators can define enforcement policies from all feeds via a single, centralized management point, Junos Space Security Director.


pic-1-4.pngStatic rules with dynamically updated objects (Dynamic Address Group)

4.     Incident handling

A declaration of an incident in any organization must be followed by a clear and pre-defined list of steps and actions, intended to contain the threat and maintain business continuity. This is true for every level of incident, from minor to a complete disaster. Some of these steps may involve the need to change firewall rules according to the severity of the situation.

Using Adaptive Security Policy in Security Director allows the quick adaption of firewall policy to the changing security environment upon an incident or disaster onset. The mechanism to move between severities can be manual or automated to achieve agility and OPEX savings with an out-of-the-box DevOps model. More on this topic in this blog post: Adaptive Security Policies for Dynamic Security Environments.

pic-1-5.pngA pre-defined policy with three colored states and different actions

Another benefit Juniper can bring is the automation of incident response lifecycles. This industry trend, known as security orchestration, automation and response (SOAR), represents a shift in focus from prevention to detection and response.

Due to the evasiveness of modern malware, it is no longer enough to apply a real-time prevention engine such as next-gen firewall since more time might be required to identify a threat. Therefore, a detection layer must be added, followed by automatic response should any compromise be identified. The importance of automation is well illustrated in this quote from Don Welch, the CISO of Penn State University[1]: “If you can write an algorithm to respond to an incident, then it needs to be automated. If you’re not automating it, then you’re wasting resources, because people are slower and more expensive. You need to have security people who are focusing on the more difficult things.”

Juniper Networks offers an automated workflow solution covering all three components (prevention, detection, response) based on the following two options:

  • Juniper Connected Security: An open multi-vendor architecture that combines Juniper Networks and third-party products to automate detection and response in both campus and multi-cloud environments. Expanding network enforcement points beyond just firewalls into switches, access points, virtualized data centers, and public cloud environments, substantially increases the efficiency and efficacy of security operations.
  • JATP Appliance: A detection and response “in a box” solution that offers advanced malware detection as well as an analytics engine to correlate and consolidate threat information with event data from other security tools, and mitigation policies enforced on solutions provided by a selected list of third-party vendors.


What’s next?

In Part Two of this blog post, we will discuss the strategic value Juniper can bring CISOs on their journey to become business influencers and enablers in their organizations. A recording of the webinar delivered to our partner community covering the essence on this blog post (both Part One and Part Two) is available here (partner credentials are required to access).


[1] 7 Experts on Security Automation and Analytics, Mighty Guides, Inc. 2018

by Aviram Zrahia at September 11, 2019 05:57 PM Blog (Ivan Pepelnjak)
XKCD Comics

September 10, 2019

Network Design and Architecture

Recommended Networking Resources for September 2019 Second Week

There are so many good resources for Network Engineers out there. I started to share the ones I liked last week. Click here to see September 2019, First Week Networking Recommended Resources. As you know, I will share 5 resource every week. There are so many in my list already, I can’t wait for the …

The post Recommended Networking Resources for September 2019 Second Week appeared first on Cisco Network Design and Architecture | CCDE Bootcamp |

by Orhan Ergun at September 10, 2019 10:31 AM Blog (Ivan Pepelnjak)

Response: The OSI Model Is a Lie

Every now and then I stumble upon a blog post saying “OSI 7-layer model sucks” or “OSI 7-layer model is a lie”, most recent one coming from Robert Graham.

Before going into the details, let’s agree on the fundamentals.

Most everyone who ever tried to build a network spanning more than one transmission technology and including intermediate nodes came to the conclusion that layered approach to networking makes sense.

Whether you have three, four, five, or seven layers in your model doesn’t matter. What really matters is that your model contains all the functionality you need to implement host-to-host networking in target environment.

Read more ...

by Ivan Pepelnjak ( at September 10, 2019 07:23 AM

September 09, 2019

My Etherealmind

Replacing a Network Element Config System with Git

In this post I’ll explore replacing the heart of a network operating system’s configuration mechanism with the software developers take on version control. It can be argued that network operating systems, or at least good ones, already have a version control system. It’s that very system that allows you to roll back and carry out operations like commit-confirmed. More specifically, this is a version control system like Git but not specifically git.

As my day job rotates around Junos, I’ll concentrate on that. So why would anyone want to rip out the heart of Junos and replace it with a git backed directory full of configuration snippets? Software developers and now automation skilled engineers want the advantages of being able to treat the network like any other service delivering node. Imagine committing human readable configuration snippets to a network configuration directory and having the network check it out and do something with it.

Junos already has a configuration engine capable of rollbacks and provides sanity through semantic and syntax commit time checks. Mgd (the service you interact with) provides mechanisms to render interfaces through YANG models and generates the very configuration tree you interact with. You could say mgd takes care of the git like mechanisms for you but interactively. Through a user interface, a user loads text, XML or JSON snippets into mgd instead of text files. Instead of a git commit, mgd commits the new semantically checked configuration tree to the data store and then performs syntax and logic checking. I believe the gains from mgd outweigh the loss in perceived flexibility from using a version control piece of software like git. Check out RFC 6241 for the intricacies on NETCONF, which mandates the data store functionality which Junos is built upon. This mechanism has been there for twenty years and is not an afterthought to keep up with trends.

As an attempt to provide some defining language, I want to describe what my take on primitives, artefacts and assets are. These can all be version controlled using git or whatever your tool of choice is.


VLANs, VxLANs, IP ranges, port numbers, ASNs etc


Configuration templates (J2, Moustache, language native templating), Terraform resources, service descriptors etc to be consumed by a workflow (machine or human driven)


Generated configuration state items in a network element native language or high level tool like Terraform HCL.

These components allow your configuration pipeline, whether a CI/CD system or workflow engine to change the state of your network element, satisfying the asset input conditions.

The description I gave earlier: “Imagine committing human readable configuration snippets to a network configuration directory and having the network check it out and do something with it.” is a basic description for a ‘component’ within a CI/CD pipeline. Engineers can check configuration asset fragments into a repository for the pipeline to check them out, perform some pre-tests, merge and post-test. Depending on what your pipeline actually does, with an operating system like Junos that already performs sanity semantic and syntax checking, you’re actually just doubling up the work higher up in the layers. It’s worth knowing what you’re dealing with before generating work for yourself. Generating the assets from artifacts and primitives however is always low hanging fruit.


Whilst the notion of replacing the guts of a network element with something like Git seems attractive, a lot of engineering time has been spent creating these network operating systems, a lot of user time has been spent getting the best out of them and many problems you may not be aware of have been solved before the industry got to this point of pondering. Moving your service logic upwards to a set of workflows, with primitive, artifact and asset management however makes perfect sense. This becomes your go to abstraction layer, leaving much of the value in place lower down the stack.

The post Replacing a Network Element Config System with Git appeared first on

by David Gee at September 09, 2019 09:35 AM Blog (Ivan Pepelnjak)

Supply-Chain Security in Open-Source Software

Last week we started the Autumn 2019 Building Network Automation Solutions online course with an interesting presentation from Matthias Luft focused on open-source supply chain security

TL&DR: Can I download whatever stuff I found as my first Google hit and use it in my automation solution? ****, NO!

Matthias covered these topics:

Read more ...

by Ivan Pepelnjak ( at September 09, 2019 07:10 AM

XKCD Comics

September 08, 2019

Routing Freak

Why we cannot live without a Telco Cloud, and how does one build one?

There are a more mobile phone connections (~7.9 billion) than the number of humans (~7.7 billion) colonising this planet.

Let me explain.

Clearly, not every person in the world has a mobile device. Here we’re talking about mobile connections that come from people with multiple devices (dual SIMs, tablets) and other integrated devices like cars, and other smart vehicles, and of course the myriad IOT devices. I don’t have to go too far — my electric 2 wheeler has a mobile connection that it uses to cheerfully download the updated firmware version and the software patches every now and then.

While the global population is growing at 1.08% annually, the mobile phone connections are growing at 2.0%. We will very soon be outnumbered by the number of mobile subscriptions, all happily chatting, tweeting and in general sending data over the network. Some of it would need low latency and low jitter, while some may be more tolerant to the delays and jitter.

What’s the big deal with mobile connections growing?

Well, historically most people have used their mobile phones to talk; to catch up on all the gossip on your neighbours and relatives.

Not anymore.

Now, it’s primarily being used to watch video.

And lots of it; both cached and live.

And it will only grow.

Video traffic in mobile networks is forecast to grow by around 34 percent annually up to 2024 to account for nearly three-quarters of mobile data traffic, from approximately 60 percent currently.

Why is the mobile video traffic growing?

The growth is driven by the increase of embedded video in many online applications, growth of video-on-demand (VoD) streaming services in terms of both subscribers and viewing time per subscriber, multiple video sharing platforms, the evolution towards higher screen resolutions on smart devices. All of these factors have been influenced by the increasing penetration of video-capable smart devices.

India had (still has?) the highest average data usage per smartphone at around 9.8 GB per month by the end of 2018.

And the Internet traffic’s not even hit the peak yet.

It will hit the roof once 5G comes in. Will reach dizzying stratospheric heights when mobile content in the Indian vernacular languages comes of age.

India is home to around 19,500 languages or dialects. Every state has its own primary language and which often is alarmingly different from the state bordering it. There is a popular Hindi saying:

Kos-kos par badle paani, chaar kos par baani

The languages spoken in India change every few kilometres, as does the taste of the water.

Currently, most of the mobile content is in few popular Indian languages.

However, thats changing.

How is the Internet traffic related to the number of languages in India?

According to a Sharechat report, 2018 was the year when for the first time internet users in great numbers accessed social media in their regional languages and participated actively in contributing to user generated content in native languages.

A KPMG India and a Google report claims that the Indian language internet users are expected to grow at a CAGR of 18% vs English users at a CAGR of 3%.

This explains a flurry of investments in vernacular content startups in India.

When all these users come online, we are looking at a prodigious growth in the Internet traffic. More specifically, in the user generated traffic, which primarily would be video — again, video that is live or could be cached.

In short, we’re looking at massive quantities of data being shipped at high speeds over the Internet.

And for this to happen, the telco networks need to change.

From the rigid hardware based network to a more agile, elastic, virtualized, cloud based network. The most seismic changes will happen in the service provider network closest to the customer — the edge network. In the Jurassic age, this would have meant more dedicated hardware at the telco edge. However, given the furious rate of innovation, locking into rigid hardware platforms may not be very prudent since the networks will need to support a range of new devices, service types, and use cases. 5G with its enhanced mobile data experience will unleash innovation that’s not possible for most ordinary mortals to imagine today. The networks however need to be ready for that onslaught. They need to be designed to accommodate the agility and the flexibility that is not needed today.

And how do the networks get that agility and flexibility?

I agree with Wally for one.

The telcos will get that flexibility by virtualizing their network functions, and by, uhem, moving it all to the cloud.

Let me explain this.

Every node, every element in a network exists for a reason. It’s there to serve a function (routing, firewall, intrusion detection, etc). All this while we had dedicated, proprietary hardware that was optimized and purpose built to serve that one network function. These physical appliances had to be manually lugged and installed in the networks. I had written about this earlier here and here.

Now, replace this proprietary hardware with a pure software solution that runs on an off-the-shelf x86 based server grade hardware. One could run this software on a bare metal server or inside a virtual machine running on the hypervisor.

Viola, you just “virtualized” the “network function”!

This is your VNF.

So much for the fancy acronym.

The networks get that flexibility and agility by replacing the physical appliances with a telco cloud running the VNFs. By bringing the VNFs closer to their customer’s end devices. By distributing the processing, and management of data to micro datacenters at the periphery of the network, closer to the customer end devices. Think of it as content caching 2.0.

The edge cloud will be the first point of contact and a lot of processing will happen there. The telco giants are pushing what’s known as edge computing: where VNFs run on a telco cloud closer to the end user, thereby cutting the distance to a computer making a given decision. These VNFs, distributed across different parts of a network, run at the “edges” of the network.

Because the VNFs run on virtual machines, one could potentially run several such virtual functions on a single hypervisor. Not only does this save on hardware costs, space and power, it also simplifies the process of wiring together different network functions, as it’s all done virtually within a single device/server. The service function chaining got a lot simpler!

While we can run multiple VNFs on a single server, we can also split the VNFs across different servers to gain additional capacity during demanding periods. The VNFs can scale up, and scale down, dynamically, as the demand ebbs and flows.

This just wasn’t possible in the old world where physical network functions (fancy word for network appliances) were used. The telco operators would usually over-provision the network to optimize around the peak demand.

In the new paradigm we could use artificial intelligence and deep learning algorithms to predict the network demand and spin the VNFs in advance to meet the network demand in advance.

How can machine learning help?

Virtual Network Functions (VNFs) are easy to deploy, update, monitor, and manage. It’s after all just a special workload running on a VM. It takes less than a few seconds to spin a new instance of a VM. 

The number of VNF instances, similar to generic computing resources in cloud, can be easily scaled based on load. Auto-scaling (of resources without human intervention) has been investigated in academia and industry. Prior studies on auto-scaling use measured network traffic load to dynamically react to traffic changes.

There are several papers that explore using a Machine Learning (ML) based approach to perform auto-scaling of VNFs in response to dynamic traffic changes. The ML classifiers learn from past VNF scaling decisions and seasonal/spatial behavior of network traffic load to generate scaling decisions ahead of time. This leads to improved QoS and significant cost savings for the Telco operators.

In a 2017 Heavy Reading survey, most respondents said that AI/ML would become a critical part of their network operations by 2020. AI/ML and Big data technologies would play a pivotal role in making real time decisions when managing virtualized 5G networks. I had briefly written about it here.

Is Telco Cloud the same as a data center?

Oh, the two are different.

Performance is the key in Telco Cloud. The workloads running on the Telco Cloud are extremely sensitive to delay, packet loss and latency. A lot of hard work goes into ensuring that the packet reaches the VNF as soon as it hits the server’s NIC. You dont want the packet to slowly inch upwards through the host’s (its almost always Linux) OS before it finally reaches the VM hosting the VNF.

In a datacenter running regular enterprise workloads, a few milli seconds of delay may still be acceptable. However, on a telco cloud, running a VNF, such a delay can be catastrophic.

Linux, and its networking component is optimized for general purpose computing. This means that achieving high performance networking inside the Linux kernel is not easy, and requires some bit, ok quite a bit, of customizations and hacks to get it past the 50K packets per second limit thats often incorrectly cited as an upper limit for the Linux kernel performance. Routing packets through the kernel may work for the regular data center workloads.

However, the VNFs need something better.

Because the Linux kernel is slow, we need to completely bypass the kernel.

One could start with SR-IOV.

Very simply, with SR-IOV, a VM hosting the VNF has direct access to subset of PCI resources on a physical NIC. With an SR-IOV compliant driver, the VNF can directly DMA (Direct Memory Access) the outgoing packets to the NIC hardware to achieve higher throughput and lower latency. DMA operation from the device to the VM memory does not compromise the safety of underlying hardware. Intel IO Virtualization Technology (vt-d) supports DMA and interrupt remapping and that restricts the NIC hardware to subset of physical memory allocated for a particular VM. No hypervisor interaction is needed except for interrupt processing.

However, there is a problem with SR-IOV.

Since the packet coming from the VNF goes out of the NIC unmodified, the telco operators would need some other HW switch, or some other entity to slap on the VxLAN or the other tunneling headers on top of the data packet so that it can reach the right remote VM. You need a local VTEP that all these packets hit when they come out of the NIC.

Having a VTEP outside complicates the design. The operators would like to push the VTEP into the compute, and have a plain IP fabric that only does IP routing. There was ways to solve this problem as well, but SR-IOV has limitations on potential migration of the VM hosting the VNF from one physical server to another. This is a big problem. If the VM gets locked down, then we lose on the flexibility and the agility that we had spoken of before.

Can something else be used?


There’s a bunch of kernel bypass techniques, and I’ll only look at a few.

Intel DPDK (Data Plane Development Kit) has been used in some solutions to bypass the kernel, and then there are new emerging initiatives such as (Fast Data Input Output) based on VPP (Vector Packet Processing). More will likely emerge in the future.

DPDK and move networking into Linux user space to address both speed and technology plug-in requirements. Since these are built in the Linux user space, there are no changes in the Linux kernel. This eliminates the extra effort required to convince the Linux kernel community about the usefulness of the patches and their adoption can be accelerated.

DPDK bypasses the Linux kernel and manages the NIC and CPU assignment directly. It uses up some CPU cores for the network processing. It has threads that handle the receiving and processing of packets from the assigned receive queues. They do this in a tight loop, and anything interrupting these threads can cause packets to be dropped. That is why these threads must run on dedicated CPU cores; that is, no other threads — including the various Linux kernel tasks — in the system should run on this core.

Telcos consider this as a “waste” of their CPU cores. The cores that could have run the VNFs have now been hijacked by the DPDK to process packets from the NICs. Its also questioned if we can get a throughput of 100Gbps and beyond with DPDK and other kernel bypass techniques. It might be asinine to dedicate 30 CPU cores in a 32 core server for packet processing, leaving only 2 cores for the VNF.

Looks almost impossible to get 100Gbps+ thats needed for NFV.

Fortunately, no — things are a lot better.

Enter SmartNICs — the brainer cousin of the regular NICs, or rather NICs on steroids. These days there is a lot more brains in the modern NIC – or at least some of them – than we might realize. They take the offloading capabilities to a whole new level. The NIC vendors are packing in a lot of processors in their NIC ASICs to beef up their intelligence. Mellanox’s ConnectX-5 adapter card, which is widely deployed by hyperscalers has six different processors built into it that were designed by Mellanox.

Ok, so these are not CPUs in the normal sense, the ones you and I understand. These are purpose built to allow the NIC to, for instance, look at fields in the incoming packets, look at the IP headers and the Layer 2 headers and look at the outer encapsulated tunnel packets for VXLAN and from that do flow identification and then finally take actions.

This is history repeating itself.

Many many years ago, when dinosaurs still ruled the Earth, Cisco would use a MIPS processor to forward packets in software. And then the asteroid hit the Earth, and Cisco realized that to make the packet routing and forwarding more efficient and for it to scale, they needed custom ASICs, and they started making chips to forward packets.

This is exactly what is now happening in the Telco Cloud space. Open vSwitch was pure software that steered the data between individual virtual machines and routed it, but the performance and scalability was bad that companies started questioning on why some of the processing couldn’t be offloaded to the hardware. Perhaps, down to the NIC if you will. And thats what the latest and the greatest smartNICs do. You can download the OVS rules onto the NIC cards so that you completely bypass the Linux kernel and do all that heavy lifting in hardware.

DPDK and SmartNICs are very interesting and warrants a separate post, which i will do in some time.

So, what is the conclusion?

Aha, i meandered. I often do when I’m very excited.

The Internet traffic is exploding. It’s nowhere near the saturation point, and will increase manifold with 5G and other technologies coming in.

The Telco network can only scale if its virtualized, a’la the telco cloud. Pure hardware based old-style network, especially at the edges, will fail miserably. It will not be able to keep pace with the rapid changes that 5G will bring in. Pure hardware will still rule in the network core. Not at the edge. The edge cloud is where most of the innovation (AI/ML, kernel bypass in software, smartNICs, newer offloading capabilities, etc) will take place.

Telco cloud is possible. We have all the building blocks, today. We have the technology to virtualize, to ship packets at 100-200 Gbps to (and from) the VMs running the VNFs. Imagine the throughput that a rack full of commodity x86 servers, where each does 200Gbps, will get you.

I am very excited about the technology trendlines and the fact that what we’re working on in Nuage Networks is completely inline with where the networking industry is headed.

I am throughly enjoying this joyride. How about you?

by Manav Bhatia at September 08, 2019 04:05 PM

September 06, 2019

The Networking Nerd

IT Hero Culture

I’ve written before about rock stars and IT super heroes. We all know or have worked with someone like this in the past. Perhaps we still do have someone in the organization that fits the description. But have you ever stopped to consider how it could be our culture that breeds the very people we don’t want around?

Keeping The Lights On

When’s the last time you got recognition for the network operating smoothly? Unless it was in response to a huge traffic spike or an attack that tried to knock you offline, the answer is probably never or rarely. Despite the fact that networks are hard to build and even harder to operate, we rarely get recognized for keeping the lights on day after day.

It’s not all that uncommon. The accounting department doesn’t get recognized when the books are balanced. The janitorial staff doesn’t get an exceptional call out when the floors are mopped. And the electric company doesn’t get a gold star because they really did keep the lights on. All of these things are examples of expected operation. When we plug something into a power socket, we expect it to work. When we plug a router in, we expect it to work as well. It may take more configuration to get the router working than the electrical outlet, but that’s just because the hard work of the wiring has already been done.

The only time we start to notice things is when they’re outside our expectation. When the accounting department’s books are wrong. When the floors are dirty. When the lights aren’t on. We’re very quick to notice failure. And, often, we have to work very hard to minimize the culture that lays blame for failure. I’ve already talked a lot about things like blameless post-mortems and other ways to attack problems instead of people. Companies are embracing the idea that we need to fix issues with our systems and not shame our people into submission for things they might not have had complete control over.

Put On The Cape

Have you ever thought about what happens in the other direction, though? I can speak from experience because I spent a lot of time in that role. As a senior engineer from a VAR, I was often called upon to ride in and save the day. Maybe it was after some other company had tried to install something and failed. Or perhaps it was after one of my own technicians had created an issue that needed to be resolved. I was ready on my white horse to ride in and save the day.

And it felt nice to be recognized for doing it! Everyone feels a bit of pride when you are the person to fix an issue or get a site back up and running after an outage. Adulation is a much better feeling than shame without a doubt. But it also beat apathy too. People don’t get those warm fuzzy feelings from just keeping the lights on, after all.

The culture we create that worships those that resolve issues with superhuman skill reinforces the idea that those traits are desirable in engineers. Think about which person you’d rather have working on your network:

  • Engineer A takes two months to plan the cutover and wants to make sure everything goes smoothly before making it happen.
  • Engineer B cuts over with very little planning and then spends three hours of the maintenance window getting all the systems back online after a bug causes an outage. Everything is back up and running before the end of the window.

Almost everyone will say they want Engineer A working for them, right? Planning and methodical reasoning beats a YOLO attitude any day of the week. But who do we recognize as the rockstar with special skills? Probably Engineer B. Whether or not they created their own issue they are the one that went above and beyond to fix it.

We don’t reward people for producing great Disaster Recovery documentation. We laud them for pulling a 36-hour shift to rebuild everything because there wasn’t a document in the first place. We don’t recognize people that spend an extra day during a wireless site survey to make sure they didn’t miss anything in a warehouse. But we really love the people that come in after-the-fact and spend countless hours fixing it.

Acknowledging Averages

Should we stop thanking people for all their hard work in solving problems? No. Because failure to appreciate true skills in a technical resource will sour them on the job quickly. But, if we truly want to stop the hero worshipping behavior that grows from IT, we have to start acknowledging the people that put in hard work day after day to stay invisible.

We need to give a pat on the back to an engineer that built a good script to upgrade switches. Or to someone that spent a little extra time making sure the site survey report covered everything in detail. We need to help people understand that it’s okay to get your job done and not make a scene. And we have to make sure that we average out the good and the bad when trying to ascertain root cause in outages.

Instead of lauding rock stars for spending 18 hours fixing a routing issue, let’s examine why the issue occurred in the first place. This kind of analysis often happens when it’s a consultant that has to fix the issue since a cost is associated with the fix, but it rarely happens in IT departments in-house. We have to start thinking of the cost of this rock star or white knight behavior as being something akin to money or capital in the environment.

Tom’s Take

Rock star culture and hero worship in IT isn’t going to stop tomorrow. It’s because we want to recognize the people that do the work. We want to hold those that go above and beyond up to those that we want to emulate them. But we should also be asking hard questions about why it was necessary for there to need to be a hero in the first place. And we have to be willing to share some of the adulation with those that keep the lights on between disasters that need heroes.

by networkingnerd at September 06, 2019 05:13 PM Blog (Ivan Pepelnjak)

Intent-Based Networking with Batfish on Software Gone Wild

Imagine you would have a system that would read network device configurations, figure out how those devices might be connected, reverse-engineer the network topology, and be able to answer questions like “what would happen if this link fails” or “do I have fully-redundant network” or even “how will this configuration change impact my network”. Welcome to Batfish.

Interested? You’ll find more in Episode 104 of Software Gone Wild.

by Ivan Pepelnjak ( at September 06, 2019 06:08 AM

XKCD Comics

September 05, 2019

Network Design and Architecture

Fast Reroute, Fast Convergence, WRED and WFQ

Fast Reroute , Fast Convergence , WRED and WFQ. You may think that why Orhan is putting all these mechanisms together. I will give you an analogy. Those who participate my talks., know that I love using analogies.  Before we try to understand how these mechanisms are related with each other, let me explain what …

The post Fast Reroute, Fast Convergence, WRED and WFQ appeared first on Cisco Network Design and Architecture | CCDE Bootcamp |

by Orhan Ergun at September 05, 2019 07:16 PM

My Etherealmind
Network Design and Architecture

What is AIGP – Accumulated IGP Metric Attribute? Where AIGP is used?

What is AIGP – Accumulated IGP Metric Attribute? Where AIGP is used? AIGP stands for Accumulated IGP Metric Attribute which is specified in RFC 7311. IGPs (Interior Gateway Protocols) are designed to run within a single administrative domain and they make path-selection decision based on metric value. This post is written based on the information in BGP …

The post What is AIGP – Accumulated IGP Metric Attribute? Where AIGP is used? appeared first on Cisco Network Design and Architecture | CCDE Bootcamp |

by Orhan Ergun at September 05, 2019 12:57 PM

What is OTT – Over the Top mean? OTT Providers

What is OTT – Over the Top and How OTT Providers Work? Over the Top is a term used to refer to Content Providers. So, when you hear Over the Top Providers, they are Content Providers. Content can be any application, any service such as Instant messaging services (Skype, WhatsApp), streaming video services (YouTube, Netflix, …

The post What is OTT – Over the Top mean? OTT Providers appeared first on Cisco Network Design and Architecture | CCDE Bootcamp |

by Orhan Ergun at September 05, 2019 11:27 AM

What is Carrier Hotel?

What is Carrier Hotel? Carrier Hotel is a Company that owns large buildings and rents out redundant power and floor space. And of course, attracts many Telco’s and Carrier networks to the building. Carrier Hotel often leases off large chunks of space to Service Providers or Enterprises. These companies operate the space as a datacenter …

The post What is Carrier Hotel? appeared first on Cisco Network Design and Architecture | CCDE Bootcamp |

by Orhan Ergun at September 05, 2019 11:17 AM Blog (Ivan Pepelnjak)

Measure Twice, Cut Once: Ansible net_interface

As I was preparing the materials for Ansible 2.7 Update webinar sessions I wanted to dive deeper into declarative configuration modules, starting with “I wonder what’s going on behind the scenes

No problem: configure EEM applet command logging on Cisco IOS and execute an ios_interface module (more about that in another blog post)

Next step: let’s see how multi-platform modules work. Ansible has net_interface module that’s supposed to be used to configure interfaces on many different platforms significantly simplifying Ansible playbooks.

Read more ...

by Ivan Pepelnjak ( at September 05, 2019 05:51 AM

September 04, 2019

Network Design and Architecture

Is Protocol Independent Multicast (PIM) really Protocol Independent?

Is Protocol Independent Multicast (PIM) really Protocol Independent? What is that dependency? Does PIM require an IP or can it work with non-IP?  If you don’t know about PIM, please have a look at here and here. One of my students asked, is PIM require an IP (Internet Protocol), which triggered me to share the …

The post Is Protocol Independent Multicast (PIM) really Protocol Independent? appeared first on Cisco Network Design and Architecture | CCDE Bootcamp |

by Orhan Ergun at September 04, 2019 06:41 PM

What is Optimal Routing and Suboptimal Routing in Networking

What is Optimal Routing and Suboptimal Routing in Networking? This may be seen very easy for some of you but let’s make a philosophy a little bit, means let’s design around optimal routing.  Network engineers know that one of the tradeoff in network design is Optimal Routing. We want our application traffic to follow Optimal …

The post What is Optimal Routing and Suboptimal Routing in Networking appeared first on Cisco Network Design and Architecture | CCDE Bootcamp |

by Orhan Ergun at September 04, 2019 12:16 PM

Recommended Networking Resources for September 2019 First Week

I would like to share with you every week some networking resources , can be video , article , book , diagram , another website etc. Whatever I believe can be useful for the computer network engineers, mobile network providers, satellite engineers ,transmission experts, datacenter engineers,  basically whatever I am interested in and I like, …

The post Recommended Networking Resources for September 2019 First Week appeared first on Cisco Network Design and Architecture | CCDE Bootcamp |

by Orhan Ergun at September 04, 2019 10:55 AM

BGP Optimal Route Reflection – BGP ORR

BGP Optimal Route Reflection provides Optimal Routing for the Route Reflector Clients without sending all available paths.  I recommend you to read this post if you don’t know about BGP Route Reflector. If you are looking to learn BGP starting from Zero to Hero, Click Here.  Service Providers mostly prefers Hot Potato Routing in their …

The post BGP Optimal Route Reflection – BGP ORR appeared first on Cisco Network Design and Architecture | CCDE Bootcamp |

by Orhan Ergun at September 04, 2019 10:01 AM Blog (Ivan Pepelnjak)

September 03, 2019

Network Design and Architecture

Will CCDE Practical Exam (Lab) Change in 2020?

Will CCDE Exam (Lab) change in 2020. I have been receiving this question again and again after Cisco’s announcement on Cisco certification exam changes.  Short answer is NO. Little bit long answer is, it will not change in February 2020 and in fact it has been the only design certification since many years. (Cisco I …

The post Will CCDE Practical Exam (Lab) Change in 2020? appeared first on Cisco Network Design and Architecture | CCDE Bootcamp |

by Orhan Ergun at September 03, 2019 07:47 PM

Technologies and the protocols may not be used for what they were intended

I was reading a book today , called Deploying QoS for Cisco IP and NGN networks, which I can recommend you for history and future for QoS in networking industry, there was couple paragraph in the book, which lead me to share my thoughts about the protocols/technologies and their usage.   In the book, as …

The post Technologies and the protocols may not be used for what they were intended appeared first on Cisco Network Design and Architecture | CCDE Bootcamp |

by Orhan Ergun at September 03, 2019 07:27 PM

Networking Now (Juniper Blog)

Defenses and Dangers: Understanding Your Data Protection Needs

The value of business data is constantly changing, and more so every year. Data used to consist of a few names and email addresses, some purchasing information and some contact details gleaned from badge-scans at the last event attended. As such, it was only valuable at certain times and rarely      viewed as business critical. Fast forward to today, this has changed. Now, the quality and quantity of data that is stored directly contributes to business success on many different levels. Some organizations are   ‘mining data’ for information or creating ‘data lakes’ using customer data which can then be used to understand purchasing patterns and attempt to predict future opportunities.

by lpitt at September 03, 2019 01:00 PM Blog (Ivan Pepelnjak)

If You Have to Simulate Your Whole Network, You're Doing It Wrong

This blog post was initially sent to subscribers of my SDN and Network Automation mailing list. Subscribe here.

Have you ever seen a presentation in which a startup is telling you how awesome their product is because it allows you to simulate your whole network in a virtual environment? Not only that, you can use that capability to build a test suite and a full-blown CI/CD pipeline and test whether your network works every time you make a change to any one box in the network.

Sounds awesome, right? It’s also dead wrong. Let me explain why that’s the case.

Read more ...

by Ivan Pepelnjak ( at September 03, 2019 05:03 AM

XKCD Comics


This comic best viewed on

September 03, 2019 12:00 AM

September 02, 2019 Blog (Ivan Pepelnjak)

Just Published: NSX-T Technical Deep Dive Slide Deck

Last year when I was creating the first version of VMware NSX Deep Dive content, NSX-V was mainstream and NSX-T was the new kid on the block. A year later NSX-V is mostly sidelined, and all the development efforts are going into NSX-T. Time to adapt the webinar to new reality… taking the usual staged approach:

by Ivan Pepelnjak ( at September 02, 2019 05:43 AM

XKCD Comics

New book: How To

Hey there!

I'm excited to announce that my new book, How To, will be going on sale in a few hours!

I'm really proud of this book. It features information on everything from opening water bottles with nuclear weapons to how to be on time for meetings by altering the rotation of the Earth. It also includes real-life tips and advice from a number of experts who generously lent their time, including Col. Chris Hadfield and Serena Williams.

You can order it now on Amazon, Barnes & Noble, IndieBound, and Apple Books.

September 02, 2019 12:00 AM

August 30, 2019

The Networking Nerd

Positioning Policy Properly

Who owns the network policy for your organization? How about the security policy?Identity policy? Sound like easy questions, don’t they? The first two are pretty standard. The last generally comes down to one or two different teams depending upon how much Active Directory you have deployed. But have you ever really thought about why?

During Future:NET this week, those poll questions were asked to an audience of advanced networking community members. The answers pretty much fell in line with what I was expecting to see. But then I started to wonder about the reasons behind those decisions. And I realized that in a world full of cloud and DevOps/SecOps/OpsOps people, we need to get away from teams owning policy and have policy owned by a separate team.

Specters of the Past

Where does the networking policy live? Most people will jump right in with a list of networking gear. Port profiles live on switches. Routing tables live on routers. Networking policy is executed in hardware. Even if the policy is programmed somewhere else.

What about security policy? Firewalls are probably the first thing that come to mind. More advanced organizations have a ton of software that scans for security issues. Those policy decisions are dictated by teams that understand the way their tools work. You don’t want someone that doesn’t know how traffic flows through a firewall to be trying to manage that device, right?

Let’s consider the identity question. For a multitude of years the identity policy has been owned by the Active Directory (AD) admins. Because identity was always closely tied to the server directory system. Novell (now NetIQ) eDirectory and Microsoft AD were the kings of the hill when it came to identity. Today’s world has so much distributed identity that it’s been handed to the security teams to help manage. AD doesn’t control the VPN concentrator the cloud-enabled email services all the time. There are identity products specifically designed to aggregate all this information and manage it.

But let’s take a step back and ask that important question: why? Why is it that the ownership of a policy must be by a hardware team? Why must the implementors of policy be the owners? The answer is generally that they are the best arbiters of how to implement those policies. The network teams know how to translate applications in to ports. Security teams know how to create firewall rules to implement connection needs. But are they really the best people to do this?

Look at modern policy tools designed to “simplify” networking. I’ll use Cisco ACI as an example but VMware NSX certainly qualifies as well. At a very high level, these tools take into account the needs of applications to create connectivity between software and hardware. You create a policy that allows a database to talk to a front-end server, for example. That policy knows what connections need to happen to get through the firewall and also how to handle redundancy to other members of the cluster. The policy is then implemented automatically in the network by ACI or NSX and magically no one needs to touch anything. The hardware just works because policy automatically does the heavy lifting.

So let’s step back for moment and discuss this. Why does the networking team need to operate ACI or NSX? Sure, it’s because those devices still touch hardware at some point like switches or routers. But we’ve abstracted the need for anyone to actually connect to a single box or a series of boxes and type in lines of configuration that implement the policy. Why does it need to be owned by that team? You might say something about troubleshooting. That’s a common argument that whoever needs to fix it when it breaks is who needs to be the gatekeeper implementing it. But why? Is a network engineer really going to SSH into every switch and correct a bad application tag? Or is that same engineer just going to log into a web console and fix the tag once and propagate that info across the domain?

Ownership of policy isn’t about troubleshooting. It’s about territory. The tug-of-war to classify a device when it needs to be configured is all about collecting and consolidating power in an organization. If I’m the gatekeeper of implementing workloads then you need to pay tribute to me in order to make things happen.

If you don’t believe that, ask yourself this: If there was a Routing team and and Switching team in an organization, who would own the routed SVI interface on a layer 3 switch? The switching team has rights because it’s on their box. The routing team should own it because it’s a layer 3 construct. Both are going to claim it. And both are going to fight over it. And those are teams that do essentially the same job. When you start pulling in the security team or the storage team or the virtualization team you can see how this spirals out of control.

Vision of the Future

Let’s change the argument. Instead of assigning policy to the proper hardware team, let’s create a team of people focused on policy. Let’s make sure we have proper representation from every hardware stack: Networking, Security, Storage, and Virtualization. Everyone brings their expertise to the team for the purpose of making policy interactions better.

Now, when someone needs to roll out a new application, the policy team owns that decision tree. The Policy Team can have a meeting about which hardware is affected. Maybe we need to touch the firewall, two routers, a switch, and perhaps a SAN somewhere along the way. The team can coordinate the policy changes and propose an implementation plan. If there is a construct like ACI or NSX to automate that deployment then that’s the end of it. The policy is implemented and everything is good. Perhaps some older hardware exists that needs manual configuration of the policy. The Policy Team then contacts the hardware owner to implement the policy needs on those devices. But the Policy Team still owns that policy decision. The hardware team is just working to fulfill a request.

Extend the metaphor past hardware now. Who owns the AWS network when your workloads move to the cloud? Is it still the networking team? They’re the best team to own the network, right? Except there are no switches or routers. They’re all software as far as the instance is concerned. Does that mean your cloud team is now your networking team as well? Moving to the cloud muddies the waters immensely.

Let’s step back into the discussion about the Policy Team. Because they own the policy decisions, they also own that policy when it changes hardware or location. If those workloads for email or productivity suite move from on-prem to the cloud then the policy team moves right along with them. Maybe they add an public cloud person to the team to help them interface with AWS but they still own everything. That way, there is no argument about who owns what.

The other beautiful thing about this Policy Team concept is that it also allows the rest of the hardware to behave as a utility in your environment. Because the teams that operate networking or security or storage are just fulfilling requests from the policy team they don’t need to worry about anything other than making their hardware work. They don’t need to get bogged down in policy arguments and territorial disputes. They work on their stuff and everyone stays happy!

Tom’s Take

I know it’s a bit of stretch to think about pulling all of the policy decisions out of the hardware teams and into a separate team. But as we start automating and streamlining the processes we use to implement application policy the need for it to be owned by a particular hardware team is hard to justify. Cutting down on cross-faction warfare over who gets to be the one to manage the new application policy means enhanced productivity and reduced tension in the workplace. And that can only lead to happy users in the long run. And that’s a policy worth implementing.

by networkingnerd at August 30, 2019 04:00 PM Blog (Ivan Pepelnjak)

Video: Introducing Transmission Technologies

After discussing the challenges one encounters even in the simplest networking scenario connecting two computers with a cable we took a short diversion into an interesting complication: what if the two computers are far apart and we can’t pull a cable between them?

Trying to answer that question we entered the wondrous world of transmission technologies. It’s a topic one can spent a whole life exploring and mastering, so we were not able to do more than cover the fundamentals of modulations and multiplexing technologies.

You need free subscription to watch the video, or a paid subscriptions to watch the rest of the webinar.

by Ivan Pepelnjak ( at August 30, 2019 04:53 AM

XKCD Comics

August 29, 2019 Blog (Ivan Pepelnjak)

Upcoming Events and Webinars (September 2019)

We’re back from the summer break for real - the first autumn 2019 event takes place today: I’ll talk about the fallacies of distributed computing.

September will be an intensive month:

Of course, we’ll keep going… our event calendar is fully packed till mid-November. More about that in a month.

by Ivan Pepelnjak ( at August 29, 2019 05:10 AM

August 28, 2019 Blog (Ivan Pepelnjak)

Updated: Never-Ending Story of IP Fragmentation

In mid 2000s I wrote a number of articles describing various TCP/IP features. Most of them are a bit outdated, so I decided to clean up, update and repost the most interesting ones on, starting with Never-Ending Story of IP Fragmentation.

The first part of that article is already online, covering MTU basics and drawbacks of IP fragmentation.

by Ivan Pepelnjak ( at August 28, 2019 04:59 AM