October 19, 2018

IPEngineer.net

Automation: Flow Control & Dimensionality

Human beings as we are, struggle sometimes to think multi-dimensionally about tasks. Our brains seem to have a conscious layer and a sub-conscious layer. Whether you think in words, noise or images, your brain is a single threaded engine with a silent co-processor that can either assist or annoy. Experience has shown that we look at network automation challenges through this shaped lens and try and solve things that makes sense to humans, but not necessarily for mechanized processes.

In an attempt not to lose my own thread, I’ll try and explain some different view points through examples.

Example One: I’m English, Make me some Tea!

Making a a cup of tea is a very English thing to do and the process of making one will suffice for this example.

Let’s look at the process involved:

// { type: activity}
(Start)-><a>[kettle empty]->(Fill Kettle)->|b|
<a>-(note: Kettle activities)
<a>[kettle full]->|b|->(Boil Kettle)->|c|
|b|->(Add Tea Bag)-><d>[Sugar: yes]->(Add Sugar)->(Add Milk)
<d>[Sugar: no]->(Add Milk)
<d>-(note: Sweet tooth?)
(Add Milk)->|c|->(Pour Boiled Water)
(Pour Boiled Water)->(Enjoy)->(Stop)

Fig.1

This makes us a relative standard cup of English breakfast tea.

Let’s assume macros exist for milk and sugar quantity and the dealing of a mug or best china has been dealt with.

Let’s analyze from a flow control perspective.

  1. Human wants tea
  2. Invoke start
  3. Check if kettle needs water
  4. If kettle needs water, fill it, else;
  5. Boil kettle
  6. Add tea bag
  7. If I want sugar, add it (decision)
  8. Add milk (Ok, some would argue the milk will be scolded)
  9. Pour boiled water
  10. Drink

There are interesting points in this flow chart that need pointing out. Boiling of the kettle is a long task, so we can do the other parallel tasks whilst the kettle boils. Even so, adding the tea bag, sugar and milk are still sequential tasks. So we have a long lived task and short sequential tasks giving the impression of efficiency. All tasks then merge and wait for the kettle to finish boiling. Note, no question was asked if the water has boiled as the statement implies you pour boiled water! It can’t be poured boiled water if it hasn’t boiled! The “Pour Boiled Water” can be represented in a self-contained flow-chart and for the purposes of this post is also an asynchronous function.

// { type: activity}
(Start)->[Yes]->(Pour Boiling Water)
-[No]->
(note: Has kettle light gone off?)->

Fig.2

Flow control in this flowchart is sequential in a single time dimension. We could optimize this and the argument against it would be the classic “optimizing the point of constraint”. For one cup of tea, the main flow-chart illustrated in this section is good enough.

Example Two: Surprise Family Visit Tea Factory

It’s a Sunday and the family have arrived. Ten of them. Each take their tea differently and with different tea bags. Although the process is ultimately the same, we now have multiple kettles to cope with the water quantity and a host of ingredients. A sequential model just won’t cut it as the family will moan that their beverages are all different temperatures. Some tasks are sequential and single dimensional and other tasks can be run in multi-dimensions, but all with state injected from the originating dimension.

Two major trains of thought exist here:

  1. Orchestrated Workflows
  2. Autonomous Workflows

Orchestrated Workflows are driven from a central “brain” and workflows are typically single threaded with most platforms having the ability to spawn branches if instructed to do so by the workflow creator. Platforms like StackStorm or Salt serve these needs.

In the case of our tea making exercise, a single workflow is triggered, which may spawn jobs both parallel and sequential to boil the kettles, prepare each mug and pour the boiled water. This relies on jobs being spawned, and the orchestrator waiting for each job to exit successfully before carrying on with the next task. The last sentence may require some more explaining but here goes. One cannot pour boiling water without the cups being appropriately laced with tea, sugar and milk (according to the person’s order).

Autonomous Workflows are a little more complicated. These workflows can split from and join each other and can be viewed as sharing data. The whole workflow is applicable to one cup of tea, and as a result, the workflow starts with an identification of all ten cups of tea. Some actions are designated to run once, or in lower numbers than the ten cups of tea. Imagine if programmatically speaking, all ten tea making workflows live in a list type, the first two entries in the list might be responsible for boiling the kettles, although each workflow has the kettle task. The first entry might be responsible for asking and confirming the type of each drink and quantities of milk and sugar. These details are available to all workflows (as the workflows are identical) and the only defining item is a spawn $ID in this case from 0-9. Therefore imagine a “mandate” which looks like below:

drinks:
    0:  # Drink ID
        kind: "engish breakfast"
        sugar: 1
        milk: "whole"
        water_from_kettle: 0
        for: "Aunt Dorris"
    1:  # Drink ID
        kind: "earl grey"
        sugar: 0
        milk: "lemon juice"
        water_from_kettle: 1
        for: "Uncle Dave"
    3:  # Etc
        ...

kettle:
    workflow_to_boil:
        -   0
        -   1

Fig.3

Its possible for each instance of the workflow to get relevant information from the data structure about what it is expected to do, like what drink and if it is responsible for boiling a kettle. Some workflows may require automatic spawning of kettles and quantities of hot water, but for this example this mandate will do just fine.

Creation of the workflow has to be crafted very carefully to ensure the correct decisions are being made at the correct time. Platform by platform, execution style can change however. Ansible will run each task sequentially, dealing with specifics of parallelism when required. For example, one task might be to start boiling the kettle by two of the ‘hosts’, or in our case the drink ID, other tasks are performed like loading of the mugs, then a looped check might happen until the kettles have boiled and both pouring tasks can be completed. This is asynchronous activity versus synchronous activity and has to be managed correctly with correct exit, error and re-try logic.

Networks have a habit of screwing up transitioning to states that require human intervention and any good workflow will provide a “hammer time” revert with a careful mutation, careful being “mutate” and “verify”.

Here is what a modified version of the workflow may look like:

// {type: activity}
(Start)->(Get Drink Mandate)
(Get Drink Mandate)->[kettle $ID empty]->(Fill Kettle $ID)->|b|
-(note: Kettle activities)
[kettle $ID full]->|b|->(Boil Kettle $ID)->|c|
|b|->(Add Tea Bag to $ID)-><d>[$ID Sugar: yes]->(Add Sugar to $ID)->(Add Milk to $ID)
<d>[$ID Sugar: no]->(Add Milk to $ID)
(Add Milk to $ID)->|c|->(Pour Boiled Water from kettle $ID)
(Pour Boiled Water from kettle $ID)->(Enjoy)->(Stop)

Fig.4

The drink and kettle IDs are indexed using the instance $ID of the workflow, so for example, instance 0 boils one of the kettles and deals with making the drink for Aunty Dorris, chatterbox extraordinaire. Each workflow can be used autonomously and in parallel, but with instance 0 and 1 boiling the water and workflows 2-9 consuming water from the kettles boiled by 0 and 1. The kettles in this instance are external and asynchronous services.

Language

I’ve flicked between words like dimensional, sequential, parallel, synchronous and asynchronous. Language in automation mainly comes from control-theory and industrial automation.

I propose the following language for orchestration:

Orchestrated Workflows For workflows requiring central co-ordination. A good example would be coordinating humans to fetch ingredients and handle the ingredient stock. Then the tasks would be executed one by one with central decision making on task progression. Large mechanized processes between organizational units requires this kind of approach. Business Process Management tools fall in this category and workflow engines as previously mentioned Salt, StackStorm etc)

Autonomous Workflows These can be viewed as self-contained workflows. A set of ingredients is despatched with an instruction set. A tool interprets the instruction set and gets on with the job using the ingredients. Some of these tasks might be intra-organizational and CI/CD tooling like Jenkins and GitLab can fall in this category

I propose the following language for your workflows:
Single-Dimension For workflows that make all decisions in a single plane of logic like making one cup of tea. This could be a task engine like Ansible.

Constraints here might be boiling many kettles and having the first drink go cold whilst finishing the last.

Multi-Dimensional For workflows requiring more than one set of tasks to run in parallel and in different time domains. Cleaning the dishes whilst making the tea would be advantageous to being a good host. A workflow engine could spawn these workflows from an orchestration perspective. Just to confuse you more, the tea making exercise can be spawned ten times with some tasks running just once or tied to an instance ID.

Managing multi-dimensional state can be difficult, so the danger here is not managing error and exit conditions correctly. Instance awareness happens in this mode of operation. This mechanism provides a great way of tying variables to workflow instances like our tea making mandate example (Fig3). In simple terms, an instance can apply it’s own ID as an index into data it needs.

I propose the following language for the tasks that live in the workflows:

Sequential For tasks that require one to finish before the next can start.

This might be taking a mug out of a cupboard and placing it on to the side before throwing in a tea bag, sugar and milk.

Parallelized Sequential For identical tasks that take the same amount of time, like loading ten tea bags into ten mugs.

Another example might be taking mugs out of the cupboard in parallel, then loading them all with tea bags, sugar and milk. After all, we’re making one batch of tea but with multiple cups.

Mixed Parallelized Sequential For workflows containing tasks requiring multiple branches of differing tasks. This could be preparing dinner whilst making the drinks if we carry on with our use cases. The tasks are now varying lengths, but to gain maximum efficiency, how cool would it be to deliver drinks and announce the time of dinner after one “as short as possible” kitchen visit?

Asynchronous For tasks that take a long time to complete, like boiling a kettle. These tasks will have a mechanism that reports their state, or a mechanism to report task completion or error. These mechanisms are accessible platform wide and queryable from any dimension or task. Another example would be having a chef prepare your dinner, give him or her instruction then occasionally shout through to the kitchen.

Closing Words

I can hear screaming and the reason for that is, choice complicates the approach. What about a “Centrally Orchestrated, Multi-dimensional, Parallelized Sequential” workflow? Now you’re talking coffee shop grade automation with two very simple constraints: till speed and worst drink preparation time. It would be like being in a cafe but being served as if you were the only person in it.

It’s better to be armed with knowledge than fumble your way through. My advice is always to draw this out on a whiteboard or on paper. I draft everything in UML which includes the programmatically created diagrams in this post.

When it comes to design, split workflows and tasks apart and do the simplest thing, always. In our tea example, if Aunty Dorris is chatting everyone’s ear off, then your guests may require your refreshments faster and therefore remove as many bottlenecks as possible. If Uncle Dave, ‘Dare Devil Extreme’ is sharing a story about base jumping, they might not care about time, but eventual delivery within the time constraints of the story will be well received.

As the number of components increases, the higher the number of error and recovery scenarios you have to consider. A centrally orchestrated, multi-dimension with asynchronous and parallelized sequential tasked workflow *breathes in* needs to be designed with reliability in mind. Mechanisms like watchdog timers, auto-remediation for predictable errors and hard mutation reverts based on unrecoverable errors all help. When there is an unrecoverable failure, the creator should have a path to invoke human assistance. One example might be to use a ChatOps module or call a phone with a pre-recorded message! *Imagine Google’s Assistant ringing you? Cool or eery?* Multi-dimension automation without central orchestration still needs recovery plans, but with fewer things to go wrong, it’s more likely a single task will fail rather than the orchestration engine itself.

In short, the more stuff you have, the more fragile it becomes. With great automation powers, comes great potential failure, but also fantastic gains if done correctly! Sorry Uncle Ben, this needed closure.

This post has mainly focussed on an easy to explain “drink making” workflow and network automation really isn’t any different. A workflow is a workflow, domain differences withstanding. Great network automations come from people who truly understand their own domain. Despite being wordy, the language used in this post should help you to define how the workflow behaves and identify the correct way to execute tasks. Out of experience, half the battle is knowing where to start and that is with drafting workflows. The tools and platforms can be identified later and should never lead discussions of process mechanization.

Until the next time, thank you for reading and please leave comments or ask questions.

Helpful Notes

Theory of Constraints: https://en.wikipedia.org/wiki/Theory_of_constraints
Control Theory: https://en.wikipedia.org/wiki/Classical_control_theory

The post Automation: Flow Control & Dimensionality appeared first on ipengineer.net.

by David Gee at October 19, 2018 05:45 PM

Potaroo blog

Diving into the DNS

DNS OARC organizes two meetings a year. They are two-day meetings with a concentrated dose of DNS esoterica. Here’s what I took away from the recent 29th meeting of OARC, held in Amsterdam in mid-October 2018.

October 19, 2018 02:50 PM

Networking Now (Juniper Blog)

Outsmarting Cybercriminals at Work: Your Employees are Your First Line of Defense

No matter where you work – be it a corporate office, a retail store, healthcare institution, place of academia or government agency – every employee has a role to play in ensuring your organization maintains good security hygiene.

 

by Amy James at October 19, 2018 01:45 PM

ipSpace.net Blog (Ivan Pepelnjak)

New: Expert ipSpace.net Subscription

Earlier this month I got this email from someone who had attended one of my online courses before and wanted to watch another one of them:

Is it possible for you to bundle a 1 year subscription at no extra cost if I purchase the Building Next-Generation Data Center course?

We were planning to do something along these lines for a long time, and his email was just what I needed to start a weekend-long hackathon.

End result: Expert ipSpace.net Subscription. It includes:

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at October 19, 2018 06:56 AM

XKCD Comics

October 18, 2018

ipSpace.net Blog (Ivan Pepelnjak)

Leaf-and-Spine Fabric Myths (Part 3)

Evil CCIE concluded his long list of leaf-and-spine fabric myths (more in part 1 and part 2) with a layer-2 fabric myth:

Layer 2 Fabrics can't be extended beyond 2 Spine switches. I had a long argument with a $vendor guys on this. They don't even count SPB as Layer 2 fabric and so forth.

The root cause of this myth is the lack of understanding of what layer-2, layer-3, bridging and routing means. You might want to revisit a few of my very old blog posts before moving on: part 1, part 2, what is switching, layer-3 switches and routers.

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at October 18, 2018 05:52 AM

October 17, 2018

ipSpace.net Blog (Ivan Pepelnjak)

MUST READ: Operational Security Considerations for IPv6 Networks

A team of IPv6 security experts I highly respect (including my good friends Enno Rey, Eric Vyncke and Merike Kaeo) put together a lengthy document describing security considerations for IPv6 networks. The document is a 35-page overview of things you should know about IPv6 security, listing over a hundred relevant RFCs and other references.

No wonder enterprise IPv6 adoption is so slow – we managed to make a total mess.

by Ivan Pepelnjak (noreply@blogger.com) at October 17, 2018 06:30 AM

XKCD Comics

October 16, 2018

Potaroo blog

Routing Security at NANOG 74

The level of interest in the general topic of routing security seems to come in waves in our community. At times it seems like the interest from network operators, researchers, security folk and vendors climbs to an intense level, while at other times the topic appears to be moribund. If the attention on this topic at NANOG 74 is anything to go by we seem to be experiencing a local peak.

October 16, 2018 06:21 PM

ipSpace.net Blog (Ivan Pepelnjak)

Event-Driven Network Automation in Network Automation Online Course

Event-driven automation (changing network state and/or configuration based on events) is the holy grail of network automation. Imagine being able to change routing policies (or QoS settings, or security rules) based on changes in the network.

We were able to automate simple responses with on-box solutions like Embedded Event Manager (EEM) available on Cisco IOS for years; modern network automation tools allow you to build robust solutions that identify significant events from the noise generated by syslog messages, SNMP traps and recently streaming telemetry, and trigger centralized responses that can change the behavior of the whole network.

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at October 16, 2018 06:44 AM

October 15, 2018

ipSpace.net Blog (Ivan Pepelnjak)

Why Is Network Automation such a Hot Topic?

This blog post was initially sent to subscribers of my SDN and Network Automation mailing list. Subscribe here.

One of my readers asked a very valid question when reading the Why Is Network Automation So Hard blog post:

Why was network automation 'invented' now? I have been working in the system development engineering for 13+ years and we have always used automation because we wanted to save time & effort for repeatable tasks.

He’s absolutely right. We had fully-automated ISP service in early 1990’s, and numerous service providers used network automation for decades.

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at October 15, 2018 06:37 AM

XKCD Comics

October 13, 2018

ipSpace.net Blog (Ivan Pepelnjak)

Worth Reading: Software Disenchantment

Found an awesome blog post describing how we’re wasting resources on incomprehensible scale. Here’s a tiny little morsel:

Only in software, it’s fine if a program runs at 1% or even 0.01% of the possible performance. Everybody just seems to be ok with it. People are often even proud about how much inefficient it is, as in “why should we worry, computers are fast enough”.

by Ivan Pepelnjak (noreply@blogger.com) at October 13, 2018 03:36 PM

October 12, 2018

Networking Now (Juniper Blog)

Seize the Opportunity: Crucial Skills for a Career in Cybersecurity

It's no secret that the cybersecurity industry is facing a shortage of skilled professionals. The threat landscape has evolved faster than nearly anyone was expecting, creating a high demand for talented individuals with niche skillsets to help protect enterprise and consumer infrastructure.

 

by SamanthaMadrid at October 12, 2018 03:32 PM

The Networking Nerd

Security Is Bananas

I think we’ve reached peak bombshell report discussion at this point. It all started this time around with the big news from Bloomberg that China implanted spy chips into SuperMicro boards in the assembly phase. Then came the denials from Amazon and Apple and event SuperMicro. Then started the armchair quarterbacking from everyone, including TechCrunch. From bad sources to lack of technical details all the way up to the crazy conspiracy theories that someone at Bloomberg was trying to goose their quarterly bonus with a short sale or that the Chinese planted the story to cover up future hacking incidents, I think we’ve covered the entire gamut of everything that the SuperMicro story could and couldn’t be.

So what more could there be to say about this? Well, nothing about SuperMicro specifically. But there’s a lot to say about the fact that we were both oblivious and completely unsurprised about an attack on the supply chain of a manufacturer. While the story moved the stock markets pretty effectively for a few days, none of the security people I’ve talked to were shocked by the idea of someone with the power of a nation state inserting themselves into the supply chain to gain the kind of advantage needed to execute a plan of collection of data. And before you scoff, remember we’re only four years removed from the allegation that the NSA had Cisco put backdoors into IOS.

Why are we not surprised by this idea? Well, for one because security is getting much, much better at what it’s supposed to be doing. You can tell that because the attacks are getting more and more sophisticated. We’ve gone from 419 scam emails being deliberately bad to snare the lowest common denominator to phishing attacks that fool some of the best and brightest out there thanks to a combination of assets and DNS registrations that pass the initial sniff test. Criminals have had to up their game because we’re teaching people how to get better at spotting the fakes.

Likewise, technology is getting better at nabbing things before we even see them. Take the example of Forcepoint. I first found out about them at RSA this year. They have a great data loss prevention (DLP) solution that keeps you from doing silly things like emailing out Social Security Numbers or credit card information that would violate PCI standards. But they also have an AI-powered analysis engine that is constantly watching for behavioral threats. If someone does this on accident once it could just be a mistake. But a repeated pattern of behavior could indicate a serious training issue or even a malicious actor.

Forcepoint is in a category of solutions that are making the infrastructure smarter so we don’t have to be as vigilant. Sure, we’re getting much better at spotting things to don’t look right. But we also have a lot of help from our services. When Google can automatically filter spam and then tag presented messages as potentially phishing (proceed with caution), it helps me start my first read through as a skeptic. I don’t have to exhaust my vigilance for every email that comes across the wire.

The Dark Side Grows Powerful Too

Just because the infrastructure is getting smarter doesn’t mean we’re on the road to recovery. It means the bad actors are now exploring new vectors for their trade. Instead of 419 or phishing emails they’re installing malware on systems to capture keystrokes. iOS 12 now has protection from fake software keyboards that could capture information when something is trying to act as a keyboard on-screen. That’s a pretty impressive low-level hack when you think about it.

Now, let’s extrapolate the idea that the bad actors are getting smarter. They’re also contending with more data being pushed to cloud providers like Amazon and Azure. People aren’t storing data on their local devices. It’s all being pushed around in Virginia and Oregon data centers. So how do you get to that data? You can’t install bad software on a switch or even a class of switches or even a single vendor, since most companies are buying from multiple vendors now or even looking to build their own networking stacks, ala Facebook.

If you can’t compromise the equipment at the point of resale, you have to get to it before it gets into the supply chain. That’s why the SuperMicro story makes sense in most people’s heads, even if it does end up not being 100% true. By getting to the silicon manufacturer you have a entry point into anything they make. Could you imagine if this was Accton or Quanta instead of SuperMicro? If there was a chip inside every whitebox switch made in the last three years? If that chip had been scanning for data or relaying information out-of-band to a nefarious third-party? Now you see why supply chain compromises are so horrible in their potential scope.

This Is Bananas

Can it be fixed? That’s a good question that doesn’t have a clear answer. I look at it like the problem with the Cavendish banana. The Cavendish is the primary variant of the banana in the world right now. But it wasn’t always that way. The Gros Michel used to be the most popular all the way into the 1950s. It stopped because of a disease that infected the Gros Michel and caused entire crops to rot and die. That could happen because bananas are not grown through traditional reproductive methods like other crops. Instead, they are grafted from tree to tree. In a way, that makes almost all bananas clones of each other. And if a disease affects one of them, it affects them all. And there are reports that the Cavendish is starting to show signs of a fungus that could wipe them out.

How does this story about bananas relate to security? Well, if you can’t stop bananas from growing everywhere, you need to take them on at the source. And if you can get into the source, you can infect them without hope of removal. Likewise, if you can get into the supply chain and start stealing or manipulating data a low level, you don’t need to worry about all the crazy protections put in at higher layers. You’ll just bypass them all and get what you want.


Tom’s Take

I’m not sold on the Bloomberg bombshell about SuperMicro. The vehement denials from Apple and Amazon make this a more complex issue than we may be able to solve in the next couple of years. But now that the genie is out the bottle, we’re going to start seeing more and more complicated methods of attacking the merchant manufacturers at the source instead of trying to get at them further down the road. Maybe it’s malware that’s installed out-of-the-box thanks to a staging server getting compromised. Maybe it’s a hard-coded backdoor like the Xiamoi one that allowed webcams to become DDoS vectors. We can keep building bigger and better protections, but eventually we need to realize that we’re only one threat away from extinction, just like the banana.

by networkingnerd at October 12, 2018 02:26 PM

ipSpace.net Blog (Ivan Pepelnjak)

Making Sense of Software-Defined World

In mid-September I was invited to present at the vNIC 2018 event in Frankfurt, Germany. Unfortunately I wasn’t able to get there, but Zoom did a great job … and enabled me to record the talk.

by Ivan Pepelnjak (noreply@blogger.com) at October 12, 2018 07:07 AM

XKCD Comics

October 11, 2018

ipSpace.net Blog (Ivan Pepelnjak)

Worth Watching: Machine Learning in a Nutshell

This blog post was initially sent to the subscribers of my SDN and Network Automation mailing list. Subscribe here.

What could be better than an SDN product to bring you closer to a networking nirvana? You guessed it – an SDN product using machine learning.

Want to have some fun? The next time your beloved $vendor rep drops by trying to boost his bonus by persuading you to buy the next-generation machine-learning tool his company just released, invite him to watch James Mickens’ Usenix Security Symposium keynote with you.

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at October 11, 2018 05:57 AM

October 10, 2018

ipSpace.net Blog (Ivan Pepelnjak)

Leaf-and-Spine Fabric Myths (Part 2)

The next set of Leaf-and-Spine Fabric Myths listed by Evil CCIE focused on BGP:

BGP is the best choice for leaf-and-spine fabrics.

I wrote about this particular one here. If you’re not a BGP guru don’t overcomplicate your network. OSPF, IS-IS, and EIGRP are good enough for most environments. Also, don’t ever turn BGP into RIP with AS-path length serving as hop count.

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at October 10, 2018 07:04 AM

XKCD Comics

October 09, 2018

ipSpace.net Blog (Ivan Pepelnjak)

Feedback: Ansible for Networking Engineers

One of my subscribers sent me a nice email describing his struggles to master Ansible:

Some time ago I started to hear about Ansible as the new power tool for network engineer, my first reaction was “What the hell is this?” I searched the web and found many blah blahs about it… until I landed on your pages.

He found Ansible for Networking Engineers material sufficient to start an automation project:

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at October 09, 2018 06:55 AM

October 08, 2018

Dyn Research (Was Renesys Blog)

Last Month in Internet Intelligence: September 2018

Over the course of a given month, hundreds of Internet-impacting “events” are visible within the Oracle Internet Intelligence Map. Many are extremely short-lived, lasting only minutes, while others last for hours or days; some have a minor impact on a single metric, while others significantly disrupt all three metrics. In addition, for some events, the root cause is publicly available/known, while for other events, digging into the underlying data helps us make an educated guess about what happened. Ultimately, this creates challenges in separating the signal from the noise, triaging and prioritizing that month’s events for review in this blog post.

Having said that, in September we observed Internet disruptions due to exams, power outages, extreme weather, and submarine cable issues, as well as a number of others with unknown causes. Additionally, a third test of nationwide mobile Internet connectivity took place in Cuba.

Cuba

As noted in our August post, ETECSA (the Cuban state telecommunications company) carried out two tests of nationwide mobile Internet connectivity, which were evident as spikes in the DNS query rates from Cuba. In a Facebook post, they noted, “On August 14th was a first test that measured levels of traffic congestion and that audited the network in stress conditions, the second test was made on August 22 and its purpose was to try the portal my cubacel and the short codes for service management.”

The company planned a third test, this one lasting three days from September 8-10, highlighting it in a promotional graphic that was posted on their Facebook page. They noted that this third test “was designed for three days with the purpose of checking traffic management in different structures of the network,” intended to validate optimizations made as a result of the connection difficulties and network congestion that resulted from the August tests.

Similar to the prior tests, Cuba’s DNS Query Rate spikes at 05:00 GMT (midnight local time) on September 8, remaining elevated through the end of the day (local time) on the 10th, when it settles back down into a much lower diurnal pattern. ETECSA’s Facebook post noted that more than 1.5 million people had participated in these tests of nationwide mobile Internet access.

Exams

Similar to actions taken a number of times in the past, Internet connectivity in Iraq was shut down repeatedly between September 1-10 to prevent cheating on nationwide student exams. A published report noted that a statement from the Iraqi Ministry of Communications planned to suspend Internet service between 06:30 and 08:30 (local time).

As seen in the figures below, multi-hour Internet shutdowns were implemented on nine of the 10 days, with September 7 the only exception. Partial drops seen in each metric indicate that the shutdowns were not complete – that is, Internet access remained available across some parts of the country.

Power outage

According to a published report, late in the day on September 6, western and southern regions of Libya, including the capital city of Tripoli, experienced a total blackout. The power outage was reportedly related to the impact of bloody clashes in Tripoli, which prevented repair teams from reaching power stations and grids in the impacted area. The impact of the power outage is evident in the graph below, showing a drop in the traceroute completion rate metric starting late in the day (GMT) on September 6, lasting for approximately half a day. A minor perturbation in the BGP routes metric is evident as well. Ongoing turmoil in the country also impacted Internet availability in Libya several days later, with another multi-hour drop in the traceroute completion rate evident on September 9.

Typhoon Mangkhut

After forming on September 7 as a tropical depression in the Pacific Ocean, Typhoon Mangkhut quickly strengthened and moved west towards Micronesia. On September 10, the typhoon moved across both the Northern Mariana Islands and Guam, causing damage with winds in excess of 100 miles per hour.

As shown in the figure below, the storm impacted Internet connectivity in the Northern Mariana Islands, with the traceroute completion rate metric declining around mid-day local time (the Islands are GMT+10) on September 10, with the DNS Query Rate also lower than normal for that time of day. The following figure shows that Internet connectivity on Guam was impacted several hours later, with the traceroute completion rate metric declining later in the day local time (Guam is also GMT+10) on September 10. It also appears that there was a slight impact to the number of routed networks at around the same time, with a concurrent drop in the DNS query rate metric.

By the next morning, the storm had reportedly moved past the islands, although the calculated metrics took several days to return to “normal” levels.

Figures below illustrate the impact that Typhoon Mangkhut had on local network providers. AS7131 (IT&E Overseas) has prefixes that are routed on both Guam and the Northern Mariana Islands. The number of completed traceroutes to endpoints in this autonomous system begin to drop mid-day local time, likely due to power outages or damage to local infrastructure alongside the arrival of the storm. Interestingly, a number of traceroutes started to pass through AS9304 (Hutchinson) around the same time as well, but it isn’t clear if this is simply coincidental, or if traffic through this provider was increased as part of a disaster recovery process. The number of completed traceroutes to endpoints in AS9246 (Teleguam Holdings) also began to decline later in the day local time on September 10, also likely due to local power outages or infrastructure damage. Interestingly, while some endpoints across both networks became unreachable as a result of Typhoon Mangkhut, there did not appear to be a meaningful impact to measured latency, which remained within the ranges seen during the days ahead of the storm.

Submarine cables

On September 4, Australian telecommunications infrastructure provider Vocus posted an “Incident Summary” regarding a suspected fault in the SeaMeWe-3 (SMW3) cable between Perth, Australia and Singapore.

The figure below (from one of Oracle’s commercial Internet Intelligence tools) illustrates the impact of the cable failure on the median latency of paths between Singapore and Perth – specifically, from a measurement in cloud provider Digital Ocean’s Singapore location to endpoints in Perth on selected Internet service providers. Among the measured providers, latencies increased 3-4x on September 2/3, stabilizing by the 4th.

The initial incident summary published by Vocus noted that similar faults seen in the past have taken upwards of 4 weeks to restore. However, on September 5, an article in ZDNet revealed that Vocus pressed the new Australia Singapore Cable (ASC) into service two weeks ahead of schedule, shifting customer traffic onto it from the damaged SMW3. The figures below, generated by internal Internet Intelligence tools, illustrate how failure of the SMW3 cable caused measured latencies to increase, and how they returned to previous levels when the ASC cable was activated and traffic was shifted onto it.

On September 10, @RightsCon, a Twitter account associated with Internet advocacy group AccessNow, posted a Tweet looking for verification of Internet disruptions in several countries.

<script async="async" charset="utf-8" src="https://platform.twitter.com/widgets.js"></script>

Doug Madory, director of Internet analysis on Oracle’s Internet Intelligence team, replied, noting that “Internet connectivity issues in Angola was due to problems on the WACS submarine cable.” The figure below shows the impact of the submarine cable issues that occurred several days earlier, with disruptions evident in both the traceroute completion ratio and BGP routes metrics on September 7.

The disruptions reviewed above were caused by known issues with the SMW-3 and WACS submarine cables. However, September also saw a number of additional disruptions that may have been related to issues with submarine cable connectivity, but such correlations were not definitively confirmed.

On September 5-6, a significant Internet disruption was observed in Comoros, impacting all three metrics as seen in the figure below. A complete outage was observed at Comores Telecom, with the number of completed traceroutes to endpoints in that network dropping to zero during the disruption. As the figure below shows, prior to the outage, traceroutes reached Comores Telecom primarily through Level 3 and BICS, but went through West Indian Ocean Cable Company for approximately three days after the outage, before transiting Level 3 and BICS once again.

International connectivity to Comoros is carried over both the Eastern Africa Submarine System (EASSy) as well as FLY-LION3, although the latter only connects Comoros to Mayotte. The observed shift in upstream providers could be indicative of a problem on one submarine cable, forcing traffic onto the other until issues with the primary cable were resolved.

Later in the month, Caribbean islands Saint Martin and Saint Barthelemy both saw disruptions that lasted for approximately 24 hours across September 28-29, as evident in the declines seen in the traceroute completion rate and BGP routes metrics shown in the figures below. (Because the disruption occurred on Friday night/Saturday, DNS query rates were lower anyway, so the evidence of the disruption in that metric would be harder to see.) Both islands are connected to Southern Caribbean Fiber, with a spur running from Saint Martin to Saint Barthelemy.

On September 25, 26, and 30, disruptions to Internet connectivity in American Samoa were evident in the Internet Intelligence Map, as shown in the figure below. Brief drops across all three metrics were observed on the 25th and 26th, while multiple drops were observed on the 30th. Internal tools indicated that the underlying issues impacted BlueSky Communications/SamoaTel, and the issues can be seen in the traceroutes going through Hurricane Electric at times that align with the issues seen in the American Samoa graph. The territory has been connected to the American Samoa-Hawaii (ASH) submarine cable for nearly a decade, but also connected to the Hawaiki cable earlier this year. BlueSky appears to connect with Hurricane Electric in San Jose, California, but it isn’t clear which cable carries traffic from that exchange point to the island.

Conclusion

Associating Internet disruptions with an underlying cause can be easy to do when related events are publicly known – severe weather, power outages, civil unrest, and even school exams. In many cases, these disruptions last for hours or days, making it more likely that they will impact Internet connectivity for users in the impacted country. However, for each well-understood disruption, there are dozens more that we observe each month that are brief, partial (not dropping the calculated metrics to zero), and unexplained. Due to their nature, these disruptions may not have a significant impact on user connectivity, which makes finding public commentary (such as news articles or Twitter posts) on them all the more challenging. Using internal Internet infrastructure analysis tools and public tools like Telegeography’s Submarine Cable Map, we can surmise what may have caused the disruption, but the actual root cause remains unknown.

by David Belson at October 08, 2018 05:00 PM

My Etherealmind

Unconscious Competence and Imposter Syndrome In A Spa in Silicon Valley

I have reached the conclusion that “imposter syndrome” is healthy. You know you have more to learn and will act correctly to ensure that any weakness is addressed and managed. The problems start at unconscious incompetence however – I attempt to explain it here:  

The post Unconscious Competence and Imposter Syndrome In A Spa in Silicon Valley appeared first on EtherealMind.

by Greg Ferro at October 08, 2018 02:00 PM

ipSpace.net Blog (Ivan Pepelnjak)

VXLAN and EVPN on Hypervisor Hosts

One of my readers sent me a series of questions regarding a new cloud deployment where the cloud implementers want to run VXLAN and EVPN on the hypervisor hosts:

I am currently working on a leaf-and-spine VXLAN+ EVPN PoC. At the same time, the systems team in my company is working on building a Cloudstack platform and are insisting on using VXLAN on the compute node even to the point of using BGP for inter-VXLAN traffic on the nodes.

Using VXLAN (or GRE) encap/decap on the hypervisor hosts is nothing new. That’s how NSX and many OpenStack implementations work.

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at October 08, 2018 07:53 AM

XKCD Comics

October 06, 2018

ipSpace.net Blog (Ivan Pepelnjak)

Worth Reading: The Fragile Engineers

Ethan Banks wrote an awesome blog post on the characteristics of fragile engineers (most of them probably being expert beginners). I can’t help but ponder how often I behave like one…

by Ivan Pepelnjak (noreply@blogger.com) at October 06, 2018 02:34 PM

October 05, 2018

The Networking Nerd

The Why of Security

<figure class="wp-block-image"></figure>

Security is a field of questions. We find ourselves asking
all kinds of them all the time. Who is trying to get into my network? What are
they using? How can I stop them? But I feel that the most important question is
the one we ask the least. And the answer to that question provides the
motivation to really fix problems as well as conserving the effort necessary to
do so.

The Why’s Old Sage

If you’re someone with kids, imagine a conversation like
this one for a moment:

Your child runs into the kitchen with a lit torch in their hands and asks “Hey, where do we keep the gasoline?”

Now, some of you are probably laughing. And some of you are
probably imagining all kinds of crazy going on here. But I’m sure that most of
you probably started asking a lot of questions like:

  • – Why does my child have a lit torch in the house?
  • – Why do they want to know where the gasoline is?
  • – Why do they want to put these two things together?
  • – Why am I not stopping this right now?

Usually, the rest of the Five Ws follow soon afterward. But Why is the biggest question. It provides motivation and understanding. If your child had walked in with a lit torch it would have triggered one set of responses. Or if they had asked for the location of combustible materials it might have elicited another set. But Why is so often overlooked in a variety of different places that we often take it for granted.

Imagine this scenario:

An application developer comes to you and says, “I need to you open all the ports on the firewall and turn off the AV on all the machines in the building.”

You’d probably react with an immediate “NO”. You’d
get cursed at and IT would live another day as the obstruction in “real
development” at your company. As security pros, we are always trying to
keep things safe. Sometimes that safety means we must prevent people from
hurting themselves, as in the above example. But, let’s apply the Why here:

  • – Why do they need all the firewall ports opened?
  • – Why does the AV need to be disabled on every machine?
  • – Why didn’t they tell me about this earlier instead of coming to me right now?

See how each Why question has some relevance to things? If
you start asking, I’d bet you would figure some interesting things out very
quickly. Such as why the developer doesn’t know what ports their application
uses. Or why they don’t understand how AV heuristics are triggered by software
that appears to be malicious. Or the value of communicating to the security team
ahead of time for things that are going to be big requests!

Digging Deeper

It’s always a question of motivation. More than networking
or storage or any other facet of IT, security must understand Why. Other
disciplines are easy to figure out. Increased connectivity and availability.
Better data retention and faster recall.
But security focuses on safety. On restriction. And allowing people to
do things against their better nature means figuring out why they want to do
them in the first place.

Too much time is spent on the How and the What. If you look
at the market for products, they all focus on that area. It makes sense at a
basic level. Software designed to stop people from stealing your files is
necessarily simple and focused on prevention, not intent. It does the job it
was designed to do and no more. In other cases, the software could be built
into a larger suite that provides other features and still not address the
intent.

And if you’ve been following along in security in the past
few months, you’ve probably seen the land rush of companies talking about artificial
intelligence (AI) in their solutions. RSA’s show floor was full of companies
that took a product that did something last year and now magically does the same
thing this year but with AI added in! Except, it’s not really AI.

AI provides the basis for intent. Well, real AI does at
least. The current state of machine learning and advanced analytics provides a
ton of data (the what and the who) but fails to provide the intent (the why).
That’s because Why is difficult to determine. Why requires extrapolation and
understanding. It’s not as simple as just producing output and correlating.
While machine learning is really good at correlation, it still can’t make the
leap beyond analysis.

That’s why humans are going to be needed for the foreseeable
future in the loop. People provide the Why. They know to ask beyond the data to
figure out what’s going on behind it. They want to understand the challenges.
Until you have a surefire way of providing that capability, you’re never going to
be able to truly automate any kind of security decision making system.


Tom’s Take

I’m a huge fan of Why. I like making people defend their decisions.
Why is the one question that triggers deeper insight and understanding. Why concentrates on things that can’t be programmed or automated. Instead, why gives us the data we really need to understand the context of all the other decisions that get
made. Concentrating on Why is how we can provide invaluable input into the
system and ensure that all the tools we’ve spent thousands of dollars to
implement actually do the job correctly.

by networkingnerd at October 05, 2018 02:18 PM

XKCD Comics

October 04, 2018

Network Design and Architecture

Orhan Ergun Training and Consultancy Company has CCIE SP Training as well !

Hi everyone ! I am glad to announce that we started CCIE SP (Service Provider) Training.   The CCIE Service Provider Lab bootcamp has been developed by Orhan Ergun LLC. as a preparation tool for the CCIE Service Provider v4.1 Lab Exam.   Each session will start by reviewing the concepts and fundamentals of the related […]

The post Orhan Ergun Training and Consultancy Company has CCIE SP Training as well ! appeared first on Cisco Network Design and Architecture | CCDE Bootcamp | orhanergun.net.

by Orhan Ergun at October 04, 2018 06:04 PM

My Etherealmind
ipSpace.net Blog (Ivan Pepelnjak)

Leaf-and-Spine Fabric Myths (Part 1)

Apart from the “they have no clue what they’re talking about” observation, Evil CCIE left a long list of leaf-and-spine fabric myths he encountered in the wild in a comment on one of my blog posts. He started with:

Clos fabric (aka Leaf And Spine fabric) is a non-blocking fabric

That was obviously true in the days when Mr. Clos designed the voice switching solution that still bears his name. In the original Clos network every voice call would get a dedicated path across the fabric, and the number of voice calls supported by the fabric equaled the number of alternate end-to-end paths.

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at October 04, 2018 06:49 AM

October 03, 2018

Networking Now (Juniper Blog)

Secondhand IoT Devices, Firsthand Threats to Security

From thermostats and voice assistants to fitness trackers and toys, smart and internet-connected gadgets are now found in nearly every room of most homes. It’s tempting to try and save money on these devices by capitalizing on less expensive secondhand products sold by third-parties like eBay, Craigslist and even friends or family. But, you may want to think twice before doing this.

 

by lpitt at October 03, 2018 01:00 PM

ipSpace.net Blog (Ivan Pepelnjak)

Network Automation Development Environments

Building the network automation lab environment seems to be one of the early showstoppers on everyone’s network automation journey. These resources might help you get started:

Hint: after setting up your environment, you might want to enroll into the Spring 2019 network automation course ;)

by Ivan Pepelnjak (noreply@blogger.com) at October 03, 2018 06:41 AM

XKCD Comics

October 02, 2018

About Networks

Python script and multiple API calls practice with Google and Twitter

After doing a lot of Python tutorials and spending time “playing” with Postman and API calls during my journey to network programmability and automation, I wanted to make a concrete example, from zero to a visible result. Since I love photography and my country is full of beautiful but not well-known places, I chose to
Read More »

The post Python script and multiple API calls practice with Google and Twitter appeared first on AboutNetworks.net.

by Jerome Tissieres at October 02, 2018 09:54 AM

ipSpace.net Blog (Ivan Pepelnjak)

Network Troubleshooting Guidelines

It all started with an interesting weird MLAG bugs discussion during our last Building Next-Generation Data Center online course. The discussion almost devolved into “when in doubt reload” yammering when Mark Horsfield stepped in saying “while that may be true, make sure to check and collect these things before reloading”.

I loved what he wrote so much that I asked him to turn it into a blog post… and he made it even better by expanding it into generic network troubleshooting guidelines. Enjoy!

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at October 02, 2018 06:00 AM

October 01, 2018

ipSpace.net Blog (Ivan Pepelnjak)

Don't Make a Total Mess When Dealing with Exceptions

A while ago I had the dubious “privilege” of observing how my “beloved” airline Adria Airways deals with exceptions. A third-party incoming flight was 2.5 hours late and in their infinite wisdom (most probably to avoid financial impact) they decided to delay a half-dozen outgoing flights for 20-30 minutes while waiting for the transfer passengers.

Not surprisingly, when that weird thingy landed and they started boarding the outgoing flights (now all at the same time), the result was a total mess with busses blocking each other (this same airline loves to avoid jet bridges).

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at October 01, 2018 07:21 AM

Potaroo blog

DOH!

If you had the opportunity to re-imagine the DNS, what might it look like? Normally this would be an idle topic of speculation over a beer or two, but maybe there’s a little more to the question these days. We are walking into an entirely new world of the DNS when we start to think about exactly might be possible when we look at DNS over HTTPS, or DOH.

October 01, 2018 01:59 AM

XKCD Comics

September 28, 2018

Networking Now (Juniper Blog)

New Worm Leverages Open Source Tools and GitHub to Build its Botnet

On September 19, 2018, Juniper Threat Labs discovered a new wave of attacks from a cryptominer worm targeting Linux servers, home networking devices, and IOT devices. These attacks were bundled with a number of exploits to spread rapidly and widely. The attack has three parts: infection, mining, and spreading.

by AsherLangton at September 28, 2018 08:06 PM

ipSpace.net Blog (Ivan Pepelnjak)

Prepare for Job Interview with ipSpace.net Subscription

Did you know that many networking engineers use ipSpace.net webinars (and subscription) to prepare for the job interviews?

Here’s one of their success stories (name changed for obvious reasons):

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at September 28, 2018 08:15 AM

Networking Now (Juniper Blog)

Another Cryptomining Campaign

On September 19, 2018, Juniper Threat Labs discovered a new wave of attacks from a cryptominer worm targeting Linux servers, home networking devices, and IOT devices. These attacks were bundled with a number of exploits to spread rapidly and widely. The attack has three parts: infection, mining, and spreading.

by AsherLangton at September 28, 2018 04:01 AM

The Networking Nerd

Outing Your Outages

How are you supposed to handle outages? What happens when everything around you goes upside down in an instant? How much communication is “too much”? Or “not enough”? And is all of this written down now instead of being figured out when the world is on fire?

Team Players

You might have noticed this week that Webex Teams spent most of the week down. Hard. Well, you might have noticed if you used Microsoft Teams, Slack, or any other messaging service that wasn’t offline. Webex Teams went offline about 8:00pm EDT Monday night. At first, most people just thought it was a momentary outage and things would be back up. However, as the hours wore on and Cisco started updating the incident page with more info it soon became apparent that Teams was not coming back soon. In fact, it took until Thursday for most of the functions to be restored from whatever knocked them offline.

What happened? Well, most companies don’t like to admit what exactly went wrong. For every CloudFlare or provider that has full disclosures on their site of outages, there are many more companies that will eventually release a statement with the least amount of technical detail possible to avoid any embarrassment. Cisco is currently in the latter category, with most guesses landing on some sort of errant patch that mucked things up big time behind the scenes.

It’s easy to see when big services go offline. If Netflix or Facebook are down hard then it can impact the way we go about our lives. On the occasions when our work tools like Slack or Google Docs are inoperable it impacts our productivity more than our personal pieces. But each and every outage does have some lessons that we can take away and learn for our own IT infrastructure or software operations. Don’t think that companies that are that big and redundant everywhere can’t be affected by outages regularly.

Stepping Through The Minefield

How do you handle your own outage? Well, sometimes it does involve eating some humble pie.

  1. Communicate – This one is both easy and hard. You need to tell people what’s up. You need to let everyone know things are working right and you’re working to make them right. Sometimes that means telling people exactly what’s affected. Maybe you can log into Facebook but not Chat or Messages. Tell people what they’re going to see. If you don’t communicate, you’re going to have people guessing. That’s not good.
  2. Triage – Figure out what’s wrong. Make educated guesses if nothing stands out. Usually, the culprits are big changes that were just made or perhaps there is something external that is affecting your performance. The key is to isolate and get things back as soon as possible. That’s why big upgrades always have a backout plan. In the event that things go sideways, you need to get back to functional as soon as you can. New features that are offline aren’t as good as tried-and-true stuff that’s reachable.
  3. Honest Post-Mortem – This is the hardest part. Once you have things back in place, you have to figure out why the broke. This is usually where people start running for the hills and getting evasive. Did someone apply a patch at the wrong time? Did a microcode update get loaded to the wrong system? How can this be prevented in the future? The answers to these questions are often hard to get because the people that were affected and innocent often want to find the guilty parties and blame someone so they don’t look suspect. The guilty parties want to avoid blame and hide in public with the rest of the team. You won’t be able to get to the bottom of things unless you find out what went wrong and correct it. If it’s a process, fix it. If it’s a person, help them. If it’s a strange confluence of unrelated events that created the perfect storm, make sure that can never happen again.
  4. Communicate (Again) – This is usually where things fall over for most companies. Even the best ones get really good at figuring out how to prevent problems. However, most of them rarely tell anyone else what happened. They hide it all and hope that no one ever asks about anything. Yet, transparency is key in today’s world. Services that bounce up and down for no reason are seen as unstable. Communicating as to their volatility is the only way you can make sure that people have faith that they’re going to stay available. Once you’ve figure out what went wrong and who did it, you need to tell someone what happened. Because the alternative is going to be second guessing and theories that don’t help anyone.

Tom’s Take

I don’t envy the people at Cisco that spent their entire week working to get Webex Teams back up and running. I do appreciate their work. But I want to figure out where they went wrong. I want to learn. I want to say to myself, “Never do that thing that they did.” Or maybe it’s a strange situation that can be avoided down the road. The key is communication. We have to know what happened and how to avoid it. That’s the real learning experience when failure comes around. Not the fix, but the future of never letting it happen again.

by networkingnerd at September 28, 2018 01:52 AM

XKCD Comics

September 27, 2018

ipSpace.net Blog (Ivan Pepelnjak)

Implications of Valley-Free Routing in Data Center Fabrics

As I explained in a previous blog post, most leaf-and-spine best-practices (as in: what to do if you have no clue) use BGP as the IGP routing protocol (regardless of whether it’s needed) with the same AS number shared across all spine switches to implement valley-free routing.

This design has an interesting consequence: when a link between a leaf and a spine switch fails, they can no longer communicate.

For example, when the link between L1 and C1 in the following diagram fails, there’s no connectivity between L1 and C1 as there’s no valley-free path between them.

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at September 27, 2018 07:05 AM

September 26, 2018

ipSpace.net Blog (Ivan Pepelnjak)

Infrastructure-as-Code Tools

This is the fourth blog post in “thinking out loud while preparing Network Infrastructure as Code presentation for the network automation course” series. Previous posts: Network-Infrastructure-as-Code Is Nothing New, Adjusting System State and NETCONF versus REST API.

Dmitri Kalintsev sent me a nice description on how some popular Infrastructure-as-Code (IaC) tools solve the challenges I described in The CRUD Hell section of Infrastructure-as-Code, NETCONF and REST API blog post:

Read more ...

by Ivan Pepelnjak (noreply@blogger.com) at September 26, 2018 06:56 AM

XKCD Comics

September 25, 2018

Potaroo blog

Measuring the KSK Roll

It has been a trade-off between waiting long enough to have the key sentinel mechanism deployed in sufficient volume in resolvers to generate statistically valid outcomes and yet start this measurement prior to the planned roll of the KSK on 11th October 2018. These are early results, and reflect less than one week of measurement, but some strong signals are evident in the data.

September 25, 2018 09:24 PM

ipSpace.net Blog (Ivan Pepelnjak)

Upcoming Webinars and Events: October 2018

The fast pace of webinars continues in October 2018:

There are no on-site events planned until early December:

You can attend all upcoming webinars with an ipSpace.net webinar subscription. Online courses and on-site events require separate registration.

by Ivan Pepelnjak (noreply@blogger.com) at September 25, 2018 01:36 PM