Have you ever encountered a situation wherein you did a continuous ping to a network device and saw packet loss, but did another continuous ping through it towards a device behind it and saw no packet loss? This can be a more common scenario in network troubleshooting than you might think.
Let's look at the below example diagram.
Here's the scenario.
As you can see, we have 2 hosts which are the client and server, then there's a router named R1 between them.
- Pings from client to R1 (10.1.1.1) has packet loss (orange dotted line)
- Pings from client to server (10.20.20.1) through R1 has no packet loss (green dotted line)
What could be going on here?
Here's the explanation.
The key point to understand is that there's a difference between how traffic is handled if it's TO a router or THROUGH a router (transit traffic). This is the reason why you might see different results when pinging TO a router vs pinging THROUGH a router towards an actual destination end host (like the host server in this case). There's a difference in treatment between these 2 types of traffic.
To understand this difference here's a diagram of the separation of Control plane and the Data plane in a router or a network device.
As you can see in this diagram, traffic destined to the router goes to the Control plane and traffic destined to somewhere else goes to the Data plane. This traffic destined to somewhere else is obviously what makes up most of the traffic being handled by the router because that's what routers do right? It has to move packets towards their destination. The router is meant to be a transit along the path of network traffic.
So back to our scenario. There are 2 typical reasons why you might see packet loss when pinging TO a router.
- First is Control Plane Policing (CoPP). This means there's most likely ICMP rate limiting applied to the network device using CoPP. Basically, the network device prevents excessive pings destined to it. This is a best practice measure to make sure that the CPU resources are utilized properly for important traffic such as control plane traffic (routing protocols, ARP, etc). It's also a security measure to protect the CPU from DoS (Denial of Service) attacks. CoPP, of course, is applied to the control plane and so it will not impact transit traffic (traffic going through the network device).
- The second reason is when ICMP is simply just not a priority. It's quite common that a router will not prioritize replying to ICMP echo (ping) requests especially when it's busy handling route updates, managing neighbor relationships, and other control plane functions. This is not real packet loss but it will show up as a timeout so it will look like packet loss. When performing a continuous ping test, a packet drop pattern of every X amount of pings is a typical indication of this (ex. 1 out of every 20) and it's relatively easy to spot.
The general rule is that when testing for packet loss you can't trust a ping test destined TO the router (or any network device for that matter) if it's showing packet loss. The best way to really test it is to ping THROUGH it. This way you are sure that the drops are going through the network device's forwarding path (Data plane).
This is just a general rule and not meant to be an end all be all. It's good practice to verify if it's indeed real packet loss or not. It is well understood that the primary function of network devices is to move network traffic so this scenario does make sense if you consider that fact.
One could argue that with the processors of modern network devices this shouldn't be an issue. However, you never really know because not all of them are made the same. Plus the fact that it only covers for the second reason (ICMP is not a priority). This kind if testing will always be handy especially if you don't own or manage the device and all you can present are tests.
I also acknowledge the fact that ping (ICMP echo) tests are not really the most reliable type of tests. Ideally, it's better to use probes which simulate real application traffic such as TCP or UDP. But in reality, not everybody has tools with these capabilities with them all the time. If the ping THROUGH the network device clears then there's really nothing else to prove so it's a matter of utilizing what you have and not going over than what is necessary.