Hacker News new | ask | show | jobs
by eudhxhdhsb32 494 days ago
Mtr is indeed nice.

One thing I've not understood is why will some hops have consistently lower ping times than hops farther down the chain in the same trace?

Is it indicating that the router is faster at forwarding packets than responding to ping requests?

5 comments

This is always worth a (re)read to understand traceroute:

https://archive.nanog.org/sites/default/files/traceroute-201...

^ This should be required reading for anyone using traceroute.
> Is it indicating that the router is faster at forwarding packets than responding to ping requests?

Exactly this. In most “real” routers, forwarding (usually) happens in the “data plane”. It’s handled by an ASIC that has a routing table accessible to it in RAM. A packet comes in on an interface, a routing decision is made, and it goes out another interface - all of this happens with dedicated hardware. Pings (ICMP Echo requests), however, get forwarded by this ASIC to a local CPU, where they are handled by software (in the “control plane”).

You’re really seeing different response times from the two control planes - one may be more loaded or less powerful than another, regardless of the capacity of their data planes.

This is also why you may see packet loss at one particular hop but then responses from hops beyond it. The hop with packet loss in this case probably has an overwhelmed CPU, rather than indicating that a particular network link has packet loss. mtr reporting packet loss at a hop is only reliable if every hop after it has similar packet loss.

Maybe the only thing I've explained more in my career than this is why it's ok that your Linux box has no "free" memory.

It also doesn’t help that mtr ICMP handling code is just bad, it disregards packets that actually arrive as a loss.
I retract my previous statement about bad ICMP code (and other comments where I posted it). I was under the impression that mtr was actually doing ICMP echo requests to individual hops with decreasing TTLs, but it's just relying on the TTL being generated for the end to end echo request. However, this is just still a terrible indicator for packet loss, for example by wifi router heavily deprioritises generating TTL exceeded packets but will respond to a flood of echo requests no issue. My main contention is the per hop loss indicator is a useless and misleading metric and you should be measuring these things end to end with traceroute and ping separately.
Traceroute doesn't use ping requests except with the old Windows binary. Usually it uses "Time-to-live (TTL) exceeded in transit" messages.

Beyond that technicality, your guess is often right... Routers will frequently prioritize forwarding packets over sending the TTL exceeded packets tools like MTR use to measure response times.

Also you can easily have the TTL expired message going via a different route on the return path (and indeed the same applies with your normal connections, asymetric routing can be a pain - especially in networks with rpf issues (multicast ones are a particular pain point), and with stateful firewalls, but most of the time it's fine. You just need to be aware.

Obviously you know, but for anyone else reading, a modern traceroute tool (like mtr) can send icmp, udp or tcp, on generic or specific ports. Indeed the default for mtr on my laptop is to use icmp.

Most likely, it's as you described, router N forwards packets much faster than it generates icmp ttl exceeded, and router N+1 is nearby and generates icmp faster.

However, it could also be the case that the routing back to you is significantly different, so you can have a much longer path to you from router N than router N+1.

This is more likely to happen on routes that cross oceans. Say you're tracing from the US to Brazil. If router N and N+1 are both in Brazil, but N sends return packets through Europe and N+1 sends through Florida, N+1 returns will arrive significantly sooner.

> Is it indicating that the router is faster at forwarding packets than responding to ping requests?

I believe most of the time this is the reason indeed. Answering an ICMP error to a TTL expiration or to an echo request is very low priority.

This latency in error message generation may even be a better signal of the router load than the latency of the actualy trip through it.