Network Troubleshooting

LXD

Network Troubleshooting Tools

by Craig Miller

Technology is great, when it works, but what to do when it doesn't, and you get a helpful message in your browser that says something like "check your connectivity"?

The art of network troubleshooting is breaking the problem in half, and figuring out if it is the problem lies in the close or far half. Doing a binary search for the issue is usually the quickest to resolution.

Using the Modern Internet

In order to use the modern internet, one requires two (2) items:

Sure there can be much more that is required, but in this day in age of NAT-designed networks, Client/Server is king. More specifically, the client is a Web Browser, and the Server is a Web Server serving up tons of Javascript. In this network, only the Server must have a real routable IP address. The Client can be behind many layers of NAT, and often is.

Of course, IPv6 doesn't require NAT, since NAT is an address conservation mechanism. And I suspect the pendulum will swing back someday to peer-to-peer sharing, but for now all we need is an IP address and DNS.

Where's the middle?

If the binary search is the fastest path to resolution, where do we start?

Most will start with ping. And it is useful, but the usefulness decreases immensely when there is no response.

$ ping netsig.makiki.ca
PING netsig.makiki.ca(2607:c000:8011:fd94:216:3eff:fed7:e195 (2607:c000:8011:fd94:216:3eff:fed7:e195)) 56 data bytes
^C
--- netsig.makiki.ca ping statistics ---
6 packets transmitted, 0 received, 100% packet loss, time 5ms

Although it may seem like don't know anything more than we we started, we actually do. We know that DNS is working! The name was looked up and an IP address was returned, but we don't know much more about the problem of why we can't get to the server.

Traceroute shows the path

I find traceroute to be a more useful starting point. Traceroute reports a path that your packets will take to their destination (aka the server)

$ traceroute6 netsig.makiki.ca
traceroute to netsig.makiki.ca (2607:c000:8011:fd94:216:3eff:fed7:e195) from 2001:470:1f12:79e::2, port 33434, from port 36161, 30 hops max, 60 bytes packets
 1  tunnel367664.tunnel.tserv10.par1.ipv6.he.net (2001:470:1f12:79e::1)  3.344 ms  2.631 ms  1.982 ms 
 2  10ge7-3.core1.par2.he.net (2001:470:0:7b::1)  0.868 ms  0.841 ms  19.875 ms 
 3  100ge11-2.core1.nyc4.he.net (2001:470:0:54::1)  71.357 ms  71.225 ms  81.582 ms 
 4  100ge14-1.core1.tor1.he.net (2001:470:0:2dc::2)  100.836 ms  81.038 ms  100.601 ms 
 5  paix.tor.packetflow.ca (2001:504:d:80::25)  81.450 ms  * 86.664 ms 
 6  ae8-0-bdr01-tor2.teksavvy.com (2607:f2c0:ffff:1:3:1:0:130)  117.044 ms  81.773 ms  81.723 ms 
 7  ae4-0-bdr01-wpg.teksavvy.com (2607:f2c0:ffff:1:26::137)  102.953 ms  102.798 ms  109.870 ms 
 8  2607:f2c0:26:28::148 (2607:f2c0:26:28::148)  117.589 ms  117.359 ms  118.345 ms 
 9  ae0-10-lns01-van2.teksavvy.com (2607:f2c0:ffff:4:4::152)  129.573 ms  129.805 ms  129.651 ms 
10  ae0-10-lns01-van2.teksavvy.com (2607:f2c0:ffff:4:4::152)  128.595 ms  128.564 ms  128.421 ms 
11  * * *         
12  * * *         
13  * * *         
^C

traceroute not only proves that DNS is working (similar to ping) but also shows the path of the packet to the destination.

More importantly, it shows that there is IP connectivity to the internet, and backbone networks such as he.net.

The three stars * * * means no response.

To understand how traceroute works, you must understand that the IP header has a TTL (Time to Live in IPv4) or HopCount (in IPv6) field that is designed to remove packets from infinitely cycling in loops on the internet. As a packet passes a router, the TTL or HopCount is decremented, and when it reaches Zero, the packet is dropped.

traceroute sends out a packet toward the destination with unusually small TTL/HopCount. Initially it is set to one (1), then the closest router will decrement it, drop it, and send back an ICMP(6) message to the source stating that it dropped the packet. traceroute listens for the ICMP(6) packets and notes the IP address of the router and prints it on the list as Hop number 1, Then in increases the TTL/HopCount to two (2) and repeats. It actually send 3 packets at each TTL/HopCount and measures the return time (much like ping). When you see the three stars * * * there are no times to be printed.

Looking at your own host

OK, you can't ping, you can't even traceroute, then what? It is time to look at your own host. Is the interface (wired or wireless) up? Did you get an IP address from the DHCP server (or RA from the router for SLAAC)? The ip command can provide you quite a bit of information about your host's networking status.

The IP command

ip is the successor to the venerable ifonfig. And with good reason, as ip can tell you much more about your configuration. It is installed by default on most distros, and usually lives at /sbin/ip.

Note: BSD & MacOS X still use ifconfig. If you are using Windows, then open a command windows and use the ipconfig command.

link status

ip can display the status of the link (Layer 2 in the OSI model), as well as allow configuration of a VLAN based interface. This will show whether your wired/wireless interface is up. To display the the link status use:

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 10:9a:dd:54:f6:34 brd ff:ff:ff:ff:ff:ff
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DORMANT qlen 1000
    link/ether 10:9a:dd:ae:81:77 brd ff:ff:ff:ff:ff:ff

IP Address

The ip command will also show if your host has any IP addresses. Using the -4 or -6 will limit the output to just that address family.

For Windows: use ipconfig /all

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 10:9a:dd:54:f6:34 brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.15/24 brd 10.1.1.255 scope global eth0
    inet6 2607:c000:815f:5600:e4a1:4d5f:961a:4973/64 scope global temporary dynamic 
       valid_lft 6951sec preferred_lft 1551sec
    inet6 2001:470:1d:489:fd2f:ea14:d171:c541/64 scope global temporary dynamic 
       valid_lft 6951sec preferred_lft 1551sec
    inet6 2607:c000:815f:5600:fd2f:ea14:d171:c541/64 scope global temporary deprecated dynamic 
       valid_lft 6951sec preferred_lft 0sec
    inet6 2001:470:1d:489:487d:35e:3834:c9a/64 scope global temporary deprecated dynamic 
       valid_lft 6951sec preferred_lft 0sec
    inet6 2607:c000:815f:5600:487d:35e:3834:c9a/64 scope global temporary deprecated dynamic 
       valid_lft 6951sec preferred_lft 0sec
    inet6 2001:470:1d:489:a0f0:7c93:4135:b344/64 scope global temporary deprecated dynamic 
       valid_lft 6951sec preferred_lft 0sec
    inet6 2607:c000:815f:5600:a0f0:7c93:4135:b344/64 scope global temporary deprecated dynamic 
       valid_lft 6951sec preferred_lft 0sec
    inet6 2607:c000:815f:5600:129a:ddff:fe54:f634/64 scope global dynamic 
       valid_lft 6951sec preferred_lft 1551sec
    inet6 2001:470:1d:489:4d85:44b3:3b87:1513/64 scope global temporary deprecated dynamic 
       valid_lft 6951sec preferred_lft 0sec
    inet6 2001:470:1d:489:5cd0:431a:b989:4517/64 scope global temporary deprecated dynamic 
       valid_lft 6951sec preferred_lft 0sec
    inet6 2001:470:1d:489:a121:bf93:87b8:c125/64 scope global temporary deprecated dynamic 
       valid_lft 6951sec preferred_lft 0sec
    inet6 2001:470:1d:489:c8c8:e6c4:ed49:e502/64 scope global temporary deprecated dynamic 
       valid_lft 6951sec preferred_lft 0sec
    inet6 2001:470:1d:489:129a:ddff:fe54:f634/64 scope global dynamic 
       valid_lft 6951sec preferred_lft 1551sec
    inet6 fe80::129a:ddff:fe54:b634/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DORMANT qlen 1000
    link/ether 10:9a:dd:ae:81:77 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::129a:ddff:feae:8177/64 scope link 
       valid_lft forever preferred_lft forever

As you can see, it is possible to have many IPv6 addresses on a single interface. It is also possible to have multiple different IPv6 prefixes active on the same interface, but you don't need to worry which one is going to be used, there is a source address selection algorithm (RFC 6724) that will do that for you.

IP Routing

OK, so your link is up, you have an IP address, but you still can't ping or traceroute. What now?

It is possible that your host does not have a default route. A default route is the route of last resort. Most hosts don't run complex routing tables (unless you are running VMs). So usually the routing table will be just a few lines, and there is one for IPv4 and IPv6.

$ ip -4  route
default via 10.1.1.1 dev eth0  metric 100 
10.1.1.0/24 dev eth0  proto kernel  scope link  src 10.1.1.15 

The IPv6 route table displays the default route with a link-local next hop

$ ip -6  route
2001:470:1d:489::/64 dev eth0  proto kernel  metric 256  expires 6865sec mtu 1280
2607:c000:815f:5600::/64 dev eth0  proto kernel  metric 256  expires 6865sec
fe80::/64 dev eth0  proto kernel  metric 256  mtu 1280
fe80::/64 dev eth1  proto kernel  metric 256 
default via fe80::224:a5ff:fee1:7ca dev eth0  proto kernel  metric 1024  expires 1464sec mtu 1280 hoplimit 64

Checking out your Router's RAs

While ip will tell you the configuration of the host, rdisc6 will tell you the configuration of your router, or at least what it is sending out as Router Advertisements (RAs). RAs send out prefixes, and controls whether clients will start DHCPv6 clients (RFC 8415), with the A, M, and O flags. rdisc6 will make a router solicitation (RS), and print out the RA in response.

rdisc6 is part of the ndisc6 package. Install it as you would any package on your Linux Distro.

$ rdisc6 eth0
Soliciting ff02::2 (ff02::2) on eth0...

Hop limit                 :           64 (      0x40)
Stateful address conf.    :           No
Stateful other conf.      :          Yes
Router preference         :       medium
Router lifetime           :         1800 (0x00000708) seconds
Reachable time            :  unspecified (0x00000000)
Retransmit time           :  unspecified (0x00000000)
 Source link-layer address: 00:24:A5:E1:07:CA
 MTU                      :         1280 bytes (valid)
 Prefix                   : 2607:c000:815f:5600::/64
  Valid time              :         7200 (0x00001c20) seconds
  Pref. time              :         1800 (0x00000708) seconds
 Prefix                   : 2001:470:1d:489::/64
  Valid time              :         7200 (0x00001c20) seconds
  Pref. time              :         1800 (0x00000708) seconds
 Route                    : 2607:c000:815f:5600::/56
  Route preference        :       medium
  Route lifetime          :         7200 (0x00001c20) seconds
 Recursive DNS server     : 2607:c000:815f:5600::1
  DNS server lifetime     :         1800 (0x00000708) seconds
 from fe80::224:a5ff:fee1:7ca

RAs not only include prefixes to be used by the host in SLAAC (Stateless Address Auto Configuration), but also the IP address of a DNS server to use (RDNSS), as well as a DNS search domain (DNSSL), freeing the host from the bondage of a DHCP server.


1. Hands On - IP Troubleshooting

Use the above information to determine the following:












DNS Troubleshooting

Continuing the troubleshooting theme, you can ping, you have an IP address, but you can't resolve human host names into IP addresses.

This can appear as if you don't have connectivity to the internet, but in fact, you just can't resolve host names.

What DNS server is your host using?

On a Linux system, traditionally the /etc/resolv.conf file contains the IP address of the DNS server.

$ cat /etc/resolv.conf
# Generated by dhcpcd from br0.dhcp, br0.dhcp6, br0.ra
domain hoomaha.net
nameserver 10.1.1.1
nameserver 2001:db8:8011:fd11::1

However, systemd has other ideas about resolving hosts, and if you see 127.0.0.53 in /etc/resolv.conf then systemd has taken over your DNS resolution.

If you are using a Linux Distro with systemd then you will have to use the systemd-resolve command to determine what DNS server it is using:

$ systemd-resolve --status
Global
       LLMNR setting: no                    
MulticastDNS setting: no                    
  DNSOverTLS setting: no                    
      DNSSEC setting: no                    
    DNSSEC supported: no                    
  Current DNS Server: 192.168.215.1         
         DNS Servers: 192.168.215.1         
                      2001:db8:8011:fd11::1
          DNS Domain: hoomaha.net           
          DNSSEC NTA: 10.in-addr.arpa       
                      16.172.in-addr.arpa   
                      168.192.in-addr.arpa  
                      17.172.in-addr.arpa   
                      18.172.in-addr.arpa 
                      ...

DNS not working, use host and dig

Does the host have IP connectivity to the DNS server? Try ping the DNS server. If that doesn't work go back and check your IP connectivity troubleshooting. If so....

Is the DNS server (as seen above) working? A quick check with host will determine if you DNS server it working.

$ host one.one.one.one
one.one.one.one has address 1.0.0.1
one.one.one.one has address 1.1.1.1
one.one.one.one has IPv6 address 2606:4700:4700::1111
one.one.one.one has IPv6 address 2606:4700:4700::1001

If there is no answer, then add the IP address of another known DNS server such as 1.1.1.1

$ host one.one.one.one 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases: 

one.one.one.one has address 1.0.0.1
one.one.one.one has address 1.1.1.1
one.one.one.one has IPv6 address 2606:4700:4700::1111
one.one.one.one has IPv6 address 2606:4700:4700::1001

Note that in the second example, the host command is making a DNS request directly to 1.1.1.1 rather than to your local DNS server. This test proves that your Linux host can resolve DNS names, but your local DNS server can't. If you are in a hurry, just change your /etc/resolv.conf to point to 1.1.1.1 (Cloudflare's DNS server) or 2606:4700:4700::1111 if you are an IPv6-only network.

Using dig trace

A few months ago, we learned the ins and outs of dig. I find that dig usually provides more information that I want. However in the scenario where your local DNS server is resolving your local DNS names, but not resolving names on the internet (perhaps a firewall rule is blocking it), the dig +trace command will display a trace of the DNS requests, as the recursive DNS servers do their work.

$ dig +trace one.one.one.one

; <<>> DiG 9.16.1-Ubuntu <<>> +trace one.one.one.one
;; global options: +cmd
.           7184    IN  NS  l.root-servers.net.
.           7184    IN  NS  a.root-servers.net.
.           7184    IN  NS  c.root-servers.net.
.           7184    IN  NS  k.root-servers.net.
.           7184    IN  NS  e.root-servers.net.
.           7184    IN  NS  d.root-servers.net.
.           7184    IN  NS  b.root-servers.net.
.           7184    IN  NS  m.root-servers.net.
.           7184    IN  NS  j.root-servers.net.
.           7184    IN  NS  h.root-servers.net.
.           7184    IN  NS  f.root-servers.net.
.           7184    IN  NS  i.root-servers.net.
.           7184    IN  NS  g.root-servers.net.
;; Received 262 bytes from 127.0.0.53#53(127.0.0.53) in 0 ms

one.            172800  IN  NS  a.nic.one.
one.            172800  IN  NS  b.nic.one.
one.            172800  IN  NS  c.nic.one.
one.            172800  IN  NS  d.nic.one.
one.            86400   IN  DS  14131 8 1 8C04B443EE763B8B67CDF0DB0BBC832E24F560EE
one.            86400   IN  DS  14131 8 2 8D11FF81A0E9BCC2719695CBE4D585B47AA3BDE6CD28C5AC6E02BD91 9CA9B9E0
one.            86400   IN  RRSIG   DS 8 1 86400 20210223210000 20210210200000 42351 . iNKofj5F/gdOI4s/eMeRHym9RARgdtaPzxDChhgxti6xc5x1P/TNAkQH RHZvH5wxULDzfPVm0mJ8NAOd0egJDgQs1OPGgjaxA535Vhlo7lq/Onb+ SkGsTUkPeiUhM13APhXBcTyWVbE8kPfy8g7bmPi7Ioi3RrotNVpp6Hjs m4wLt1iNSHUDznrfD/7GbkiNZuuNptapm48Rr64oEsJiybsecsP8r9rm 8FDbKrdhHDEYp/sytI5RIvr0yJlCb9TEao8G1nck8wcMeg6Wsxn8kKHs IdxDLTBwshNN9ppdwzb+sNyNpeNUQsAbMykYSsBjUVyZP6Lv1ac5vieK 2/Leww==
;; Received 659 bytes from 2001:500:a8::e#53(e.root-servers.net) in 35 ms

one.one.        3600    IN  NS  a.b-one-dns.net.
one.one.        3600    IN  NS  b.b-one-dns.net.
one.one.        3600    IN  DS  53074 13 2 86F2929EE3E5E501032B6DC94841A4A056A2D2876CABCF46A5F8907E B4917782
one.one.        3600    IN  RRSIG   DS 8 2 3600 20210308200131 20210206190132 29100 one. m2h/vPZl2vOLkqXqOZaWELFNMGM2bMEagv/gOwUo3Vg1gWD7fpP4w81y L1UJydVYMPdU/Ir+zcusiiIXK6KcygDwJDl2yyH1CdVxmbl0oSCkqK3I QSzuj+MmKQNKuF9eOywSJRfEynmPsWmOkXz29s54HrTxAjkgoWpxP2PS YLRwTSkI0N5vFH6M1oiRYWIP8bztuze00vkLRLXuuTkMXg==
;; Received 339 bytes from 37.209.194.9#53(b.nic.one) in 67 ms

one.one.one.        3600    IN  NS  jean.ns.cloudflare.com.
one.one.one.        3600    IN  NS  fred.ns.cloudflare.com.
70hf8peef8el004f0mr69pinmkfq0qsu.one.one. 300 IN NSEC3 1 0 1 AB 03VKHA5O24MFR8QICMVNJ8C9K3GDOHP3 NS
70hf8peef8el004f0mr69pinmkfq0qsu.one.one. 300 IN RRSIG NSEC3 13 3 300 20210218000000 20210128000000 53074 one.one. NNRzCeacwWqXvu7SIHFhmQiQHXe+I/Zn8bXnPRzjt/rqRUddhtZt/Gal lj7m+3ePbGi0b7/H0GMSvUlLxSnuaQ==
;; Received 277 bytes from 195.206.121.11#53(a.b-one-dns.net) in 171 ms

one.one.one.one.    300 IN  A   1.0.0.1
one.one.one.one.    300 IN  A   1.1.1.1
;; Received 76 bytes from 172.64.33.113#53(fred.ns.cloudflare.com) in 31 ms

Where you can see that systemd (127.0.0.53) made a request a root server (2001:500:a8::e) for the one top level domain, which then made a request to one.one domain server (37.209.194.9), which then queried cloudflares DNS server (195.206.121.11), and finally queried the one.one.one subdomain server (172.64.33.113)

Note, that systemd didn't even bother querying my local DNS server. Which leads us to another DNS scenario, your local DNS server is broken, but you can still resolve names on the internet.

Other DNS considerations, VPNs, DoH and /etc/hosts

Are you using a VPN, which typically uses its own DNS servers. Or do you have DoH (DNS over HTTPS) enabled on your web browser? Both will result in your local host not being able to resolve local names on your network, but able to resolve names on the internet just fine.

Can't get to just one host? If you have put a static DNS mapping in your /etc/hosts file, and the IP address of that host changes, you will no longer be able to get to that host. But you will have connectivity to other hosts. On a Linux system, the /etc/hosts file is usually looked at before a DNS host lookup is made.


2. Hands On - Troubleshooting DNS

Use the above information to determine the following:












Summary

Often the problem is a simple ethernet cable that is loose, and has become unplugged. Or perhaps your neighbour just fired up a Wifi Access Point right on top of the channel you were using, causing massive interference. In both cases, it is the physical layer that is the problem, one is easier to determine and fix.

Sometimes the problem is beyond your control, but using a binary method will determine that quickly enough. Now if you can just find someone at your ISP who is knowledgeable and will answer the phone.

Summary of Tools

Tools used


* some info from ipv6-net.blogspot.ca

16 February 2021