![]()
|
Technology is great, when it works, but what to do when it doesn't, and you get a helpful message in your browser that says something like "check your connectivity"?
The art of network troubleshooting is breaking the problem in half, and figuring out if it is the problem lies in the close or far half. Doing a binary search for the issue is usually the quickest to resolution.
In order to use the modern internet, one requires two (2) items:
Sure there can be much more that is required, but in this day in age of NAT-designed networks, Client/Server is king. More specifically, the client is a Web Browser, and the Server is a Web Server serving up tons of Javascript. In this network, only the Server must have a real routable IP address. The Client can be behind many layers of NAT, and often is.
Of course, IPv6 doesn't require NAT, since NAT is an address conservation mechanism. And I suspect the pendulum will swing back someday to peer-to-peer sharing, but for now all we need is an IP address and DNS.
If the binary search is the fastest path to resolution, where do we start?
Most will start with ping
. And it is useful, but the usefulness decreases immensely when there is no response.
$ ping netsig.makiki.ca
PING netsig.makiki.ca(2607:c000:8011:fd94:216:3eff:fed7:e195 (2607:c000:8011:fd94:216:3eff:fed7:e195)) 56 data bytes
^C
--- netsig.makiki.ca ping statistics ---
6 packets transmitted, 0 received, 100% packet loss, time 5ms
Although it may seem like don't know anything more than we we started, we actually do. We know that DNS is working! The name was looked up and an IP address was returned, but we don't know much more about the problem of why we can't get to the server.
I find traceroute
to be a more useful starting point. Traceroute reports a path that your packets will take to their destination (aka the server)
$ traceroute6 netsig.makiki.ca
traceroute to netsig.makiki.ca (2607:c000:8011:fd94:216:3eff:fed7:e195) from 2001:470:1f12:79e::2, port 33434, from port 36161, 30 hops max, 60 bytes packets
1 tunnel367664.tunnel.tserv10.par1.ipv6.he.net (2001:470:1f12:79e::1) 3.344 ms 2.631 ms 1.982 ms
2 10ge7-3.core1.par2.he.net (2001:470:0:7b::1) 0.868 ms 0.841 ms 19.875 ms
3 100ge11-2.core1.nyc4.he.net (2001:470:0:54::1) 71.357 ms 71.225 ms 81.582 ms
4 100ge14-1.core1.tor1.he.net (2001:470:0:2dc::2) 100.836 ms 81.038 ms 100.601 ms
5 paix.tor.packetflow.ca (2001:504:d:80::25) 81.450 ms * 86.664 ms
6 ae8-0-bdr01-tor2.teksavvy.com (2607:f2c0:ffff:1:3:1:0:130) 117.044 ms 81.773 ms 81.723 ms
7 ae4-0-bdr01-wpg.teksavvy.com (2607:f2c0:ffff:1:26::137) 102.953 ms 102.798 ms 109.870 ms
8 2607:f2c0:26:28::148 (2607:f2c0:26:28::148) 117.589 ms 117.359 ms 118.345 ms
9 ae0-10-lns01-van2.teksavvy.com (2607:f2c0:ffff:4:4::152) 129.573 ms 129.805 ms 129.651 ms
10 ae0-10-lns01-van2.teksavvy.com (2607:f2c0:ffff:4:4::152) 128.595 ms 128.564 ms 128.421 ms
11 * * *
12 * * *
13 * * *
^C
traceroute
not only proves that DNS is working (similar to ping) but also shows the path of the packet to the destination.
More importantly, it shows that there is IP connectivity to the internet, and backbone networks such as he.net.
The three stars * * *
means no response.
To understand how traceroute
works, you must understand that the IP header has a TTL (Time to Live in IPv4) or HopCount (in IPv6) field that is designed to remove packets from infinitely cycling in loops on the internet. As a packet passes a router, the TTL or HopCount is decremented, and when it reaches Zero, the packet is dropped.
traceroute
sends out a packet toward the destination with unusually small TTL/HopCount. Initially it is set to one (1), then the closest router will decrement it, drop it, and send back an ICMP(6) message to the source stating that it dropped the packet. traceroute
listens for the ICMP(6) packets and notes the IP address of the router and prints it on the list as Hop number 1, Then in increases the TTL/HopCount to two (2) and repeats. It actually send 3 packets at each TTL/HopCount and measures the return time (much like ping). When you see the three stars * * *
there are no times to be printed.
OK, you can't ping
, you can't even traceroute
, then what? It is time to look at your own host. Is the interface (wired or wireless) up? Did you get an IP address from the DHCP server (or RA from the router for SLAAC)? The ip
command can provide you quite a bit of information about your host's networking status.
ip
is the successor to the venerable ifonfig
. And with good reason, as ip
can tell you much more about your configuration. It is installed by default on most distros, and usually lives at /sbin/ip
.
Note: BSD & MacOS X still use ifconfig
. If you are using Windows, then open a command windows and use the ipconfig
command.
ip
can display the status of the link (Layer 2 in the OSI model), as well as allow configuration of a VLAN based interface. This will show whether your wired/wireless interface is up. To display the the link status use:
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether 10:9a:dd:54:f6:34 brd ff:ff:ff:ff:ff:ff
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DORMANT qlen 1000
link/ether 10:9a:dd:ae:81:77 brd ff:ff:ff:ff:ff:ff
The ip
command will also show if your host has any IP addresses. Using the -4
or -6
will limit the output to just that address family.
For Windows: use ipconfig /all
$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether 10:9a:dd:54:f6:34 brd ff:ff:ff:ff:ff:ff
inet 10.1.1.15/24 brd 10.1.1.255 scope global eth0
inet6 2607:c000:815f:5600:e4a1:4d5f:961a:4973/64 scope global temporary dynamic
valid_lft 6951sec preferred_lft 1551sec
inet6 2001:470:1d:489:fd2f:ea14:d171:c541/64 scope global temporary dynamic
valid_lft 6951sec preferred_lft 1551sec
inet6 2607:c000:815f:5600:fd2f:ea14:d171:c541/64 scope global temporary deprecated dynamic
valid_lft 6951sec preferred_lft 0sec
inet6 2001:470:1d:489:487d:35e:3834:c9a/64 scope global temporary deprecated dynamic
valid_lft 6951sec preferred_lft 0sec
inet6 2607:c000:815f:5600:487d:35e:3834:c9a/64 scope global temporary deprecated dynamic
valid_lft 6951sec preferred_lft 0sec
inet6 2001:470:1d:489:a0f0:7c93:4135:b344/64 scope global temporary deprecated dynamic
valid_lft 6951sec preferred_lft 0sec
inet6 2607:c000:815f:5600:a0f0:7c93:4135:b344/64 scope global temporary deprecated dynamic
valid_lft 6951sec preferred_lft 0sec
inet6 2607:c000:815f:5600:129a:ddff:fe54:f634/64 scope global dynamic
valid_lft 6951sec preferred_lft 1551sec
inet6 2001:470:1d:489:4d85:44b3:3b87:1513/64 scope global temporary deprecated dynamic
valid_lft 6951sec preferred_lft 0sec
inet6 2001:470:1d:489:5cd0:431a:b989:4517/64 scope global temporary deprecated dynamic
valid_lft 6951sec preferred_lft 0sec
inet6 2001:470:1d:489:a121:bf93:87b8:c125/64 scope global temporary deprecated dynamic
valid_lft 6951sec preferred_lft 0sec
inet6 2001:470:1d:489:c8c8:e6c4:ed49:e502/64 scope global temporary deprecated dynamic
valid_lft 6951sec preferred_lft 0sec
inet6 2001:470:1d:489:129a:ddff:fe54:f634/64 scope global dynamic
valid_lft 6951sec preferred_lft 1551sec
inet6 fe80::129a:ddff:fe54:b634/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DORMANT qlen 1000
link/ether 10:9a:dd:ae:81:77 brd ff:ff:ff:ff:ff:ff
inet6 fe80::129a:ddff:feae:8177/64 scope link
valid_lft forever preferred_lft forever
As you can see, it is possible to have many IPv6 addresses on a single interface. It is also possible to have multiple different IPv6 prefixes active on the same interface, but you don't need to worry which one is going to be used, there is a source address selection algorithm (RFC 6724) that will do that for you.
OK, so your link is up, you have an IP address, but you still can't ping
or traceroute
. What now?
It is possible that your host does not have a default route. A default route is the route of last resort. Most hosts don't run complex routing tables (unless you are running VMs). So usually the routing table will be just a few lines, and there is one for IPv4 and IPv6.
$ ip -4 route
default via 10.1.1.1 dev eth0 metric 100
10.1.1.0/24 dev eth0 proto kernel scope link src 10.1.1.15
The IPv6 route table displays the default route with a link-local next hop
$ ip -6 route
2001:470:1d:489::/64 dev eth0 proto kernel metric 256 expires 6865sec mtu 1280
2607:c000:815f:5600::/64 dev eth0 proto kernel metric 256 expires 6865sec
fe80::/64 dev eth0 proto kernel metric 256 mtu 1280
fe80::/64 dev eth1 proto kernel metric 256
default via fe80::224:a5ff:fee1:7ca dev eth0 proto kernel metric 1024 expires 1464sec mtu 1280 hoplimit 64
While ip
will tell you the configuration of the host, rdisc6
will tell you the configuration of your router, or at least what it is sending out as Router Advertisements (RAs). RAs send out prefixes, and controls whether clients will start DHCPv6 clients (RFC 8415), with the A, M, and O flags. rdisc6 will make a router solicitation (RS), and print out the RA in response.
rdisc6
is part of the ndisc6 package. Install it as you would any package on your Linux Distro.
$ rdisc6 eth0
Soliciting ff02::2 (ff02::2) on eth0...
Hop limit : 64 ( 0x40)
Stateful address conf. : No
Stateful other conf. : Yes
Router preference : medium
Router lifetime : 1800 (0x00000708) seconds
Reachable time : unspecified (0x00000000)
Retransmit time : unspecified (0x00000000)
Source link-layer address: 00:24:A5:E1:07:CA
MTU : 1280 bytes (valid)
Prefix : 2607:c000:815f:5600::/64
Valid time : 7200 (0x00001c20) seconds
Pref. time : 1800 (0x00000708) seconds
Prefix : 2001:470:1d:489::/64
Valid time : 7200 (0x00001c20) seconds
Pref. time : 1800 (0x00000708) seconds
Route : 2607:c000:815f:5600::/56
Route preference : medium
Route lifetime : 7200 (0x00001c20) seconds
Recursive DNS server : 2607:c000:815f:5600::1
DNS server lifetime : 1800 (0x00000708) seconds
from fe80::224:a5ff:fee1:7ca
RAs not only include prefixes to be used by the host in SLAAC (Stateless Address Auto Configuration), but also the IP address of a DNS server to use (RDNSS), as well as a DNS search domain (DNSSL), freeing the host from the bondage of a DHCP server.
Use the above information to determine the following:
Use traceroute
to determine the number of hops to the host one.one.one.one
Use traceroute6
to determine the number of hops to host one.one.one.one, are the number of hops different from the previous traceroute
?
Using the ip
command, determine how many IP addresses are on your active interface?
Using the ip
command note your default route for both IPv4 and Ipv6
Extra Credit: Use rdisc6
to determine the IP address of your DNS server.
Continuing the troubleshooting theme, you can ping
, you have an IP address, but you can't resolve human host names into IP addresses.
This can appear as if you don't have connectivity to the internet, but in fact, you just can't resolve host names.
On a Linux system, traditionally the /etc/resolv.conf
file contains the IP address of the DNS server.
$ cat /etc/resolv.conf
# Generated by dhcpcd from br0.dhcp, br0.dhcp6, br0.ra
domain hoomaha.net
nameserver 10.1.1.1
nameserver 2001:db8:8011:fd11::1
However, systemd
has other ideas about resolving hosts, and if you see 127.0.0.53
in /etc/resolv.conf
then systemd
has taken over your DNS resolution.
If you are using a Linux Distro with systemd
then you will have to use the systemd-resolve
command to determine what DNS server it is using:
$ systemd-resolve --status
Global
LLMNR setting: no
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: no
DNSSEC supported: no
Current DNS Server: 192.168.215.1
DNS Servers: 192.168.215.1
2001:db8:8011:fd11::1
DNS Domain: hoomaha.net
DNSSEC NTA: 10.in-addr.arpa
16.172.in-addr.arpa
168.192.in-addr.arpa
17.172.in-addr.arpa
18.172.in-addr.arpa
...
host
and dig
Does the host have IP connectivity to the DNS server? Try ping
the DNS server. If that doesn't work go back and check your IP connectivity troubleshooting. If so....
Is the DNS server (as seen above) working? A quick check with host
will determine if you DNS server it working.
$ host one.one.one.one
one.one.one.one has address 1.0.0.1
one.one.one.one has address 1.1.1.1
one.one.one.one has IPv6 address 2606:4700:4700::1111
one.one.one.one has IPv6 address 2606:4700:4700::1001
If there is no answer, then add the IP address of another known DNS server such as 1.1.1.1
$ host one.one.one.one 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases:
one.one.one.one has address 1.0.0.1
one.one.one.one has address 1.1.1.1
one.one.one.one has IPv6 address 2606:4700:4700::1111
one.one.one.one has IPv6 address 2606:4700:4700::1001
Note that in the second example, the host
command is making a DNS request directly to 1.1.1.1
rather than to your local DNS server. This test proves that your Linux host can resolve DNS names, but your local DNS server can't. If you are in a hurry, just change your /etc/resolv.conf
to point to 1.1.1.1
(Cloudflare's DNS server) or 2606:4700:4700::1111
if you are an IPv6-only network.
dig
traceA few months ago, we learned the ins and outs of dig
. I find that dig
usually provides more information that I want. However in the scenario where your local DNS server is resolving your local DNS names, but not resolving names on the internet (perhaps a firewall rule is blocking it), the dig +trace
command will display a trace of the DNS requests, as the recursive DNS servers do their work.
$ dig +trace one.one.one.one
; <<>> DiG 9.16.1-Ubuntu <<>> +trace one.one.one.one
;; global options: +cmd
. 7184 IN NS l.root-servers.net.
. 7184 IN NS a.root-servers.net.
. 7184 IN NS c.root-servers.net.
. 7184 IN NS k.root-servers.net.
. 7184 IN NS e.root-servers.net.
. 7184 IN NS d.root-servers.net.
. 7184 IN NS b.root-servers.net.
. 7184 IN NS m.root-servers.net.
. 7184 IN NS j.root-servers.net.
. 7184 IN NS h.root-servers.net.
. 7184 IN NS f.root-servers.net.
. 7184 IN NS i.root-servers.net.
. 7184 IN NS g.root-servers.net.
;; Received 262 bytes from 127.0.0.53#53(127.0.0.53) in 0 ms
one. 172800 IN NS a.nic.one.
one. 172800 IN NS b.nic.one.
one. 172800 IN NS c.nic.one.
one. 172800 IN NS d.nic.one.
one. 86400 IN DS 14131 8 1 8C04B443EE763B8B67CDF0DB0BBC832E24F560EE
one. 86400 IN DS 14131 8 2 8D11FF81A0E9BCC2719695CBE4D585B47AA3BDE6CD28C5AC6E02BD91 9CA9B9E0
one. 86400 IN RRSIG DS 8 1 86400 20210223210000 20210210200000 42351 . iNKofj5F/gdOI4s/eMeRHym9RARgdtaPzxDChhgxti6xc5x1P/TNAkQH RHZvH5wxULDzfPVm0mJ8NAOd0egJDgQs1OPGgjaxA535Vhlo7lq/Onb+ SkGsTUkPeiUhM13APhXBcTyWVbE8kPfy8g7bmPi7Ioi3RrotNVpp6Hjs m4wLt1iNSHUDznrfD/7GbkiNZuuNptapm48Rr64oEsJiybsecsP8r9rm 8FDbKrdhHDEYp/sytI5RIvr0yJlCb9TEao8G1nck8wcMeg6Wsxn8kKHs IdxDLTBwshNN9ppdwzb+sNyNpeNUQsAbMykYSsBjUVyZP6Lv1ac5vieK 2/Leww==
;; Received 659 bytes from 2001:500:a8::e#53(e.root-servers.net) in 35 ms
one.one. 3600 IN NS a.b-one-dns.net.
one.one. 3600 IN NS b.b-one-dns.net.
one.one. 3600 IN DS 53074 13 2 86F2929EE3E5E501032B6DC94841A4A056A2D2876CABCF46A5F8907E B4917782
one.one. 3600 IN RRSIG DS 8 2 3600 20210308200131 20210206190132 29100 one. m2h/vPZl2vOLkqXqOZaWELFNMGM2bMEagv/gOwUo3Vg1gWD7fpP4w81y L1UJydVYMPdU/Ir+zcusiiIXK6KcygDwJDl2yyH1CdVxmbl0oSCkqK3I QSzuj+MmKQNKuF9eOywSJRfEynmPsWmOkXz29s54HrTxAjkgoWpxP2PS YLRwTSkI0N5vFH6M1oiRYWIP8bztuze00vkLRLXuuTkMXg==
;; Received 339 bytes from 37.209.194.9#53(b.nic.one) in 67 ms
one.one.one. 3600 IN NS jean.ns.cloudflare.com.
one.one.one. 3600 IN NS fred.ns.cloudflare.com.
70hf8peef8el004f0mr69pinmkfq0qsu.one.one. 300 IN NSEC3 1 0 1 AB 03VKHA5O24MFR8QICMVNJ8C9K3GDOHP3 NS
70hf8peef8el004f0mr69pinmkfq0qsu.one.one. 300 IN RRSIG NSEC3 13 3 300 20210218000000 20210128000000 53074 one.one. NNRzCeacwWqXvu7SIHFhmQiQHXe+I/Zn8bXnPRzjt/rqRUddhtZt/Gal lj7m+3ePbGi0b7/H0GMSvUlLxSnuaQ==
;; Received 277 bytes from 195.206.121.11#53(a.b-one-dns.net) in 171 ms
one.one.one.one. 300 IN A 1.0.0.1
one.one.one.one. 300 IN A 1.1.1.1
;; Received 76 bytes from 172.64.33.113#53(fred.ns.cloudflare.com) in 31 ms
Where you can see that systemd
(127.0.0.53) made a request a root server (2001:500:a8::e) for the one top level domain, which then made a request to one.one domain server (37.209.194.9), which then queried cloudflares DNS server (195.206.121.11), and finally queried the one.one.one subdomain server (172.64.33.113)
Note, that systemd
didn't even bother querying my local DNS server. Which leads us to another DNS scenario, your local DNS server is broken, but you can still resolve names on the internet.
/etc/hosts
Are you using a VPN, which typically uses its own DNS servers. Or do you have DoH (DNS over HTTPS) enabled on your web browser? Both will result in your local host not being able to resolve local names on your network, but able to resolve names on the internet just fine.
Can't get to just one host? If you have put a static DNS mapping in your /etc/hosts
file, and the IP address of that host changes, you will no longer be able to get to that host. But you will have connectivity to other hosts. On a Linux system, the /etc/hosts
file is usually looked at before a DNS host lookup is made.
Use the above information to determine the following:
host
to determine the addresse(s) the host one.one.one.onedig +trace one.one.one.one
to determine if your local DNS server is queried. What is the IP address of your local DNS server?
Often the problem is a simple ethernet cable that is loose, and has become unplugged. Or perhaps your neighbour just fired up a Wifi Access Point right on top of the channel you were using, causing massive interference. In both cases, it is the physical layer that is the problem, one is easier to determine and fix.
Sometimes the problem is beyond your control, but using a binary method will determine that quickly enough. Now if you can just find someone at your ISP who is knowledgeable and will answer the phone.
Tools used
* some info from ipv6-net.blogspot.ca
16 February 2021