Hacker News new | ask | show | jobs
by simoncion 60 days ago
E_NOREPRO

  user@ubuntu-server:~$ lsb_release -a
  No LSB modules are available.
  Distributor ID: Ubuntu
  Description:    Ubuntu 25.10
  Release:        25.10
  Codename:       questing
  user@ubuntu-server:~$ uname -a
  Linux ubuntu-server 6.17.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 18 10:10:29 UTC 2025 x86_64 GNU/Linux
  user@ubuntu-server:~$ getent ahosts us.archive.ubuntu.com
  91.189.91.82    STREAM us.archive.ubuntu.com
  91.189.91.82    DGRAM  
  91.189.91.82    RAW    
  91.189.91.81    STREAM 
  91.189.91.81    DGRAM  
  91.189.91.81    RAW    
  91.189.91.83    STREAM 
  91.189.91.83    DGRAM  
  91.189.91.83    RAW    
  2620:2d:4002:1::102 STREAM 
  2620:2d:4002:1::102 DGRAM  
  2620:2d:4002:1::102 RAW    
  2620:2d:4002:1::101 STREAM 
  2620:2d:4002:1::101 DGRAM  
  2620:2d:4002:1::101 RAW    
  2620:2d:4002:1::103 STREAM 
  2620:2d:4002:1::103 DGRAM  
  2620:2d:4002:1::103 RAW    
  user@ubuntu-server:~$ ip --oneline link | grep -v lo: | awk '{ print $2 }'
  enp0s3:
  user@ubuntu-server:~$ ip addr | grep inet6
      inet6 ::1/128 scope host noprefixroute 
      inet6 fe80::5054:98ff:fe00:64a9/64 scope link proto kernel_ll 
  user@ubuntu-server:~$ fgrep -r -e us.archive /etc/apt/
  /etc/apt/sources.list.d/ubuntu.sources:URIs: http://us.archive.ubuntu.com/ubuntu/
  user@ubuntu-server:~$ sudo apt-get update
  Hit:1 http://us.archive.ubuntu.com/ubuntu questing InRelease                            
  Get:2 http://security.ubuntu.com/ubuntu questing-security InRelease [136 kB]            
  <snip>
  Get:43 http://security.ubuntu.com/ubuntu questing-security/multiverse amd64 c-n-f Metadata [252 B]
  Fetched 2,602 kB in 3s (968 kB/s) 
  Reading package lists... Done
I didn't think to wrap that in 'time', but it only took a few seconds to run... more than two and less than thirty. The IPv6 packet capture running during all that reveals that it never tried to reach out over v6 (but that my multicast group querier is happily running):

  user@ubuntu-server:~$ sudo tcpdump -i enp0s3 -s 0 -n 'ip6 or icmp6'
  tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
  listening on enp0s3, link-type EN10MB (Ethernet), snapshot length 262144 bytes
  22:16:44.327503 IP6 fe80::5054:98ff:fe00:64a9 > ff02::2: ICMP6, router solicitation, length 16
  22:17:35.823917 IP6 fe80::<REDACTED>          > ff02::1: HBH ICMP6, multicast listener query v2 [gaddr ::], length 28
  22:17:41.706930 IP6 fe80::5054:98ff:fe00:64a9 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
I even manually ran unattended-upgrade, which looks to have succeeded. Other than unanswered router solicitations and multicast group query membership chatter, there continued to be no IPv6 communication at all, and none of the messages you reported appeared either in /var/log/syslog or on the terminal.

  user@ubuntu-server:~$ sudo /usr/bin/unattended-upgrade
  user@ubuntu-server:~$ sudo grep -e 'Tried to start delayed item' /var/log/syslog
  user@ubuntu-server:~$ 
What am I doing wrong?
1 comments

You aren't running it during an external transitive failure that happened on April 15th.

The problem isn't the happy path, the problem is when things fail, and that linux, in particular made it really hard to reliably disable [0]

Once that hits someone's vagrant or ansible code, it tends to stick forever, because they don't see the value until they try to migrate, then it causes a mess.

The last update on the original post link [1] explains this. The ipv4 host being down, not having a response, it being the third Tuesday while Aquarius is rising into what ever, etc... can invoke it. It causes pains, is complex and convoluted to disable when you aren't using it, thus people are afraid to re-enable it.

[0] https://wiki.archlinux.org/title/IPv6#Disable_IPv6 [1] https://tailscale.com/blog/two-internets-both-flakey

> ...linux, in particular made it really hard to reliably disable

Section 10.1 of that Archi Wiki page says that adding 'ipv6.disable=1' to the kernel command line disables IPv6 entirely, and 'ipv6.disable_ipv6=1' keeps IPv6 running, but doesn't assign any addresses to any interfaces. If you don't like editing your bootloader config files, you can also use sysctl to do what it looks like 'ipv6.disable_ipv6=1' does by setting the 'net.ipv6.conf.all.disable_ipv6' sysctl knob to '1'.

> You aren't running it during an external transitive failure...

I'll assume you meant "transient". Given that I've already demonstrated that the only relevant traffic that is generated is IPv4 traffic, let's see what happens when we cut off that traffic on the machine we were using earlier, restored to its state prior to the updates.

We start off with empty firewall rules:

  root@ubuntu-server:~# iptables-save
  root@ubuntu-server:~# ip6tables-save
  root@ubuntu-server:~# nft list ruleset
  root@ubuntu-server:~# 
We prep to permit DNS queries and ICMP and reject all other IPv4 traffic:

  root@ubuntu-server:~# iptables -A OUTPUT -o enp0s3 -p udp --dport 53 -j ACCEPT
  root@ubuntu-server:~# iptables -A OUTPUT -o enp0s3 -p tcp --dport 53 -j ACCEPT
  root@ubuntu-server:~# iptables -A OUTPUT -o enp0s3 -p icmp -j ACCEPT
  root@ubuntu-server:~# iptables -A INPUT  -i enp0s3 -p udp --sport 53 -j ACCEPT
  root@ubuntu-server:~# iptables -A INPUT  -i enp0s3 -p tcp --sport 53 -j ACCEPT
  root@ubuntu-server:~# iptables -A INPUT  -i enp0s3 -p icmp -j ACCEPT
  root@ubuntu-server:~# iptables -A OUTPUT -o enp0s3 -j REJECT
  root@ubuntu-server:~# iptables -A INPUT  -i enp0s3 -j REJECT
  root@ubuntu-server:~#
And we do an apt-get update, which fails in less than ten seconds:

  root@ubuntu-server:~# apt-get update
  Ign:1 http://security.ubuntu.com/ubuntu questing-security InRelease
  Ign:2 http://us.archive.ubuntu.com/ubuntu questing InRelease
  <snip>
  Could not connect to security.ubuntu.com:80 (91.189.92.23). - connect (111: Connection refused) Cannot initiate the connection to security.ubuntu.com:80 (2620:2d:4000:1::102). - connect (101: Network is unreachable) <long line snipped>
  <snip>
  W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/questing-security/InRelease  Cannot initiate the connection to security.ubuntu.com:80 (2620:2d:4000:1::102). - connect (101: Network is unreachable) <long line snipped>
  W: Some index files failed to download. They have been ignored, or old ones used instead.
  root@ubuntu-server:~# 
In this case, the IPv6 traffic I see is... an unanswered router solicitation, and the multicast querier chatter that I saw before. [0] What happens when we change those REJECTs into DROPs

  root@ubuntu-server:~# iptables -D OUTPUT -o enp0s3 -j REJECT
  root@ubuntu-server:~# iptables -D INPUT  -i enp0s3 -j REJECT
  root@ubuntu-server:~# iptables -A OUTPUT -o enp0s3 -j DROP
  root@ubuntu-server:~# iptables -A INPUT  -i enp0s3 -j DROP
  root@ubuntu-server:~# 
...and then re-run 'apt-get update'?

  root@ubuntu-server:~# apt-get update
  Ign:1 http://security.ubuntu.com/ubuntu questing-security InRelease
  Ign:1 http://security.ubuntu.com/ubuntu questing-security InRelease
  Ign:1 http://security.ubuntu.com/ubuntu questing-security InRelease
  Err:1 http://security.ubuntu.com/ubuntu questing-security InRelease
  Cannot initiate the connection to security.ubuntu.com:80 (2620:2d:4002:1::103). - connect (101: Network is unreachable) <v6 addrs snipped> Could not connect to security.ubuntu.com:80 (91.189.92.24), connection timed out <long line snipped>
  <redundant output snipped>
  W: Some index files failed to download. They have been ignored, or old ones used instead.
  root@ubuntu-server:~#
Exactly the same thing, except it takes like two minutes to fail, rather than ~ten seconds, and the error for IPv4 hosts is "connection timed out", rather than "Connection refused". Other than the usual RS and multicast querier traffic, absolutely no IPv6 traffic is generated.

However. The output of 'apt-get' sure makes it seem like an IPv6 connection is what's hanging, because the last thing that its "Connecting to..." line prints is the IPv6 address of the host that it's trying to contact... despite the fact that it immediately got a "Network is unreachable" back from the IPv6 stack.

To be certain that my tcpdump filter wasn't excluding IPv6 traffic of a type that I should have accounted for but did not, I re-ran tcpdump with no filter and kicked off another 'apt-get update'. I -again- got exactly zero IPv6 traffic other than unanswered router solicitations and multicast group membership querier chatter.

I'm pretty damn sure that what you were seeing was misleading output from apt-get, rather IPv6 troubles. Why? When you combine these facts:

* REJECTing all non-DNS IPv4 traffic caused apt-get to fail within ten seconds

* DROPping all non-DNS IPv4 traffic caused apt-get to fail after like two minutes.

* In both cases, no relevant IPv6 traffic was generated.

the conclusion seems pretty clear.

But, did I miss something? If so, please do let me know.

[0] I can't tell you why the last line in the 'apt-get update' output is only IPv6 hosts. But everywhere there were IPv6 hosts, the reported error was "Network is unreachable" and for IPv4 the error was "Connection refused".

This part is exactly the problem I was talking about:

  root@ubuntu-server:~# apt-get update
  ...
  Could not connect to security.ubuntu.com:80 (91.189.92.23). - connect (111: Connection refused) Cannot initiate the connection to security.ubuntu.com:80 (2620:2d:4000:1::102). - connect (101: Network is unreachable) <long line snipped>
  <snip>
  W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/questing-security/InRelease  Cannot initiate the connection to security.ubuntu.com:80 (2620:2d:4000:1::102). - connect (101: Network is unreachable) <long line snipped>
  W: Some index files failed to download. They have been ignored, or old ones used instead.
Well... in this case the output does show the failure to connect to 91.189.92.23, but that looks like a different kind of message to the "W:" lines, so maybe it doesn't show up on all setups or didn't make it into the logs on disk, or got buried under other output.

If you look at just the W: lines, it mentions a v6 address but the machine doesn't have v6 and the actual problem is the Connection Refused to the v4 address. The output is understandably misleading but ultimately the problem here has nothing to do with v6.

> ...ultimately the problem here has nothing to do with v6.

I agree... more or less. The remainder of this message is a reply to nyrikki, but I'm sticking it under your comment because you might also appreciate how weird it looks like this guy's setup is.

nyrikki: The rest of this message is directed directly at you:

============================

Actually, what's up with your link-local addresses? They have really odd flags on them.

The only way I can figure that you got into that configuration was to remove the kernel-generated link-local address and add a new one with the arguments 'scope link noprefixroute'. Even if a router on your network advertised a fe80::/64 prefix, that does nothing at all, as hosts are supposed to [0] ignore advertised prefixes that are link-local.

Yeah. After playing around with this for a bit, I can see that your network is at either least as misconfigured as one would be if -say- your DHCP server was giving leases with an invalid default gateway, or it is very, very specially configured for very special reasons.

Starting with the ubuntu-server host in the "IPv4 traffic is REJECTed" configuration from my last comment, we do this on the host to delete the kernel-supplied link-local address and instruct the OS to create an address in the link-local address space that can be used for global addresses.

  root@ubuntu-server:~# ip addr del fe80::5054:98ff:fe00:64a9/64 dev enp0s3
  root@ubuntu-server:~# ip addr add fe80::5054:98ff:fe00:64aa/64 noprefixroute dev enp0s3
  root@ubuntu-server:~# 
We then configure our upstream router to either

* Send RAs on the local link without a prefix

or

* Send RAs on the local link with a link-local prefix (so they're ignored by the Ubuntu host)

or we hard-code the address of a next-hop router on our host. One (or more) of these three things sets up the host with a default route. If you do none of them, you don't get a default route, and global traffic goes nowhere.

Then -because either you or something running on the host deleted the kernel-provisioned link-local address, and then explicitly instructed the kernel to create a link-local address that can be used to reach global addresses- the local host starts emitting IPv6 traffic with a link-local source address and a global destination address.

When presented with this sort of traffic, my router immediately sends back a ICMP6 "destination unreachable, beyond scope", which immediately terminates the connection attempt on the host, so the behavior ends up being exactly the same as when the host didn't have a misconfigured link-local address. But. You claim to be having trouble.

So, there are one or more things that might be going on that explain your trouble.

1) You have a firewall on this host that is dropping important ICMP6 traffic, causing it to miss the "this destination address is beyond your scope" message from the router. Do. Not. Do. This. ICMP is network-management traffic which tells you important things. Dropping important ICMP traffic is how you have mysterious and annoying failures.

2) Your router is configured to ignore link-local traffic with non-link-local destination addresses, rather than replying that the destination is out of scope. On the one hand, this seems stupid to me, but on the other hand, we got here through a misconfiguration that seems very unlikely to me to happen often, [1] so the router admin might not have thought about it when making "locked down" firewall rules.

3) There's some middlebox on the path to the router that's dropping your traffic because not all that many folks would expect to see link-local source and global destination, and middleboxes are widely known for dropping stuff that's even a little bit abnormal.

Investigating your misconfigured host (and maybe also connected network) has been interesting. I'd love to try to figure out if SystemD can be misconfigured to produce the host configuration that we're seeing (or if this misconfiguration is 100% bespoke), but I hear a hot burrito calling my name. Maybe I'll get bored and do more investigation later.

Also, you might object to my conclusion with "But this couldn't happen on IPv4! Clearly IPv6 is too complicated!". I would reply with "What would happen if your host couldn't get a lease from a DHCPv4 server, autoconfigured an address in the IPv4 link-local (169.254.0.0/16) address range, and the network's upstream router was configured to silently drop traffic from that subnet? At least the IPv6 link-local address range is prohibited from sending traffic off the local link [2] and fails the transmission attempt immediately."

[0] ...and Ubuntu questing does ignore such prefixes...

[1] ...that is, a link-local address that has been configured to handle global traffic...

[2] ...unless -as we've discovered- you specifically tell the OS otherwise...

> Actually, what's up with your link-local addresses? They have really odd flags on them.

They were probably configured by one of the fancy network config daemons (systemd-networkd, dhcpcd or similar). They like to take over RA processing, and they add IPs with "noprefixroute" so they can add the route themselves separately.

RAs have nothing to do with link-locals, but I bet one or the other of those daemons also takes over configuring link-local addresses and does the same thing there. If you looked in the routing table, there'll be a prefix route for fe80::/64 that was added by the daemon.

This wouldn't affect how DNS replies are sorted though. On machines without non-link-local v6, AAAA records aren't handled by trying them first and then expecting them to quickly fail. They're handled by pushing them to the bottom of the list so that the A records are tried first.

> They were probably configured by one of the fancy network config daemons (systemd-networkd, dhcpcd or similar). They like to take over RA processing, and they add IPs with "noprefixroute" so they can add the route themselves separately.

Makes sense, yeah.

While I don't see a way to do this with dhcpcd, I have no clue what Lovecraftian horrors systemd-networkd generates, so maybe it's the culprit. And whatever is doing this, this behavior is not configured by default on Ubuntu Server version Questling. Out of the box, I get regular kernel-assigned link-local addresses.

But I don't understand why you'd want to do this for link-local addresses... not automatically, anyway. It looks like doing this has the disadvantage that it erases the baked-in "This shouldn't be used for global-scope transmissions. Send back 'Network is unreachable' in those cases." rule that you get for free with the kernel-generated address. Sheesh. I wonder if there's some additional logic in a stupid daemon somewhere that manages a firewall rule that restores the "Network is unreachable" ICMPv6 response to outbound global-scope packets that come from the link-local address... just to add more moving parts that can get out-of-sync.

> This wouldn't affect how DNS replies are sorted though.

Yeah.

It's a pity that I don't work with OP. I'd rather like to take a look at this system and the network it's hooked to.

> So, there are one or more things that might be going on that explain your trouble.

Ah, there's secret option #4:

4) This rather weird configuration has been deliberately set up by the sysadmin that manages this system and network and ordinarily works fine, but the "external transitive failure that happened on April 15th." affected both IPv4 and IPv6 traffic (which, duh, that happens frequently)... but it was an intermittent failure so unrelated changes made by OP caused him to come to the wrong conclusions and point the blame cannon at the wrong part of the system.

Okay. Burrito time!