There is a way to build an even smaller version of Tailscale for embedded systems, though there are some trade offs with regards to ease of use and security. https://tailscale.com/kb/1207/small-tailscale/ has more details.
Method described gets most of the win from UPX.
That's not applicable on openwrt as it already uses lzma on filesystem level. Switching to UPX would cause excessive ram usage as binary would be in anonymous memory and no longer be backed by filesystem pages.