Yes that would work. They don't need to be connected to the router on Wi-Fi, it can be ethernet too. The ESP's will connect with each other peer-to-peer.
Yes, if I understand you correctly then it already works with that setup.
So you can have the ESP32s over ethernet and it still works as long as it's on the same network as TOMMY (Home Assistant Add-on or Docker). Only thing to keep in mind is that the ESP32s need to have Wi-Fi inbuilt with antenna (either PCB or external).
Does that mean that Tommy is not using SSID at all for it's motion detection? Where is the WiFi network then? Hidden? What about bands and channels? Overlaps? Interference?
Actually, you are right. I confused myself. You would need to have them connected to the same Wi-Fi as that determines the bands and channels they communicate on even though the communication is peer-to-peer.
How are your devices connected exactly? Using ethernet on the subnet your HA instance is on? And are you then able to also connect to a separate Wi-Fi SSID you create for those devices?
Also, are you able to join the Discord channel? Then we can create a thread and go a bit more into depth about your setup.