A simple GUI would be great to let the user select high quality 3-4 matches (reference points) between the image pairs. Then the rest of the stitching pipeline could stay the same.
I'm convinced after the work I did on panoramas that one shouldn't need to involve user input. Maybe for very strange situations... but generally it shouldn't be necessary. This is especially true if you have access to the IMU on a phone, which helps constrain the potential angles of each of your images.
If your images are just a random bag of jpegs that came in from the cold, then it's harder for sure.
If your images are just a random bag of jpegs that came in from the cold, then it's harder for sure.