The most widely distributed embedded ARM system software in the world, Android, offers this. They use mini debug info (normal debug info sections compressed, IIRC) and then you can signal a daemon in the background to do a stacktrace on your application.
I've done it, but it increases binary size by a whole lot.
What worked better was a last chance hardware fault handler that wrote a partial core dump out to flash, and a tool to create an ELF core dump file from that that GDB can accept.
Not on production images, though.