for 2), it's actually written in the description: "phrase-level timestamps", so it should be possible (phrase level is neat for skipping to a special location on a video, but maybe not for audio editing).