Can you elaborate? Why would a transform be needed only on mobile devices? Why wouldn't the OS handle whatever's needed to produce audio in a platform-agnostic way, especially on the web? Very interested to know more.
I suspect the commenter meant that when the buffer runs empty because the device can't compute new samples quickly enough, the last seen power spectrum (power versus frequency) is briefly maintained in the output. This filler is computed by a cheaper process.