I don't know of a paper, but on the Tcl wiki there is some discussion of the "non-recursive evaluation" (NRE) engine that enables these features [1]. More formal descriptions of the new coroutine [2] and tailcall [3] commands are listed, among others, here [4].
Tcl's "minor" or point releases are bigger than most languages'. 8.6 has been in development for about four years. 8.5 was also in development at least that long. Contrast Python, which does a point release every 18 months or so.
1. http://wiki.tcl.tk/37253#pagetocbefb5a57
2. http://www.tcl.tk/cgi-bin/tct/tip/328
3. http://www.tcl.tk/cgi-bin/tct/tip/327
4. http://wiki.tcl.tk/21276
Update: more details on the NRE implementation are available here [5] at the contributing author's site; anonymous login required.
5. http://msofer.com:8080/wiki?name=NRE