Hacker News new | ask | show | jobs
by cscottnet 2321 days ago
If you want to dig through the history some: https://github.com/wikimedia/parsoid/blame/6eb00df3e090b20cc... Is a pretty good example of the porting technique. You'll see quite a decent number of lines are still unchanged from the "automatic conversion from JS". https://github.com/wikimedia/parsoid/commit/6eb00df3e090b20c... shows what the initial port process was like. Still quite a bit of work, but you'll see it's almost all "real" work that needs a human to think about things, not just mechanical syntax translation. The syntax translation part was done automatically.

Then https://github.com/wikimedia/parsoid/commits/master/src/Ext/... is a not-too-atypical view of the process after the "intial working port" was done (post Aug 2019). Some nasty bugs fixed (https://github.com/wikimedia/parsoid/commit/34fcb4241aa0f3a0... a GC bug in PHP!), some more subtle bugs (PHP's crazy behavior of '$' at the end of a regexp, unless you use the 'D' flag), etc.

If you look through the history earlier in 2019, you'll even see JS commits like https://github.com/wikimedia/parsoid/commit/2853a90ceda7cdfa... which are to the JS code (in production at the time) preparing the way for the PHP port. In that particular case, our tooling was doing offset conversion between JS UTF-16 and PHP UTF-8 as part of the output-testing-and-comparison QA framework we'd built for the port, and it was getting hugely confused by Gallery since Gallery was using "bogus" offsets into the source text. Since fixing the offsets was rather involved (the patchset for this commit in gerrit went through 56 revisions : https://gerrit.wikimedia.org/r/505319 ) the change was first done on the JS side, thoroughly tested, and deployed to production to ensure it had no inadvertent effects, before that now-better JS code was ported to PHP. It would have been a disaster to try to make this change in the PHP version directly during the port.