Hacker News new | ask | show | jobs
by eperfa 5842 days ago
Sure, but we're talking about harvesting e-mail addresses from basically any sites you can find. It it's a matter of writing a regexp for the HTML source, it's fine. If it's a matter of running some complicated software with complex rules for each and every site you can find on the net, I think it's less good..

By the way: if you want to write a script for yourself for this purpose (I mean, stripping out invisible/out-of-screen tags), I think it can turn really difficult pretty fast because of the cascading nature of CSS and the possibility of the complex rules you can generate. So basically you'll have to interpret the whole CSS tree and count the position of every element, taking into account all the rules of CSS - for me it feels like implementing half of a browser's rendering engine. Oh, and have we talked about the possibility of setting a CSS property from JS? (eg. making every tag hidden and then show the 'true' e-mail address after the JS has loaded)