|
|
|
|
|
by kqr
78 days ago
|
|
I have a hypothesis email scrapers don't parse HTML at all. I suspect they search the raw bytestring for @ characters and take whatever's on either side of it. That probably gets them as many addresses as they can realistically use at a fraction of the cost, given how expensive HTML parsing can be. (Similarly, I'm sure most links can be found by searching the bytestring for "href" and taking what's to the right of it.) This would explain why HTML entities are so effective. On the other hand, surely the TLS handshake is far more expensive than HTML parsing? Maybe it's to avoid parser failure modes that consume a lot of resources? |
|
Could also be that they learned that sending spam to obfuscated addresses doesn’t gets much response. Such messages might get filtered out more and/or addressees might be less inclined to reply to it.