Hacker News new | ask | show | jobs
by madeofpalk 3035 days ago
One question I've had recently is how to scrape out a Javascript object out of HTML source. With server-side react + redux, I've wanted to be able to scrap out the serialised var __STATE__ = {...} object to JSON, from nodejs. Best solution I cobbled together was to basically eval() the JS source, which I know is far from ideal.
2 comments

You could use a parser like esprima or its equivalent from the babeljs ecosystem on the JS source instead and just find the global variable with name `__STATE__` and just eval its init expression. Cheaper, more secure, more direct than actually running the JS.
I actually looked into this (from reading docs, never wrote code) and I wasn't able to find a way to convert the AST for the ObjectExpression into JSON or an actual Javascript object.
What you need is a code generation library that will turn the AST back into JS code once you've identified which part of the syntax tree you're interested in. And that's the code you want to then eval(). Esprima has escodegen for that purpose. I'm not sure what the counterparts are in the babel world. Feel free to shoot me an email with any specifics of where you're getting stuck thinking this through (email should be visible from my profile?), and I'll be glad to help.
You can use the vm module [0] to securely execute the code.

[0] https://nodejs.org/api/vm.html