|
|
|
How much did i screw up?
|
|
4 points
by LELISOSKA
3636 days ago
|
|
i know there are alot of programmers on this board, so tell me me how on a scale of 1 to 10 how much of a moron i am for using node.js and mongodb to do the following:
scrape 8 different social outlet / portfolio apis
sync the the scraped data with 3 different post model schemas as well as a user model using a key store database mongodb
all of that wrapped in promise library to avoid callbacks.
scrape data live and based on certian queries, save, sync etc.
everything wrapped in promises and i dont even know what will happen if two users run the same query async.
why would i ever use node.js instead of something like python or ruby or whatever and just split the stuff into threads instead of having to deal with retarded javascript async?? the only reason i did it is because javascript was the first thing i learned besides some c++ or whatever in a few semesters of college. should i be using a normal synchronous language like c# or java or something? |
|
So let's talk about solving this with NodeJS. First of all, you've allowed this to get overwhelming to you because you consider this whole thing to be a monolithic service, when it's really just time to break it into modules. It sounds like you have a scraper, a parser, some application which needs the scraped data, and you have a web API for that application. Cool, easy enough.
So the first thing I would do is pull out your scraper into a completely different service which has nothing to do with your API. Just have it be a javascript class which takes some configurations to manage the scraping settings, and that's about it. All you really need it to do is have a method which takes some input which tells it what site to scrape, and it will then give you the output.
Cool, now you need a parser. You need to go through the html returned by your scraper and format it as JSON. Fortunately, Javascript has a lot of tools very well suited for dealing with HTML. A tool I like is called Cheerio, which makes it very easy to parse HTML and get the data you need from it. So you have a parser now.
All that's left is integrating with your application now. Based off of you saying that you need to "scrape data live and based on certain queries", what we want is to expose some routes in your application which make that possible.
If you tell me a little bit more about what your application does, I can give you some idea of how I would structure this from here.