| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by voodooEntity 1098 days ago

Well you could say its for "data driven processing" and probably best suited for any kind of data processing especially data gathering/enrichment. What you could build with it is only limited by your imagination. Tho i will give a simple example (the one i gonne use for the example project i gonne provide).

A webcrawler. What does a webcrawler do? It expects a domain (data) and crawls for more data - analyzes it and enriches your collected data. You may end up with writing multiple plugins like.

- resolveIpFromDomain (takes Domain returns Domain->IP)

- detectWebserver (takes ip uses for example nmap to scan ports+banner) returns ip->port->software->(banner,state)

- detectVhost (takes ip->port->(software[webserver],state[open]) || domain->ip->(software[webserver],state[open]) and returns ip->port->(software[webserver])->[]vhost[]->page[/] ) -> loadPage (takes page loads it with curl and return page->content)

- extractLinks(takes page->content return page->content->[]link)

- loadLink (takes vhost->page->link returns vhost->[]page )

- extractMedia (takes page->content return page->content->[]media)

- analyzeMedia (takes page->content return page->content->media->[]attribute)

..... So what you do is you provide a domain, which will trigger resolveIpFromDomain. This will map the data back to the datahive and based on the Ip in new data trigger detectWebserver. This will return found webservers which triggers the requirement of detectVhost. At this point you probably see how its going.

Due to how the architecture works it will always maximum parallelize the work, it will always map the data into one big structure without you having to care about it, it will only execute things that are necesary/usefull.

So the more your software should branch/parallelize the more gain you get.

Tho as i mentioned in my original post ill release the first alpha so there is still things that can be extended and improved. And right now im spending time in writing the docs which will probably take me some more weeks in orders to make them good enaugh for people to understand how to use it by themself.

I mostly will release it because i think its a great showcase of how you can do optimized data driven processing while havin an architecture that cares about the most painfull things like data mapping / parallelization / etc. I dont expect it to be the next "big thing" or even beeing used by alot of people, but if it inspires people or someone maybe write a even better version based on the idea i would be happy already .)

So to come back to your original question - can it host a website? Probably - but not really meant to do it and a nginx would serve u better.