Hacker News new | ask | show | jobs
by voodooEntity 1102 days ago
Well im about to release the alpha but basicly its a project i worked on ~4 years in privat time and consists of two parts.

1. A custom in-memory graph storage/database which is threadsafe and designed for fast multithreading purposes. It also comes with a custom query builder/language which can be transportet via json so viable for every language. It can either be used by directly importing it as dependency in your go project or build as server with a http api package i build and used via api.

2. An architecture/framework which enables you to create completly self supervising software that basicly only persist of you defining a set of "abilities" which you do in form of go plugins. A plugin defines the parameter structure it needs to get executed. The architecture uses the in (1) described graph backend to store all the data. If data gets mapped into the storage the architecture will effective check if the new data can supply any of the registered plugins, and if so execute it. Since the architecture runs multithread by default all jobs will be parallelised into worker/runner threads. Results from those will be automaticly checked by a scheduler to see if it can supply new jobs with the new knowledge (and optional added already existing knowledge). Also data returned from a plugin will automaticly be mapped into the existing storage. Since all the runner do scheduling based on new data you have no central supervising but rather thread-distributed self supervising.

This way you basicly just define certain abilities as plugins and than insert the starting data. The architecture/framework will take on from there.

This was a strongly simplified explaination for what it does and i coded alot of different stuff in the last 15-20 years tho this project by far was the most complex in terms of dynamic data mapping / scheduling / etc. So complex in terms of logic rather then in size.

1 comments

>This was a strongly simplified explaination for what it does

You missed the part about what the end result is supposed to be. What is the purpose? Can I use it to run a website?

Well you could say its for "data driven processing" and probably best suited for any kind of data processing especially data gathering/enrichment. What you could build with it is only limited by your imagination. Tho i will give a simple example (the one i gonne use for the example project i gonne provide).

A webcrawler. What does a webcrawler do? It expects a domain (data) and crawls for more data - analyzes it and enriches your collected data. You may end up with writing multiple plugins like.

- resolveIpFromDomain (takes Domain returns Domain->IP)

- detectWebserver (takes ip uses for example nmap to scan ports+banner) returns ip->port->software->(banner,state)

- detectVhost (takes ip->port->(software[webserver],state[open]) || domain->ip->(software[webserver],state[open]) and returns ip->port->(software[webserver])->[]vhost[]->page[/] ) -> loadPage (takes page loads it with curl and return page->content)

- extractLinks(takes page->content return page->content->[]link)

- loadLink (takes vhost->page->link returns vhost->[]page )

- extractMedia (takes page->content return page->content->[]media)

- analyzeMedia (takes page->content return page->content->media->[]attribute)

..... So what you do is you provide a domain, which will trigger resolveIpFromDomain. This will map the data back to the datahive and based on the Ip in new data trigger detectWebserver. This will return found webservers which triggers the requirement of detectVhost. At this point you probably see how its going.

Due to how the architecture works it will always maximum parallelize the work, it will always map the data into one big structure without you having to care about it, it will only execute things that are necesary/usefull.

So the more your software should branch/parallelize the more gain you get.

Tho as i mentioned in my original post ill release the first alpha so there is still things that can be extended and improved. And right now im spending time in writing the docs which will probably take me some more weeks in orders to make them good enaugh for people to understand how to use it by themself.

I mostly will release it because i think its a great showcase of how you can do optimized data driven processing while havin an architecture that cares about the most painfull things like data mapping / parallelization / etc. I dont expect it to be the next "big thing" or even beeing used by alot of people, but if it inspires people or someone maybe write a even better version based on the idea i would be happy already .)

So to come back to your original question - can it host a website? Probably - but not really meant to do it and a nginx would serve u better.