Hacker News new | ask | show | jobs
Show HN: SuperDuperDB – Open-source framework for integrating AI with databases (github.com)
49 points by timwitho 930 days ago
Hi everyone, I’m Timo, one of the creators of SuperDuperDB !

Today we are officially launching SuperDuperDB, an open-source framework for integrating AI directly with major databases, including streaming inference, scalable model training, and vector search, with release of v0.1. on GitHub and on ProductHunt.

SuperDuperDB is not a database. It transforms your favorite database into an AI development and deployment environment (𝘮𝘢𝘬𝘪𝘯𝘨 𝘪𝘵 𝘴𝘶𝘱𝘦𝘳-𝘥𝘶𝘱𝘦𝘳).

SuperDuperDB eliminates complex MLOps pipelines, specialized vector databases - and the need to migrate and duplicate data by integrating AI at the data's source, directly on top of your existing data infrastructure. This massively simplifies building and managing AI applications.

SuperDuperDB provides a simple Python interface, but allows experts to drill down to any level of implementation detail such as models weights or training details.

Today’s release comes with the full integration of major SQL databases as well as further MongoDB support: PostgreSQL, MySQL, SQLite, DuckDB, Snowflake, BigQuery, ClickHouse, DataFusion, Druid, Impala, MSSQL, Oracle, pandas, Polars, PySpark, and Trino.

Currently Supported AI: Any model from PyTorch, Sklearn, HuggingFace as well as AI APIs such as OpenAI, Anthrophic, Cohere.

A few useful links: - Our website: https://superduperdb.com - Getting started docs: https://docs.superduperdb.com/docs/category/get-started/ - Our repo on Github: https://github.com/SuperDuperDB/superduperdb

Check the uses-cases that we have already implemented here https://docs.superduperdb.com/docs/category/use-cases as well as apps built by the community here https://github.com/SuperDuperDB/superduper-community-apps and try all of them with Jupyter your browser https://demo.superduperdb.com/

For more information about SuperDuperDB and why we believe it is much needed, read the blog post https://docs.superduperdb.com/blog/superduperdb-the-open-sou...

We are keen to hear your feedback!

All the best, Timo

13 comments

This looks like a decent project, but I really didn't appreciate waking up to messages from mods on my community notifying me that you joined lastnight, ignored my welcome message, request of giving us an intro to yourself and what you're working on, didn't bother reading the community rules and then just blatantly started spamming my users to upvote you on ProductHunt.
like others who are complaining here about spam from this project I too just got a spam email from them to my personal email...

SuperDuperDB fernando@superduperdb.com via sendinblue.com

and "Fernando" is Fernando Guerra, their "Business, Marketing and Growth"

not cool and immediately put me off

Fernando already replied to this thread acting like he has no connections. https://news.ycombinator.com/item?id=38530381
which is even more silly since he wrote about working there in another post of his 29 days ago https://news.ycombinator.com/item?id=38160306
We are really sorry for the inconvenience. The assumption for contacting was around thinking you would be interested in the project. Again, we are very sorry for bothering you, this will never happen again.
This project just sent me unsolicited DM spam on a Slack for a tangentially related project. I strongly encourage avoiding this project as they are using spam and harassment to promote their project. From the sounds of another comment on this thread, the community this project harassed me in is not the only community they have done this in this morning.
Congrats on the launch, it looks like you worked very hard on it.

But I’m an engineer, I read the README and the website, and I still don’t know what Super-duper is.

Is it just a python library? Does it have its own persistence (it must)? It doesn’t appear to be a set of plugins for various DB’s but I could be wrong.

As such I don’t know how I’d use it. It might be helpful to describe the product in more concrete terms.

One of the phrases used in YC is: ACME makes soup taste better. We do it with a seasoning that chefs add to their broth.

Maybe that’s helpful. Explaining a product can be hard!

All things aside, it's a framework for building data workflows in Python.

Like taking the data from that source (e.g., SQL), processing them (e.g., pytorch or openai), and storing the results somewhere (e.g., data on Mongo metadata on SQL).

It actually consists of the following: 1. nifty abstractions for Data (e.g., sources, encoders, listeners), Metadata (e.g., vector indexes), Compute (e.g., sync, async, parallel). 2. gluing engine that transparently handles the interaction between components 3. out-of-the-box integrations with established tools (databases, AI models and APIs, compute engines)

This way, you can build customized data layers that sit on top of your database and save you from moving the data to dedicated systems (e.g., vector databases or MLops tools)

For further discussion, feel free to join our slack https://join.slack.com/t/superduperdb/shared_invite/zt-1zuoj...

Ah, it’s a langchain competitor, possibly with better DB support.

One of the nice things about langchain is the code examples, making it easy to get simple services up and running. And because it’s a toolkit I can take what I need and leave the rest.

However, the ecosystem around langchain is really exploding, is there some way you can retool what you have to extend langchain with better DB support, rather than build your own thing?

Indeed both frameworks support model chaining.

However, achieving goals like "training your LLM" or enabling "real-time inference" requires more than just pipelines. For that, we have invested in enhancing compatibility with databases and facilitating parallel computing.

About your last point, I 'm not sure I fully understand. Do you mean to write a guide for moving lang-chain models to superduper? Or to create superduper wrappers for langchain ? Or to move the core functionalities of superduper to langchain ?

The guide, is something have in our immediate plans. The wrappers are under discussion. The latter I don't think it's possible due to architectural differences. For example, superduper is designed with multi-node environments in mind.

By connecting AI models with the data's source (the database) we make it very easy to bring AI to your end-user-facing applications.

SuperDupeDB is really an end-to-end AI development and deployment framework wrapping and integrating your existing data infrastructure. It replaces MLOps entirely as it covers inference and model training.

I think this is very smart. Instead of a big ML ops stack, do your model stuff where your data is.

For us, the fact it supports Postgres and SQLite out of the box is awesome. It means we can run a model on the server to generate embeddings, sync these to a local SQLite and then run local inference on the same data structure.

Thanks for commenting :),

Sure you can have a mesh of database, Have your input data in one data technology, run models on a distributed compute, save models on different database tech and manage the jobs on other!

Looks pretty cool. I haven't yet gotten into training any models at scale, but this seems to reduce the cognitive load associated with MLOps. Will start hacking on it this weekend.

The upsides to this are pretty compelling. Any downsides vs traditional MLOps?

The idea is to have a single scalable deployment in one environment directly on top of your existing database, containing all your AI that you use in different use-cases and applications.
So I have 1.6M rows of title, body and author. Can I use SDDB to embed all rows with a local model and then do semantic search as a query? And how can this scale past a single machine?
Thanks for commenting :)

Absolutely you can query the embeddings of 1.6M rows, you can try with lance or in memory vector search type. The scalablity will depends on embedding size, machine configurations, etc. Thanks

Looks cool but I have a question, I was checking the QnA example. Why do I need openAI? Can't I directly chat my data using superduper db?
SuperDuperDB allows you to integrate AI models and APIs. In this case we use the OpenAI API as an example. You can switch for other supported APIs, or models, and even bring your own models and APIs.
Pardon my ignorance but I thought that it will let me question my data without using anything else
No no, you will always need to bring the right models or APIs that can do that. SuperDuperDB is just the framework that allows to integrate them with your database seamlessly and easily.
Can you suggest some free LLM model that I can use in it directly?

Also, does it support Bard APIs?

Perhaps a starting point would be the models on huggingface.co
This is great; we will try it out. I will get on the Slack in the coming days to have a chat!
That's amazing!, Please try out the Jupyter demos :)
Looks like it uses pylance for vector database.
We use PyLance for performing searches in databases which do not have a vector-comparison engine. However we use the primary database as the store of vectors and model outputs.
Cool project,I will test it in the future
Awesome!
thats very cool, you can create several projects with it
Your comment history would suggest that you work on the project?