Hacker News new | ask | show | jobs
by paddy_m 607 days ago
This looks good, especially the data ingest.

A couple of questions:

What type of data person are you aiming this tool to be useful for, data analyst, data scientist, or data engineer. I'm guessing data analyst who wants to use PyData instead of Tableau?

How data aware is your system? Is it sending dtypes of columns, and maybe actual values to the LLM?

How do you deal with python code gen and hallucinations?

Do you plan to make a jupyter extension with this tech?

1 comments

1. when started out, we have Tableau/PowerBI users as the main audience in mind, hoping to grant them the power of data analysts who program in python to create charts requiring data transformation. but as we are building the tool, data formulator more or less are most powerful for people work with python data as you mentioned, since they can more easily issue instructions, verify results and followup.

from our user study, it seems like experience with data analysis (i.e., know how to explore is the most important skill) either they know programming or not, software engineers without data analysis experience sometimes struggle using it.

2. check out how it's build in our paper (https://arxiv.org/abs/2408.16119)! but generally data types and sample values

3. hallucinations is unfortunately a big challenge! our main strategy is to (1) restrict the code space that AI can transform data so it's less likely to go wild, and (2) provide much information to the user as possible (code + data output + chart) to let them discover and (3) let them easily refine using data threads

4. yes! we have some plans in the bag and about to figuring it out!