Hacker News new | ask | show | jobs
by alchemist1e9 761 days ago
This is a step forward! Forget the detractors and any negative comments this is a small peek into the future which will include automated research, automated engineering, all sorts of tangible ways to automate progress. Obviously the road will be bumpy, with many detractors and complaints.

Here is a small idea for taking it one step further in the future. Perhaps there could be an additional stage where once the initial data is analyzed and some candidate research ideas generate, a domain knowledge stage is incorporated. So Semantic Scholar API helps generate a set of reference papers currently, instead those papers could be downloaded in full, put into a local RAG, and then have agents read in detail each paper with the summary of the current data in context, effectively doing research, store it’s summaries and ideas in the same RAG, then combine all that context specific research into the material for the further development of the paper.

There is a link to awesome-agents and I’d be curious what their opinion is of various other agent frameworks, especially as I don’t think they actually used any.

For my proposed idea above I think txtai could provide a lot of the tools needed.

2 comments

This is a super cool idea! We have considered implementing a variation of what you suggested, with the additional feature of linking each factual statement directly to the relevant lines in the literature. Imagine that in each scientific paper, you could click on any factual or semi-factual statement to be led to the exact source—not just the paper, but the specific relevant lines. From there, you could continue clicking to trace the origins of each fact or idea.
> This is a super cool idea!

Thank you. I’m honored you found it useful.

> From there, you could continue clicking to trace the origins of each fact or idea.

Exactly! I think you would like automated semantic knowledge graph building example in txtai.

Imagine how much could be done when price/token drops by another few orders of magnitude! I can envision a world with millions of research agents doing automated research on many thousands of data sets simultaneously and then pooling their research together for human scientists to study, interpret and review.

thanks! indeed currently we only provide the LLM with a short tldr created by Semantic Scholar for each paper. Reading the whole thing and extracting and connecting to specific findings and results will be amazing to do. Especially as it can start creating a network of logical links between statements in the vast scientific literature. txtai indeed looks extremely helpful for this.
Excellent! I’m glad my input was interesting.

txtai has some demos of automated semantic graph building that might be relevant.

I noticed you didn’t really use any existing agent frameworks, which I find very understandable as their value added can be questionable over DIY approaches. However txtai might fit better with your overall technology style and philosophy.

Has your team studied latest CoT, OPA, or research into Cognitive architectures?

thanks. will certainly look deeper into txtai. our project is now open and you are more than welcome to give a hand if you can! yes you are right - it is built completely from scratch. Does have some similarities to other agent packages, but we have some unique aspects especially in terms of tracing information flow between many steps and thereby creating the idea of "data-chained" manuscripts (that you can click each result and go back all the way to the specific code lines). also, we have a special code-running environment that catches many different types of common improper uses of imported statistical packages.
“data-chained” will be very valuable, especially for the system to evaluate itself and verify the work it’s performed.

this is obviously just my initial impression on a distracted Sunday but I’m very encouraged by your project and I will absolutely be following it and looking at your source code.

The detractors don’t understand LLMs and probably haven’t used them in the way you have and I have. They don’t understand that with CoT and OPA that they can be used to reason and think themselves.

I’ve used them for full automated script writing, performing the job of a software developer. I’ve also used them to create study guides and practice tests, and then grade those tests. When one implements first hand automated systems with agent frameworks using the APIs it gives a deeper understanding of their power over the basic chat usage most are familiar with.

The people arguing that your system can’t do real science are silly, as if the tedious process and logical thinking is something so complex and human that the LLMs can’t do it when used within a cognitive framework, of course they can!

Anyway I’m very exited by your project. I hope this summer to spend at least a week dedicated to setting it up and exploring potential integrations with txtai for use on private knowledge bases in addition to your public Scholarly published papers.

and yes we are implementing CoT and OPA - but surely there is ton of room for improvements!