|
|
|
|
|
by axegon_
804 days ago
|
|
Overall great idea though, I'll be definitely checking it back in the future. A few things that hit me out of the box: * The idea behind using Serper is great, however it would be cool if other search engines/data sources can be used instead, ie. Kagi or some private search engine/data. Reason for the latter: there are tons of people who are sourcing all sorts of information which will not immediately show up on google and some might never do. For context: I have roughly 60GB (and growing) of cleaned news article with where I got them from and with a good amount of pre-processing done on the fly(I collect those all the time). * Relying heavily on OpenAI. Yes, OpenAI is great but there's always the thing at the back of our minds that is "where are all those queries going and do we trust that shit won't hit the fan some day". It would be nice to have the ability to use a local LLM, given how many and how good there are around. * The installation can be improved massively: setuptools + entry_points + console_scripts to avoid all the hassle behind having to manage dependencies, where your scripts are located and all that. The cp factcheck/config/secret_dict.template factcheck/config/secret_dict.py is a bit.... Uuuugh... pydantic[dotenv] + .env? That would also make the containerizing the application so much easier. |
|
Regarding the first version, we are currently working on enabling customized evidence retrieval, including local files. Our plan is to integrate existing tools like LlamaIndex. Any suggestion is greatly appreciated!
Regarding the second point, we have found OpenAI's JSON mode to be greatly helpful, and have optimized our prompts to fully utilize these advances. However, we agree that it would be beneficial to enable the use of other models. As promised, we will add this feature soon.
Lastly, we appreciate your suggestion and will work on improving the installation process for the next version.