Hacker News new | ask | show | jobs
by axegon_ 804 days ago
Overall great idea though, I'll be definitely checking it back in the future. A few things that hit me out of the box:

* The idea behind using Serper is great, however it would be cool if other search engines/data sources can be used instead, ie. Kagi or some private search engine/data. Reason for the latter: there are tons of people who are sourcing all sorts of information which will not immediately show up on google and some might never do. For context: I have roughly 60GB (and growing) of cleaned news article with where I got them from and with a good amount of pre-processing done on the fly(I collect those all the time).

* Relying heavily on OpenAI. Yes, OpenAI is great but there's always the thing at the back of our minds that is "where are all those queries going and do we trust that shit won't hit the fan some day". It would be nice to have the ability to use a local LLM, given how many and how good there are around.

* The installation can be improved massively: setuptools + entry_points + console_scripts to avoid all the hassle behind having to manage dependencies, where your scripts are located and all that. The cp factcheck/config/secret_dict.template factcheck/config/secret_dict.py is a bit.... Uuuugh... pydantic[dotenv] + .env? That would also make the containerizing the application so much easier.

2 comments

Thank you for your suggestions, axegon!!! We will definitely consider them and add the features in a future version shortly.

Regarding the first version, we are currently working on enabling customized evidence retrieval, including local files. Our plan is to integrate existing tools like LlamaIndex. Any suggestion is greatly appreciated!

Regarding the second point, we have found OpenAI's JSON mode to be greatly helpful, and have optimized our prompts to fully utilize these advances. However, we agree that it would be beneficial to enable the use of other models. As promised, we will add this feature soon.

Lastly, we appreciate your suggestion and will work on improving the installation process for the next version.

Dead internet.
Have to agree with you, every comment from the product creator reads like a chatGPT response.
To me it sounds like someone who speaks English as a second language, writing well and clearly and in a formal style. It's just unlucky for them that that's the style GPT is so good at too.

I'm a native English speaker, and traditionally when it comes to formal/professional written English (emails etc.) my instincts take me to sounding quite GPTish - luckily I've got a good grasp of the language and have found it fairly easy to alter my formal writing style to be a bit less traditional and a bit less formal too, but if it wasn't my first language and I wasn't a fair bit above average in writing ability even for native speakers, I suspect it wouldn't be nearly as easy to go against how I was taught at school to write in formal situations.

It's really not enough to see that somebody writes roughly in that style to assume they're using LLMs, because the reason LLMs so often sound like that is because they've learned from humans very often sounding like that.

In an example such as this particular case, it maybe set off your LLM suspicions because culturally you wouldn't expect somebody to sound so formal in comments on a site like HN, and choosing the wrong tone of voice for the context is something an LLM is likely to do - but actually, if a) English isn't your first language nor part of your primary culture, and b) you're wanting to make a good impression as the subject of the thread is something you've created and are therefore essentially acting as a spokesperson for in the comments, then all of a sudden writing formally rather than as if writing throwaway forum comments makes sense rather than looking like an indication that AI wrote it.

+1 reads like a non native speaker writing very polite and formal prose to a customer. ChatGPT has a very peculiar way of speaking that belies a psychotic mind plotting your enslavement in a global labeling farm.
I wouldn't say it's formal. It's the overly optimistic tone and 100% coverage of the parent post. The fact we can't tell for sure emphasizes my point.
I will take it as a compliment, lol. But I do hope ChatGPT or some agents could help me with this. Btw, our recent study on machine-generated text detection might be interesting to you.

https://arxiv.org/abs/2305.14902 https://arxiv.org/abs/2402.11175

I fully expect some sort of enshittification of openai at some point.
That's assuming it's not done already with their mission of being open completely forgotten