Hacker News new | ask | show | jobs
by aarnphm 1095 days ago
Hi all, I'm the main maintainer from the OpenLLM team here. I'm actively developing the fine-tuning feature and will release a PR soon enough. Stay tuned. In the meanwhile, the best way to track the development workflow is at our discord, so feel free to join!!
6 comments

Thanks for the great project! Any chance, your team might consider more open platform than Discord for posting updates? I personally find Discord hard to use, and there’s no way to have sensible subscription (like RSS). Discord is usually muted.
Discord is a black hole where information goes to die. Its search and scrollback is awful. It's awful at being an archive, as finding anything that was asked more than a day or two ago is impractical.

To use Discord in good faith and with open eyes, you have to prioritize communication in the present, and give up hope of archiving anything that was said for people who might need the information in the future.

Discord is just a rich IRC replacement. You can log and search in IRC too but nobody seriously tries to archive information for research later. And big difference is it's all closed and operated by one entity that can change conditions at will. Don't even try to use it for anything else than real time chat.
"Discord is just a rich IRC replacement"

That's only half true. Yes, Discord does allow a "rich" chat experience, with channels and servers, but there the similarities end.

IRC is based on an open protocol, with many open source clients available for it, and a decentralized server infrastructure.

Discord is closed and centralized, with only a single client available for it.

You can easily log IRC channels, but there is no easy way to do that on Discord, if it can be done at all.

I've logged every channel I've ever visited on IRC, and I can use powerful text tools to regex search through all of my conversations on IRC and have the results appear instantly. Nothing remotely like that is possible with Discord.

Paging through IRC logs is virtually instant on a modern terminal, while Discord makes you wait a long time between every other page load, so if you need to look through more than a handful of pages it's incredibly slow and painful.

Some IRC channels have their logs published on the web, making them fully searchable through web search engines, but to my knowledge no Discord channels do that.

What happens in Discord stays in Discord.

Greping through IRC logs has a 10x better UX
Furthermore, you risk getting banned for deleting messages you wrote in the past
For gaming communities (where you'd use voice chat), Discord was great. Easy to set up, free as in beer, runs in cloud. The alternatives back in the days did not have these features. They were either expensive (Ventrilo) or bad quality (Ventrilo and Skype latency/quality) or proprietary (only Mumble wasn't, TeamSpeak, Ventrilo etc were) or lacked community features (Ventrilo) or these were very archaic (TeamSpeak, Mumble) or you'd have to self-host (all but Ventrilo). It was also before GDPR existed. So Discord happily used and abused that unique position.

Its a shame its being used for general communities who don't use or need the voice chat feature. Especially when its an official community for a place, given their stance on third party clients and privacy issues.

If you don't need voice chat, Zulip, Mattermost, Revolt, Discourse, and many other would suffice (Linen recently got featered on HN). If you do, I think even Signal would be suffice these days.

For Discord search, recently Answer Overflow was recently featured on HN [1].

[1] https://news.ycombinator.com/item?id=36383773

agreed.
I find their search amazing. What's your issue with it?
Here's just one issue:

They stem words aggressively, so searching for "repeater", which is a less common, specific term, gives you results including "repeat", a commonly used word. And there's no way to do an exact word search.

The issue is it's not indexed by Google
There was a recent post about an open source tool for indexing Discord content and making it available for Google search:

https://news.ycombinator.com/item?id=36383773

have you used google lately? might as well not be indexed with all the seo spam you get as top results
> have you used google lately? might as well not be indexed with all the seo spam you get as top results

I just googled "how to use openllm" as an example to test your thesis, and the results look very relevant to me.

https://www.google.com/search?client=safari&rls=en&q=how+to+...

Related: As an operator/mod/admin it's fairly straight-forward to bridge a Discord channel to Matrix (and, if one so desires, from there to IRC), allowing users not on Discord to participate. Conservative mods concerned about spam can start with an allowlist for which servers can join.

https://github.com/matrix-org/matrix-appservice-discord

I know this isn't a great time for reddit, but I just made this on your behalf:

https://www.reddit.com/r/OpenLLM/

I much prefer the HN/Reddit discussion format to Discord and even Stack Overflow.

plugging the open source and self hostable https://revolt.chat which i've found to have great UX and be very performant compared to discord.
I'm liking revolt. Thanks for the suggestion.
good alternative: https://www.linen.dev/
s/rd/urse/g
HAHA this was one of my panel interview questions at Goooog'

Q: "How do you do a search and replace for a string in VI"

Me: I cant recall right now, i'd just google it"

What an insulting interview question, I hope it was just in jest or at the end looking to pad the time

However, it did make me realize hidden therein is an actual interesting interview question, similar to the "describe what happens when you type an address into the browser's URL bar and hit enter": describe what happens after you type `:s/foo/bar` and hit enter. Followup version: what about `:%s/foo/bar`? The kind of thing that can be interesting to watch them reason through even if they don't know the answer, or even know what those syntaxes do.

Alt proposed answer "I'd install emacs".
Side question : why are people working on open source project communicating through discord a lot noawadays ?

are discord conversations persisted and indexed on search engines ?

I find Discord quite versatile and a bit overwhelming at the same time. As to SEO, see https://news.ycombinator.com/item?id=36383773

AFAIK most of the gamers choose it for voice chat (Anyone remember TeamSpeak?)

In Europe, TeamSpeak is still very popular.
I used to play EVE Online a fair bit, and always thought it interesting how some of the groups used Discord but only for text communications. Voice was still done over Teamspeak or Mumble.
When I played EVE, Mumble was the de facto voice comms since it supported 100s of pilots which happened many times during joint ops and xmpp for text chat and pings.
My understanding from asking several people, since I hate discord and want to know why people insist on using it, is that it’s a free alternative to Slack. Simple as that.

But it’s crazy, people are aggressive about Discord for some reason. I maintain an OpenAI SDK package for .Net, and I had some random person decide they wanted it to be a Discord community, so they created a Discord claiming it was the official community discord for my library, and submitted a PR updating my readme to say that it’s my project’s official Discord. They also replied to several issues and pull requests telling people to discuss it on that discord. If Discord isn’t paying this person in some guerrilla marketing tactic, they should be...

Because it's easy, free and it just works.

Very few people actually care about indexing the conversations.

So all knowledge is lost and questions have to be asked and answered again and again?
That didn't stop IRC being popular in the 1990s.

There has long been a place in the ecosystem for ephemeral chat. Often alongside non-ephemeral things like written documentation.

People didn't put documentation in IRC channels because they didn't want to answer the same questions over and over. Info went into a wiki, and you would get flamed for asking a question on IRC that was answered on the wiki. Discord is not a good place to stash documentation.
It's ok you get scolded for asking an FAQ in many Discord "servers" as well.
> That didn't stop IRC being popular in the 1990s.

IRC chats, especially in opensource projects channels, could and would be archived, published over the web and indexed by search engines.

In my experience, I don’t think I’ve ever seen an IRC log in a search result.

#haskell on Libra is publicly logged, but I couldn’t get Google to return a quoted phrase from a message a few weeks ago.

Many people on IRC don’t enjoy being in logged channels. I’ve also heard that there are GDPR implications to publicly logging people’s messages without their consent.

Discussion of the difficulty and downsides of IRC logging, from a coulple years ago:

=> https://news.ycombinator.com/item?id=22892015

=> https://web.archive.org/web/20200417001532/https://echelog.c...

The HN blowback to developers choosing to use Discord is just wildly out of proportion.

So just like Discord then..
Also monks being the only ones who can read and write didn't stop religion to be popular in middle ages.

/s

C'mon

Isn't there something really nice about it though? It seems to me that most every community gradually evolves into one where every new message from a new-ish member is answered by something like "Duplicate, please search first!". And this in turn makes those newcomers either go away, become passive lurkers, or become part of the "hive-mind" (as only likeminded questions get answered).

On the other hand, if people have to actually converse to get an answer to their questions (like back in the real world), newcomers can more rapidly become part of the community, and help make it more diverse.

I just recently saw a post where someone said something similar about Reddit versus traditional forums.

There's a balance between engaging with new members and not turning it into a time sink for older members. This is probably a good use case for LLMs.

LLMs could indeed address the first part, but not the second, of bringing the newcomers in via actual conversation with the older members. The only good solution I encountered to this is of having some (preferably not too experienced) member(s) actively take upon themselves the role of welcoming newcomers and answering their questions, whether that's in an official or unofficial capacity.

This to me is the real way through this "Eternal September", where in every "cohort" of newcomers, one or more choose to stay close to the doorway to welcome and guide the next cohort.

The best of both worlds - a friendly community that welcomes newbies, with a searchable archive - is possible. Limiting to only chat-based support means that support is bottle-necked by the folks who are available and engaged at the time of the question, and that knowledge will "drop out" of the community as people forget it.
Apologies for my skepticism, but is it just "possible", or do you actually have an example of a long-lived community that remained fully welcoming to newbies while utilizing a searchable archive?

In any case, I'm not arguing that it's impossible, but rather that the more comprehensive the archive, the less welcoming the community would tend to be, all other things being equal. To take it to the extreme, I'll posit the following law: "A well-curated archive is the grave of a community"

No, questions don't have to be asked and answered again and again, because all the knowledge is lost, full stop. No one would know anything.
Not lost enough to use as a transient space for sharing secret intelligence reports.
Maybe it doesn't matter because this type of knowledge is relevant for current week only?
indexing conversations is secondary for gaming but primary for FOSS projects and Discord sucks at that. its like wiping your ass with a fork.
short answer: because it's one of the options with least friction to get running

a lot of people who are into tech stuff already have a discord account making joining the community a one click process, the instant nature of it seems to appeal to younger users more than async forums, it's a fairly mature platform so it has a bunch of moderation/customization/integration features you might want, etc.

> are discord conversations persisted and indexed on search engines ?

nope (and that is a drawback many point out)

Isn't it a generation thing? If I had the choice everyone would be on IRC still.
I've used IRC for a long time and still do, but I do think Discord has a nicer UX for most use cases. In particular, building communities around clusters of channels ("servers") and support for rich media (yes, some old people might call that a downside) increase the appeal for most people. It's also a lot more work to have a persistent connection on IRC (bouncers).

My main problem with Discord is that it's someone else's centralized, for-profit company and has no apparent barriers to enshittification[0]. As Reddit recently demonstrated, it's probably a mistake to build communities on top of something like that.

Matrix is a good candidate for a modern successor to IRC. It's not quite as slick a UX as Discord, but it addresses the main advantages Discord has over IRC.

[0] https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys

Matrix would be my choice as well but good luck getting people to use the uglier alternative. Discord is great.
Fue problem with IRC is that it's crucial to have really robust read state synchronization across desktop and mobile these days.

Slack was the first to really get that right, and Discord effectively emulated them and made it available for free.

IRC users could get there with bouncers, but those were always a lot harder to get going with.

Practically all of my friends grew up with IRC, we are in our late 30s, 40s, early 50s.

We might reminisce about irc but we all prefer discord.

Even the searchability of indexed irc has been surpassed by other knowledge sites. It would have to be something extremely niche these days where the only source of info is in an irc chat log

IRC doesn't even have history, one of the most basic requirements for a modern rudimentary chat app. It's ridiculous to suggest using it in 2023 when it doesn't have features a freshman homework assignment chat app has.
1.People like talking to each other on discord. 2. No. :/
What's the rationale for telemetry tracking?

https://github.com/bentoml/OpenLLM/blob/main/src/openllm/uti...

They have a section about it in the README:

https://github.com/bentoml/OpenLLM#-telemetry

Very cool, btw it's not mentioned in the readme so I assume it's only for running full precision models or do quantized GGML/GPTQ/etc. also work with it?
Hi there, 8bit and 4bit is currently supported on main. GPTQ is working in progress, as well as GGML
GPTQ support would be amazing (AutoGPTQ is an easy way to integrate GPTQ support - it's basically just importing autogptq and switching out 1 line in the model loading code).
How can we stay tuned if we can't do tuning? :P
Fine-tuning is coming up in the next release!

You can actually try it out on the main branch :P