Hacker News new | ask | show | jobs
by pants2 830 days ago
It will be difficult to differentiate from Otter, Fireflies, Sembly, Noty, Krisp, Tl;Dv, Supernormal, Spinach, Fathom, Airgram, Noota, Tactiq, Vowel, Jamie, and whatever's built in to Zoom, Google Meet, and Teams already.

I evaluated many of these for my company and ultimately found that they were pretty much all bad due to lack of customizability. I would suggest anyone here looking for an AI meeting note taker to write their own:

1. Pull down the raw recording (using built-in recording or Recall.ai).

2. Run it through Whisper or Deepgram with a custom prompt and custom vocabulary.

3. Run it through a basic LLM to correct common transcription errors specific to your company's vocabulary and remove filler words.

4. Run that through GPT-4 or your favorite powerful LLM to generate notes, iterating on the prompt to tailor it to your industry and meeting style.

All of my coworkers agreed that the custom solution (which took me a day to slap together and is close to free) is dramatically better than any of the off-the-shelf meeting note takers we have used, no contest. Being able to customize the transcription to capture industry-specific terms, acronyms, or internal codenames, is an absolute must.

8 comments

We provide a compelling alternative to those considering building vs. buying in this space but there's nothing more customizable than building your own solution.

Right now, we provide the following customization levers: custom vocabulary (configurable from Settings → Account) and custom prompts (via insights you can define in workflows). We're also working on adding more control over the verbosity of the notes.

If you're looking to have full control over the outcomes generated, I would also suggest building your solution if the building/maintenance costs make sense. We're focused on providing an out-of-the-box solution that works really well with minimal setup (with customizations available for power users) and providing value beyond just summarization with search, workflow automation, and collaboration features (i.e. sharing, commenting) for teams.

If you do decide to try out Circleback, I'd love to get your thoughts as someone who's very well-informed about this space!

Is this satire? It really reads like that one infamous post on the Dropbox HN announcement...

https://news.ycombinator.com/item?id=9224

For every one person who'd want to do it the better DIY way, there are 10000s that can't (or won't), but would still find value in the service.

For the record I sync my files using a method similar to that user, rather than using Dropbox. I imagine plenty of HN users do the same. I'm making this recommendation to other hackers on here who don't mind getting their hands dirty.
"the dropbox comment" is noteworthy because it mischaracterized the business value of a more accessible solution.

nonetheless, there are ppl whose needs do not exceed their skills

For what it's worth, the new "Meet" add-in from Microsoft for Teams is remarkably pragmatic, pulling together transcription, speaker tagging, semantic content timeline, meeting assistant notes, more, into a common workspace for the meeting.

People are just discovering it this past quarter or so, but when they see the scope and utility of what it can do, they start recording everything.

Looks like you're talking about this one. Do you know if it is available to everyone or if admins have to it on?

https://support.microsoft.com/en-us/office/stay-on-top-of-me...

> which took me a day to slap together and is close to free

An engineer who can slap together a good meeting transcriber in a day is probably paid $2,000 a day or more. The opportunity cost of having them work on a random project is far more than the cost of their salary. The ongoing maintenance cost of the custom meeting transcriber is probably 5-10 engineer days over its lifetime. The added utility from slightly-better meeting transcripts does not compensate for the staggering engineering cost of the custom solution.

Circleback and other meeting transcribers will surely get better and surpass the quality of the custom transcriber. Then the team must spend more time switching to Circleback and deleting the custom transcriber.

I'm tending to agree with you... you have better data security, and nothing beats the best Whisper model.

However there's lots of manual steps here and it would definitely be more convenient to have a tool that just does all that straight away. Also, Whisper unfortunately does not (yet?) support diarisation (identifying which speaker is talking).

Totally agree on the baseline. We’ve found that adding multimodal data like what was onscreen to be a big help to improve over this, which is a little more complex. Helps more to add action data to like who was typing in what, where the mouse was, etc.

I’ve also been playing with pulling in knowledge base context or reading relevant web pages for unique words to create that initial prompt and custom vocab automatically.

Gotta add facial recognition too so the notes can include "Bob rolled his eyes slightly when Alice mentioned the new reporting procedures."
Lol. I should add that as an option you can toggle
Can you elaborate on the custom prompting? As in "You are a scribe attending a meeting of engineers discussing how to implement ElasticSearch. Please take this recording and create detailed notes of the discussion?"
Depending on your use-case it might include things like, "be sure to highlight the contributions and points of view of each participant", or "come up with a list of action items and who's assigned to each one", or "ensure the summary doesn't include confidential details about our new product".
> 3. Run it through a basic LLM to correct common transcription errors specific to your company's vocabulary and remove filler words.

Do you just include a list of common errors in the prompt? Or have you trained a model for this?

I would love to train a model, but no, it's just common errors that I've encountered. It can also fix it in context. For example if you're talking about the AI 'Groq' but the transcription always says 'grok', then you can, in this step, ask the LLM to fix that error if the context is appropriate (i.e. it's being used as a noun).
I do this with just a prompt setting context of the meeting if we have it, explaining this is a spoken meeting between n people, and asking to correct based on the entire context, acutally works decently out of the box.
Damn, I just tried this with ChatGPT (warning: contrived example) and it works surprisingly well:

PROMPT:

The following is a partial transcription of a meeting at a tech company. Please correct any transcription errors, using clues from context as well as the fact that this is a technical meeting.

Person A: we shouldn't use a sequel database to store this data, we probably want to use something Callum nor instead since it's more efficient than row-based storage. Something like a patchy park or arrow.

Person B: What's park, eh?

Person A: It's just a column-based storage format with efficient Jesus or snappy compression.

RESPONSE:

Person A: "we shouldn't use a SQL database to store this data, we probably want to use something columnar instead since it's more efficient than row-based storage. Something like Apache Parquet or Arrow."

Person B: "What's Parquet?"

Person A: "It's just a column-based storage format with efficient GZIP or Snappy compression."

EDIT: mistral 7B on the other hand botched this terribly, and made up a lot of nonsense unrelated to the transcript.

Yeah it’s pretty neat and can be improved with more context for technical terms. You do risk some hallucinations
Fantastic example.