Hacker News new | ask | show | jobs
Show HN: PennyWhale – Use natural language to search for any financial data (pennywhale.com)
91 points by coffeejay 4359 days ago
24 comments

This is cool!

We've been working on something similar, but the opposite direction (for world trade data). Instead of trying to NLP our way out of the problem, we pre-generate and index a bunch of possible questions, and let full text search handle the rest.

It's interesting, because theoretically NLP should be able to "understand" what you mean but in reality I find that even if you parse sentence structure and extract some meaning, you're still at some level hardcoding the possible things that can be queried into the code.

So it's a neat tradeoff of whether it's more worth it to create a mini query language, or go full natural language, or go somewhere in between.

...

Anyway, try it out by clicking on the title (keeping it a bit hidden for now for testing purposes): http://atlas.cid.harvard.edu/explore/tree_map/export/usa/all...

Things you can try (mix and match too!):

- "wine italy" - "france" - "germany spain" - "germany export wine 2002 to 2012" - "turkey feasible"

...

If you want to see the code, check out our github:

https://github.com/cid-harvard/atlas-economic-complexity/blo... (search view)

https://github.com/cid-harvard/atlas-economic-complexity/blo... (indexer)

Apologies for any mess, I recently joined and we're undergoing a huge overhaul right now.

Couple of years back, I implemented a natural-language-ish solution for querying ERP data. My approach was to

a. have structured queries that can be precisely parsed

b. provide query completion to guide the user while entering the query

It turned out quite well, if I say so myself. But I never got the time to market it.

It can be seen in action here: http://nlq.lavadip.com/servlet/demo

A cool system but I'd suggest this is NLP to the extent SQL or a similar (or somewhat better) query language is NLP. You've made queries in your query language easy but it's more like interactive programming than free-form NLP.

Not that it's a bad application, it's nice, it's just if one extended this model, one would wind-up with a query language, not something new.

Thanks for the praise!

Totally agree; it's not true NLP.

Moreover, true NLP is currently not achievable. We would need an algorithm that passes the Turing test to infer the meaning of a free-form statement.

Like my parent post said, every current NLP system uses some hard-coded assumptions. They just differ in the amount of assumptions.

this is pretty cool ! could you talk about the parser-generator and ER relationships ? I was looking to build something similar for a webapp and was stuck at how to generate the autocompletes.
The autocomplete is provided by my parser-combinator library which is designed in a very generic way. It can be used for any application, not just natural language queries.

About the parser generator: the ER description provides natural language phrases for every entity and relationship. From this the parser builder is able to create parser combinators. There are some hard-coded assumptions and parsers for common data types such as dates and numerical figures.

It's interesting, because theoretically NLP should be able to "understand" what you mean but in reality I find that even if you parse sentence structure and extract some meaning, you're still at some level hardcoding the possible things that can be queried into the code.

This is a good point.

One thing that I'd claim is that NLP is less useful than you'd think for single queries. I'd just suggest the thought experiment of being able to ask a highly knowledge person one question with no follow-ups. Even someone with a mastery of English and the data probably won't start out with the same idioms and approaches, so you'd have to phrase the question carefully and exactly ... just the way you have to spend a lot of effort writing a SQL or similar query.

If you can interact with such a knowledgeable person, you'd learn their expression style and they'd learn yours - after some interaction one has a very powerful effect. Until then, things are rather limited.

Congratulations on launching! Cool idea and a good start. I think since you're targeting the non-professional market here, you need to focus on user experience a bit. Some feedback:

- Found it a bit slow, but guess that's partially the HN effect.

- As per above, I'd make the queries ajax rather than refreshing the whole page every search, should improve the UX a fair bit.

- I expected autopredict when I started to type $... to predict a ticker, would be nice to guess company names as well if no ticker is entered. Probably an essential feature for non-pros.

- You need a clear signup link. And the redirect when you run out of free credits should go to the signup page with small link to login, not the other way around.

- Also look at highcharts.com, they have a stock chart product as well. Hands down the best JS charts, mobile friendly, highly customisable etc.

Thanks so much, this is great advice. Definitely all things we plan on doing.
+1 I like these suggestions given that I was expecting the exact things :)
It's a pity you have to specify a ticker. I was really hoping it can do the hard work for me.

https://www.pennywhale.com/app/queries/execute?utf8=%E2%9C%9...

hahaha we'll add an easter egg just for you ;)

perhaps check out http://www.premise.com/ or https://kensho.com/

Congrats! Looks good!

Small issue: you're calling Google fonts over HTTP, but your site uses HTTPS, so Chrome is blocking the request. This leads to your fonts rendering in the default (Times New Roman here).

[blocked] The page at 'https://www.pennywhale.com/' was loaded over HTTPS, but ran insecure content from 'http://fonts.googleapis.com/css?family=Montserrat:400,700': this content should also be loaded over HTTPS.

Does indeed look good.

Have to say, love the logo. Think it would drop into a favicon nicely too..

Congrats

I don't know about the logo. It immediately made me think of docker. That could be an issue.
Thanks for the catch!
Honest feedback from a quant working for an asset manager: This is not (yet) useful. None of these queries even come close to being a real question a financial researcher would ask.

What's a real question like? here are examples which your system doesn't even try to handle.

* companies in S&P500 with P/E less than 15

* companies in S&P500 with more than 20% of their revenue coming from China

* $TSLA Beta with technology sector (returns the wrong answer).

* https://www.pennywhale.com/app/queries/execute?utf8=%E2%9C%9... (this produced a serious error on your server)

If you want to answer trivial questions your product will fail. nothing you've done isn't done better by Google RIGHT NOW.

Not trying to discourage you but this just doesn't solve any existing problem. All you have is a pretty interface to a tool that does one thing poorly.

If you can actually solve 15% of the real problem with converting natural language questions into financial queries and return accurate, well formatted data, you will have a serious product. Right now you are nowhere close.

Pretty interesting effort, but a bit of constructive criticism:

- none of your query results have a date attached! While pretty, an earnings number without a date is like a chart without an axis.

- Just how natural can the queries be? "$AAPL sales estimates" did not return anything. Neither did "$CAT cost of capital". "quarterly $CAT earnings growth" gave me a single EBIDTA number", e.g. not what asked. Is it easier to start with simple dropdowns?

- Why the stocktwits $ prefix? Why cant a company name be used?

- Another "sad face" feature - every not found/no results query takes away a "free query" :(

On more positive note, things look really clean - good job on the design! A great start. Is the long-term idea here to build an accessible place for fundamental data research?

What value added are you providing in your premium product (aside from heatmap) over just going over to EDGAR archive?

Ping me directly - would love to provide more feedback!

Edit: fix formatting :)

Would love your email for feedback :) Or just shoot me an email at chintan@pennywhale.com. Thanks!
Will shoot you an email! One more thing: you seem to be hotlinking to images on nasdaq, e.g. apple pe ratio shows http://www.nasdaq.com//charts/aapl_per.jpeg - what determines is a datapoint is presented as a hotlink/image vs a number (like apple's price to book?).

Aside from as-of date on each datapoint, ideally you should provide attribution as well (anyone who worked in finance knows that all data sources provide different data for same query at any given time :)

Hey Jay,

This is really cool. One idea is it would be great if you could interface with EDGAR and gather the links to the various financial statements. The EDGAR database search/navigation is horrendous.

As another commenter said, it would be great if it could detect the company name or at least the ticker without the $.

Also, a failed query subtracts from the guest query allotment. That doesn't seem ideal as I'm first learning how to phrase my requests.

Lastly, mutual fund data would be awesome. Morning star rating, fund performance, fund assets, etc.

Thanks for the feedback!

We're definitely looking into interfacing with EDGAR as well as getting rid of the 'cashtag'. Mutual fund data is a great idea too - shouldn't be too difficult to tie in.

We have an API for a lot of EDGAR stuff & lots of it is free so feel free to give it a try: www.jivedata.com
This is pretty great. I am convinced there is lots of room in this space for services that cater to people who aren't quite in the Bloomberg terminal crowd, and yet would benefit from easy access to financial data for whatever reason. I've been working on a service to open up EDGAR data to more people: www.easyedgar.com (pre-launch). Not quite as fancy as Penny Whale but I think a lot of people just want the data in their spreadsheet and don't know how to do that.
Do you have to use one of their example questions for a free demonstration? I asked one of the questions and it asked me to log in.
What did you ask? There are a few premium features right now (holdings data, interactive financials). If you want to make an account I'll give you a free upgrade :)
I did the same thing as my first query and I assumed you needed login for everything and gave up. I suspect you will bounce traffic if it's not obvious what's premium.
I put in What are the holdings for $WF as an example. I wanted to see one of the big banks and how it showed versus yahoo finance/bloomberg sources.
I asked "Show me the Cash Flow Statement for $INTC". (I didn't expect it to work, but it would have been impressive.)
That would actually work in the premium version. Again, if you'd like, I'll hook you up with a free premium account - just shoot me an email at chintan@pennywhale.com
This is intriguing. Large implications when applied to private datasets ---- think Factiva, etc. There the data schema differ so wildly that natural language can really make it more useful. Publicly traded equities is the facile, simple place to start, obviously.

Can I spin off a search and get pinged on updates?

Love to learn more.

Private datasets is brilliant, great idea. As of now, updates aren't available but definitely could be in the future.

We'd love to chat more and get some feedback! My email is jay@pennywhale.com

It's slow to load at the moment, so I'll describe what it does.

PennyWhale lets you search, manipulate, and compare financial data with natural language queries along the lines of "how much cash does $GOOG have" and "show me the PE ratio for $AAPL".

Seems like a really good research tool!

I suggest including parameters mentioned in the book "The intelligent investor" praised by Buffet. For example, the earnings in last 20 years, total value of tangible assets, etc. Company information on value investing are hard to find, and this could be a great help.
Still very early stage, so we're working on collecting more data. Meanwhile, you can inspect sales growth on the financials - ex. 'Income statement for $TSLA'. If you're interested, shoot me an email at jay@pennywhale.com we'd love to make you a free premium account.
Neat. I can see myself using this.

Two bugs: "$brkb book value per share" gives a number that's way off. "$brk.a book value per share" and "$_x EV/EBITDA" yield apparent server side errors.

Thanks for the catches. We're still refining our data sources so we'll look into these.
When I tried the query: "show me the ratio of $SLV vs $GLD"

I get:

"We're sorry, but something went wrong.

If you are the application owner check the logs for more information."

Is this due to the usual HN overload? or did I hit a bad spot?

This looks extremely interesting!

Yeah, definitely HN overload. That query may not work though - you'll want to specify what ratio you're looking for (PE, PEG, EPS, etc)
Oh, I thought I could get just "the ratio", like, SLV price at date X / GLD price at date X.

Would that be too hard to implement? I would love being able to pull ratios out of (almost) any arbitrary data.

We definitely plan on doing that soon
How do you guys plan to compete with Google Spreadsheets?

I can already do this stuff in it (free) and just use the built in GoogleFinance calls (and their limit is 1000 per sheet).

Plus I have all the other advantages of a spreadsheet program.

I wasn't aware you could do this and I'm sure many causal investors aren't. Plus this is not as user friendly: https://support.google.com/docs/answer/3093281?hl=en
Possibly. But I suspect many casual investors (on HN anyway) might be familiar with Excel. In which case

  =GoogleFinance("GOOG", "price")
may not be all that complicated. Plus it helps that it's free.

Examples:

Function list - https://docs.google.com/spreadsheet/pub?key=0Ault2FD3uBwydEV...

Watch list - https://docs.google.com/spreadsheet/pub?key=0Ault2FD3uBwydG5...

Watchlist + holdings - https://docs.google.com/spreadsheet/pub?key=0Ault2FD3uBwydDl...

This is very interesting - thanks for pointing it out. I think one way we're working on being competitive is the breadth and depth of our research. Rather than just fundamental data points, we're working on integrating more complicated queries such as 'What's the weighted average cost of capital for $AAPL'. We view it as a great opportunity for automation of financial research.
How does this compare to Bloomberg?

From what I've seen, people in the financial services industry have gotten very good at using the Bloomberg Terminal commands to examine this kind of data.

Fundamentally, PennyWhale is targeted at a different market. While Bloomberg is tremendously powerful, it's also very expensive. We think there's an opportunity for the retail investors here.
FYI You can literally ask the same question to the terminal -- we process English language queries (and can resolve textual company names to tickers :)).
Searching for "pe ratio for google" in pennywhale doesn't return anything, while the same query in google actually returns the PE ratio for GOOG.
What is the source of the data? It seems crazy to provide EPS estimates (etc.) without the source.
We're pooling a lot of disparate sources (Yahoo, Morningstar, NASDAQ, etc.)
This issue has come up multiple times. Anyone that will pay you money for your tool will need the source of the information and the date associated with it. On top of this a link to where the data originated is likely going to be essential. You're tool _still_ hasn't broken past the minimum requirement of doing a single thing better than any tool that kind of solves this problem.
Looks pretty cool but I get the rails error page with this query:

how much cash does $APPL have vs $GOOG

not sure if this is the right place to report bugs (and/or if this is intended behavior), but if one clicks the COMPUTE button without a query entered, it decrements the guest query counter
"Show me the UK's GDP" doesn't.
Wow this looks great! Congratulations on launch.
Keep reppin' GT :)
Always.