We've been working on something similar, but the opposite direction (for world trade data). Instead of trying to NLP our way out of the problem, we pre-generate and index a bunch of possible questions, and let full text search handle the rest.
It's interesting, because theoretically NLP should be able to "understand" what you mean but in reality I find that even if you parse sentence structure and extract some meaning, you're still at some level hardcoding the possible things that can be queried into the code.
So it's a neat tradeoff of whether it's more worth it to create a mini query language, or go full natural language, or go somewhere in between.
A cool system but I'd suggest this is NLP to the extent SQL or a similar (or somewhat better) query language is NLP. You've made queries in your query language easy but it's more like interactive programming than free-form NLP.
Not that it's a bad application, it's nice, it's just if one extended this model, one would wind-up with a query language, not something new.
this is pretty cool ! could you talk about the parser-generator and ER relationships ? I was looking to build something similar for a webapp and was stuck at how to generate the autocompletes.
The autocomplete is provided by my parser-combinator library which is designed in a very generic way. It can be used for any application, not just natural language queries.
About the parser generator: the ER description provides natural language phrases for every entity and relationship. From this the parser builder is able to create parser combinators. There are some hard-coded assumptions and parsers for common data types such as dates and numerical figures.
It's interesting, because theoretically NLP should be able to "understand" what you mean but in reality I find that even if you parse sentence structure and extract some meaning, you're still at some level hardcoding the possible things that can be queried into the code.
This is a good point.
One thing that I'd claim is that NLP is less useful than you'd think for single queries. I'd just suggest the thought experiment of being able to ask a highly knowledge person one question with no follow-ups. Even someone with a mastery of English and the data probably won't start out with the same idioms and approaches, so you'd have to phrase the question carefully and exactly ... just the way you have to spend a lot of effort writing a SQL or similar query.
If you can interact with such a knowledgeable person, you'd learn their expression style and they'd learn yours - after some interaction one has a very powerful effect. Until then, things are rather limited.
Congratulations on launching! Cool idea and a good start. I think since you're targeting the non-professional market here, you need to focus on user experience a bit. Some feedback:
- Found it a bit slow, but guess that's partially the HN effect.
- As per above, I'd make the queries ajax rather than refreshing the whole page every search, should improve the UX a fair bit.
- I expected autopredict when I started to type $... to predict a ticker, would be nice to guess company names as well if no ticker is entered. Probably an essential feature for non-pros.
- You need a clear signup link. And the redirect when you run out of free credits should go to the signup page with small link to login, not the other way around.
- Also look at highcharts.com, they have a stock chart product as well. Hands down the best JS charts, mobile friendly, highly customisable etc.
Small issue: you're calling Google fonts over HTTP, but your site uses HTTPS, so Chrome is blocking the request. This leads to your fonts rendering in the default (Times New Roman here).
Honest feedback from a quant working for an asset manager: This is not (yet) useful. None of these queries even come close to being a real question a financial researcher would ask.
What's a real question like? here are examples which your system doesn't even try to handle.
* companies in S&P500 with P/E less than 15
* companies in S&P500 with more than 20% of their revenue coming from China
* $TSLA Beta with technology sector (returns the wrong answer).
If you want to answer trivial questions your product will fail. nothing you've done isn't done better by Google RIGHT NOW.
Not trying to discourage you but this just doesn't solve any existing problem. All you have is a pretty interface to a tool that does one thing poorly.
If you can actually solve 15% of the real problem with converting natural language questions into financial queries and return accurate, well formatted data, you will have a serious product. Right now you are nowhere close.
Pretty interesting effort, but a bit of constructive criticism:
- none of your query results have a date attached! While pretty, an earnings number without a date is like a chart without an axis.
- Just how natural can the queries be? "$AAPL sales estimates" did not return anything. Neither did "$CAT cost of capital". "quarterly $CAT earnings growth" gave me a single EBIDTA number", e.g. not what asked. Is it easier to start with simple dropdowns?
- Why the stocktwits $ prefix? Why cant a company name be used?
- Another "sad face" feature - every not found/no results query takes away a "free query" :(
On more positive note, things look really clean - good job on the design! A great start. Is the long-term idea here to build an accessible place for fundamental data research?
What value added are you providing in your premium product (aside from heatmap) over just going over to EDGAR archive?
Ping me directly - would love to provide more feedback!
Will shoot you an email! One more thing: you seem to be hotlinking to images on nasdaq, e.g. apple pe ratio shows http://www.nasdaq.com//charts/aapl_per.jpeg - what determines is a datapoint is presented as a hotlink/image vs a number (like apple's price to book?).
Aside from as-of date on each datapoint, ideally you should provide attribution as well (anyone who worked in finance knows that all data sources provide different data for same query at any given time :)
This is really cool. One idea is it would be great if you could interface with EDGAR and gather the links to the various financial statements. The EDGAR database search/navigation is horrendous.
As another commenter said, it would be great if it could detect the company name or at least the ticker without the $.
Also, a failed query subtracts from the guest query allotment. That doesn't seem ideal as I'm first learning how to phrase my requests.
Lastly, mutual fund data would be awesome. Morning star rating, fund performance, fund assets, etc.
We're definitely looking into interfacing with EDGAR as well as getting rid of the 'cashtag'. Mutual fund data is a great idea too - shouldn't be too difficult to tie in.
This is pretty great. I am convinced there is lots of room in this space for services that cater to people who aren't quite in the Bloomberg terminal crowd, and yet would benefit from easy access to financial data for whatever reason. I've been working on a service to open up EDGAR data to more people: www.easyedgar.com (pre-launch). Not quite as fancy as Penny Whale but I think a lot of people just want the data in their spreadsheet and don't know how to do that.
What did you ask? There are a few premium features right now (holdings data, interactive financials). If you want to make an account I'll give you a free upgrade :)
I did the same thing as my first query and I assumed you needed login for everything and gave up. I suspect you will bounce traffic if it's not obvious what's premium.
That would actually work in the premium version. Again, if you'd like, I'll hook you up with a free premium account - just shoot me an email at chintan@pennywhale.com
This is intriguing. Large implications when applied to private datasets ---- think Factiva, etc. There the data schema differ so wildly that natural language can really make it more useful. Publicly traded equities is the facile, simple place to start, obviously.
Can I spin off a search and get pinged on updates?
It's slow to load at the moment, so I'll describe what it does.
PennyWhale lets you search, manipulate, and compare financial data with natural language queries along the lines of "how much cash does $GOOG have" and "show me the PE ratio for $AAPL".
I suggest including parameters mentioned in the book "The intelligent investor" praised by Buffet. For example, the earnings in last 20 years, total value of tangible assets, etc. Company information on value investing are hard to find, and this could be a great help.
Still very early stage, so we're working on collecting more data. Meanwhile, you can inspect sales growth on the financials - ex. 'Income statement for $TSLA'. If you're interested, shoot me an email at jay@pennywhale.com we'd love to make you a free premium account.
Two bugs: "$brkb book value per share" gives a number that's way off. "$brk.a book value per share" and "$_x EV/EBITDA" yield apparent server side errors.
This is very interesting - thanks for pointing it out. I think one way we're working on being competitive is the breadth and depth of our research. Rather than just fundamental data points, we're working on integrating more complicated queries such as 'What's the weighted average cost of capital for $AAPL'. We view it as a great opportunity for automation of financial research.
From what I've seen, people in the financial services industry have gotten very good at using the Bloomberg Terminal commands to examine this kind of data.
Fundamentally, PennyWhale is targeted at a different market. While Bloomberg is tremendously powerful, it's also very expensive. We think there's an opportunity for the retail investors here.
FYI You can literally ask the same question to the terminal -- we process English language queries (and can resolve textual company names to tickers :)).
This issue has come up multiple times. Anyone that will pay you money for your tool will need the source of the information and the date associated with it. On top of this a link to where the data originated is likely going to be essential. You're tool _still_ hasn't broken past the minimum requirement of doing a single thing better than any tool that kind of solves this problem.
not sure if this is the right place to report bugs (and/or if this is intended behavior), but if one clicks the COMPUTE button without a query entered, it decrements the guest query counter
We've been working on something similar, but the opposite direction (for world trade data). Instead of trying to NLP our way out of the problem, we pre-generate and index a bunch of possible questions, and let full text search handle the rest.
It's interesting, because theoretically NLP should be able to "understand" what you mean but in reality I find that even if you parse sentence structure and extract some meaning, you're still at some level hardcoding the possible things that can be queried into the code.
So it's a neat tradeoff of whether it's more worth it to create a mini query language, or go full natural language, or go somewhere in between.
...
Anyway, try it out by clicking on the title (keeping it a bit hidden for now for testing purposes): http://atlas.cid.harvard.edu/explore/tree_map/export/usa/all...
Things you can try (mix and match too!):
- "wine italy" - "france" - "germany spain" - "germany export wine 2002 to 2012" - "turkey feasible"
...
If you want to see the code, check out our github:
https://github.com/cid-harvard/atlas-economic-complexity/blo... (search view)
https://github.com/cid-harvard/atlas-economic-complexity/blo... (indexer)
Apologies for any mess, I recently joined and we're undergoing a huge overhaul right now.