Hacker News new | ask | show | jobs
by worik 1245 days ago
For modern computing/finance type of people (I was but now have reformed) the lack of financial data is a problem. Even if you can get access to every trade, which is hard, the amount of data is not what modern machine learning types require.

Thr EMH is a hard mistress too. There is no amount of data that can help you solve unsolvable equations.

So alot fall into this trap, synthetic data. Some of the best statisticians on the planet have. It is so tempting to believe that there is money to be made by being cleaver trader I markets.

General, there is not. Buy and hold is not a shibolith it is a strategy. It is the only strategy that can be replicated.

Synthesizing data to disprove buy and hold is wishful thinking. Data snooping.

5 comments

There are many different types of inefficiencies that may arise in the short-term, and there are plenty of funds that very skillfully capture them. But they're all constrained by how much of their capital they can trade before they're the ones moving the price.

Jim Simons makes 80% a year from his fund but he still has to find boring ways to invest that extra money because it can't go back into the fund.

The beauty of diversified buy and hold is that it allows investors to stay invested to reap the benefits of compound growth. Over a long period of time EMH does hold up pretty well.

Getting access to every trade is trivial if you're able to fork over $$$:

https://www.nyse.com/market-data/historical/taq-trades

There are way more trades than those conducted on NYSE, or even the other lit markets. You have a multitude of dark pools that count for approximately 13% of all consolidated trading volume as well as internal matching that counts for 18% of all consolidated trading volume.

Dark pools do report their trades so it's possible to access it, but internal matching goes unreported, that data is kept by each respective broker.

The Medallion Fund by Renaissance Technologies has had an average annual return of 71.8% from 1994-2014.
What is not known, however, is if those returns were achieved through legal means…
Are you referring to the tax evasion? I think they paid a slap on the wrist. Otherwise I am interested.
Bernie Madoff promised high returns while obfuscating his business model as well. It went well until it no longer did.
innocent until proven guilty.

presumably the SEC would have their software sift through transactions to correlate insider trading (which is one form of illegal way to make money).

> innocent until proven guilty.

This is not a court of law.

This is finance.

If it is too good to be true it is "guilty". What was the other option? Irrelevant.

If you are interested in the machine learning part, you can try the Numerai tournament ( https://numer.ai ). They provide obfuscated high quality hedge fund data that participants can train their models on and send back only their predictions and then they combine the user's predictions into their market neutral meta model which they actively trade. So far their fund's returns looks promising in their category (market neutral fund) especially in the latest months: https://numer.ai/fund

You can get reward too if you stake their own "made up" crypto token on your prediction, but that comes with the usual crypto volatility risks too, so I do not recommend staking as a beginner unless you have a top model and OK with the crypto risk. Also they use only the staked predictions in their meta model because they use the stake size as an indicator for confidence of the user in their model (they are creating a stake weighted metamodel)

It's not easy too be good at it and it's getting harder because they want not only good predictions but diverse set of models which help eachother to improve their meta model (that's the TC metric on their leaderboard of models). If you have some machine learning experience it's easy to get started with their example script and see how far you can get with hedge fund quality data. Boosted tree models are having good results but Neural Nets are more customizable so you can try more exotic models with them to have the diversity they are looking for.

Also one of their early investor is Howard Morgan the co-founder of Renaissance Technologies.

Have you heard of Citadel?