Hacker News new | ask | show | jobs
by a904guy 5682 days ago
I've been working on an algorithmic trading system using machine learning, it is not HFT currently. It is currently daily (24h+ held equities), the intra-day side (~5-60 minute held equities) will come very quick after I feel comfortable with the machine learning side of things. The source of data will change, and a few tweaks to the actual trading system and it will be running intra-day. The HFT will only come around once I can get a small colo that can achieve the necessary <40ms transactions to get the benefit of the pre-window before orders actually hit the open market.

http://edwardworthington.com/

Sorry the interface was thrown together over a weekend (The actual back-end application was the primary focus for the last year as it was just me looking at it via command line) and quickly designed it with a large AJAX load at the beginning, I'll eventually change it to a static load then do ajax polling to update the data.

I cannot recommend any particular reading sources as I've been working with my financial buddies, that have been feeding me tips and doing my own discovery on the internet.

This was just a side project of mine but has turned into a really nice application. It is always calculating the better strategies (out of over 50 possible different methods/functions with variables ranging from 0-260 that are used to indicate open and close signals in any number of combinations). It has improved its strategy over the last week taking it from estimating ~60% to ~70% gains YTD. I have no doubt it will eventually get over 100%.

I'd love to collaborate with anyone wanting to get into this stuff as I'm flying solo.

4 comments

Where do you get the data? I've found that access to (inexpensive) sources of data to be a problem. For backtesting, I'd love to get historical data; even a sample would do. A long time ago, Island used to make their data available. Then they were bought out by NASDAQ, and no more data :-(
I've been collecting data from various places, originally while I was building the framework I simply downloaded yahoo daily data to test against.

Now I've been downloading and recording tick changes from my broker, Optionshouse.com.

It looks very interesting! If you need/want help with interface design and front-end of the application, shoot me an e-mail (my username at gmail).
Awesome! I am more than interested to step into. Could you please shoot me an email to aliengeek4u at gmail dot com
I will shoot you an email in the morning.
I used to trade Forex (had a small startup). I'm looking to get back into the game. Currently working on a few algorithms that I'd like to test out, soon. I'd love to bounce some ideas around with you. Email is in my profile!
I've been tinkering with Forex at eToro. I can see ALOT of potential there, I would also like to chat about it with you as well. I will contact you shortly.
Waiting for you email!
I apologize, I've been in the middle of a move.

I posted my contact info @ HNofficeHours : http://hnofficehours.com/profile/a904guy/

can you go more into the machine learning parts? i love such uses of ML, so futuristic.. im guessing you use either some kind of genetic algorithm or neural networks as is most common? or something else entirely?
Absolutely

So the system is built around the ~50 different indicators and oscillators that I've found the formulas for generating.

These "methods" are variable driven, meaning each method could have 1 to 4 variables ranging between 0.0001 to 260.

The current testing platform that it self manages has the end goal to find the highest gain after commissions using a combination of any of these methods for opens and closes.

Considering the __massive__ amount of possible combinations of these methods, the system has two testing suites it runs, the first round of testing is a simple test suite that only sums the gain it would get using those methods over the last YTD. This doesn't mean that the highest here will be the best combination in the end as commissions will bite you if you don't keep an eye on them. We simply only collect sum of gains because of the MASSIVE amount of tests we have to run. This cuts down on days/weeks/months of computation time.

So it will increment through the blocks (two sets of combinations, one for open, one for close) of functions typically by 1000, so it will start testing at #1, then skip 1000 of them, test #1001, rinse and repeat till its ran through the whole list (this is broken out across 5 machines around my condo, and multi threaded to utilize every core minus 1 of each machine, I have roughly 40 cores working on this system).

Once it has identified the best ~25% of raw gain increment points, it will then start incrementing forward and backward around those starting points as long as the raw gain is increasing and not showing a change in direction (I use a mixture of a momentum formula and P SAR to determine the change in direction as you will get some noise and a quick change could make you miss gold on the other side because of a slight fluctuation). So while all these are being ran, the results get funneled back into the job queue for the second half of the testing suite.

This is also distributed across all the machines.

The second half then runs the full test suite on each combination of methods to recreate the market verbatim for the last YTD and determine everything, gains, share quantity, p/l, commissions spent, highest equity, ect.

Finally all this data is put back into the job queue once again to be sorted to find the highest net gains after commissions.

The winning methods are finally stored in the datastore to be accessed by the actual stock trading platform that will use this data during trading hours to execute trades accordingly.

So from start to finish the application handles and manages, what tests it wants to run, can determine the best strategy to use and execute the strategies on its own.

I tried to keep it very basic explanation, there is a lot of other things that go on, to make this all work flawlessly. I do want to thank 0MQ and Gearman for playing vital roles in work distribution and message queuing amongst the worker threads.

I do highly recommend that if anyone every wants to truly learn how to scale an application to build a algo trading system on limited hardware and try to squeeze EVERY millisecond you can out of it. (I've rebuilt it from the ground up numerous times).

Actually typing this out makes me see some similarities to a map and reduce method in some ways as well.

Awesome. It sounds like you are running some kind of stochastic optimization algorithm on the data points and also doing so across a cluster! Are you eventually gonna leverage GPUs on the machines as well? Also, do you have a blog or something?

>I tried to keep it very basic explanation, there is a lot of other things that go on, to make this all work flawlessly.

nope, was more than enough. Don't want you to give away your secret sauce :D. Is the system in production yet?

I have been tinkering with pyCUDA and openCL. I will have to convert all my algorithms to kernel code for CUDA but it shows VERY promising results. I can see the application being run across GPUs to keep costs low. I have a handful of nice nVidia cards with around ~200 cores each.

I tried to write up a little about Edward and his backend process a while back on my personal website, but I find that it changes too much currently to keep anything on paper. I code more than I can write about the code ;]

The system is being ran against other virtual systems now. Everything has been trending almost exactly along my estimates so I'm very excite to get it started. The main testing grounds is with my Optionshouse.com, which their interface is nice. All interactions are XML requests, so I've been able to easily write an API that is VERY solid, to do everything I need.

I'm moving the system this week so that I can get it into more high availability setup. After that I would begin working it against my personal IRA account to see how it goes on the long side at least.

Also interested. I have wanted to try GAs with this same general approach for awhile but I have a trading background not a coding one. logicwins atthe gmail dot com.
I'll shoot you an email today.
Hey I'm interested in this and have tried to code some stuff in C# for options trading and using candlestick patterns, support/resistance, etc. I'm now sort of looking into options pinning on expiration and exploiting that - just need to find a good cheap source for options data. Anyhow can you send me an email - maybe we can help each other out? carbtrader @ gmail dot com