Hacker News new | ask | show | jobs
by a904guy 5677 days ago
Absolutely

So the system is built around the ~50 different indicators and oscillators that I've found the formulas for generating.

These "methods" are variable driven, meaning each method could have 1 to 4 variables ranging between 0.0001 to 260.

The current testing platform that it self manages has the end goal to find the highest gain after commissions using a combination of any of these methods for opens and closes.

Considering the __massive__ amount of possible combinations of these methods, the system has two testing suites it runs, the first round of testing is a simple test suite that only sums the gain it would get using those methods over the last YTD. This doesn't mean that the highest here will be the best combination in the end as commissions will bite you if you don't keep an eye on them. We simply only collect sum of gains because of the MASSIVE amount of tests we have to run. This cuts down on days/weeks/months of computation time.

So it will increment through the blocks (two sets of combinations, one for open, one for close) of functions typically by 1000, so it will start testing at #1, then skip 1000 of them, test #1001, rinse and repeat till its ran through the whole list (this is broken out across 5 machines around my condo, and multi threaded to utilize every core minus 1 of each machine, I have roughly 40 cores working on this system).

Once it has identified the best ~25% of raw gain increment points, it will then start incrementing forward and backward around those starting points as long as the raw gain is increasing and not showing a change in direction (I use a mixture of a momentum formula and P SAR to determine the change in direction as you will get some noise and a quick change could make you miss gold on the other side because of a slight fluctuation). So while all these are being ran, the results get funneled back into the job queue for the second half of the testing suite.

This is also distributed across all the machines.

The second half then runs the full test suite on each combination of methods to recreate the market verbatim for the last YTD and determine everything, gains, share quantity, p/l, commissions spent, highest equity, ect.

Finally all this data is put back into the job queue once again to be sorted to find the highest net gains after commissions.

The winning methods are finally stored in the datastore to be accessed by the actual stock trading platform that will use this data during trading hours to execute trades accordingly.

So from start to finish the application handles and manages, what tests it wants to run, can determine the best strategy to use and execute the strategies on its own.

I tried to keep it very basic explanation, there is a lot of other things that go on, to make this all work flawlessly. I do want to thank 0MQ and Gearman for playing vital roles in work distribution and message queuing amongst the worker threads.

I do highly recommend that if anyone every wants to truly learn how to scale an application to build a algo trading system on limited hardware and try to squeeze EVERY millisecond you can out of it. (I've rebuilt it from the ground up numerous times).

Actually typing this out makes me see some similarities to a map and reduce method in some ways as well.

2 comments

Awesome. It sounds like you are running some kind of stochastic optimization algorithm on the data points and also doing so across a cluster! Are you eventually gonna leverage GPUs on the machines as well? Also, do you have a blog or something?

>I tried to keep it very basic explanation, there is a lot of other things that go on, to make this all work flawlessly.

nope, was more than enough. Don't want you to give away your secret sauce :D. Is the system in production yet?

I have been tinkering with pyCUDA and openCL. I will have to convert all my algorithms to kernel code for CUDA but it shows VERY promising results. I can see the application being run across GPUs to keep costs low. I have a handful of nice nVidia cards with around ~200 cores each.

I tried to write up a little about Edward and his backend process a while back on my personal website, but I find that it changes too much currently to keep anything on paper. I code more than I can write about the code ;]

The system is being ran against other virtual systems now. Everything has been trending almost exactly along my estimates so I'm very excite to get it started. The main testing grounds is with my Optionshouse.com, which their interface is nice. All interactions are XML requests, so I've been able to easily write an API that is VERY solid, to do everything I need.

I'm moving the system this week so that I can get it into more high availability setup. After that I would begin working it against my personal IRA account to see how it goes on the long side at least.

Also interested. I have wanted to try GAs with this same general approach for awhile but I have a trading background not a coding one. logicwins atthe gmail dot com.
I'll shoot you an email today.
Hey I'm interested in this and have tried to code some stuff in C# for options trading and using candlestick patterns, support/resistance, etc. I'm now sort of looking into options pinning on expiration and exploiting that - just need to find a good cheap source for options data. Anyhow can you send me an email - maybe we can help each other out? carbtrader @ gmail dot com