| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by CodiePetersen 2501 days ago
	Yeah that's a good example. Say for alpha go for instance. Without the DL network the q learner would be massive. And that's just for a board game. Imagine trying to do that for systems like moving human body parts. Every single body configuration would be a state and then you have every single possible action from each state.

1 comments

jsjolen 2501 days ago

The table becomes unwieldy on much simpler tasks than that.

Consider a 3x3 board where each cell holds 3 bits of information (each cell can be in 2^3 states). Then for the board you have (2^3)^9 = 2^27 different states.

link

CodiePetersen 2501 days ago

Then multiply that by how many actions you have per state. We'll suppose 9 because you can only change one tile at a time. Then multiply that by 4 bytes assuming we are using a float instead of a double and you get 4.8 gigs of memory for whatever this simple problem is.

link

wannabesrevenge 2501 days ago

you mathed wrong. 3x3 board with 8 states per cell is 72 total states. 9x8,not 8^9.

Edit, I just considered : Unless you mean that the state is the combination of all the cells. Then you are right

link

CodiePetersen 2501 days ago

Yes the state is the entirety of the boards configuration.

link