awm, on 2016-March-15, 16:29, said:
I somewhat disagree. I realize this wasn't the point you were making, but assuming that this approach is generalizable to all sorts of problems (Mr. Hassabis' favourite example being "healthcare") seems dangerous to me.
1. The way the two neural networks are connected is already an encoding of expert knowledge, as this is exactly what an expert Go player does: read some key variations and evaluate the board at the leaf nodes. While Google claims that even the Policy Network alone can be moderately successful, it would obviously not be impressive enough to create all this hype.
2. Google found that a mixed evaluation between the Value Network and Monte Carlo rollouts was more successful than the Value Network alone. Monte Carlo rollouts, at a minimum, actually encode human expert knowledge about "eyeshape" to make sure that the game comes to a reasonable end rather than the players suiciding all their stones. The previous state of the art programs included even more hand-crafted patterns in their Monte Carlo playouts, and AlphaGo might do this also.
3. The training data builds on 1000+ years of accumulated human knowledge. Consider that, for the very first move of the game, there are 55 possibilities after accounting for symmetry. Only 2 of those are commonly played in professional games, with a further 2-4 considered potentially viable. AlphaGo so far has not deviated from those top 2 moves. While it is very exciting to Go players to see what AlphaGo would come up with if trained only on self-play, there is no guarantee that it would lead to a strong program in a reasonable timeframe.
Most importantly, it seems to me that the parametrization of the neural network is very significant. Note that the inputs to AlphaGo's networks include whether a stone/group is capturable in a ladder. This is pretty Go-specific obviously! And Go is a very well bounded problem - finding the correct parametrization of a neural network for a more fuzzy problem is going to be quite nontrivial.