ewjordan
Ex Member
Re: pattern recognition and ORC
Reply #9 - Jun 19th , 2007, 10:53pm
First, I agree with all of st33d's posts 100% - neural nets are no magic bullet here, but they can help as a fallback if you can't figure the character out using other means. To understand why higher resolution data is difficult to use with neural networks, consider this: a neural net with M input nodes, N intermediate nodes, and O output nodes has MxN+NxO connection weights to train. That means you have a whole crapload of parameters to vary, and the more parameters you have to set, the more independent input data you need. The problem? Think of polynomial curves - a N-th degree polynomial can exactly fit N+1 data points, though the resulting polynomial will tell you nothing at all about the underlying pattern in the points. This phenomenon is known as overfitting, and is especially prevalent in neural networks because the number of connections is so huge. Doubling the resolution of an image multiplies the number of pixels by 4, so quickly pushes up the number of input nodes. To get anything out of these, you'll likely have to increase the number of hidden nodes, so the number of connections to train increases VERY quickly. To get meaningful results, this forces you to increase the training set even more dramatically, and if I recall, the total training time increases even faster than any of these values. So that's why it's not easy to scale - you're dealing with an O(N^p) process where p is some fairly large number. From what I remember, sometimes Kohonen or RBF nets are well suited to classification tasks, and I think these scale a bit better than your standard feedforward one, so you may give a look there. Also, if you need to take high-res input, consider filtering it down to a lower, more usable resolution first, perhaps several different times in several different ways. If your classifier gives the same answer on each different filtering, you can probably be fairly confident that it's correct; if the predicted character differs, this may be an indication that the confidence level is fairly low. One other thing I would recommend - if you're dealing exclusively with English text, for instance, you might consider recognizing characters in context, for instance if the last three letters were 't,' 'h,' and 'i,' there's a much higher probability that the next character will be 's' than 'a,' given the 4-character probabilities for English text (though if the next letter is 'm,' you might have "thiamine," so the possibility of 'a' has to be kept open...). You can often make do with a less accurate character recognizer if you employ tricks like that to post-filter your strings, though robust implementations of these types of tricks can be difficult.