Convolutional Neural Networks have been providing a performance boost in many areas in the last few years, but their performance for Handshape Recognition in the context of Sign Language Recognition has not been thoroughly studied. We evaluated several convolutional architectures in order to determine their applicability for this problem.
Using the LSA16 and RWTH-PHOENIX-Weather handshape datasets, we performed experiments with the LeNet, VGG16, ResNet-34 and All Convolutional architectures, as well as Inception with normal training and via transfer learning, and compared them to the state of the art in these datasets. We included experiments with a feedforward neural network as a baseline. We also explored various preprocessing schemes to analyze their impact on the recognition.
We determined that while all models perform reasonably well on both datasets (with performance similar to hand-engineered methods), VGG16 produced the best results, closely followed by the traditional LeNet architecture.
Also, pre-segmenting the hands from the background provided a big boost to accuracy.