Taken from Intel site:
"Devices with Intel RealSense Camera have three lenses: a conventional camera, an infrared camera, and an infrared laser projector. The three lenses allow the device to infer depth by detecting infrared light that has bounced back from objects in front of it.
This visual data, taken in combination with Intel RealSense motion-tracking software, create a touch-free interface that responds to hand, arm, and head motions as well as facial expressions."
They never state what are the facial expressions that can be detected as well as head motions.