How Viziware works?

It only requires two standard low-cost cameras. Our continuous alignment algorithm doesn’t need a specific light source (only enough light to capture images). No standard model is used, which allows nearly unlimited gesture scenarios. Both cameras detect reference points and identify the corresponding point as seen from the other camera. Even if the other camera isn’t precisely aligned, an algorithm automatically aligns the images and calculates distances using triangulation. The cost of such a stereo camera system can thus be kept low.
A gesture is the motion of a human limb. The Viziware system directly locates several reference points on a human limb. Only a few moving points are needed to confidently detect limb motion without the use of pose models. System thus makes it possible to detect gestures directly.
Automatic camera alignment, motion recognition, distance measurement using the stereo vision principle and gesture recognition by tracking swarms of features - all this jointly defines the Viziware technology.

Automatic Alignment

If similar features are found in the images produced by both cameras of a Viziware system, correlation vectors are determined for each of them. If both cameras have the same pitch or tilt angles (rotation around the horizontal or optical axis), then these correlation vectors are horizontal. The angular difference between the correlation vectors and the horizon reflects the rotation of each camera relative to the other.

Distance Measurement

Distance is measured by using the classic triangulation method. The offset between a reference point’s 2D co-ordinates in each stereo image is inversely proportional to its distance in 3D space. A reference point’s 2D co-ordinates directly reflects its direction in 3D space, relative to the camera’s optical axis. The reference points’ 3D co-ordinates can thus be determined.

Motion Analysis

If similar features are found in two succeeding frames, their correlation vector is equivalent to their optical flow. Knowing the positions of a feature in the first and second frames, the difference between these two sets of co-ordinates divided by the time elapsed between frames is equal to the 3D velocity of that feature.

Gesture Recognition

Neighbouring features with similar velocities represent a swarm of features. Typical velocity characteristics of swarms within a certain size range will be interpreted as a gesture. In the end, it doesn’t really matter which limb it represents, or even if it isn’t a body part (example: gesturing with a pencil). All that counts is the velocity characteristics.