Autonomous underwater vehicles (AUVs) in cooperation with human divers are often used for marine inspection, surveillance, manipulation, and navigation. For each of the above-mentioned scenarios, it is highly likely that a robot will be directed to specifically navigate along a particular direction or inspect an object. This project leverages pointing, a natural form of human communication to share location and interest, to direct an AUV to a location or object of interest.
In this project, we present the Diver Interest via Pointing (DIP) algorithm, a highly modular method for conveying a diver's area of interest to an AUV using pointing gestures. DIP uses a single monocular camera and exploits human body pose, even with complete dive gear, to extract underwater human pointing gesture poses and their directions. By extracting 2D scene geometry based on the human body pose and density of salient feature points along the direction of pointing, using a low-level feature detector, the DIP algorithm is able to locate objects of interest as indicated by the diver.