Semantic Risk Assessment in Visual Scenes for AUV-Assisted Marine Debris Removal

Underwater debris is a significantly growing challenge that autonomous underwater vehicles (AUV) can help alleviate, but robot-guided debris search and removal can also cause harm to the aquatic ecosystem, or other humans engaged in cleanup missions if the AUVs are not able to assess the risks associated with its actions. We introduce a method for identifying such risks in an underwater scene in the context of AUV debris search and removal tasks. Our approach integrates a vision language model (VLM) with monocular depth estimation to effectively classify and localize objects in a marine scene, specifically, submerged marine debris. We use the pixel distance and depth difference using the monocular depth map to identify entities that are sensitive to harm in proximity to the debris. We collect and annotate a custom dataset containing images in three different marine and aquatic environments containing debris and other such sensitive entities, and compare classification performance for different types of prompts. We observe that the prompts describing the debris properties (e.g., “eroded trash”) demonstrate a significant increase in accuracy compared to the use of object names directly as prompts. Our method successfully identifies debris that is safe to remove in complex scenes and turbid water conditions, highlighting the potential for using VLMs for risk assessment in AUV operations in the diverse underwater domain.

 

Investigator: Sakshi Singh