Underwater gesture-based human-to-robot communication

Jenkin, MichaelCodd-Downey, Robert Frank2025-04-102025-04-102024-10-252025-04-10https://hdl.handle.net/10315/42788Underwater human to robot interaction presents significant challenges due to the harsh environment, including reduced visibility from suspended particulate matter and high attenuation of light and electromagnetic waves generally. Divers have developed an application-specific gesture language that has proven effective for diver-to-diver communication underwater. Given the wide acceptance of this language for underwater communication, it would seem an appropriate mechanism for diver to robot communication as well. Effective gesture recognition systems must address several challenges. Designing a gesture language involves balancing expressiveness and system complexity. Detection techniques range from traditional computer vision methods, suitable for small gesture sets, to neural networks for larger sets requiring extensive training data. Accurate gesture detection must handle noise and distinguish between repeated gestures and single gestures held for longer durations. Reliable communication also necessitates a feedback mechanism to allow users to correct miscommunications. Such systems must also deal with the need to recognize individual gesture tokens and their sequences, a problem that is hampered by the lack of large-scale labelled datasets of individual tokens and gesture sequences. Here these problems are addressed through weakly supervised learning and a sim2real approach that reduces by several orders of magnitude the effort required in obtaining the necessary labelled dataset. This work addresses this communication task by (i) developing a traditional diver and diver part recognition system (SCUBANetV1+), (ii) using this recognition within a weak supervision approach to train SCUBANetV2, a diver hand gesture recognition system, (iii) SCUBANetV2 recognizes individual gestures, and provides input to the Sim2Real trained SCUBALang LSTM network which translates temporal gesture sequences into phrases. This neural network pipeline effectively recognizes diver hand gestures in video data, demonstrating success in structured sequences. Each of the individual network components are evaluated independently, and the entire pipeline evaluated formally using imagery obtained in both the open ocean and in pool environments. As a final evaluation, the resulting system is deployed within a feedback structure and evaluated using a custom unmanned unwatered vehicle. Although this work concentrates on underwater gesture-based communication, the technology and learning process introduced here can be deployed in other environments for which application-specific gesture languages exist.Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.Underwater gesture-based human-to-robot communicationElectronic Thesis or Dissertation2025-04-10Underwater human robot interaction