Zhuo Wang

HCI Researcher

A Streaming Gesture Recognition Framework


My Role: Concept Design, Writting

Abstract

Gesture recognition in resource-constrained scenarios presents significant challenges, particularly in balancing accuracy, latency, and computational efficiency. This paper proposes Duo Streamers, an event-triggered streaming gesture recognition framework inspired by event-triggered control theory and marked temporal point processes. By adopting a novel three-stage sparse recognition mechanism, a compact RNN-lite architecture with external hidden states, and specialized event-masked training and robust intensity-normalized post-processing pipelines, Duo Streamers achieves substantial advances in real-time performance and lightweight design. Experimental results show that Duo Streamers matches mainstream methods in accuracy metrics, while reducing the real-time factor by approximately 88.9%, i.e., delivering a nearly 9-fold speedup. In addition, the framework shrinks parameter counts to 1/38 (idle state) and 1/9 (busy state) compared to mainstream models. An additional user study further confirms that operators can utilize the framework in a variety of real-world scenarios, achieving efficient and accurate gesture input. In summary, Duo Streamers not only offers an efficient and practical solution for streaming gesture recognition in resource-constrained devices but also lays a solid foundation for extended applications in multimodal and diverse scenarios. Upon acceptance, we will publicly release all models, code, and demos. 

Publications


Duo Streamers: A Streaming Gesture Recognition Framework


Boxuan Zhu, Sicheng Yang, Zhuo Wang, Haining Liang, Junxiao Shen

2025