[ICLR 2026] SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System

[ICLR 2026] SpikePingpong: High-Frequency Spike Vision-based Robot Learning for Precise Striking in Table Tennis Game

1State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University; 2Beijing Academy of Artificial Intelligence (BAAI);
Equal Contribution   ✉ Corresponding Author  

highlight

Highlights

  • We design and implement a comprehensive robotic table tennis system that systematically addresses high-speed dynamic manipulation through task-specific decomposition and Fast-Slow architecture.

  • We develop a Fast-Slow system perception framework that enables accurate trajectory prediction using conventional cameras through neural error correction, complemented by real-world imitation learning for precise ball striking control.

  • We conduct extensive experimental evaluation demonstrating superior performance with 92% success rate in 30cm zones and 70% accuracy in challenging 20cm precision targeting, validating the effectiveness of our integrated approach.

Abstract

Learning to control high-speed objects in dynamic environments represents a fundamental challenge in robotics. Table tennis serves as an ideal testbed for advancing robotic capabilities in dynamic environments. This task presents two fundamental challenges: it requires a high-precision vision system capable of accurately predicting ball trajectories under complex dynamics, and it necessitates intelligent control strategies to ensure precise ball striking to target regions. High-speed object manipulation typically demands advanced visual perception hardware capable of capturing rapid motion with exceptional temporal resolution. Drawing inspiration from Kahneman's dual-system theory, where fast intuitive processing complements slower deliberate reasoning, there exists an opportunity to develop more robust perception architectures that can handle high-speed dynamics while maintaining accuracy. To this end, we present SpikePingpong, a novel system that integrates spike-based vision with imitation learning for high-precision robotic table tennis. We develop a Fast-Slow system architecture where System 1 provides rapid ball detection and preliminary trajectory prediction with millisecond-level responses, while System 2 employs spike-oriented neural calibration for precise hittable position corrections. For strategic ball striking, we introduce Imitation-based Motion Planning And Control Technology, which learns optimal robotic arm striking policies through demonstration-based learning. Experimental results demonstrate that SpikePingpong achieves a remarkable 92% success rate for 30 cm accuracy zones and 70% in the more challenging 20 cm precision targeting. This work demonstrates the potential of Fast-Slow architectures for advancing robotic capabilities in time-critical manipulation tasks.

Method Overview

Framework of SpikePingpong. The system comprises two integrated components: (1) A Fast-Slow perception architecture, where System 1 delivers rapid trajectory prediction using RGB-D data, while System 2 functions as a Spike-Oriented Neural Improvement Calibrator to refine the estimated hittable position; and (2) The IMPACT module, which facilitates strategic motion planning and control, enabling tactical return placement via imitation learning.

System Precision and Efficiency

SpikePingpong ensures exceptional interception accuracy by minimizing spatial deviation between the ball and racket at the point of contact. This high-precision capability is complemented by millisecond-level inference speeds, providing the critical responsiveness necessary for effective robotic arm actuation.

Tactical Spatial Control

SpikePingpong exhibits superior interception capabilities across various target regions, significantly outperforming human players and baseline methods in both isolated returns and complex sequential tasks. The system maintains high success rates even under strict precision thresholds, demonstrating its ability to execute consistent, tactical-level gameplay.

Robustness and Out-of-Distribution Generalization

SpikePingpong demonstrates robust out-of-distribution generalization, maintaining a 74% success rate under novel ball trajectories and showing significant adaptability to complex human playing styles. These results confirm that the system captures generalizable underlying dynamics rather than overfitting, validating its potential for real-world human-robot interaction.

Real-world Demos

We present real-world demonstration videos showing SpikePingpong executing precise shots to target regions A, B, C, and D, validating the system's tactical placement capabilities in actual gameplay scenarios. Additionally, we demonstrate our system's capabilities through a video of SpikePingpong engaging in a table tennis rally against a human player. The demonstration showcases the robot's ability to maintain sustained exchanges, accurately track and return incoming balls, and execute precise shot placements in a dynamic gameplay environment, validating our approach's effectiveness in practical human-robot interaction scenarios.


Target A


Target B


Target C


Target D


Human vs Robot



BibTeX

@inproceedings{wangspikepingpong,
 title={SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System},
 author={Wang, Hao and Hou, Chengkai and Li, Xianglong and Fu, Yankai and Li, Chenxuan and Chen, Ning and Dai, Gaole and Liu, Jiaming and Huang, Tiejun and Zhang, Shanghang},
 booktitle={The Fourteenth International Conference on Learning Representations}
}