Cooperative perception enables autonomous agents to share encoded representations over wireless communication to enhance each other’s live situational awareness. However, the tension between the limited communication bandwidth and the rich sensor information hinders its practical deployment. Recent studies have explored selection strategies that share only a subset of features per frame while striving to keep the performance on par. Nevertheless, the bandwidth requirement still stresses current wireless technologies.
To fundamentally ease the tension, we take a proactive approach, exploiting the temporal continuity to identify features that capture environment dynamics, while avoiding repetitive and redundant transmission of static information. By incorporating temporal awareness, agents are empowered to dynamically adapt the sharing quantity according to environment complexity. We instantiate this intuition into an adaptive selection framework, COOPERTRIM, which introduces a novel conformal temporal uncertainty metric to gauge feature relevance, and a data-driven mechanism to dynamically determine the sharing quantity.
To evaluate COOPERTRIM, we take semantic segmentation and 3D detection as example tasks. Across multiple open-source cooperative segmentation and detection models, COOPERTRIM achieves up to 80.28% and 72.52% bandwidth reduction respectively while maintaining a comparable accuracy. Relative to other selection strategies, COOPERTRIM also improves IoU by as much as 45.54% with up to 72% less bandwidth. Combined with compression strategies, COOPERTRIM can further reduce bandwidth usage to as low as 1.46% without compromising IoU performance. Qualitative results show COOPERTRIM gracefully adapts to environmental dynamics, localization error, and communication latency, demonstrating flexibility and paving the way for real-world deployment.
The core challenge in cooperative perception is the mismatch between the richness of sensor data and limited wireless bandwidth. CooperTrim addresses this by shifting from static, frame-by-frame sharing to a proactive, temporal-aware adaptation strategy.
Instead of treating every frame independently, CooperTrim contextualizes current features within the ego agent's recent memory. It uses a conformal prediction-inspired quantile gating mechanism to identify features that deviate significantly from past data. This allows the system to prioritize "uncertain" or "dynamic" features (like moving vehicles or changing lights) over static, redundant background information.
How much data is "enough"? CooperTrim dynamically answers this by learning two key thresholds:
To better understand how CooperTrim selects features, we visualize two distinct cases from our experiments. These scenarios demonstrate the correlation between temporal uncertainty (what we measure) and information value (what we need).
(Frames 1960-1970) In dynamic scenarios with fast moving objects (incoming blue car) or complex intersections, the temporal uncertainty is high. CooperTrim identifies these regions as critical and allocates more bandwidth to share detailed features, ensuring safety and accuracy.
(Frames 940-950) In static (as we see hardly any change in consecutive frames) or simple environments (e.g., straight roads with no traffic), the temporal uncertainty is low because the scene hasn't changed significantly from previous frames. CooperTrim suppresses redundant data sharing here, saving bandwidth.
@inproceedings{coopertrim2026,
title={COOPERTRIM: Adaptive Data Selection for Uncertainty-Aware Cooperative Perception},
author={Shilpa Mukhopadhyay, Amit Roy-Chowdhury, Hang Qiu},
booktitle={International Conference on Learning Representations (ICLR)},
year={2026}
}