WiCompass: Oracle-driven Data Scaling for mmWave Human Pose Estimation
Abstract
Millimeter-wave Human Pose Estimation (mmWave HPE) promises privacy but suffers from poor generalization under distribution shifts. We demonstrate that brute-force data scaling is ineffective for out-of-distribution (OOD) robustness; efficiency and coverage are the true bottlenecks. To address this, we introduce WiCompass, a coverage-aware data-collection framework. WiCompass leverages large-scale motion-capture corpora to build a universal pose space "oracle" that quantifies dataset redundancy and identifies underrepresented motions. Guided by this oracle, WiCompass employs a closed-loop policy to prioritize collecting informative missing samples. Experiments show that WiCompass consistently improves OOD accuracy at matched budgets and exhibits superior scaling behavior compared to conventional collection strategies. By shifting focus from brute-force scaling to coverage-aware data acquisition, this work offers a practical path toward robust mmWave sensing.
Why Data Coverage Matters
While state-of-the-art mmWave HPE models achieve impressive in-distribution accuracy, their performance degrades sharply under even modest distribution shifts. Our pilot study reveals two critical data-level bottlenecks:
Generalization bottleneck. Simply withholding a single action (e.g., "Waving hand (left)") from training causes joint localization error to spike from ~50 mm to over 120 mm, even when semantically related motions remain in the training set. Current models learn to interpolate within known distributions but fail to generalize across the true spectrum of human motion.
Efficiency bottleneck. Naively scaling up data is not only expensive but frequently ineffective. Randomly discarding up to 70% of training samples from existing large-scale datasets results in less than a 2% drop in performance, revealing massive redundancy in current collection practices.
Model capacity saturates quickly (~10 MB). Larger models yield diminishing returns, ruling out model architecture as the bottleneck.
Leave-one-out test: holding out a single action causes error to spike ~2.4x, revealing severe OOD fragility in current datasets.
Discarding 70% of training data causes less than 2% performance drop, exposing massive redundancy in existing datasets.
Method
Universal Pose Tokenizer (VQ-VAE)
We train a VQ-VAE on the AMASS motion capture dataset to learn a universal vocabulary of human motion. The encoder compresses 3D poses into a sequence of discrete tokens from a learned codebook, while the decoder reconstructs poses from these tokens. This quantization maps any pose — whether from optical MoCap or mmWave estimates — into a shared set of tokens, enabling modality-agnostic comparison. The discrete space also makes coverage analysis computationally tractable: instead of comparing continuous distributions, we simply measure token distances.
k-NN Coverage Analysis
We build a k-Nearest Neighbors framework that provides two complementary views of dataset quality. Cross-dataset analysis compares a mmWave dataset against the large MoCap corpus to identify coverage gaps — poses that are underrepresented in the mmWave data. Intra-dataset analysis uses a Normalized Redundancy Index (NRI) to quantify oversampling within a single dataset. Together, these metrics produce an actionable gap set of missing poses for targeted collection.
Coverage-Driven Data Collection
Given the identified gap set, WiCompass selects target poses under a fixed budget using Capped Probability-Proportional-to-Size (Capped-PPS) sampling. This strategy prioritizes sparse, underrepresented regions while filtering extreme outliers that may correspond to physically implausible poses. The selected targets are then realized through either real-world capture (a volunteer mimics on-screen visualizations) or simulation (RF-Genesis ray-tracing simulator). After each acquisition round, coverage is recomputed, forming a closed feedback loop.
Results
Data Scaling Efficiency
WiCompass yields a consistent monotonic decrease in OOD error as dataset size grows, with an order-of-magnitude larger scaling exponent than the conventional baseline. At matched budgets (2k–40k samples), WiCompass reduces OOD MPJPE by roughly 25–30 mm compared to the mmBody-trace baseline, and the gap widens with more data.
Dataset Coverage Analysis
Cross-dataset coverage analysis reveals that existing mmWave datasets occupy only a narrow subset of the human motion manifold. mmBody covers just 3.7% of AMASS at k=12, while WiCompass achieves significantly broader coverage per unit of data budget — using roughly one-eighth as many samples.
mmBody vs. AMASS
MMFi vs. AMASS
WiCompass vs. AMASS
Pose Token Visualization
The learned VQ-VAE tokens correspond to semantically meaningful motion primitives. Modifying individual tokens reveals that different tokens control distinct body parts — for example, one token governs global orientation while another affects the right elbow — demonstrating the interpretability of the codebook.
Real-World Validation
We validate WiCompass with a closed-loop real-world experiment using a 79 GHz FMCW radar paired with a vision motion capture system. Under an identical budget of 8k training frames, WiCompass achieves 105.7 mm test MPJPE, significantly outperforming the conventional baseline (112.9 mm) and approaching the oracle recollection upper bound (95.7 mm).
BibTeX
@inproceedings{liang2026wicompass,
title={WiCompass: Oracle-driven Data Scaling for mmWave Human Pose Estimation},
author={Liang, Bo and Gong, Chen and Wang, Haobo and Liu, Qirui and Zhou, Rungui and Shao, Fengzhi and Wang, Yubo and Zhou, Kaichen and Gao, Wei and Cui, Guolong and Xu, Chenren},
booktitle={Proceedings of the 32nd Annual International Conference on Mobile Computing and Networking},
year={2026}
}