WiCompass: Oracle-driven Data Scaling for mmWave Human Pose Estimation

Liang, Bo; Gong, Chen; Wang, Haobo; Liu, Qirui; Zhou, Rungui; Shao, Fengzhi; Wang, Yubo; Gao, Wei; Zhou, Kaichen; Cui, Guolong; Xu, Chenren

WiCompass: Oracle-driven Data Scaling for mmWave Human Pose Estimation

Bo Liang¹, Chen Gong¹, Haobo Wang¹, Qirui Liu¹, Rungui Zhou¹, Fengzhi Shao³, Yubo Wang³, Wei Gao⁴, Kaichen Zhou⁵, Guolong Cui³, Chenren Xu^1,2

¹Peking University ²Key Laboratory of High Confidence Software Technologies, Ministry of Education (PKU) ³UESTC ⁴University of Pittsburgh ⁵MIT
ACM MobiCom 2026

Paper (coming soon) Code

WiCompass encodes human poses from MoCap and mmWave datasets into a shared latent space via VQ-VAE, uses directional k-NN coverage to identify underrepresented regions, and guides targeted data collection in real or simulated settings.

Abstract

Millimeter-wave Human Pose Estimation (mmWave HPE) promises privacy but suffers from poor generalization under distribution shifts. We demonstrate that brute-force data scaling is ineffective for out-of-distribution (OOD) robustness; efficiency and coverage are the true bottlenecks. To address this, we introduce WiCompass, a coverage-aware data-collection framework. WiCompass leverages large-scale motion-capture corpora to build a universal pose space "oracle" that quantifies dataset redundancy and identifies underrepresented motions. Guided by this oracle, WiCompass employs a closed-loop policy to prioritize collecting informative missing samples. Experiments show that WiCompass consistently improves OOD accuracy at matched budgets and exhibits superior scaling behavior compared to conventional collection strategies. By shifting focus from brute-force scaling to coverage-aware data acquisition, this work offers a practical path toward robust mmWave sensing.

Why Data Coverage Matters

While state-of-the-art mmWave HPE models achieve impressive in-distribution accuracy, their performance degrades sharply under even modest distribution shifts. Our pilot study reveals two critical data-level bottlenecks:

Generalization bottleneck. Simply withholding a single action (e.g., "Waving hand (left)") from training causes joint localization error to spike from ~50 mm to over 120 mm, even when semantically related motions remain in the training set. Current models learn to interpolate within known distributions but fail to generalize across the true spectrum of human motion.

Efficiency bottleneck. Naively scaling up data is not only expensive but frequently ineffective. Randomly discarding up to 70% of training samples from existing large-scale datasets results in less than a 2% drop in performance, revealing massive redundancy in current collection practices.

Model capacity saturates quickly (~10 MB). Larger models yield diminishing returns, ruling out model architecture as the bottleneck.

Leave-one-out test: holding out a single action causes error to spike ~2.4x, revealing severe OOD fragility in current datasets.

Discarding 70% of training data causes less than 2% performance drop, exposing massive redundancy in existing datasets.

Method

Universal Pose Tokenizer (VQ-VAE)

We train a VQ-VAE on the AMASS motion capture dataset to learn a universal vocabulary of human motion. The encoder compresses 3D poses into a sequence of discrete tokens from a learned codebook, while the decoder reconstructs poses from these tokens. This quantization maps any pose — whether from optical MoCap or mmWave estimates — into a shared set of tokens, enabling modality-agnostic comparison. The discrete space also makes coverage analysis computationally tractable: instead of comparing continuous distributions, we simply measure token distances.

k-NN Coverage Analysis

We build a k-Nearest Neighbors framework that provides two complementary views of dataset quality. Cross-dataset analysis compares a mmWave dataset against the large MoCap corpus to identify coverage gaps — poses that are underrepresented in the mmWave data. Intra-dataset analysis uses a Normalized Redundancy Index (NRI) to quantify oversampling within a single dataset. Together, these metrics produce an actionable gap set of missing poses for targeted collection.

Coverage-Driven Data Collection

Given the identified gap set, WiCompass selects target poses under a fixed budget using Capped Probability-Proportional-to-Size (Capped-PPS) sampling. This strategy prioritizes sparse, underrepresented regions while filtering extreme outliers that may correspond to physically implausible poses. The selected targets are then realized through either real-world capture (a volunteer mimics on-screen visualizations) or simulation (RF-Genesis ray-tracing simulator). After each acquisition round, coverage is recomputed, forming a closed feedback loop.

Results

Data Scaling Efficiency

WiCompass yields a consistent monotonic decrease in OOD error as dataset size grows, with an order-of-magnitude larger scaling exponent than the conventional baseline. At matched budgets (2k–40k samples), WiCompass reduces OOD MPJPE by roughly 25–30 mm compared to the mmBody-trace baseline, and the gap widens with more data.

Scaling laws comparison on synthetic data

Dataset Coverage Analysis

Cross-dataset coverage analysis reveals that existing mmWave datasets occupy only a narrow subset of the human motion manifold. mmBody covers just 3.7% of AMASS at k=12, while WiCompass achieves significantly broader coverage per unit of data budget — using roughly one-eighth as many samples.

mmBody vs. AMASS

MMFi vs. AMASS

WiCompass vs. AMASS

Pose Token Visualization

The learned VQ-VAE tokens correspond to semantically meaningful motion primitives. Modifying individual tokens reveals that different tokens control distinct body parts — for example, one token governs global orientation while another affects the right elbow — demonstrating the interpretability of the codebook.

Real-World Validation

We validate WiCompass with a closed-loop real-world experiment using a 79 GHz FMCW radar paired with a vision motion capture system. Under an identical budget of 8k training frames, WiCompass achieves 105.7 mm test MPJPE, significantly outperforming the conventional baseline (112.9 mm) and approaching the oracle recollection upper bound (95.7 mm).

BibTeX

@inproceedings{liang2026wicompass,
  title={WiCompass: Oracle-driven Data Scaling for mmWave Human Pose Estimation},
  author={Liang, Bo and Gong, Chen and Wang, Haobo and Liu, Qirui and Zhou, Rungui and Shao, Fengzhi and Wang, Yubo and Zhou, Kaichen and Gao, Wei and Cui, Guolong and Xu, Chenren},
  booktitle={Proceedings of the 32nd Annual International Conference on Mobile Computing and Networking},
  year={2026}
}