An international competition spanning lightweight efficient models and frontier multimodal LLMs on depth, IMU, mmWave radar, and skeleton modalities — built on the CUHK-X benchmark for HAR, HAU, and HARn tasks.
The CUHK-X Multimodal Human Activity Challenge is the first large-scale international competition that excludes RGB data entirely — models learn human dynamics information from depth, IMU, mmWave radar, and skeleton modalities. This privacy-preserving design mirrors the deployment reality of healthcare, smart home, and elderly-care systems, where visual privacy must be preserved at every stage of training, validation, and inference.
Hosted by the AIoT Lab at The Chinese University of Hong Kong on Kaggle across two parallel tracks, with finals held alongside UbiComp 2026 in Shanghai. The Small Model Track targets efficient, edge-deployable HAR; the Large Model Track pushes multimodal LLMs on VQA covering action understanding and reasoning. Total prize pool: USD $20,000.
Two independent parallel tracks, each with its own Kaggle leaderboard, prize pool, and evaluation criteria.
Efficient multimodal Human Activity Recognition
Targeted at resource-constrained edge deployment in smart home and healthcare scenarios — applications such as Alzheimer's monitoring, fall detection, and elderly care, where models must run on low-power devices with limited memory and compute.
Participants build lightweight multimodal models that fuse depth imagery, IMU streams, mmWave radar, and skeleton keypoints to classify 40 daily activities under strict cross-subject evaluation. Traditional architectures (CNN, RNN, Transformer) are encouraged; large pretrained foundation models are not permitted — placing the spotlight on architecture design, sensor fusion, and inference efficiency.
LVLMs on depth & non-RGB modalities
Pushes the frontier of Large Vision-Language Models on non-RGB modalities.
Participants tackle Human Action Understanding (HAU) and Human Action Reasoning (HARn) through privacy-preserving video Visual Question Answering. There is no parameter limit, encouraging exploration of prompt design, modality alignment, and fine-tuning at scale.
Three-Stage Competition · Both Tracks
Each track carries an independent prize pool of USD $10,000. The following prizes apply independently to both the Small Model Track and the Large Model Track.
$10,000 per track · $20,000 total across both tracks
Certificate Awards · Both Tracks
Beyond the prize-money tiers above, every participating team is recognized through a 5-tier certificate system (per track). Tiers are nested — each team receives only their highest-qualifying award.
| Tier | Award | Eligibility |
|---|---|---|
| 🏆 | Outstanding Award | UbiComp Finals Top 6 |
| 🎖️ | Finalist Award | Kaggle Private LB Top 15 |
| 🎗️ | Excellence Award | Kaggle Private LB Top 15% (excl. Top 15) |
| 📜 | Distinction Award | Kaggle Private LB Top 30% (excl. above) |
| ✨ | Successful Participation Award | Teams with ≥ 1 valid submission |
All certificates are issued electronically to the email address provided during team registration.
Flat travel grant for every finalist team attending in person at UbiComp 2026 Shanghai
In-person attendance is strongly encouraged. Teams unable to travel may join remotely via Zoom — organizers will operate the projector and coordinate live Q&A on their behalf. Remote teams remain eligible for prizes and awards, but travel grants only apply to teams attending in person.
Top 6 teams per track, auto-synced from Kaggle every day at 10:00 HKT.
| Rank | Team | Score | Last Submission |
|---|---|---|---|
| Loading… | |||
Official CUHK-X Challenge website goes live with full timeline, track details, and dataset overview.
Both Kaggle competitions open for registration and submissions. Dataset publicly released. Promotion channels go live simultaneously.
Public submissions close. Top 15 teams per track on the Kaggle private leaderboard are notified and required to upload code + checkpoint within 48 hours.
Top 15 teams attend a Zoom verification session where they run live inference on freshly released sample data. Organizers also reproduce Kaggle results offline using submitted code. Teams with accuracy gap > 10% from their private LB score are disqualified.
Top 6 teams per track passing verification are officially invited to UbiComp 2026 finals. Final Technical Report due by this date.
Finalists run inference on a brand-new private dataset (cross-subject, never released). 15-min technical report presentation + Q&A. Awards ceremony same day. Teams unable to travel may participate via Zoom.
Step 1. Register your team on this official website using the form below — required to be eligible for prizes, announcements, and finals invitations. Step 2. Join the competition on Kaggle and create your team there with the same team name. ⚠ Your Kaggle team name must exactly match the team name registered here — otherwise your certificate and shortlist notification cannot reach you.
Submit your team info so we can keep you in the loop on competition news, finals logistics, prize coordination, and last-minute announcements. Takes about 2 minutes to fill out.
Step 2 — Join on Kaggle (use the same team name)
Multimodal action recognition under resource constraints. Build efficient CNN / RNN / Transformer architectures — no large pretrained backbones permitted.
Register on Kaggle →Vision-language models on depth and non-RGB modalities. Tackle 6,160 VQA questions across five reasoning types — no parameter limit.
Register on Kaggle →Registration Steps
Sign up at kaggle.com and verify your email address.
Accept the competition rules on one or both Kaggle pages.
Solo entry or up to 3 members. Merge teams via Kaggle's UI before deadline.
Access the dataset, train your model, and submit predictions to the leaderboard.
Most large vision-language models still depend almost entirely on RGB data, while modalities such as depth, thermal imaging, IMU, and millimeter-wave radar remain severely underrepresented. The root cause is a lack of large-scale, high-quality paired multimodal datasets.
CUHK-X, built by the AIoT Lab at CUHK, addresses this gap with 64,267 samples across seven fully synchronized modalities collected from 30 participants performing 40 daily activities across two real-world indoor environments. Annotations follow a Ground-Truth First strategy, combining LLM-generated scene descriptions with human review to ensure temporal and logical consistency. The dataset supports three progressive tasks: HAR (action classification), HAU (action understanding), and HARn (action reasoning).
Modalities — Available in Dataset
Benchmark Tasks
All Top 15 teams per track must pass a Zoom-based verification session before advancing to the UbiComp 2026 finals. The verification involves live inference on freshly released sample data, plus offline reproduction of Kaggle results by organizers.
By Sep 22, 23:59 UTC, Top 15 teams upload:
code/ — full training and inference codecheckpoints/model.pth — final model weightsinference.sh — single entry script (data_dir → CSV)README.md — reproducibility artifacthonor_declaration.pdf — signedA 45-min recorded session per team. Sample data link released at session start. Teams have ≤ 2 hours to complete inference and submit results.










For technical questions or general information about the competition and dataset.
Technical questions, team registration help, or anything else about the challenge.
CUHK-X Multimodal Human Activity Challenge · 2026