Search-TTA: A Multimodal Test-Time Adaptation Framework for Visual Search in the Wild Demo
Click on any of the examples below and run the demo.
Note that the model initialization, RL planner, and TTA updates are not fully optimized on GPU for this huggingface demo, and hence may experience some lag during execution. We will improve this in the future.
Project Website
Model Inputs
Live Heatmap Output
Taxonomy
Satellite Image | Ground-level Image (optional) | Full Taxonomy Name (optional) |
---|
The satellite image CLIP encoder is fine-tuned using Sentinel-2 Level 2A satellite image and taxonomy images (with GPS locations) from iNaturalist. The sound CLIP encoder is fine-tuned with a subset of the same taxonomy images and their corresponding sounds from iNaturalist. Note that while some of the examples above result in poor probability distributions, they will be improved using our test-time adaptation framework during the search process.
Search-TTA: A Multimodal Test-Time Adaptation Framework for Visual Search in the Wild Demo
Click on any of the examples below and run the demo.
Project Website
Heat-map Results
Ground Image Query
Text Query
Sound Query
In-Domain Taxonomy
Satellite Image | Ground-level Image (optional) | Full Taxonomy Name (optional) | Sound Input (optional) |
---|
Out-Domain Taxonomy
Satellite Image | Ground-level Image (optional) | Full Taxonomy Name (optional) | Sound Input (optional) |
---|
The satellite image CLIP encoder is fine-tuned using Sentinel-2 Level 2A satellite image and taxonomy images (with GPS locations) from iNaturalist. The sound CLIP encoder is fine-tuned with a subset of the same taxonomy images and their corresponding sounds from iNaturalist. Note that while some of the examples above result in poor probability distributions, they will be improved using our test-time adaptation framework during the search process.