Rishabh Mondal

Rishabh Mondal

Hallo, Freunde, I am Rishabh Mondal (ঋষভ মণ্ডল). In Bengali, “Rishabh” means superior and also refers to the second note (Re) in the octave of Indian classical music: a sound associated with harmony and grace.

My research lies at the intersection of Earth Observation and Computer Vision, with a focus on environmental monitoring, geographical domain generalization, and foundation models for remote sensing. I am currently a Ph.D. scholar at the Sustainability Lab, IIT Gandhinagar, supervised by Prof. Nipun Batra. I hold an M.Tech (2023) in Information Technology from the Indian Institute of Engineering Science and Technology (IIEST), Shibpur, where I worked under the guidance of Dr. Prasun Ghosal in the domain of TinyML, and a B.Tech (2021) in Computer Science and Engineering from The Neotia University, Kolkata.

Profile photo


Technical Writings

Deep dives into machine learning, computer vision, and geospatial AI. From foundational concepts to cutting-edge research.

5 Articles

3 Topics

Filter by topic:

Coming Soon

Remote Sensing Foundation Models

An overview of foundation models in remote sensing and their applications for environmental monitoring.

Remote Sensing Foundation Models

Coming Soon

Remote Sensing Foundation Models

An overview of foundation models in remote sensing and their applications for environmental monitoring.

Remote Sensing Foundation Models

Open for Collaboration

Let’s Build Something Meaningful Together

Student-to-Student Research Collaboration in AI for Earth Observation

This opportunity is for students outside IIT Gandhinagar

I believe in peer-to-peer learning and collaborative research. Research is not just about publishing papers. It’s about asking bold questions, learning through failure, and growing together. If you’re a student passionate about solving real-world problems using AI, let’s connect!

How It Works

1
Reach Out

Send me an email or Linkedin with your background and interests

2
Initial Chat

We discuss ideas and find a good project fit

3
Collaborate

Work together on research with regular sync-ups

4
Publish

Aim for a conference/journal publication together

What I Offer

  • Research Guidance: Methodology, experimentation, paper writing
  • Real Problems: Work on impactful Earth Observation challenges
  • Technical Support: Deep learning, PyTorch, geospatial tools
  • Publication Support: Target top-tier venues together

Ideal Collaborator

  • Passionate about Computer Vision or Earth Observation
  • Self-motivated with 15-20 hrs/week commitment
  • Familiar with deep learning & PyTorch (preferred)
  • Growth mindset: we learn and grow together!

Open Research Projects

3 projects available
Open for Collaboration
Silent Collapse: Why Self-Supervised Learning Fails on Visually Homogeneous Domains

Related Paper

The Problem: Self-supervised learning powers modern computer vision, but it was built for ImageNet: rich colors, clear objects, strong contrast. What happens when the visual world is subtle?

Read more about this project

Think Mars terrain, medical tissue, and radar imagery. These are domains where every patch looks nearly identical, contrast is faint, and color barely exists.

We find that SSL methods train normally, loss converges, and checkpoints look healthy, but the learned features are quietly useless. We call this silent collapse.

This project diagnoses exactly where and why standard SSL objectives break on homogeneous imagery. We then propose fixes including frequency-aware masking, contrast-amplified augmentations, and hard-negative mining. These will be validated across Mars science tasks and medical imaging benchmarks.

Goal: Understanding when the most popular training recipe in vision silently stops working.

Self-Supervised Learning Computer Vision Satellite Imagery

Open for Collaboration
GeoRAG: Retrieval-Augmented Spatio-Temporal Forecasting via Multimodal Climate Analogues

Related Code

The Problem: Current spatio-temporal forecasting models assume nearby regions behave similarly. But climate doesn’t respect borders: Mumbai’s monsoon dynamics may better predict Chennai than geographically closer Hyderabad.

Read more about this project

We propose GeoRAG, a retrieval-augmented forecasting framework. Instead of memorizing spatial patterns during training, we encode each region’s geographic identity using satellite imagery, geo-images, and LLM-derived text descriptions into a unified embedding.

At inference, we retrieve the k most climatologically similar regions from anywhere on Earth and feed their historical time series as in-context examples to a lightweight transformer forecaster.

The key technical contribution is a learned multimodal geographic similarity metric, trained so that regions with correlated meteorological behavior cluster together in embedding space, regardless of physical distance.

Goal: Replacing spatial adjacency assumptions with retrieval over climatological twins, making every forecast explainable by where on Earth it came from.

Spatio-Temporal Forecasting Retrieval-Augmented Multimodal Learning Remote Sensing Climate Science

Open for Collaboration
SatBEV: Cross-View Geospatial Foundation Model for HD Map Generation from Satellite Imagery

OpenSatMap

The Problem: HD map construction requires expensive LiDAR-equipped vehicles to drive every road. What if we could generate lane-level HD maps directly from satellite imagery, for any location on Earth?

Read more about this project

We propose SatBEV, a cross-view geospatial foundation model that learns dense lane-level correspondence between satellite imagery and street-level driving observations. Instead of treating satellite images as auxiliary features for HD map construction, we train a dual-encoder architecture that aligns overhead and perspective views in a shared geometric embedding space at instance-level lane granularity.

The model is pretrained on OpenSatMap’s 38k satellite images across 60 cities using a masked autoencoder objective, then fine-tuned on the spatially aligned nuScenes and Argoverse 2 subsets with three joint objectives: contrastive satellite-BEV alignment, lane-level cross-view correspondence prediction via GPS-supervised cross-attention, and a map transfer objective that predicts ego-vehicle-frame HD maps from satellite tiles alone.

The key technical contribution is a cross-view transformer with GPS-derived positional encoding that produces dense correspondence fields mapping satellite lane pixels to BEV lane pixels, handling projective distortion and partial observability. At inference, this enables HD map generation for any location with satellite coverage — no driving data or onboard sensors required.

We evaluate on zero-shot HD map prediction (IoU on nuScenes val for divider/crossing/boundary), few-shot city adaptation with 10 street-level samples, and ablations on whether OpenSatMap’s instance-level annotations improve transfer over coarse semantic labels. The baseline to beat is SatforHDMap at 50.9 IoU; the target is generalization to cities entirely absent from driving datasets.

Goal: Building HD maps for any road on Earth using only satellite imagery, eliminating the need for expensive ground-truth collection vehicles.

Autonomous Driving Cross-View Learning HD Maps Satellite Imagery Foundation Models

Have your own idea? I’m open to discussing it!

Ready to Collaborate?

Send me an email with:

A brief introduction about yourself
Your research interests and background
Which project interests you (or propose your own!)

iamr38579@gmail.com