Physically grounded Multimodal foundation Models

Thumbnail

Event details

Date 29.08.2024
Hour 15:0017:00
Speaker Kunal Pratap Singh
Location
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Caglar Gulchere
Thesis advisor: Prof. Amir Zamir
Co-examiner: Prof. Mackenize Mathis

Abstract
On a broader sense, I'm interested in studying the role of the agent's embodiment and environment in the emergence of intelligence, through the lens of multimodality and active perception.
To this end, I'm more specifically trying to study, how an agent can bootstrap its visual capabilities by solely observing the environments based on its physical sensors.
Moreover, I'm also interested in studying the impact of actions on its perception, thereby closing the action perception loop. Specifically, I want to understand how can we design an agent that can take appropriate actions to learn a certain visual property.

Background papers
  1. Elephants Don't Play Chess by Rodney Brooks. The pdf can be found here.  
  2. Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video. The pdf can be found here.
  3. SPIN: Simultaneous Perception, Interaction and Navigation. The pdf can be found here

Practical information

  • General public
  • Free

Contact

  • edic@epfl.ch

Tags

EDIC candidacy exam

Share