Pranav Gupta

Pranav Gupta

Multimodal AI • Vision+Language • Generative Models

Master's student (incoming) in CSE at the University of Michigan, Ann Arbor. I build multimodal systems and study natural-language supervision for vision tasks.

Publications Projects Contact

About

I will be starting my Master's in CSE at the University of Michigan, Ann Arbor. My research interests revolve around Multimodal AI and natural language supervision in vision tasks. I'm currently a Research Intern at Stanford's PanLab working on multimodal AI in neuroimaging, and I also intern at DREAM:Lab (IISc) on deep learning for edge accelerators.

Past stints include Samsung, Upthrust, and AarogyaAI on research and data science projects.

GitHub Google Scholar X/Twitter LinkedIn Email

ViDAS

ECHO

ISAApp

Publications

ViDAS

ViDAS: Vision-based Danger Assessment and Scoring

ICVGIP 2024

Human-like danger assessment from videos with LLM-based reasoning and failures on subtle cues.

ACM DL

ECHO

ECHO: Environmental Sound Classification With Hierarchical Ontology-Guided SSL

IEEE CONECCT 2024

Label-ontology guided pretext tasks improve audio classification via semi-supervised learning.

IEEE Xplore

ISAApp

ISAApp – Image Based Smart Attendance

AAIMB 2023

A camera-first pipeline for attendance with privacy-preserving data flows.

Springer

Managing Congregations

Managing Congregations by Predicting Infection Likelihood (COVID)

IEEE CCEM 2020

Graph-based proximity modeling to estimate contagion transfer risk in gatherings.

IEEE Xplore

Selected Projects

AI Wordle Solver

Vision + heuristic search

Predict the next best word by parsing a screenshot of a partially-filled Wordle.

Search Browser History GPT

RAG on local history

Query your search history content semantically to jump to the right page.

Splitwise GPT Vision

Parse a bill image and auto-create Splitwise entries end-to-end.

Genetic Handwritten Digits

Evolve CNN kernels and pooling to optimize MNIST classification.

Face Recognition LFW

One-vs-all and one-vs-one SVMs using FaceNet embeddings on LFW.

CLIP / SimCLR / MusicLM

Clean PyTorch/TensorFlow implementations for core vision & audio papers.

Organizations & Leadership

Odyssey Lab

Odyssey Lab

Co-Founder

Research collective with 15 students & mentors across IISc, IIT, IBM, and IIIT-Hyderabad.

Next Tech Lab

Next Tech Lab

Syndicate – Head AI Researcher

Led >40 ML projects; organized 20+ workshops for 50+ students.

Contact

Open to research collaborations in multimodal reasoning, video understanding, and VLMs. Always happy to chat.

Email me X/Twitter GitHub

gpranav at umich dot edu

New Delhi, India → Ann Arbor, MI

Made with DeepSite - 🧬 Remix