Projects
Some projects I have worked on.

Fine-Tuning Video Diffusion Models for 3D-Consistent Multi-view Generation
Fine-tuned a video diffusion model (SVD) to generate geometrically consistent, multi-view renderings from a single input image. Demonstrated that a curated high-quality 1% subset (10K objects) of the Objaverse dataset achieved performance comparable to full-scale training (1M+ objects).
Canvas - A Modular Deep Learning Project Template Using Pytorch and Hydra
Designed a flexible, modular deep learning project template using pytorch and hydra. Canvas aims to provide a unified template for all kinds of machine learning projects.

Scalable CLIP-based Geolocation via Hierarchical Embedding Search
Developed a CLIP-based geolocation model trained on over 4M+ images from the MediaEval-16 dataset, achieving 70% country-level prediction accuracy. Engineered a novel hierarchical clustering algorithm to accelerate model inference by ~100x, reducing the search space from 100k+ GPS points to ~1k while maintaining competitive accuracy.

Continual Credit Assignment with Eligibility Traces
Masters Capstone Project
Developed an online reinforcement learning algorithm by adapting Generalized Advantage Estimation (GAE) with eligibility traces, eliminating memory-intensive data buffers. Proposed a clipped traces regularization method to solve training instability in the online setting, and applied on MuJoCo and Atari environments.

Solving Optimal Transport using Deep Neural Networks
Project under Prof. Augusto Gerolin, Undergraduate Thesis
Prototyped a deep neural network solver for amortized Wasserstein OT in TensorFlow, accelerating the Sinkhorn algorithm by 2x on MNIST. Simulated atomic dissociation for N-electron systems using an OT solver, predicting potential energy curves within 5% of theoretical values.

Multi-Lingual QA with Back-Translation Augmentation
Engineered a multi-lingual QA system supporting 6+ languages using the Hugging Face Transformers, and Google Translate API for cross-lingual translation. Fine-tuned BERT model on the SQuAD dataset, doubling the training data size with back-translation to improve model generalization and robustness.

End-to-End System for Interpretable Diabetic Retinopathy Detection
Implemented an EfficientNet model in TensorFlow to classify Diabetic Retinopathy, achieving robust performance on a noisy and imbalanced medical dataset of retinal images. Deployed a complete end-to-end system as a Flask web application, integrating Grad-CAM to generate visual heat-maps that assist diagnosis and ensure model interpretability.