Projects | Akshay Raman

Finetuning Video Diffusion Models for Multi-view Consistency

Fine-tuned a video diffusion model to generate multi-view consistent object renderings from single-view inputs. Demonstrated that a curated high-quality 1% subset (10K objects) of the Objaverse dataset achieved performance comparable to full-scale training. (1M+ objects).

Canvas - A Template for Deep Learning Projects

Designed a flexible deep learning project template using pytorch and hydra. The template is based on the agent-environment interface in RL and supports all kinds of machine learning tasks.

Hierarchical CLIP-based Image Geolocation Prediction

Trained a CLIP-inspired image geolocation model that predicts the precise location of an image taken anywhere on earth. Designed a novel inference approach based on hierarchical feature clustering which achieves comparable performance while being ~100x more efficient than previous methods.

Continual Learning for Policy Gradient Methods

Masters Capstone Project

Developed novel incremental learning algorithms to train reinforcment learning agents on a variety of real-world environments. Modified batch-wise policy gradient methods using eligibility traces to eliminate data buffers, particularly for long horizon tasks.

Solving Optimal Transport using Deep Neural Networks

Project under Prof. Augusto Gerolin, Undergraduate Thesis

Developed gradient-based DNN appoximators to solve the optimal transport problem for high-dimensional data. Aimed to study application of OT in Density Functional Theory (DFT) to study dissociation of atoms.

Multi-lingual Question Answering

Built an multi-lingual question answering system using the HuggingFace API on syntactic rules from multiple languages. Finetuned BERT on the SQUAD dataset augmented with multiple question variants using back translation.

Diabetic Retinopathy Detection

Trained large-scale CNNs to predict diabetic retinopathy (an eye disease) from a noisy dataset of retinal images. Generated heatmaps using Grad-CAM to identify parts of the image which had the most impact on model prediction.