Efficient Diffusion Models as Low-Cost Synthesizers
Vocal-conditioned music generation · IEEE/WI IAT · arXiv
My research sits at the intersection of hardware-aware AI and embodied intelligence. I focus on learning systems that are mathematically grounded yet deployable on robots and edge devices under strict latency, power, and memory constraints—bridging high-level generative modeling with low-level system execution.
Core Focus Areas
Efficient Diffusion Models as Low-Cost Synthesizers
Vocal-conditioned music generation · IEEE/WI IAT · arXiv

World Models as a Proxy for Robotics
Predictive representations for embodied agents

Robotic Manipulation that Combines DL and Classical Controls
World models + visual foresight for control

DL Microfluidics Simulation as a CFD surrogate
Efficient Simulation of Flows where advection and diffusion both matter · arXiv

Industrial Predictive Modeling as a Data-Centric Challenge
SkySense — Predicting from Industrial Data at Scale

Exploring Legacy Neuroevolution Methods for Understanding Trained Behavior
Neuroevolution in Smash Bros · Post
Ensembling NN as a Heuristic for Robust CV
Robust inference for robotic perception
Classical NEAT Implementation for Atari Games
Neuroevolution for Atari games

Pendulum
OpenCV tracking + dynamics modeling

LegoFIKS
3D generation + instruction synthesis · Post
Candidly
Anti-cheat platform for interviews · Devpost
A lightweight latent diffusion model for vocal-conditioned musical accompaniment generation. We present soft alignment attention that adaptively combines local and global temporal dependencies across diffusion timesteps. The model achieves 220× parameter reduction and 52× faster inference (15M params) while maintaining competitive quality—enabling real-time deployment on consumer hardware.
End-to-end predictive maintenance pipelines deployed in manufacturing environments, with emphasis on robustness to noise, missing modalities, and out-of-distribution behavior. Focus on bridging research-grade models with real production constraints.
A comprehensive platform ensuring interview integrity through gaze and typing pattern detection, combined with a video interviewing tool for candidates and a Databricks-style analytics dashboard for interviewers. Features session recording, anomaly flagging, and insights on interviewer performance.
Predictive representations for embodied agents, bridging generative modeling with real-time inference constraints in closed-loop control. Explores latent dynamics learning and planning in compressed state spaces.
Generates 3D LEGO models and step-by-step role-playing instructions from input images using modern generative models and structured outputs. Demonstrates human-centered AI applications.
A semantic-based matching algorithm built on AWS primitives (Lambda + embeddings + DynamoDB) developed under hackathon constraints. Demonstrates rapid prototyping with cloud services.
Sequence-to-sequence encoder–decoder implementation for translation tasks with LSTM bottleneck. Includes discussion of attention mechanisms as next iteration. Fully documented with interactive Colab notebook.
OpenCV-based real-time pendulum tracking with downstream analysis of period and damping behavior. Demonstrates integration of computer vision with system identification.
NeuroEvolution of Augmenting Topologies (NEAT) implementations including self-play Pong demo and evolved Smash Bros agents. Demonstrates evolutionary optimization for agent design.
I enjoy teaching and building technical communities around tools and research: