SUR (Semantic Understanding & Reasoning Adapter) for Text to Image Diffusion Models

CS 726: Advanced Machine Learning, Prof. Sunita Sarawagi

Report
Presentation


Semantic Enhancement of Text to Image Diffusion Models

In this project, we explored and experimented with advanced research papers focused on enhancing the semantic understanding of text-to-image generators. Specifically, we delved into:


  • SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models (LLMs)
  • ELLA: Equipping Diffusion Models with LLMs for Enhanced Semantic Alignment

Our work involved devising architectural changes and modifications to these implementations, aiming to improve the semantic understanding of the stable diffusion pipeline. We meticulously compared our enhanced models with the vanilla implementations to evaluate performance improvements.