Speakers:

Tejas Chopra

Memory Wall for AI

Date:

Wednesday, May 6, 2026

Time:

9:35 am

Summary:

Modern generative AI systems—from LLMs to multimodal models—are no longer compute-bound; they are memory-bound. As model sizes soar, inference latency is dominated by memory bandwidth, memory fragmentation, KV-cache bloat, checkpoint restore time, and PCIe/NVLink bottlenecks. This session breaks down the “Memory Wall” limiting generative model performance and shares practical techniques such as model compression, quantization, memory-efficient attention, sharding, and cold-start optimization. This talk provides actionable insights for practitioners building large-scale generative AI infrastructure.

Ready to attend?

Register now! Join your peers.

Register nowView agenda
Newsletter Knowledge is everything! Sign up for our newsletter to receive:
  • 10% off your first ticket!
  • insights, interviews, tips, news, and much more about Machine Learning Week
  • price break reminders