Posters

Cost & Energy of Everyday LLM Workloads: A Visual Field Guide for Python Developers

Presented by

Experience Level:

Some experience

Description

Generative AI is everywhere, but the real cost and energy impact of everyday LLM calls is usually buried in dashboards and invoices. This poster turns those hidden numbers into clear visuals that Python developers can explore at a glance.

Using small, reproducible Python scripts, we profile common workloads, chat completions, document summarization, classification, RAG queries, and batch embedding jobs, across different model sizes and configurations. For each scenario, we generate waterfall charts that break total latency into network, tokenization, inference, and post‑processing, plus bar plots that compare cost per request and per 1,000 tokens, along with simple indicators of CPU vs GPU energy use.

All measurements are implemented with familiar Python tools such as pandas, matplotlib, and popular LLM client libraries, and the code will be available in an open repository so attendees can plug in their own endpoints and regenerate the figures. The poster is designed for students, educators, and practitioners who already use LLMs but want crisp, visual intuition for “what this call really costs,” leaving them with practical heuristics and ready‑to‑run notebooks for their own projects.