Modern data-center applications frequently exhibit one-to-many communication patterns and, at the same time, require the fabric to provide sub-millisecond latencies and high throughput. Native IP multicast can achieve these requirements. However, native IP multicast solutions have scalability limitations that make it challenging to offer it as a service for hundreds of thousands of tenants, typical of cloud environments. Tenants must, thus, either add custom support for multicast in their applications or rely on services like overlay multicast, both of which are unicast-based approaches that impose their own overhead on throughput and CPU utilization, leading to higher and unpredictable latencies. In this paper, we present Elmo, a mechanism for native multicast in switches which takes advantage of the unique characteristics of data-center topologies and workloads. Specifically, the symmetric topology and short paths in a data center and the tendency of virtual machines (VMs) from individual tenants to cluster in small portions of the topology. Elmo encodes multicast group information inside packets themselves, significantly reducing the need to store multicast group information in individual network switches. In a data-center topology with 27K hosts, Elmo supports a million multicast groups using a 325-byte packet header while requiring as few as 1.1K multicast flow-table entries on average in leaf switches with a traffic overhead as low as 5%.