Hello Joemarr,
the question is not the overhead but the way load balancing is performed.
all methods either based on a L2 or L3 bundle or using CEF and parallel L3 links use flow based load balancing.
L2 bundle or L3 bundle with IPv4 traffic: exor of less significant bits of IP SA and IP DA in modern IOS switches.
in older switches can be the exor of MAC SA and MAC DA.
CEF load balancing : IP SA exor IP DA exor hash seed. The hash seed changes at device reload.
if your traffic is not enough variable or if there are few very high traffic volume flows like database synchronization you can end up with one member link in the bundle that is full and other member links that are far less used.
With L3 CEF you might be enable to enable per packet load balancing but this requires some thoughts because it is a possible source of out of order packets that can affect performances of TCP/IP stacks of hosts in the two sites.
Hope to help
Giuseppe