For TCP traffic, the queue depth is too small for initial slow start, or queue depth is much too large. (For optimal TCP, queue should be sized for link's BDP [bandwidth delay product].)
For TCP it's also possible when using single FIFO queue to have multiple flows enter global synchronization. (See http://en.wikipedia.org/wiki/TCP_global_synchronization)
For non-TCP, insufficient bandwidth to support it or queue depth is too small for bursty traffic. (Much non-TCP traffic does not dynamically adjust its flow rate when there's congestion discards.)