In clouds and data centers, GPU servers which consist of multiple GPUs are widely deployed. Current state-of-the-art GPU scheduling algorithm are \"static\" in assigning applications to different GPUs. These algorithms usually ignore the dynamics of …