Concurrent execution of tasks in GPUs can reduce the computation time of a workload by
overlapping data transfer and execution commands.
However it is difficult to implement an efficient run-
time scheduler that minimizes the workload makespan
as many execution orderings should be evaluated. In
this paper, we employ scheduling theory to build a
model that takes into account the device capabili-
ties, workload characteristics, constraints and objec-
tive functions. In our model, GPU tasks schedul-
ing is reformulated as a flow shop scheduling prob-
lem, which allow us to apply and compare well known
methods already developed in the operations research
field. In addition we develop a new heuristic, specif-
ically focused on executing GPU commands, that
achieves better scheduling results than previous tech-
niques. Finally, a comprehensive evaluation, showing
the suitability and robustness of this new approach,
is conducted in three different NVIDIA architectures
(Kepler, Maxwell and Pascal).