Improving Energy Efficiency Of Basic Linear Algebra Routines On Heterogeneous Systems With Multiple Gpus