Reducing the Cost of Test Data Labelling for Deep-Learning Systems: An Empirical Study