Stochastic Optimal Control Through Gradient Projection Method And Backward Action Learning