Toward Joint Scene Understanding Using Deep Convolutional Neural Network: Object State, Depth, and Segmentation