UAV videos are attracting widespread interest due to their cost effectiveness and wide applications in monitoring environmental changes, disaster management etc. Recently, computer vision algorithms are utilized to analyse UAV videos and act as decision support systems. To this end, the context of the scene plays a prominent role in improving the performance of the decision support systems. The context of the video scenes is generally realized by using a video semantic segmentation algorithm. The success of video semantic segmentation algorithms relies on temporal consistency for which estimation of temporal correspondence is a necessity. Optical flow based methods are popularly used in literature for establishing temporal correspondence which are expensive for video semantic segmentation. In this regard, a new Conditional Random Field frame work is presented in this paper for UAV video semantic segmentation. A new pairwise potential energy term is proposed which uses long range temporal information required for temporally consistent labels. Further, the proposed method selectively applies CRF inference which reduces the CRF computation and is independent of optical flow estimation. The proposed algorithm achieved an mIoU of 0.89 on ManipalUAVid dataset.