This paper proposes a simple and discriminative framework, using graphical model and 3D geometry to understand the diversity of urban scenes with varying viewpoints. Our algorithm constructs a conditional random field (CRF) network using over-segmented superpixels and learns the appearance model from different set of features for specific classes of our interest. Also, we introduce a training algorithm to learn a model for edge potential among these superpixel areas based on their feature difference. The proposed algorithm gives competitive and visually pleasing results for urban scene segmentation. We show the inference from our trained network improves the class labeling performance compared to the result when using the appearance model solely.