Dialogue policy plays a crucial role in a dialogue system as it determines the system response given a user input. In a pipeline system, the dialogue policy is susceptible to the performance degradation when the preceding components fail to produce correct output. To address this issue, this paper proposes a new method to train a robust dialogue policy that can handle noisy representation due to the mispredicted user dialogue acts from natural language understanding component. This method is mainly designed with two strategies, which are student-teacher learning and offline reinforcement learning. Student-teacher learning aims to force the student model to map the extracted features of the noisy input to be close to the clean features extracted by teacher model. Meanwhile, the offline reinforcement learning with multi-label classification objective is used to train the dialogue policy to provide appropriate response given user input by only utilizing the trajectories stored in the dataset. The experimental results show that the proposed hybrid learning can substantially improve the multi-turn end-to-end performance in a pipeline dialogue using MultiWOZ 2.1 dataset under ConvLab-2 evaluation framework. Furthermore, competitive results are obtained when compared to the end-to-end performance by using the pre-trained GPT-2 model with lower computational cost.