Target speaker extraction (TSE) acts as a front-end processing technology for various speech applications, such as automatic speech recognition. However, TSE has long faced challenges from underdetermined environments and in the presence of noise. In this paper, we propose a dual-channel system for directional TSE under noisy underdetermined conditions. In our approach, we utilize two source models that integrate conditional variational autoencoders (CVAEs) with global style tokens (GSTs) to learn representations of the noisy single speech and the noisy mixed speech within a geometric source separation framework, where GSTs generate conditional variables for CVAEs. To address residual noise in the extracted target signal under various noisy conditions, we introduce a conditional neural postfilter with a GST to estimate a complex Time-Frequency (T-F) mask for denoising. Additionally, we propose a joint network, where a conditional neural postfilter is jointly trained with a CVAE and a shared GST module. The experimental results demonstrate that our proposed dual-channel TSE method achieves better performance under noisy underdetermined conditions.