Can conditional Autoencoders be used for regression predictions on an image?

I have a question regarding the use of C-VAE.

Say you had an image you want to do pose estimation on, and you have 1 or 2 landmarks but not the rest. How can you leverage this information as a prior as well as encode some kind of shape constraint implicitly? Could you use C-VAE’s somehow to represent possible human poses in the latent space, and then condition it using the ground truth landmarks along with the predicted heatmap? If not, how else could you model this?

Any advice, reading material or pointers would be hugely appreciated. Thanks!

I am not sure about this, but there is a chance that this pdf contains what you are looking for: