Learning Robust Visual-Semantic Retrieval Models With Limited Supervision