CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Updated 2024-06-04 19:47:22 +00:00
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
image-captioning
image-text-retrieval
vision-and-language-pre-training
vision-language
vision-language-transformer
visual-question-answering
visual-reasoning
Updated 2023-06-12 21:34:17 +00:00