Files
BLIP/README.md
2022-01-27 20:49:49 +08:00

553 B

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

This is the PyTorch implementation of the BLIP paper.

Catalog:

  • Inference demo
  • Pre-trained and finetuned checkpoints
  • Pre-training code
  • Finetuning code for Image-Text Retrieval, Image Captioning, VQA, and NLVR2
  • Download of bootstrapped image-text dataset

Inference demo (Image Captioning and VQA):

Run our interactive demo using Colab notebook (no GPU needed):