2022-01-27 13:44:41 +00:00
2022-01-27 12:37:45 +00:00
2022-01-27 12:51:05 +00:00
2022-01-27 13:44:41 +00:00
2022-01-27 13:44:41 +00:00
2022-01-27 12:37:45 +00:00
2022-01-27 12:37:45 +00:00
2022-01-27 12:37:45 +00:00
2022-01-27 20:49:49 +08:00
2022-01-27 12:37:45 +00:00
2022-01-27 13:44:41 +00:00
2022-01-27 12:37:45 +00:00
2022-01-27 12:37:45 +00:00
2022-01-27 12:37:45 +00:00
2022-01-27 12:37:45 +00:00
2022-01-27 12:37:45 +00:00

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

This is the PyTorch implementation of the BLIP paper.

Catalog:

  • Inference demo
  • Pre-trained and finetuned checkpoints
  • Pre-training code
  • Finetuning code for Image-Text Retrieval, Image Captioning, VQA, and NLVR2
  • Download of bootstrapped image-text dataset

Inference demo (Image Captioning and VQA):

Run our interactive demo using Colab notebook (no GPU needed):

Description
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Readme 9.7 MiB
Languages
Jupyter Notebook 72.5%
Python 27.5%