Files
BLIP/README.md
2022-01-27 20:49:49 +08:00

15 lines
553 B
Markdown

## BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
This is the PyTorch implementation of the <a href="https://arxiv.org/abs/2107.07651">BLIP paper</a>.
Catalog:
- [x] Inference demo
- [x] Pre-trained and finetuned checkpoints
- [x] Pre-training code
- [x] Finetuning code for Image-Text Retrieval, Image Captioning, VQA, and NLVR2
- [x] Download of bootstrapped image-text dataset
### Inference demo (Image Captioning and VQA):
Run our interactive demo using Colab notebook (no GPU needed):