From 473f323924b28bd03eb808a1c3cfe87ea997b9c1 Mon Sep 17 00:00:00 2001 From: Junnan Li Date: Thu, 27 Jan 2022 21:19:58 +0800 Subject: [PATCH] Update README.md --- README.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index d2b3817..4bdbe16 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,11 @@ This is the PyTorch implementation of the BLIP paper. The code has been tested on PyTorch 1.9 and 1.10. Catalog: -- [x] Inference demo +- [ ] Inference demo - [x] Pre-trained and finetuned checkpoints - [x] Finetuning code for Image-Text Retrieval, Image Captioning, VQA, and NLVR2 - [x] Pre-training code -- [x] Download of bootstrapped image-text dataset +- [x] Download of bootstrapped image-text datasets ### Inference demo (Image Captioning and VQA): @@ -15,10 +15,20 @@ Run our interactive demo using Colab notebook (no GPU needed): ### Pre-trained checkpoints: Num. pre-train images | BLIP w/ ViT-B | BLIP w/ ViT-B and CapFilt-L | BLIP w/ ViT-L ---- | --- | --- | --- +--- | :---: | :---: | :---: 14M | Download| - | - 129M | Download| Download | Download +### Finetuned checkpoints: +Task | BLIP w/ ViT-B | BLIP w/ ViT-B and CapFilt-L | BLIP w/ ViT-L +--- | :---: | :---: | :---: +Image-Text Retrieval (COCO) | Download| - | Download +Image-Text Retrieval (Flickr30k) | Download| - | Download +Image Captioning (COCO) | - | Download| Download | +VQA | Download| - | - +NLVR2 | Download| - | - + + ### Image-Text Retrieval: 1. Download COCO or Flickr30k datasets from the original websites, and set 'image_root' in configs/retrieval_{dataset}.yaml accordingly. 2. To evaluate the finetuned BLIP model on COCO, run: