diff --git a/README.md b/README.md
index d2b3817..4bdbe16 100644
--- a/README.md
+++ b/README.md
@@ -3,11 +3,11 @@
This is the PyTorch implementation of the BLIP paper. The code has been tested on PyTorch 1.9 and 1.10.
Catalog:
-- [x] Inference demo
+- [ ] Inference demo
- [x] Pre-trained and finetuned checkpoints
- [x] Finetuning code for Image-Text Retrieval, Image Captioning, VQA, and NLVR2
- [x] Pre-training code
-- [x] Download of bootstrapped image-text dataset
+- [x] Download of bootstrapped image-text datasets
### Inference demo (Image Captioning and VQA):
@@ -15,10 +15,20 @@ Run our interactive demo using Colab notebook (no GPU needed):
### Pre-trained checkpoints:
Num. pre-train images | BLIP w/ ViT-B | BLIP w/ ViT-B and CapFilt-L | BLIP w/ ViT-L
---- | --- | --- | ---
+--- | :---: | :---: | :---:
14M | Download| - | -
129M | Download| Download | Download
+### Finetuned checkpoints:
+Task | BLIP w/ ViT-B | BLIP w/ ViT-B and CapFilt-L | BLIP w/ ViT-L
+--- | :---: | :---: | :---:
+Image-Text Retrieval (COCO) | Download| - | Download
+Image-Text Retrieval (Flickr30k) | Download| - | Download
+Image Captioning (COCO) | - | Download| Download |
+VQA | Download| - | -
+NLVR2 | Download| - | -
+
+
### Image-Text Retrieval:
1. Download COCO or Flickr30k datasets from the original websites, and set 'image_root' in configs/retrieval_{dataset}.yaml accordingly.
2. To evaluate the finetuned BLIP model on COCO, run: