diff --git a/README.md b/README.md index ffc5bc0..771ac00 100644 --- a/README.md +++ b/README.md @@ -60,14 +60,14 @@ NLVR2 | python -m torch.distributed.run --nproc_per_node=8 train_caption.py ### VQA: 1. Download VQA v2 dataset and Visual Genome dataset from the original websites, and set 'vqa_root' and 'vg_root' in configs/vqa.yaml. 2. To evaluate the finetuned BLIP model, generate results with: (evaluation needs to be performed on official server)
python -m torch.distributed.run --nproc_per_node=8 train_vqa.py --evaluate
-3. To finetune the pre-trained checkpoint using 16 A100 GPUs, first set 'pretrained' in configs/vqa.yaml as "https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model*_base.pth". Then run: +3. To finetune the pre-trained checkpoint using 16 A100 GPUs, first set 'pretrained' in configs/vqa.yaml as "https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth". Then run:
python -m torch.distributed.run --nproc_per_node=16 train_vqa.py 
### NLVR2: