add RN50x64 and ViT-L/14 models

This commit is contained in:
Jong Wook Kim
2022-01-25 17:04:00 -08:00
parent 573315e83f
commit 67fc250eb6
2 changed files with 3 additions and 1 deletions

View File

@@ -18,7 +18,7 @@ The base model uses a ResNet50 with several modifications as an image encoder an
Initially, weve released one CLIP model based on the Vision Transformer architecture equivalent to ViT-B/32, along with the RN50 model, using the architecture equivalent to ResNet-50.
As part of the staged release process, we have also released the RN101 model, as well as RN50x4, a RN50 scaled up 4x according to the [EfficientNet](https://arxiv.org/abs/1905.11946) scaling rule. In July 2021, we additionally released the RN50x16 and ViT-B/16 models.
As part of the staged release process, we have also released the RN101 model, as well as RN50x4, a RN50 scaled up 4x according to the [EfficientNet](https://arxiv.org/abs/1905.11946) scaling rule. In July 2021, we additionally released the RN50x16 and ViT-B/16 models, and In January 2022, the RN50x64 and ViT-L/14 models were released.
Please see the paper linked below for further details about their specification.