Files
Depth-Anything/semseg
2024-02-02 19:46:47 +08:00
..
2024-02-02 19:46:47 +08:00
2024-02-02 19:46:47 +08:00

Depth Anything for Semantic Segmentation

We use our Depth Anything pre-trained ViT-L encoder to fine-tune downstream semantic segmentation models.

Performance

Cityscapes

Note that our results are obtained without Mapillary pre-training.

Method Encoder mIoU (s.s.) m.s.
SegFormer MiT-B5 82.4 84.0
Mask2Former Swin-L 83.3 84.3
OneFormer Swin-L 83.0 84.4
OneFormer ConNeXt-XL 83.6 84.6
DDP ConNeXt-L 83.2 83.9
Ours ViT-L 84.8 86.2

ADE20K

Method Encoder mIoU
SegFormer MiT-B5 51.0
Mask2Former Swin-L 56.4
UperNet BEiT-L 56.3
ViT-Adapter BEiT-L 58.3
OneFormer Swin-L 57.4
OneFormer ConNeXt-XL 57.4
Ours ViT-L 59.4

Pre-trained models

Note: If you want to reproduce the training process, please 1) download the Depth Anything pre-trained model (to initialize the encoder) and 2) put it under the checkpoints folder.

Installation

Please refer to MMSegmentation for instructions.

After installation:

For training or inference with our pre-trained models, please refer to MMSegmentation instructions.