We currenent release the code and models for:
-
ImageNet-1K pretraining
-
ImageNet-1K pretraining + Token Labeling
-
Large resolution fine-tuning
-
Lightweight Model
05/21/2022
Lightweight models are released, which surpass MobileViT, PVTv2 and EfficientNet.
03/06/2022
Some models with head_dim=64
are released, which can save memory cost for downstream tasks.
01/19/2022
- Pretrained models on ImageNet-1K with Token Labeling.
- Large resolution fine-tuning.
01/13/2022
Pretrained models on ImageNet-1K are released.
The followed models and logs can be downloaded on Google Drive: total_models, total_logs.
We also release the models on Baidu Cloud: total_models (bdkq), total_logs (ttub).
Model | Top-1 | Resolution | #Param. | FLOPs | Model | Log | Shell |
---|---|---|---|---|---|---|---|
UniFormer-XXS | 76.8 | 128x128 | 10.2M | 0.43G | run.sh | ||
UniFormer-XXS | 79.1 | 160x160 | 10.2M | 0.67G | run.sh | ||
UniFormer-XXS | 79.9 | 192x192 | 10.2M | 0.96G | run.sh | ||
UniFormer-XXS | 80.6 | 224x224 | 10.2M | 1.3G | run.sh | ||
UniFormer-XS | 81.5 | 192x192 | 16.5M | 1.4G | run.sh | ||
UniFormer-XS | 82.0 | 224x224 | 16.5M | 2.0G | run.sh |
For those lightweight models, we train them with longer (600) epochs and weaker data augmentation. Besides, to avoid loss NAN, we do not use mixed precision training.
The followed models and logs can be downloaded on Google Drive: total_models, total_logs.
We also release the models on Baidu Cloud: total_models (bdkq), total_logs (ttub).
Model | Top-1 | #Param. | FLOPs | Model | Log | Shell |
---|---|---|---|---|---|---|
UniFormer-S | 82.9 | 22M | 3.6G | run.sh | ||
UniFormer-S† | 83.4 | 24M | 4.2G | run.sh | ||
UniFormer-B | 83.8 | 50M | 8.3G | - | run.sh | |
UniFormer-B+Layer Scale | 83.9 | 50M | 8.3G | run.sh |
Though Layer Scale is helpful for training deep models, we meet some problems when fine-tuning on video datasets. Hence, we only use the models trained without it for video tasks.
Due to the model UniFormer-S† uses head_dim=32
, which cause much memory cost for downstream tasks. We re-train these models with head_dim=64
. All models are trained with 224x224 resolution.
Model | Top-1 | #Param. | FLOPs | Model | Log | Shell |
---|---|---|---|---|---|---|
UniFormer-S† | 83.4 | 24M | 4.2G | run.sh |
The followed models and logs can be downloaded on Google Drive: total_models, total_logs.
We also release the models on Baidu Cloud: total_models (p05h), total_logs (wsvi).
We follow LV-ViT to train our models with Token Labeling. Please see token_labeling for more details.
Model | Top-1 | #Param. | FLOPs | Model | Log | Shell |
---|---|---|---|---|---|---|
UniFormer-S | 83.4 (+0.5) | 22M | 3.6G | run.sh | ||
UniFormer-S† | 83.9 (+0.5) | 24M | 4.2G | run.sh | ||
UniFormer-B | 85.1 (+1.3) | 50M | 8.3G | run.sh | ||
UniFormer-L+Layer Scale | 85.6 | 100M | 12.6G | run.sh |
Due to the models UniFormer-S/S†/B use head_dim=32
, which cause much memory cost for downstream tasks. We re-train these models with head_dim=64
. All models are trained with 224x224 resolution.
Model | Top-1 | #Param. | FLOPs | Model | Log | Shell |
---|---|---|---|---|---|---|
UniFormer-S | 83.4 (+0.5) | 22M | 3.6G | run.sh | ||
UniFormer-S† | 83.6 (+0.2) | 24M | 4.2G | run.sh | ||
UniFormer-B | 84.8 (+1.0) | 50M | 8.3G | run.sh |
The followed models and logs can be downloaded on Google Drive: total_models, total_logs.
We also release the models on Baidu Cloud: total_models (p05h), total_logs (wsvi).
We fine-tune the above models with Token Labeling on resolution of 384x384. Please see token_labeling for more details.
Model | Top-1 | #Param. | FLOPs | Model | Log | Shell |
---|---|---|---|---|---|---|
UniFormer-S | 84.6 | 22M | 11.9G | run.sh | ||
UniFormer-S† | 84.9 | 24M | 13.7G | run.sh | ||
UniFormer-B | 86.0 | 50M | 27.2G | run.sh | ||
UniFormer-L+Layer Scale | 86.3 | 100M | 39.2G | run.sh |
Our repository is built base on the DeiT repository, but we add some useful features:
- Calculating accurate FLOPs and parameters with fvcore (see check_model.py).
- Auto-resuming.
- Saving best models and backup models.
- Generating training curve (see generate_tensorboard.py).
-
Clone this repo:
git clone https://github.com/Sense-X/UniFormer.git cd UniFormer
-
Install PyTorch 1.7.0+ and torchvision 0.8.1+
conda install -c pytorch pytorch torchvision
-
Install other packages
pip install timm pip install fvcore
Simply run the training scripts in exp as followed:
bash ./exp/uniformer_small/run.sh
If the training was interrupted abnormally, you can simply rerun the script for auto-resuming. Sometimes the checkpoint may not be saved properly, you should set the resumed model via --reusme ${work_path}/ckpt/backup.pth
.
Simply run the evaluating scripts in exp as followed:
bash ./exp/uniformer_small/test.sh
It will evaluate the last model by default. You can set other models via --resume
.
You can generate the training curves as followed:
python3 generate_tensoboard.py
Note that you should install tensorboardX
.
You can calculate the FLOPs and parameters via:
python3 check_model.py
This repository is built using the timm library and the DeiT repository.