PyTorch Torchvision Models

Model Acc@1 Acc@5
AlexNet 56.522 79.066
VGG-11 69.020 88.628
VGG-13 69.928 89.246
VGG-16 71.592 90.382
VGG-19 72.376 90.876
VGG-11 with batch normalization 70.370 89.810
VGG-13 with batch normalization 71.586 90.374
VGG-16 with batch normalization 73.360 91.516
VGG-19 with batch normalization 74.218 91.842
ResNet-18 69.758 89.078
ResNet-34 73.314 91.420
ResNet-50 76.130 92.862
ResNet-101 77.374 93.546
ResNet-152 78.312 94.046
SqueezeNet 1.0 58.092 80.420
SqueezeNet 1.1 58.178 80.624
Densenet-121 74.434 91.972
Densenet-169 75.600 92.806
Densenet-201 76.896 93.370
Densenet-161 77.138 93.560
Inception v3 77.294 93.450
GoogleNet 69.778 89.530
ShuffleNet V2 x1.0 69.362 88.316
ShuffleNet V2 x0.5 60.552 81.746
MobileNet V2 71.878 90.286
MobileNet V3 Large 74.042 91.340
MobileNet V3 Small 67.668 87.402
ResNeXt-50-32x4d 77.618 93.698
ResNeXt-101-32x8d 79.312 94.526
Wide ResNet-50-2 78.468 94.086
Wide ResNet-101-2 78.848 94.284
MNASNet 1.0 73.456 91.510
MNASNet 0.5 67.734 87.490
EfficientNet-B0 77.692 93.532
EfficientNet-B1 78.642 94.186
EfficientNet-B2 80.608 95.310
EfficientNet-B3 82.008 96.054
EfficientNet-B4 83.384 96.594
EfficientNet-B5 83.444 96.628
EfficientNet-B6 84.008 96.916
EfficientNet-B7 84.122 96.908
regnet_x_400mf 72.834 90.950
regnet_x_800mf 75.212 92.348
regnet_x_1_6gf 77.040 93.440
regnet_x_3_2gf 78.364 93.992
regnet_x_8gf 79.344 94.686
regnet_x_16gf 80.058 94.944
regnet_x_32gf 80.622 95.248
regnet_y_400mf 74.046 91.716
regnet_y_800mf 76.420 93.136
regnet_y_1_6gf 77.950 93.966
regnet_y_3_2gf 78.948 94.576
regnet_y_8gf 80.032 95.048
regnet_y_16gf 80.424 95.240
regnet_y_32gf 80.878 95.340

Alexnet

https://arxiv.org/abs/1404.5997

  • torchvision.models.alexnet

VGG

https://arxiv.org/abs/1409.1556

  • torchvision.models.vgg11

  • torchvision.models.vgg11_bn

  • torchvision.models.vgg13

  • torchvision.models.vgg13_bn

  • torchvision.models.vgg16

  • torchvision.models.vgg16_bn

  • torchvision.models.vgg19

  • torchvision.models.vgg19_bn


ResNet

https://arxiv.org/abs/1512.03385

  • torchvision.models.resnet18
  • torchvision.models.resnet34
  • torchvision.models.resnet50
  • torchvision.models.resnet101
  • torchvision.models.resnet152

SqueezeNet

https://arxiv.org/abs/1602.07360

  • torchvision.models.squeezenet1_0
  • torchvision.models.squeezenet1_1

DenseNet

https://arxiv.org/abs/1608.06993

  • torchvision.models.densenet121
  • torchvision.models.densenet169
  • torchvision.models.densenet161
  • torchvision.models.densenet201

Inception v3

https://arxiv.org/abs/1512.00567

  • torchvision.models.inception_v3
    requires scipy to be installed

GoogLeNet

https://arxiv.org/abs/1409.4842

  • torchvision.models.googlenet
    requires scipy to be installed

ShuffleNet v2

https://arxiv.org/abs/1807.11164

  • torchvision.models.shufflenet_v2_x0_5
  • torchvision.models.shufflenet_v2_x1_0
  • torchvision.models.shufflenet_v2_x1_5
  • torchvision.models.shufflenet_v2_x2_0

MobileNet v2

https://arxiv.org/abs/1801.04381

  • torchvision.models.mobilenet_v2

MobileNet v3

https://arxiv.org/abs/1905.02244

  • torchvision.models.mobilenet_v3_large
  • torchvision.models.mobilenet_v3_small

ResNext

https://arxiv.org/abs/1611.05431

  • torchvision.models.resnext50_32x4d
  • torchvision.models.resnext101_32x8d

Wide ResNet

  • torchvision.models.wide_resnet50_2
  • torchvision.models.wide_resnet101_2

MNASNet

https://arxiv.org/abs/1807.11626

  • torchvision.models.mnasnet0_5
  • torchvision.models.mnasnet0_75
  • torchvision.models.mnasnet1_0
  • torchvision.models.mnasnet1_3

EfficientNet

https://arxiv.org/abs/1905.11946

  • torchvision.models.efficientnet_b0
  • torchvision.models.efficientnet_b1
  • torchvision.models.efficientnet_b2
  • torchvision.models.efficientnet_b3
  • torchvision.models.efficientnet_b4
  • torchvision.models.efficientnet_b5
  • torchvision.models.efficientnet_b6
  • torchvision.models.efficientnet_b7

RegNet

https://arxiv.org/abs/2003.13678

  • torchvision.models.regnet_y_400mf
  • torchvision.models.regnet_y_800mf
  • torchvision.models.regnet_y_1_6gf
  • torchvision.models.regnet_y_3_2gf
  • torchvision.models.regnet_y_8gf
  • torchvision.models.regnet_y_16gf
  • torchvision.models.regnet_y_32gf
  • torchvision.models.regnet_x_400mf
  • torchvision.models.regnet_x_800mf
  • torchvision.models.regnet_x_1_6gf
  • torchvision.models.regnet_x_3_2gf
  • torchvision.models.regnet_x_8gf
  • torchvision.models.regnet_x_16gf

WSL Ubuntu 20.04 に Caffe をインストール

http://caffe.berkeleyvision.org/

システムパッケージ更新

1
2
3
4
sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get autoremove --purge -y
reboot

必須パッケージインストール

1
2
3
4
5
6
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev
sudo apt-get install libopencv-dev libboost-all-dev libhdf5-serial-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev
sudo apt-get install protobuf-compiler
sudo apt-get install libatlas-base-dev libopenblas-dev

OpenCV3インストール

https://docs.opencv.org/3.4.16/d7/d9f/tutorial_linux_install.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#必須パッケージ
sudo apt-get install cmake libgtk2.0-dev pkg-config
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev
sudo apt-get install python-dev python-numpy libtbb2 libtbb-dev
sudo apt-get install libjpeg-dev libpng-dev libtiff-dev libdc1394-22-dev

#ダウンロード
cd /tmp
wget https://github.com/opencv/opencv/archive/3.4.16.zip

#解凍
unzip 3.4.16.zip
cd opencv-3.4.16

#ビルドインストール
mkdir build && cd build
cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local ..
make -j8 && sudo make install

Caffeインストール

1
2
3
4
5
6
7
8
9
10
11
12
cd ~

#ダウンロード
wget https://github.com/BVLC/caffe/archive/refs/tags/1.0.zip

#解凍
unzip 1.0.zip
cd caffe-1.0

#Makefile修正
mv Makefile.config.example Makefile.config
nano Makefile.config

OpenCVバージョン変更

1
2
# Ver 2 -> 3
OPENCV_VERSION := 3

CUDA 11.4以降の場合3.5未満削除

1
2
3
4
5
6
CUDA_ARCH := -gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_50,code=sm_50 \
-gencode arch=compute_52,code=sm_52 \
-gencode arch=compute_60,code=sm_60 \
-gencode arch=compute_61,code=sm_61 \
-gencode arch=compute_61,code=compute_61

ATLASはマルチスレッドCPUを十分にサポートしていないため、行列演算ライブラリをOpenBLASに変更

1
BLAS := open

hdf5のヘッダー・ライブラリパス追加

1
2
3
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib \
/usr/lib/x86_64-linux-gnu/hdf5/serial

ビルド、テスト、インストール

1
make all -j8 && make -j8 && make runtest -j8

AMD Radeon RX 6XXX (RDNA 2)

AMD Radeon RX 6600 AMD Radeon RX 6600 XT AMD Radeon RX 6700 AMD Radeon RX 6700 XT AMD Radeon RX 6800 AMD Radeon RX 6800 XT AMD Radeon RX 6900 XT AMD Radeon RX 6900 XT Liquid Cooled
GPU Navi 23 (XL?) Navi 23 (XT?) Navi 22 (XL?) Navi 22 (XT?) Navi 21 XL Navi 21 XT Navi 21 XTX Navi 21 XTXH
Process Node 7nm 7nm 7nm 7nm 7nm 7nm 7nm 7nm
Die Size 237mm2 237mm2 336mm2 336mm2 520mm2 520mm2 520mm2 520mm2
Transistors 11.06 Billion 11.06 Billion 17.2 Billion 17.2 Billion 26.8 Billion 26.8 Billion 26.8 Billion 26.8 Billion
Compute Units 28 32 36 40 60 72 80 80
Stream Processors 1792 2048 2304 2560 3840 4608 5120 5120
TMUs/ROPs TBA TBA TBA 160/64 240 / 96 288 / 128 320 / 128 320 / 128
Game Clock TBA TBA TBA 2424 MHz 1815 MHz 2015 MHz 2015 MHz 2250 MHz
Boost Clock TBA TBA TBA 2581 MHz 2105 MHz 2250 MHz 2250 MHz 2345 MHz
FP32 TFLOPs TBA TBA TBA 13.21 TFLOPs 16.17 TFLOPs 20.74 TFLOPs 23.04 TFLOPs 24.01 TFLOPs
Memory Size 8 GB GDDR6 + 32 MB Infinity Cache? 8 GB GDDR6 + 32 MB Infinity Cache? 12 GB GDDR6 + 96 MB Infinity Cache? 12 GB GDDR6 + 96 MB Infinity Cache 16 GB GDDR6 +128 MB Infinity Cache 16 GB GDDR6 +128 MB Infinity Cache 16 GB GDDR6 +128 MB Infinity Cache 16 GB GDDR6 +128 MB Infinity Cache
Memory Bus 128-bit? 192-bit 192-bit 192-bit 256-bit 256-bit 256-bit 256-bit
Memory Clock 16 Gbps? 16 Gbps? 16 Gbps? 16 Gbps 16 Gbps 16 Gbps 16 Gbps 18 Gbps
Bandwidth 256 GB/s? 256 GB/s? 384 GB/s 384 GB/s 512 GB/s 512 GB/s 512 GB/s 576 GB/s
TDP TBA TBA TBA 230W 250W 300W 300W 330W
Price TBA TBA TBA $479 US $579 US $649 US $999 US ~$1199 US

NVIDIA Tesla

Tesla K40 (PCI-Express) Tesla M40 (PCI-Express) Tesla P100 (PCI-Express) Tesla P100 (SXM2) Tesla V100 (SXM2) Tesla V100S (PCIe) NVIDIA A100 (SXM4) NVIDIA A100 (PCIe4)
GPU GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal) GP100 (Pascal) GV100 (Volta) GV100 (Volta) GA100 (Ampere) GA100 (Ampere)
Process Node 28nm 28nm 16nm 16nm 12nm 12nm 7nm 7nm
Transistors 7.1 Billion 8 Billion 15.3 Billion 15.3 Billion 21.1 Billion 21.1 Billion 54.2 Billion 54.2 Billion
GPU Die Size 551 mm2 601 mm2 610 mm2 610 mm2 815mm2 815mm2 826mm2 826mm2
SMs 15 24 56 56 80 80 108 108
TPCs 15 24 28 28 40 40 54 54
FP32 CUDA Cores Per SM 192 128 64 64 64 64 64 64
FP64 CUDA Cores / SM 64 4 32 32 32 32 32 32
FP32 CUDA Cores 2880 3072 3584 3584 5120 5120 6912 6912
FP64 CUDA Cores 960 96 1792 1792 2560 2560 3456 3456
Tensor Cores N/A N/A N/A N/A 640 640 432 432
Texture Units 240 192 224 224 320 320 432 432
Boost Clock 875 MHz 1114 MHz 1329MHz 1480 MHz 1530 MHz 1601 MHz 1410 MHz 1410 MHz
TOPs (DNN/AI) N/A N/A N/A N/A 125 TOPs 130 TOPs 1248 TOPs 2496 TOPs with Sparsity 1248 TOPs 2496 TOPs with Sparsity
FP16 Compute N/A N/A 18.7 TFLOPs 21.2 TFLOPs 30.4 TFLOPs 32.8 TFLOPs 312 TFLOPs 624 TFLOPs with Sparsity 312 TFLOPs 624 TFLOPs with Sparsity
FP32 Compute 5.04 TFLOPs 6.8 TFLOPs 10.0 TFLOPs 10.6 TFLOPs 15.7 TFLOPs 16.4 TFLOPs 156 TFLOPs (19.5 TFLOPs standard) 156 TFLOPs (19.5 TFLOPs standard)
FP64 Compute 1.68 TFLOPs 0.2 TFLOPs 4.7 TFLOPs 5.30 TFLOPs 7.80 TFLOPs 8.2 TFLOPs 19.5 TFLOPs (9.7 TFLOPs standard) 19.5 TFLOPs (9.7 TFLOPs standard)
Memory Interface 384-bit GDDR5 384-bit GDDR5 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2 6144-bit HBM2e 6144-bit HBM2e
Memory Size 12 GB GDDR5 @ 288 GB/s 24 GB GDDR5 @ 288 GB/s 16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s 16 GB HBM2 @ 732 GB/s 16 GB HBM2 @ 900 GB/s 16 GB HBM2 @ 1134 GB/s Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 1.6 TB/s Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 2.0 TB/s
L2 Cache Size 1536 KB 3072 KB 4096 KB 4096 KB 6144 KB 6144 KB 40960 KB 40960 KB
TDP 235W 250W 250W 300W 300W 250W 400W 250W