Speed

Some obvious / not so obvious notes on speed

# Python native modules
import os
from copy import deepcopy
# Third party libs
from fastcore.all import *
import numpy as np
# Local modules

Numpy to Tensor Performance

img=np.random.randint(0,255,size=(240, 320, 3))

img=np.random.randint(0,255,size=(240, 320, 3))

1.61 ms ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

deepcopy(img)

240 µs ± 4.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Tensor(img)

79.2 µs ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Tensor([img])

/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:2: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)

135 ms ± 3.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

You will notice that if you wrap a numpy in a list, it completely kills the performance. The solution is to just add a batch dim to the existing array and pass it directly.

Tensor(np.expand_dims(img,0))

85.6 µs ± 4.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In fact we can just test this with python lists…

Tensor([[1]])

6.75 µs ± 95.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

test_arr=[[1]*270000]

Tensor(test_arr)

9.55 ms ± 221 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

test_arr=np.array([[1]*270000])

Tensor(test_arr)

88 µs ± 5.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

This is horrifying just how made of a performance hit this causes… So we will be avoiding python list inputs to Tensors for now on…