Speed

Some obvious / not so obvious notes on speed
# Python native modules
import os
from copy import deepcopy
# Third party libs
from fastcore.all import *
import numpy as np
# Local modules

Numpy to Tensor Performance

img=np.random.randint(0,255,size=(240, 320, 3))
img=np.random.randint(0,255,size=(240, 320, 3))
1.61 ms ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
deepcopy(img)
240 µs ± 4.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Tensor(img)
79.2 µs ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Tensor([img])
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:2: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
  
135 ms ± 3.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

You will notice that if you wrap a numpy in a list, it completely kills the performance. The solution is to just add a batch dim to the existing array and pass it directly.

Tensor(np.expand_dims(img,0))
85.6 µs ± 4.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In fact we can just test this with python lists…

Tensor([[1]])
6.75 µs ± 95.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
test_arr=[[1]*270000]
Tensor(test_arr)
9.55 ms ± 221 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
test_arr=np.array([[1]*270000])
Tensor(test_arr)
88 µs ± 5.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

This is horrifying just how made of a performance hit this causes… So we will be avoiding python list inputs to Tensors for now on…