# Python native modules
import os
from copy import deepcopy
# Third party libs
from fastcore.all import *
import numpy as np
# Local modulesSpeed
Some obvious / not so obvious notes on speed
Numpy to Tensor Performance
img=np.random.randint(0,255,size=(240, 320, 3))img=np.random.randint(0,255,size=(240, 320, 3))1.61 ms ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
deepcopy(img)240 µs ± 4.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Tensor(img)79.2 µs ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Tensor([img])/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:2: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
135 ms ± 3.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
You will notice that if you wrap a numpy in a list, it completely kills the performance. The solution is to just add a batch dim to the existing array and pass it directly.
Tensor(np.expand_dims(img,0))85.6 µs ± 4.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In fact we can just test this with python lists…
Tensor([[1]])6.75 µs ± 95.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
test_arr=[[1]*270000]Tensor(test_arr)9.55 ms ± 221 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
test_arr=np.array([[1]*270000])Tensor(test_arr)88 µs ± 5.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
This is horrifying just how made of a performance hit this causes… So we will be avoiding python list inputs to Tensors for now on…