Dan's Brain

From Python to Numpy

Nicholas P. Rougier, From Python To Numpy1, 2017

Introduction

def random_walk_faster(n=1000): from itertools import accumulate

steps = random.choices([-1,+1], k=n) return [0]+list(accumulate(steps))

walk = random_walk_faster(1000)

accumulate([1,2,3,4,5]) --> 1 3 6 10 152

Without using loops and instead vectorizing the problem we get a 85% increase in performance.

>>> from tools import timeit >>> timeit(“random_walk_faster(n=10000)”, globals()) 10 loops, best of 3: 2.21 msec per loop

Translating in numpy we get:

def random_walk_fastest(n=1000):

steps = np.random.choice([-1,+1], n) return np.cumsum(steps)

walk = random_walk_fastest(1000)

>>> from tools import timeit >>> timeit(“random_walk_fastest(n=10000)”, globals()) 1000 loops, best of 3: 14 usec per loop

Readability vs Speed

The tradeoff for the massive speedups using numpy is often the readabily of the code: comment your code!

  • future-self will thank you

Anatomy of an array

Code vectorization

Problem vectorization

Custom vectorization

Beyond Numpy


  1. https://www.labri.fr/perso/nrougier/from-python-to-numpy/ ↩︎

  2. https://docs.python.org/3.6/library/itertools.html?highlight=accumulate#itertools.accumulate ↩︎