Numba: A High Performance Python Compiler
@njit( parallel=True) def simulator(out): # iterate loop in parallel for i in prange(out.shape[0]): out[i] = run_sim() Numba can automatically execute NumPy array expressions on multiple CPU cores and makes it easy to write parallel loops. LBB0_8: vmovups (%rax,%rdx,4), %ymm0 vmovups (%rcx,%rdx,4), %ymm1 vsubps %ymm1, %ymm0, %ymm2 vaddps %ymm2, %ymm2, %ymm2 Numba can automatically translate some loops into vector instructions for 2-4x speed improvements.
https://numba.pydata.org/