The Sheer Joy of Accelerating Your Existing Python Code with Numba! - Part I

What is Numba?

a JIT (Just-in-Time) compiler for Python that:

  • generates optimized machine code using LLVM (Low Level Virtual Machine) compiler infrastructure
  • provides toolbox for different targets and execution models:
    • Single-threaded CPU, multi-threaded CPU, GPU
    • regular functions, "universal functions (ufuncs)" (array functions), etc
  • integrates well with the Scientific Python stack
  • with a few annotations, array-oriented and math-heavy Python code provides:
    • speedup: 2x (compared to basic NumPy code) to 200x (compared to pure Python)
    • performance similar to C, C++, Fortran, without having to switch languages or Python interpreters
  • is totally awesome!

TVM: End-to-End Optimization Stack for Deep Learning

(Image Source: http://tvmlang.org/)

Abstract

  • Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch are optimized for a narrow range of serve-class GPUs.
  • Deploying workloads to other platforms such as mobile phones, IoT, and specialized accelarators(FPGAs, ASICs) requires laborious manual effort.
  • TVM is an end-to-end optimization stack that exposes:

    • graph-level
    • operator-level optimizations

    ---> to provide performance portability to deep learning workloads across diverse hardware back-ends.

Kube In Action - 02: Containers

A process running in a container runs inside the host's operating system, like all other processes. But the process in the container is still isolated from other processes. To the process itself, it looks like it is only one running on the machine and in its operating system.

Kube In Action - 01: Introduction to Kubernetes

I've recently picked up an interest in containerization. I started reading up on and playing with kubernetes.

I have been taking notes as I go through Kubernetes in Action by Marko Lukša and I wanted to share these with those who might have similar interests in containerization and distributed systems in general. This is the 1st installment of a series called Kube in Action. Every week or so, I’ll be summarizing and exploring kubernetes fundamentals + concepts with hands-on examples as I learn more about Kubernetes.

Play interactively with C++ - Getting Started with Xeus-Cling

xeus-cling

This is the 1st installment of a new series called Play interactively with C++. Every week or so, I’ll be summarizing and exploring Standard C++ Programming in Jupyter notebook using xeus-cling.

The source code (in notebook format) for this series can be found here.

xeus-cling is a Jupyter kernel for C++ based on the C++ interpreter cling and the native implementation of the Jupyter protocol xeus.

Why Parallel Computing?

For many years we’ve enjoyed the fruits of ever faster processors. However, because of physical limitations the rate of performance improvement in conventional processors is decreasing. In order to increase the power of processors, chipmakers have turned to multicore integrated circuits, that is, integrated circuits with multiple conventional processors on a single chip.

(Source: Exponential growth of supercomputing power as recorded by the TOP500 list)