High Performance Computing in Python

Jun 12, 2017

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Computing Frames

Apache Ray

A fast and simple framework for building and running distributed applications.

Ray does not handle large data well (as of 2018/05/28). Please refer to the …

Free High Performance Computing Resouces

Jul 13, 2023

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

BOINC
ACCESS

Cloud Platforms

Some cloud platform forms offer free trials or credits.

Using GPUs

A Beginners Guide to Basic GPU Application for Integer Calculations https://saturncloud.io/blog/a-beginners-guide-to-basic-gpu-application-for-integer-calculations/#:~:text …

Broadcast Join in Spark

Jun 18, 2020

Tips and Traps¶

BroadcastHashJoin, i.e., map-side join is fast. Use BroadcastHashJoin if possible. Notice that Spark will automatically use BroacastHashJoin if a table in inner join has a size less then the configured BroadcastHashJoin limit.
Notice that BroadcastJoin only works for inner joins. If you have a outer join, BroadcastJoin won't happend even if you explicitly Broadcast a DataFrame.

Conversion Between PySpark DataFrames and pandas DataFrames

Jun 18, 2020

Comments¶

A PySpark DataFrame can be converted to a pandas DataFrame by calling the method DataFrame.toPandas, and a pandas DataFrame can be converted to a PySpark DataFrame by calling SparkSession.createDataFrame. Notice that when you call DataFrame.toPandas to convert a Spark DataFrame to a pandas DataFrame, the whole Spark DataFrame is collected to the driver machine! This means that you should only call the method DataFrame.toPandas

Hands on the Python module numba

May 09, 2020

Installation ¶

numba can be installed using the following command.

:::bash
pip3 install numba

If you need CUDA support, you have to install CUDA drivers.

:::bash
sudo apt-get install cuda-10-1

Instead of going through the hassle of configuring numba for GPU, a better way is to run numba in a Nvidia Docker container. The Docker image nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 has CUDA runtime installed, so it is as easy as installing numba on top it and you are ready to go. For more detailed instructions, please refer to

Parallel Computing in Java

Jun 25, 2012

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The following are a few tips for multithreading parallel computing in Java.

Instance fields, static fields and elements of arrays are stored in heap memory and thus can be shared between …

← Older

Ben Chuanlong Du's Blog

It is never too late to learn.

High Performance Computing in Python

Computing Frames

Apache Ray

Free High Performance Computing Resouces

Cloud Platforms

Using GPUs

Broadcast Join in Spark

Tips and Traps¶

Conversion Between PySpark DataFrames and pandas DataFrames

Comments¶

Hands on the Python module numba

Installation ¶

Parallel Computing in Java