Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: Pure Python Code Errors

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

This post collects some typical pure Python errors in PySpark applications.

Symptom 1

object has no attribute

Solution 1

Fix the attribute name.

Symptom 2

No such file or directory

Solution …

Spark SQL

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Spark SQL Guide

  1. Since a Spark DataFrame is immutable, you cannot update or delete records from a physical table (e.g., a Hive table) directly using Spark DataFrame/SQL API. However …

Spark Issue: TypeError WithReplacement

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

TypeError: withReplacement (optional), fraction (required) and seed (optional) should be a bool, float and number; however, got [].

Causes

An integer number (e.g., 1) is passed to the fraction parameter …

Computer Vision Libraries in Rust

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

resize

resize Image resampling library in pure Rust. - Fast, with support for many pixel formats - No encoders/decoders, meant to be used with some external library - Tuned for resizing to the …

Configure Log4J for Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Show Error Messages Only

When you run Spark or PySpark in a Jupyter/Lab notebook, it is recommended that you show ERROR messages only. Otherwise, there might be too much logging information polluting your notebook. You can set the log level of Spark to ERROR using the following line of code.