Ben Chuanlong Du's Blog

It is never too late to learn.

UDF in Spark

Comments

Use the higher-level standard Column-based functions with Dataset operators whenever possible before reverting to using your own custom UDF functions since UDFs are a blackbox for Spark and so it does not even try to optimize them.

Read Text File into a pandas DataFrame

Advanced Options

  1. The argument sep supports regular expression! For example,

     :::python
     df = pd.read_csv(file, sep+" +")

nrows: control the number of rows to read skiprows, skip blank lines (the default behavior)

namesarray-like, optional List of column names to use. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed.

Send Emails in Python

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Use Standard Libraries smtplib and email

Below is a function for sending email leveraging standard libraries smtplib and email.

import smtplib
from email.mime.text import MIMEText


def send_email(recipient: Union …