In [36]:

import numpy as np
import torch
import tensorflow as tf

In [23]:

x = torch.tensor(
    [
        [1.0, 2, 3, 4, 5],
        [6.0, 7, 8, 9, 10],
    ]
)
x

Out[23]:

tensor([[ 1.,  2.,  3.,  4.,  5.],
        [ 6.,  7.,  8.,  9., 10.]])

Tips¶

numpy.pad and torch.nn.utils.rnn.pad_sequence can only increase the length of sequence (nump array, list or tensor) while tf.keras.preprocessing.sequence.pad_sequence can both increase and decrease the length of a sequence.
numpy.pad implements many different ways (constant, edge, linear_ramp, maximum, mean, median, minimum, reflect, symmetric, wrap, empty and abitrary padding function) to pad a sequence while torch.nn.utils.rnn.pad_sequence and tf.keras.preprocessing.sequence.pad_sequence only support padding a constant value (as this is only use case in NLP).
You can easily control the final length (after padding) with numpy.pad and tf.keras.preprocessing.sequence.pad_sequence. torch.nn.utils.rnn.pad_sequence pad each tesor to be have the max length of all tensors. You cannot easily use torch.nn.utils.rnn.pad_sequeence to pad sequence to an arbitrary length.
Both numpy.pad pads a single iterable object (numpy array, list or Tensor), torch.nn.utils.rnn.pad_sequence pads a sequence of Tensors, and tf.keras.preprocessing.sequence.pad_sequence pads a sequence of iterable objects (numpy arrays, lists or Tensors).

Overall, tf.keras.preprocessing.sequence.pad_sequence is the most useful for NLP. torch.nn.utisl.rnn.pad_sequence seems to be quite limited. numpy.pad can be used to easily implement customized padding strategy.

numpy.pad ¶

In [24]:

a = [1, 2, 3, 4, 5]
np.pad(a, (2, 3), "constant", constant_values=(4, 6))

Out[24]:

array([4, 4, 1, 2, 3, 4, 5, 6, 6, 6])

torch.nn.utils.rnn.pad_sequence ¶

In [27]:

t = torch.nn.utils.rnn.pad_sequence(
    [
        torch.tensor([1, 2, 3]),
        torch.tensor([1, 2, 3, 4]),
    ]
)
t

Out[27]:

tensor([[1, 1],
        [2, 2],
        [3, 3],
        [0, 4]])

In [28]:

t[0]

Out[28]:

tensor([1, 1])

In [25]:

torch.nn.utils.rnn.pad_sequence(
    [
        [1, 2, 3],
        [1, 2, 3, 4],
    ]
)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-25-242698656610> in <module>
      2     [
      3         [1, 2, 3],
----> 4         [1, 2, 3, 4],
      5     ]
      6 )

~/.local/lib/python3.7/site-packages/torch/nn/utils/rnn.py in pad_sequence(sequences, batch_first, padding_value)
    325     # assuming trailing dimensions and type of all the Tensors
    326     # in sequences are same and fetching those from sequences[0]
--> 327     max_size = sequences[0].size()
    328     trailing_dims = max_size[1:]
    329     max_len = max([s.size(0) for s in sequences])

AttributeError: 'list' object has no attribute 'size'

tf.keras.preprocessing.sequence.pad_sequences ¶

In [34]:

tf.keras.preprocessing.sequence.pad_sequences(
    [[1, 2, 3, 4, 5]],
    maxlen=3,
    dtype="long",
    value=0,
    truncating="post",
    padding="post",
)

Out[34]:

array([[1, 2, 3]])

In [35]:

tf.keras.preprocessing.sequence.pad_sequences(
    [[1, 2, 3, 4, 5]],
    maxlen=9,
    dtype="long",
    value=0,
    truncating="post",
    padding="post",
)

Out[35]:

array([[1, 2, 3, 4, 5, 0, 0, 0, 0]])

Reference¶

https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences

https://docs.scipy.org/doc/numpy/reference/generated/numpy.pad.html

https://pytorch.org/docs/stable/nn.html#torch.nn.utils.rnn.pad_sequence

In [ ]:

Ben Chuanlong Du's Blog

It is never too late to learn.

Pad a Sequence in Python

Tips¶

numpy.pad ¶

torch.nn.utils.rnn.pad_sequence ¶

tf.keras.preprocessing.sequence.pad_sequences ¶

Reference¶

Comments

Tips¶

numpy.pad¶

torch.nn.utils.rnn.pad_sequence¶

tf.keras.preprocessing.sequence.pad_sequences¶

Reference¶

Comments

numpy.pad ¶

torch.nn.utils.rnn.pad_sequence ¶

tf.keras.preprocessing.sequence.pad_sequences ¶