import numpy as np
import torch
import tensorflow as tf
x = torch.tensor(
[
[1.0, 2, 3, 4, 5],
[6.0, 7, 8, 9, 10],
]
)
x
Tips¶
numpy.pad
andtorch.nn.utils.rnn.pad_sequence
can only increase the length of sequence (nump array, list or tensor) whiletf.keras.preprocessing.sequence.pad_sequence
can both increase and decrease the length of a sequence.numpy.pad
implements many different ways (constant, edge, linear_ramp, maximum, mean, median, minimum, reflect, symmetric, wrap, empty and abitrary padding function) to pad a sequence whiletorch.nn.utils.rnn.pad_sequence
andtf.keras.preprocessing.sequence.pad_sequence
only support padding a constant value (as this is only use case in NLP).You can easily control the final length (after padding) with
numpy.pad
andtf.keras.preprocessing.sequence.pad_sequence
.torch.nn.utils.rnn.pad_sequence
pad each tesor to be have the max length of all tensors. You cannot easily usetorch.nn.utils.rnn.pad_sequeence
to pad sequence to an arbitrary length.Both
numpy.pad
pads a single iterable object (numpy array, list or Tensor),torch.nn.utils.rnn.pad_sequence
pads a sequence of Tensors, andtf.keras.preprocessing.sequence.pad_sequence
pads a sequence of iterable objects (numpy arrays, lists or Tensors).
Overall,
tf.keras.preprocessing.sequence.pad_sequence
is the most useful for NLP.
torch.nn.utisl.rnn.pad_sequence
seems to be quite limited.
numpy.pad
can be used to easily implement customized padding strategy.
a = [1, 2, 3, 4, 5]
np.pad(a, (2, 3), "constant", constant_values=(4, 6))
t = torch.nn.utils.rnn.pad_sequence(
[
torch.tensor([1, 2, 3]),
torch.tensor([1, 2, 3, 4]),
]
)
t
t[0]
torch.nn.utils.rnn.pad_sequence(
[
[1, 2, 3],
[1, 2, 3, 4],
]
)
tf.keras.preprocessing.sequence.pad_sequences(
[[1, 2, 3, 4, 5]],
maxlen=3,
dtype="long",
value=0,
truncating="post",
padding="post",
)
tf.keras.preprocessing.sequence.pad_sequences(
[[1, 2, 3, 4, 5]],
maxlen=9,
dtype="long",
value=0,
truncating="post",
padding="post",
)