import numpy as np
import torch
import tensorflow as tf
x = torch.tensor(
[
[1.0, 2, 3, 4, 5],
[6.0, 7, 8, 9, 10],
]
)
x
Tips¶
numpy.padandtorch.nn.utils.rnn.pad_sequencecan only increase the length of sequence (nump array, list or tensor) whiletf.keras.preprocessing.sequence.pad_sequencecan both increase and decrease the length of a sequence.numpy.padimplements many different ways (constant, edge, linear_ramp, maximum, mean, median, minimum, reflect, symmetric, wrap, empty and abitrary padding function) to pad a sequence whiletorch.nn.utils.rnn.pad_sequenceandtf.keras.preprocessing.sequence.pad_sequenceonly support padding a constant value (as this is only use case in NLP).You can easily control the final length (after padding) with
numpy.padandtf.keras.preprocessing.sequence.pad_sequence.torch.nn.utils.rnn.pad_sequencepad each tesor to be have the max length of all tensors. You cannot easily usetorch.nn.utils.rnn.pad_sequeenceto pad sequence to an arbitrary length.Both
numpy.padpads a single iterable object (numpy array, list or Tensor),torch.nn.utils.rnn.pad_sequencepads a sequence of Tensors, andtf.keras.preprocessing.sequence.pad_sequencepads a sequence of iterable objects (numpy arrays, lists or Tensors).
Overall,
tf.keras.preprocessing.sequence.pad_sequence is the most useful for NLP.
torch.nn.utisl.rnn.pad_sequence seems to be quite limited.
numpy.pad can be used to easily implement customized padding strategy.
a = [1, 2, 3, 4, 5]
np.pad(a, (2, 3), "constant", constant_values=(4, 6))
t = torch.nn.utils.rnn.pad_sequence(
[
torch.tensor([1, 2, 3]),
torch.tensor([1, 2, 3, 4]),
]
)
t
t[0]
torch.nn.utils.rnn.pad_sequence(
[
[1, 2, 3],
[1, 2, 3, 4],
]
)
tf.keras.preprocessing.sequence.pad_sequences(
[[1, 2, 3, 4, 5]],
maxlen=3,
dtype="long",
value=0,
truncating="post",
padding="post",
)
tf.keras.preprocessing.sequence.pad_sequences(
[[1, 2, 3, 4, 5]],
maxlen=9,
dtype="long",
value=0,
truncating="post",
padding="post",
)