Hardware-Agnostic training

Hardware agnostic training (preparation)

More details are here: Hardware agnostic training (preparation)

Delete `.cuda()` or `.to()` calls

# before lightning
def forward(self, x):
    x = x.cuda(0)
    layer_1.cuda(0)
    x_hat = layer_1(x)


# after lightning
def forward(self, x):
    x_hat = layer_1(x)

Example from the code:

def forward(
    self,
    x: torch.Tensor,
    pitches_range: Tuple[float, float],
    speakers: torch.Tensor,
    langs: torch.Tensor,
    p_control: float = 1.0,
    d_control: float = 1.0,
) -> torch.Tensor:
    # Generate masks for padding positions in the source sequences
    src_mask = tools.get_mask_from_lengths(
        torch.tensor([x.shape[1]], dtype=torch.int64),
    ).to(x.device) # Read the device from the input `x.device`

Synchronize validation and test logging

When running in distributed mode, we have to ensure that the validation and test step logging calls are synchronized across processes. This is done by adding sync_dist=True to all self.log calls in the validation and test step. This ensures that each GPU worker has the same behaviour when tracking model checkpoints, which is important for later downstream tasks such as testing the best checkpoint across all workers. The sync_dist option can also be used in logging calls during the step methods, but be aware that this can lead to significant communication overhead and slow down your training.

def validation_step(self, batch, batch_idx):
    x, y = batch
    logits = self(x)
    loss = self.loss(logits, y)
    # Add sync_dist=True to sync logging across all GPU workers (may have performance impact)
    self.log("validation_loss", loss, on_step=True, on_epoch=True, sync_dist=True)


def test_step(self, batch, batch_idx):
    x, y = batch
    logits = self(x)
    loss = self.loss(logits, y)
    # Add sync_dist=True to sync logging across all GPU workers (may have performance impact)
    self.log("test_loss", loss, on_step=True, on_epoch=True, sync_dist=True)

Hardware-Agnostic training