Skip to content

Variance Predictor

VariancePredictor

Bases: Module

Duration and Pitch predictor neural network module in PyTorch.

It consists of multiple layers, including ConvTransposed layers (custom convolution transpose layers from the model.conv_blocks module), LeakyReLU activation functions, Layer Normalization and Dropout layers.

Constructor for VariancePredictor class.

Parameters:

Name Type Description Default
channels_in int

Number of input channels.

required
channels int

Number of output channels for ConvTransposed layers and input channels for linear layer.

required
channels_out int

Number of output channels for linear layer.

required
kernel_size int

Size of the kernel for ConvTransposed layers.

required
p_dropout float

Probability of dropout.

required

Returns:

Type Description

torch.Tensor: Output tensor.

Source code in models/tts/delightful_tts/acoustic_model/variance_predictor.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
class VariancePredictor(Module):
    r"""Duration and Pitch predictor neural network module in PyTorch.

    It consists of multiple layers, including `ConvTransposed` layers (custom convolution transpose layers from
    the `model.conv_blocks` module), LeakyReLU activation functions, Layer Normalization and Dropout layers.

    Constructor for `VariancePredictor` class.

    Args:
        channels_in (int): Number of input channels.
        channels (int): Number of output channels for ConvTransposed layers and input channels for linear layer.
        channels_out (int): Number of output channels for linear layer.
        kernel_size (int): Size of the kernel for ConvTransposed layers.
        p_dropout (float): Probability of dropout.

    Returns:
        torch.Tensor: Output tensor.
    """

    def __init__(
        self,
        channels_in: int,
        channels: int,
        channels_out: int,
        kernel_size: int,
        p_dropout: float,
        leaky_relu_slope: float = LEAKY_RELU_SLOPE,
    ):
        super().__init__()

        self.layers = nn.ModuleList(
            [
                # Convolution transpose layer followed by LeakyReLU, LayerNorm and Dropout
                ConvTransposed(
                    channels_in,
                    channels,
                    kernel_size=kernel_size,
                    padding=(kernel_size - 1) // 2,
                ),
                nn.LeakyReLU(leaky_relu_slope),
                nn.LayerNorm(
                    channels,
                ),
                nn.Dropout(p_dropout),
                # Another "block" of ConvTransposed, LeakyReLU, LayerNorm, and Dropout
                ConvTransposed(
                    channels,
                    channels,
                    kernel_size=kernel_size,
                    padding=(kernel_size - 1) // 2,
                ),
                nn.LeakyReLU(leaky_relu_slope),
                nn.LayerNorm(
                    channels,
                ),
                nn.Dropout(p_dropout),
            ],
        )

        # Output linear layer
        self.linear_layer = nn.Linear(
            channels,
            channels_out,
        )

    def forward(self, x: torch.Tensor, mask: torch.Tensor) -> torch.Tensor:
        r"""Forward pass for `VariancePredictor`.

        Args:
            x (torch.Tensor): Input tensor.
            mask (torch.Tensor): Mask tensor, has the same size as x.

        Returns:
            torch.Tensor: Output tensor.
        """
        # Sequentially pass the input through all defined layers
        # (ConvTransposed -> LeakyReLU -> LayerNorm -> Dropout -> ConvTransposed -> LeakyReLU -> LayerNorm -> Dropout)
        for layer in self.layers:
            x = layer(x)
        x = self.linear_layer(x)
        x = x.squeeze(-1)
        return x.masked_fill(mask, 0.0)

forward(x, mask)

Forward pass for VariancePredictor.

Parameters:

Name Type Description Default
x Tensor

Input tensor.

required
mask Tensor

Mask tensor, has the same size as x.

required

Returns:

Type Description
Tensor

torch.Tensor: Output tensor.

Source code in models/tts/delightful_tts/acoustic_model/variance_predictor.py
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
def forward(self, x: torch.Tensor, mask: torch.Tensor) -> torch.Tensor:
    r"""Forward pass for `VariancePredictor`.

    Args:
        x (torch.Tensor): Input tensor.
        mask (torch.Tensor): Mask tensor, has the same size as x.

    Returns:
        torch.Tensor: Output tensor.
    """
    # Sequentially pass the input through all defined layers
    # (ConvTransposed -> LeakyReLU -> LayerNorm -> Dropout -> ConvTransposed -> LeakyReLU -> LayerNorm -> Dropout)
    for layer in self.layers:
        x = layer(x)
    x = self.linear_layer(x)
    x = x.squeeze(-1)
    return x.masked_fill(mask, 0.0)