Phoneme Level Prosody Encoder
PhonemeLevelProsodyEncoder
Bases: Module
Phoneme Level Prosody Encoder Module
This Class is used to encode the phoneme level prosody in the speech synthesis pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preprocess_config |
PreprocessingConfig
|
Configuration for preprocessing. |
required |
model_config |
AcousticModelConfigType
|
Acoustic model configuration. |
required |
Returns:
Type | Description |
---|---|
torch.Tensor: The encoded tensor after applying masked fill. |
Source code in models/tts/delightful_tts/reference_encoder/phoneme_level_prosody_encoder.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|
forward(x, src_mask, mels, mel_lens, encoding)
The forward pass of the PhonemeLevelProsodyEncoder. Input tensors are passed through the reference encoder, attention mechanism, and a bottleneck.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor
|
Input tensor of shape [N, seq_len, encoder_embedding_dim]. |
required |
src_mask |
Tensor
|
The mask tensor which contains |
required |
mels |
Tensor
|
The mel-spectrogram with shape [N, Ty/r, n_mels*r], where r=1. |
required |
mel_lens |
Tensor
|
The lengths of each sequence in mels. |
required |
encoding |
Tensor
|
The relative positional encoding tensor. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
torch.Tensor: Output tensor of shape [N, seq_len, bottleneck_size]. |
Source code in models/tts/delightful_tts/reference_encoder/phoneme_level_prosody_encoder.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|