Monotonic Alignments Shrink
b_mas(b_attn_map, in_lens, out_lens, width=1)
Applies Monotonic Alignments Shrink (MAS) operation in parallel to the batches of an attention map.
It uses the mas_width1
function internally to perform MAS operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b_attn_map |
ndarray
|
The batched attention map; a 3D array where the first dimension is the batch size, second dimension corresponds to source length, and third dimension corresponds to target length. |
required |
in_lens |
ndarray
|
Lengths of sequences in the input batch. |
required |
out_lens |
ndarray
|
Lengths of sequences in the output batch. |
required |
width |
int
|
The width for the MAS operation. Defaults to 1. |
1
|
Raises:
Type | Description |
---|---|
AssertionError
|
If width is not equal to 1. This function currently supports only width of 1. |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The batched attention map after applying the MAS operation. It has the same dimensions as |
Source code in models/tts/delightful_tts/acoustic_model/mas.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
mas_width1(attn_map)
Applies a Monotonic Alignments Shrink (MAS) operation with a hard-coded width of 1 to an attention map. Mas with hardcoded width=1 Essentially, it produces optimal alignments based on previous attention distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
attn_map |
ndarray
|
The original attention map, a 2D numpy array where rows correspond to mel bins and columns to text bins. |
required |
Returns:
Name | Type | Description |
---|---|---|
opt |
ndarray
|
Returns the optimal attention map after applying the MAS operation. |
Source code in models/tts/delightful_tts/acoustic_model/mas.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|