Normalize Text
NormalizeText
NVIDIA NeMo is a conversational AI toolkit built for researchers working on automatic speech recognition (ASR), text-to-speech synthesis (TTS), large language models (LLMs), and natural language processing (NLP). The primary objective of NeMo is to help researchers from industry and academia to reuse prior work (code and pretrained models) and make it easier to create new conversational AI models.
This class normalize the characters in the input text and normalize the input text with the nemo_text_processing
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lang |
str
|
The language code to use for normalization. Defaults to "en". |
'en'
|
Attributes:
Name | Type | Description |
---|---|---|
lang |
str
|
The language code to use for normalization. Defaults to "en". |
model |
Normalizer
|
The |
Methods:
Name | Description |
---|---|
byte_encode |
str) -> list: Encode a word as a list of bytes. |
normalize_chars |
str) -> str: Normalize the characters in the input text. |
__call__ |
str) -> str: Normalize the input text with the |
Examples:
>>> from training.preprocess.normilize_text import NormalizeText
>>> normilize_text = NormalizeText()
>>> normilize_text("It’s a beautiful day…")
"It's a beautiful day."
Source code in training/preprocess/normalize_text.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
__call__(text)
Normalize the input text with the nemo_text_processing
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
The input text to normalize. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The normalized text. |
Source code in training/preprocess/normalize_text.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
__init__(lang='en')
Initialize a new instance of the NormalizeText class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lang |
str
|
The language code to use for normalization. Defaults to "en". |
'en'
|
Source code in training/preprocess/normalize_text.py
31 32 33 34 35 36 37 38 39 40 |
|
byte_encode(word)
Encode a word as a list of bytes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
word |
str
|
The word to encode. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
list
|
A list of bytes representing the encoded word. |
Source code in training/preprocess/normalize_text.py
42 43 44 45 46 47 48 49 50 51 52 53 |
|
normalize_chars(text)
Normalize the characters in the input text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
The input text to normalize. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The normalized text. |
Examples:
>>> normalize_chars("It’s a beautiful day…")
"It's a beautiful day."
Source code in training/preprocess/normalize_text.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|