caoscrawler.transformers.string_transformers module#
Transformer functions for string manipulation.
See https://docs.linkahead.org for more information.
- caoscrawler.transformers.string_transformers.lower(in_value: str, params: dict)#
Returns the given text in lowercase.
- Parameters:
in_value (str) – Text to be modified.
params (dict) – No parameters are expected.
- Returns:
result – The input value in lowercase.
- Return type:
str
- caoscrawler.transformers.string_transformers.regex_replace(in_value: str, params: dict)#
Replace the leftmost matches of params.pattern in the given string. Returns re.sub(params.pattern, params.replace_with, in_value, params.count).
- Parameters:
in_value (str) – Text to be modified.
params (dict) –
- “pattern”:
The regex pattern to replace. Alternative keys: “match”, “old”, “replace”, “remove”
- ”repl”: optional, default is “”.
The new string to insert. Alternative keys: “new”, “insert”, “replace_with”
- ”count”: optional
Limit the number of occurrences to replace. Alternative keys: “max_replace”, “maxreplace”, “max”
- Returns:
result – The input value with the leftmost pattern matches replaced by params.replace_with.
- Return type:
str
- caoscrawler.transformers.string_transformers.removeprefix(in_value: str, params: dict)#
Remove a prefix from the given text. If the text does not have the specified prefix, does nothing.
- Parameters:
in_value (str) – Text to be modified.
params (dict) –
- “prefix”:
The prefix to remove. Alternative keys: “remove”
- Returns:
result – The input value without prefix.
- Return type:
str
- caoscrawler.transformers.string_transformers.removesuffix(in_value: str, params: dict)#
Remove a suffix from the given text. If the text does not have the specified suffix, does nothing.
- Parameters:
in_value (str) – Text to be modified.
params (dict) –
- “suffix”:
The suffix to remove. Alternative keys: “remove”
- Returns:
result – The input value without suffix.
- Return type:
str
- caoscrawler.transformers.string_transformers.replace(in_value: Any, params: dict)#
Replace all occurrences of a substring in the given text.
- Parameters:
in_value (str) – Text to be modified.
params (dict) –
- “old”:
The substring to replace. Alternative keys: “replace”, “remove”
- ”new”: optional, default is “”.
The new string to insert. Alternative keys: “insert”, “replace_with”
- ”count”: optional
Limit the number of occurrences to replace. Alternative keys: “max_replace”, “maxreplace”, “max”
- Returns:
result – The input value with all occurrences of old replaced by new.
- Return type:
str
- caoscrawler.transformers.string_transformers.split(in_value: Any, params: dict)#
Split the given text on all occurrences of the separator.
- Parameters:
in_value (str) – Text to be split.
params (dict) –
- “sep”: optional, default any whitespace
The substring to replace. Alternative keys: “separator”, “marker”, “split_on”
- ”maxsplit”: optional
Limit the number of splits to make. Alternative keys: “count”, “max_split”, “max”
- Returns:
result – The list of substrings obtained by splitting.
- Return type:
list[str]
- caoscrawler.transformers.string_transformers.strip(in_value: str, params: dict)#
Remove leading and trailing whitespace from the given text.
- Parameters:
in_value (str) – Text to be modified.
params (dict) – No parameters are expected.
- Returns:
result – The input text without leading and trailing whitespace.
- Return type:
str
- caoscrawler.transformers.string_transformers.upper(in_value: str, params: dict)#
Returns the given text in uppercase.
- Parameters:
in_value (str) – Text to be modified.
params (dict) – No parameters are expected.
- Returns:
result – The input value in uppercase.
- Return type:
str