caoscrawler.transformers.string_transformers module#

Transformer functions for string manipulation.

See https://docs.linkahead.org for more information.

caoscrawler.transformers.string_transformers.lower(in_value: str, params: dict)#

Returns the given text in lowercase.

Parameters:
  • in_value (str) – Text to be modified.

  • params (dict) – No parameters are expected.

Returns:

result – The input value in lowercase.

Return type:

str

caoscrawler.transformers.string_transformers.regex_replace(in_value: str, params: dict)#

Replace the leftmost matches of params.pattern in the given string. Returns re.sub(params.pattern, params.replace_with, in_value, params.count).

Parameters:
  • in_value (str) – Text to be modified.

  • params (dict) –

    “pattern”:

    The regex pattern to replace. Alternative keys: “match”, “old”, “replace”, “remove”

    ”repl”: optional, default is “”.

    The new string to insert. Alternative keys: “new”, “insert”, “replace_with”

    ”count”: optional

    Limit the number of occurrences to replace. Alternative keys: “max_replace”, “maxreplace”, “max”

Returns:

result – The input value with the leftmost pattern matches replaced by params.replace_with.

Return type:

str

caoscrawler.transformers.string_transformers.removeprefix(in_value: str, params: dict)#

Remove a prefix from the given text. If the text does not have the specified prefix, does nothing.

Parameters:
  • in_value (str) – Text to be modified.

  • params (dict) –

    “prefix”:

    The prefix to remove. Alternative keys: “remove”

Returns:

result – The input value without prefix.

Return type:

str

caoscrawler.transformers.string_transformers.removesuffix(in_value: str, params: dict)#

Remove a suffix from the given text. If the text does not have the specified suffix, does nothing.

Parameters:
  • in_value (str) – Text to be modified.

  • params (dict) –

    “suffix”:

    The suffix to remove. Alternative keys: “remove”

Returns:

result – The input value without suffix.

Return type:

str

caoscrawler.transformers.string_transformers.replace(in_value: Any, params: dict)#

Replace all occurrences of a substring in the given text.

Parameters:
  • in_value (str) – Text to be modified.

  • params (dict) –

    “old”:

    The substring to replace. Alternative keys: “replace”, “remove”

    ”new”: optional, default is “”.

    The new string to insert. Alternative keys: “insert”, “replace_with”

    ”count”: optional

    Limit the number of occurrences to replace. Alternative keys: “max_replace”, “maxreplace”, “max”

Returns:

result – The input value with all occurrences of old replaced by new.

Return type:

str

caoscrawler.transformers.string_transformers.split(in_value: Any, params: dict)#

Split the given text on all occurrences of the separator.

Parameters:
  • in_value (str) – Text to be split.

  • params (dict) –

    “sep”: optional, default any whitespace

    The substring to replace. Alternative keys: “separator”, “marker”, “split_on”

    ”maxsplit”: optional

    Limit the number of splits to make. Alternative keys: “count”, “max_split”, “max”

Returns:

result – The list of substrings obtained by splitting.

Return type:

list[str]

caoscrawler.transformers.string_transformers.strip(in_value: str, params: dict)#

Remove leading and trailing whitespace from the given text.

Parameters:
  • in_value (str) – Text to be modified.

  • params (dict) – No parameters are expected.

Returns:

result – The input text without leading and trailing whitespace.

Return type:

str

caoscrawler.transformers.string_transformers.upper(in_value: str, params: dict)#

Returns the given text in uppercase.

Parameters:
  • in_value (str) – Text to be modified.

  • params (dict) – No parameters are expected.

Returns:

result – The input value in uppercase.

Return type:

str