caoscrawler.converters.transformer_converters module#

Converters for transforming text elements. Provide similar functions to the text transformers.

class caoscrawler.converters.transformer_converters.SplitTextConverter(definition, *args, **kwargs)#

Bases: _BaseTransformTextConverter

Splits the given TextElement into a list of TextElements, based on the separator given in the definition. Valid keys for the separator are “sep”, “separator”, “marker”, and “split_on”. Example for usage:

text_to_split:

type: SplitTextConverter sep: “;” match_name: “ALIASES” match_value: (?P<text_to_split_value>.*) subtree:

list_entry:

type: TextElement …

create_children(generalStore: GeneralStore, element: StructureElement)#
class caoscrawler.converters.transformer_converters.TransformTextConverter(definition, *args, **kwargs)#

Bases: _BaseTransformTextConverter

Applies the specified text transformer to the given TextElement. The transformer name should be given in options as “transformer”. If the transformer needs parameters, these may be supplied in one of “params”, “parameters”, or “arguments”. Example for usage:

text_to_transform:

type: TransformTextConverter transformer: “replace” parameters:

old: “;” new: “,”

match_name: “.*” match_value: (?P<original_text>.*) subtree:

transformed_text:

type: TextElement …

create_children(generalStore: GeneralStore, element: StructureElement)#