ws.parser_helpers.encodings module

ws.parser_helpers.encodings.encode(str_, escape_char='%', encode_chars='', skip_chars='', special_map=None, charset='utf-8', errors='strict')

Generalized implementation of a percent encoding algorithm.

Parameters
  • str – the string to be encoded

  • escape_char – character to be used as escape (by default ‘%’)

  • encode_chars – the characters to be encoded; empty string means that all characters will be encoded unless explicitly skipped

  • skip_chars – characters to be skipped (applied after encode_chars)

  • special_map – a mapping overriding default encoding (applied after both encode_chars and skip_chars)

  • charset – character set used to encode non-ASCII characters to byte sequence with str.encode()

  • errors – defines behaviour when encoding non-ASCII characters to bytes fails (passed to str.encode())

ws.parser_helpers.encodings.decode(str_, escape_char='%', special_map=None, charset='utf-8', errors='strict')

An inverse function to encode().

Note

The reversibility of the encoding depends on the parameters passed to encode(). Specifically, if the escape_char is not encoded, the operation is irreversible. Unfortunately MediaWiki does this with dot-encoding, so don’t even try to decode dot-encoded strings!

Parameters
  • str – the string to be decoded

  • escape_char – character to be used as escape (by default ‘%’)

  • special_map – an analogue to the same parameter in encode() (the caller is responsible for inverting the mapping they passed to encode())

  • charset – character set used to decode byte sequence with bytes.decode()

  • errors – defines behaviour when byte-decoding with bytes.decode() fails

ws.parser_helpers.encodings.dotencode(str_)

Return an anchor-encoded string as shown in this encoding table. It uses the legacy format for $wgFragmentMode.

Note

The rules for handling special characters in section anchors are not well defined even upstream, see T20431. This function produces the actual anchor for the section, i.e. the ID of the heading’s span element (e.g. <span id="anchor" ...>).

ws.parser_helpers.encodings.anchorencode(str_, format='html5')

Function corresponding to the {{anchorencode:}} magic word.

Parameters
  • str – the string to be encoded

  • format – either "html5" or "legacy" (see $wgFragmentMode)

ws.parser_helpers.encodings.urlencode(str_)

Standard URL encoding as described on Wikipedia, which should correspond to the PATH style in the MediaWiki’s comparison table.

ws.parser_helpers.encodings.urldecode(str_)

An inverse function to urlencode().

ws.parser_helpers.encodings.queryencode(str_)

The QUERY style encoding as described on MediaWiki. This is the default URL encoding in MediaWiki since 1.17.

ws.parser_helpers.encodings.querydecode(str_)

An inverse function to queryencode().