ws.parser_helpers.wikicode module

ws.parser_helpers.wikicode.strip_markup(text, normalize=True, collapse=True)

Parses the given text and returns the text after stripping all MediaWiki markup, leaving only the plain text.

Parameters
  • normalize – passed to mwparserfromhell.wikicode.Wikicode.strip_code()

  • collapse – passed to mwparserfromhell.wikicode.Wikicode.strip_code()

Returns

str

ws.parser_helpers.wikicode.get_adjacent_node(wikicode, node, ignore_whitespace=False)

Get the node immediately following node in wikicode.

Parameters
  • wikicode – a mwparserfromhell.wikicode.Wikicode object

  • node – a mwparserfromhell.nodes.Node object

  • ignore_whitespace – When True, the whitespace between node and the node being returned is ignored, i.e. the returned object is guaranteed to not be an all white space text, but it can still be a text with leading space.

Returns

a mwparserfromhell.nodes.Node object or None if node is the last object in wikicode

ws.parser_helpers.wikicode.get_parent_wikicode(wikicode, node)

Returns the parent of node as a wikicode object. Raises ValueError if node is not a descendant of wikicode.

ws.parser_helpers.wikicode.remove_and_squash(wikicode, obj)

Remove obj from wikicode and fix whitespace in the place it was removed from.

ws.parser_helpers.wikicode.get_section_headings(text)

Extracts section headings from given text. Custom regular expression is used instead of mwparserfromhell for performance reasons.

Note

Known issues:

Parameters

text (str) – content of the wiki page

Returns

list of section headings (without the = marks)

ws.parser_helpers.wikicode.get_anchors(headings, pretty=False, suffix_sep='_')

Converts section headings to anchors.

Note

Known issues:

  • templates are not handled (call ws.parser_helpers.template_expansion.expand_templates() on the wikitext before extracting section headings)

  • all tags are always stripped, even invalid tags (mwparserfromhell is not that configurable)

  • if pretty is True, tags escaped with <nowiki> in the input are not encoded in the output

Parameters
  • headings (list) – section headings (obtained e.g. with get_section_headings())

  • pretty (bool) – if True, the anchors will be as pretty as possible (e.g. for use in wikilinks), otherwise they are fully dot-encoded

  • suffix_sep (str) – the separator between the base anchor and numeric suffix for duplicate section names

Returns

list of section anchors

ws.parser_helpers.wikicode.ensure_flagged_by_template(wikicode, node, template_name, *template_parameters, overwrite_parameters=True)

Makes sure that node in wikicode is immediately (except for whitespace) followed by a template with template_name and optional template_parameters.

Parameters
Returns

the template flag, as a mwparserfromhell.nodes.template.Template objet

ws.parser_helpers.wikicode.ensure_unflagged_by_template(wikicode, node, template_name, *, match_only_prefix=False)

Makes sure that node in wikicode is not immediately (except for whitespace) followed by a template with template_name.

Parameters
ws.parser_helpers.wikicode.is_redirect(text, *, full_match=False)

Checks if the text represents a MediaWiki redirect page.

Parameters

full_match (bool) – Restricts the behaviour to return True only for pages which do not contain anything else but the redirect line.

ws.parser_helpers.wikicode.parented_ifilter(wikicode, recursive=True, matches=None, flags=re.IGNORECASE | re.UNICODE | re.DOTALL, forcetype=None)

Iterate over nodes and their corresponding parents.

The arguments are interpreted as for ifilter(). For each tuple (parent, node) yielded by this method, parent is the direct parent wikicode of node.

The method is intended for performance optimization by avoiding expensive search e.g. in the replace method. See the mwparserfromhell issue for details: https://github.com/earwig/mwparserfromhell/issues/195