ws.parser_helpers.wikicode module

ws.parser_helpers.wikicode.strip_markup(text, normalize=True, collapse=True)

Parses the given text and returns the text after stripping all MediaWiki markup, leaving only the plain text.

  • normalize – passed to mwparserfromhell.wikicode.Wikicode.strip_code()
  • collapse – passed to mwparserfromhell.wikicode.Wikicode.strip_code()


ws.parser_helpers.wikicode.get_adjacent_node(wikicode, node, ignore_whitespace=False)

Get the node immediately following node in wikicode.

  • wikicode – a mwparserfromhell.wikicode.Wikicode object
  • node – a mwparserfromhell.nodes.Node object
  • ignore_whitespace – When True, the whitespace between node and the node being returned is ignored, i.e. the returned object is guaranteed to not be an all white space text, but it can still be a text with leading space.

a mwparserfromhell.nodes.Node object or None if node is the last object in wikicode

ws.parser_helpers.wikicode.get_parent_wikicode(wikicode, node)

Returns the parent of node as a wikicode object. Raises ValueError if node is not a descendant of wikicode.

ws.parser_helpers.wikicode.remove_and_squash(wikicode, obj)

Remove obj from wikicode and fix whitespace in the place it was removed from.


Extracts section headings from given text. Custom regular expression is used instead of mwparserfromhell for performance reasons.

Parameters:text (str) – content of the wiki page
Returns:list of section headings (without the = marks)
ws.parser_helpers.wikicode.get_anchors(headings, pretty=False, suffix_sep='_')

Converts section headings to anchors.


Known issues:

  • templates are always fully stripped (doing this right requires template expansion)
  • all tags are always stripped, even invalid tags (mwparserfromhell is not that configurable)
  • if pretty is True, tags escaped with <nowiki> in the input are not encoded in the output
  • headings (list) – section headings (obtained e.g. with get_section_headings())
  • pretty (bool) – if True, the anchors will be as pretty as possible (e.g. for use in wikilinks), otherwise they are fully dot-encoded
  • suffix_sep (str) – the separator between the base anchor and numeric suffix for duplicate section names

list of section anchors

ws.parser_helpers.wikicode.ensure_flagged_by_template(wikicode, node, template_name, *template_parameters, overwrite_parameters=True)

Makes sure that node in wikicode is immediately (except for whitespace) followed by a template with template_name and optional template_parameters.


the template flag, as a mwparserfromhell.nodes.template.Template objet

ws.parser_helpers.wikicode.ensure_unflagged_by_template(wikicode, node, template_name)

Makes sure that node in wikicode is not immediately (except for whitespace) followed by a template with template_name.