ws.parser_helpers.title module

ws.parser_helpers.title.canonicalize(title)

Return a canonical form of the title, that is:

  • underscores are replaced with spaces,
  • leading and trailing whitespace is stripped,
  • consecutive spaces are squashed,
  • first letter is capitalized.

Note

The interwiki and namespace prefixes are not split, canonicalization is applied to the passed title as a whole.

Parameters:title – a str or mwparserfromhell.wikicode.Wikicode object
Returns:a str object
class ws.parser_helpers.title.Context(interwikimap, namespacenames, namespaces, legaltitlechars)

Bases: object

classmethod from_api(api)
__eq__(other)
__hash__ = None
class ws.parser_helpers.title.Title(context, title)

Bases: object

A helper class intended for easy manipulation with wiki titles. Title parsing complies to the rules used in MediaWiki code and the interface is inspired by the magic words. Besides namespace detection, we also parse interwiki prefixes, which is useful for parsing the wiki links on lower level than what mwparserfromhell provides (it does not take the wiki configuration into account). The functionality depends on the Context class for the validation of interwiki and namespace prefixes.

Parameters:
set_iwprefix(iw)
set_namespace(ns)
set_pagename(pagename)
set_sectionname(sectionname)
parse(full_title)

Splits the title into (iwprefix, namespace, pagename, sectionname) parts and canonicalizes them. Can be used to set these attributes from a string of full title instead of creating new instance.

Parameters:full_title – The full title to be parsed, either a str or mwparserfromhell.wikicode.Wikicode object.
_format(pre, mid, title)
iwprefix

The interwiki prefix of the title.

namespace

Same as {{NAMESPACE}}.

namespacenumber

Same as {{NAMESPACENUMBER}}.

articlespace

Same as {{ARTICLESPACE}}.

talkspace

Same as {{TALKSPACE}}.

pagename

Same as {{PAGENAME}}, drops the interwiki and namespace prefixes. The section anchor is not included. Other *pagename attributes are based on this attribute.

fullpagename

Same as {{FULLPAGENAME}}, but also includes interwiki prefix (if any).

basepagename

Same as {{BASEPAGENAME}}, drops the interwiki and namespace prefixes and the rightmost subpage level.

Note

The $wgNamespacesWithSubpages option is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.

subpagename

Same as {{SUBPAGENAME}}, returns the rightmost subpage level.

Note

The $wgNamespacesWithSubpages option is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.

rootpagename

Same as {{ROOTPAGENAME}}, drops the interwiki and namespace prefixes and all subpages.

Note

The $wgNamespacesWithSubpages option is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.

articlepagename

Same as {{ARTICLEPAGENAME}}.

talkpagename

Same as {{TALKPAGENAME}}.

sectionname

The section anchor, usable in wiki links. It is passed through the ws.parser_helpers.encodings._anchor_preprocess() function, but it is not anchor-encoded nor decoded.

Note

Section anchors on MediaWiki are usually encoded (see ws.parser_helpers.encodings.dotencode()), but decoding is ambiguous due to whitespace squashing and the fact that the escape character itself (i.e. the dot) is not encoded even when followed by two hex characters. As a result, the canonical form of the anchor cannot be determined without comparing to the existing sections of the target page.

dbtitle(expected_ns=None)

Returns the title formatted for use in the database.

In practice it is something between pagename and fullpagename to cover all the corner cases:

  • If there is an interwiki prefix, it is included. Necessary for old log entries from times when the current interwiki prefixes were not in place.
  • Namespace prefix is stripped if there is no interwiki prefix and the parsed namespace number agrees with expected_ns. This is to cover the creation of new namespaces, e.g. pages Foo:Bar existing first in the main namespace and then moved into a separate namespace, Foo:.
  • Section anchor is included. Again necessary for old log entries, apparently MediaWiki allowed # in user names at some point.
Parameters:expected_ns (int) – expected namespace number
__eq__(other)
__hash__ = None
__repr__()
__str__()

Returns the full representation of the title in the canonical form.

exception ws.parser_helpers.title.InvalidTitleCharError

Bases: Exception

Raised when an invalid title is passed to Title.