ws.parser_helpers.title module

ws.parser_helpers.title.canonicalize(title)

Return a canonical form of the title, that is:

  • underscores are replaced with spaces,
  • leading and trailing whitespace is stripped,
  • consecutive spaces are squashed,
  • first letter is capitalized.

Note

The interwiki and namespace prefixes are not split, canonicalization is applied to the passed title as a whole.

Parameters:title – a str or mwparserfromhell.wikicode.Wikicode object
Returns:a str object
class ws.parser_helpers.title.Context(interwikimap, namespacenames, namespaces, legaltitlechars)

Bases: object

A context class for the Title parser.

The parameters can be fetched either from the API or Database class:

Parameters:
  • interwikimap (dict) – a mapping representing the data from MediaWiki’s interwiki table
  • namespacenames (dict) – a dictionary mapping namespace names to numbers
  • namespaces (dict) – a dictionary mapping namespace numbers to dictionaries providing details about the namespace, such as names or case-sensitiveness
  • legaltitlechars (str) – string of characters which are allowed to occur in page titles

Normally, the user does not interact with the Context class. Both the API and Database classes provide shortcut functions (API.Title and Database.Title, respectively) which construct the necessary context and pass it to the Title class.

classmethod from_api(api)

Creates a Context instance using the API object.

Used by API.Title.

__eq__(other)

Standard equality comparison operator. Comparing API-based and Database-based contexts is possible.

__hash__ = None
class ws.parser_helpers.title.Title(context, title)

Bases: object

A helper class intended for easy manipulation with wiki titles. Title parsing complies to the rules used in MediaWiki code and the interface is inspired by the magic words. Besides namespace detection, we also parse interwiki prefixes, which is useful for parsing the wiki links on lower level than what mwparserfromhell provides (it does not take the wiki configuration into account). The functionality depends on the Context class for the validation of interwiki and namespace prefixes.

Parameters:

The title is parsed by the parse() method.

_set_iwprefix(iw)

Auxiliary setter for iwprefix.

_set_namespace(ns)

Auxiliary setter for namespace.

_set_pagename(pagename)

Auxiliary setter for pagename.

_set_sectionname(sectionname)

Auxiliary setter for sectionname.

_format(pre, mid, title)

Auxiliary method for full title formatting.

parse(full_title)

Splits the title into (iwprefix, namespace, pagename, sectionname) parts and canonicalizes them. Can be used to set these attributes from a string of full title instead of creating new instance.

Parameters:full_title – The full title to be parsed, either a str or mwparserfromhell.wikicode.Wikicode object.
Raises:InvalidTitleCharError when the page title is not valid
iwprefix

The interwiki prefix of the title.

This attribute has a setter which raises ValueError when the supplied interwiki prefix is not valid.

namespace

Same as {{NAMESPACE}} in MediaWiki.

This attribute has a setter which raises ValueError when the supplied namespace is not valid.

namespacenumber

Same as {{NAMESPACENUMBER}} in MediaWiki.

articlespace

Same as {{ARTICLESPACE}} in MediaWiki.

talkspace

Same as {{TALKSPACE}} in MediaWiki.

pagename

Same as {{PAGENAME}} in MediaWiki, drops the interwiki and namespace prefixes. The section anchor is not included. Other *pagename attributes are based on this attribute.

This attribute has a setter which calls parse() to split the supplied page name.

fullpagename

Same as {{FULLPAGENAME}} in MediaWiki, but also includes interwiki prefix (if any).

basepagename

Same as {{BASEPAGENAME}} in MediaWiki, drops the interwiki and namespace prefixes and the rightmost subpage level.

Note

The $wgNamespacesWithSubpages option is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.

subpagename

Same as {{SUBPAGENAME}} in MediaWiki, returns the rightmost subpage level.

Note

The $wgNamespacesWithSubpages option is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.

rootpagename

Same as {{ROOTPAGENAME}} in MediaWiki, drops the interwiki and namespace prefixes and all subpages.

Note

The $wgNamespacesWithSubpages option is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.

articlepagename

Same as {{ARTICLEPAGENAME}} in MediaWiki.

talkpagename

Same as {{TALKPAGENAME}} in MediaWiki.

sectionname

The section anchor, usable in wiki links. It is passed through the ws.parser_helpers.encodings._anchor_preprocess() function, but it is not anchor-encoded nor decoded.

Note

Section anchors on MediaWiki are usually encoded (see ws.parser_helpers.encodings.dotencode()), but decoding is ambiguous due to whitespace squashing and the fact that the escape character itself (i.e. the dot) is not encoded even when followed by two hex characters. As a result, the canonical form of the anchor cannot be determined without comparing to the existing sections of the target page.

This attribute has a setter.

dbtitle(expected_ns=None)

Returns the title formatted for use in the database.

In practice it is something between pagename and fullpagename to cover all the corner cases:

  • If there is an interwiki prefix, it is included. Necessary for old log entries from times when the current interwiki prefixes were not in place.
  • Namespace prefix is stripped if there is no interwiki prefix and the parsed namespace number agrees with expected_ns. This is to cover the creation of new namespaces, e.g. pages Foo:Bar existing first in the main namespace and then moved into a separate namespace, Foo:.
  • Section anchor is included. Again necessary for old log entries, apparently MediaWiki allowed # in user names at some point.
Parameters:expected_ns (int) – expected namespace number
__eq__(other)

Return self==value.

__hash__ = None
__repr__()

Return repr(self).

__str__()

Returns the full representation of the title in the canonical form.

exception ws.parser_helpers.title.InvalidTitleCharError

Bases: Exception

Raised when an invalid title is passed to Title.