ws.parser_helpers.title module

ws.parser_helpers.title.canonicalize(title)

Return a canonical form of the title, that is:

  • underscores are replaced with spaces,

  • leading and trailing whitespace is stripped,

  • consecutive spaces are squashed,

  • first letter is capitalized.

Note

The interwiki and namespace prefixes are not split, canonicalization is applied to the passed title as a whole.

Parameters

title – a str or mwparserfromhell.wikicode.Wikicode object

Returns

a str object

class ws.parser_helpers.title.Context(interwikimap, namespacenames, namespaces, legaltitlechars)

Bases: object

A context class for the Title parser.

The parameters can be fetched either from the API or Database class:

Parameters
  • interwikimap (dict) – a mapping representing the data from MediaWiki’s interwiki table

  • namespacenames (dict) – a dictionary mapping namespace names to numbers

  • namespaces (dict) – a dictionary mapping namespace numbers to dictionaries providing details about the namespace, such as names or case-sensitiveness

  • legaltitlechars (str) – string of characters which are allowed to occur in page titles

Normally, the user does not interact with the Context class. Both the API and Database classes provide shortcut functions (API.Title and Database.Title, respectively) which construct the necessary context and pass it to the Title class.

classmethod from_api(api)

Creates a Context instance using the API object.

Used by API.Title.

class ws.parser_helpers.title.Title(context, title)

Bases: object

A helper class intended for easy manipulation with wiki titles. Title parsing complies to the rules used in MediaWiki code and the interface is inspired by the magic words. Besides namespace detection, we also parse interwiki prefixes, which is useful for parsing the wiki links on lower level than what mwparserfromhell provides (it does not take the wiki configuration into account). The functionality depends on the Context class for the validation of interwiki and namespace prefixes.

Parameters

The title is parsed by the parse() method.

parse(full_title)

Splits the title into (iwprefix, namespace, pagename, sectionname) parts and canonicalizes them. Can be used to set these attributes from a string of full title instead of creating new instance.

Parameters

full_title – The full title to be parsed, either a str or mwparserfromhell.wikicode.Wikicode object.

Raises

InvalidTitleCharError when the page title is not valid

format(*, iwprefix=False, namespace=False, sectionname=False, colon=False)

General formatting method.

Parameters
  • colon (bool) – include the leading colon

  • iwprefix (bool) – include the interwiki prefix

  • namespace (bool) – include the namespace prefix (it is always included if there is an interwiki prefix)

  • sectionname (bool) – include the section name

property iwprefix

The interwiki prefix of the title.

This attribute has a setter which raises ValueError when the supplied interwiki prefix is not valid.

Note that interwiki prefixes are case-insensitive and the canonical representation stored in the object is lowercase.

property namespace

Same as {{NAMESPACE}} in MediaWiki.

This attribute has a setter which raises ValueError when the supplied namespace is not valid.

property namespacenumber

Same as {{NAMESPACENUMBER}} in MediaWiki.

property articlespace

Same as {{ARTICLESPACE}} in MediaWiki.

property talkspace

Same as {{TALKSPACE}} in MediaWiki.

property pagename

Same as {{PAGENAME}} in MediaWiki, drops the interwiki and namespace prefixes. The section anchor is not included. Other *pagename attributes are based on this attribute.

This attribute has a setter which calls parse() to split the supplied page name.

property fullpagename

Same as {{FULLPAGENAME}} in MediaWiki, but also includes interwiki prefix (if any).

property basepagename

Same as {{BASEPAGENAME}} in MediaWiki, drops the interwiki and namespace prefixes and the rightmost subpage level.

Note

The $wgNamespacesWithSubpages option is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.

property subpagename

Same as {{SUBPAGENAME}} in MediaWiki, returns the rightmost subpage level.

Note

The $wgNamespacesWithSubpages option is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.

property rootpagename

Same as {{ROOTPAGENAME}} in MediaWiki, drops the interwiki and namespace prefixes and all subpages.

Note

The $wgNamespacesWithSubpages option is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.

property articlepagename

Same as {{ARTICLEPAGENAME}} in MediaWiki.

property talkpagename

Same as {{TALKPAGENAME}} in MediaWiki.

property sectionname

The section anchor, usable in wiki links. It is passed through the ws.parser_helpers.encodings._anchor_preprocess() function, but it is not anchor-encoded nor decoded.

Note

Section anchors on MediaWiki are usually encoded (see ws.parser_helpers.encodings.dotencode()), but decoding is ambiguous due to whitespace squashing and the fact that the escape character itself (i.e. the dot) is not encoded even when followed by two hex characters. As a result, the canonical form of the anchor cannot be determined without comparing to the existing sections of the target page.

This attribute has a setter.

property leading_colon

Returns ":" if the parsed title had a leading colon (e.g. for links like [[:Category:Foo]]), otherwise it returns an empty string.

dbtitle(expected_ns=None)

Returns the title formatted for use in the database.

In practice it is something between pagename and fullpagename:

  • Namespace prefix is stripped if there is no interwiki prefix and the parsed namespace number agrees with expected_ns. This is to cover the creation of new namespaces, e.g. pages Foo:Bar existing first in the main namespace and then moved into a separate namespace, Foo:.

  • If there is an interwiki prefix or a section name, DatabaseTitleError is raised to prevent unintended data loss.

Parameters

expected_ns (int) – expected namespace number

make_absolute(basetitle)

Changes a relative link to an absolute link. Has no effect if called on an absolute link.

Types of a relative link:

  • same-page section links (e.g. [[#Section name]] on page Base is changed to [[Base#Section name]])

  • subpages (e.g. [[/Subpage]] on page Base is changed to [[Base/Subpage]])

Note

The $wgNamespacesWithSubpages option is ignored (not available via API anyway), the method behaves as if subpages were enabled for all namespaces.

Known MediaWiki incompatibilities:

  • We allow links starting with ../ even on top-level pages.

  • Because we use the os.path module to join the titles, even links starting with / or ../ and containing /./ in the middle are allowed. In MediaWiki such links would be invalid.

Parameters

basetitle – the base title, either str or Title

Returns

a copy of self, modified as appropriate

exception ws.parser_helpers.title.TitleError

Bases: Exception

Base class for all title errors.

exception ws.parser_helpers.title.InvalidTitleCharError

Bases: ws.parser_helpers.title.TitleError

Raised when the requested title contains an invalid character.

exception ws.parser_helpers.title.InvalidColonError

Bases: ws.parser_helpers.title.TitleError

Raised when the requested title contains an invalid colon at the beginning.

exception ws.parser_helpers.title.DatabaseTitleError

Bases: ws.parser_helpers.title.TitleError

Raised when calling Title.dbtitle() would cause a data loss.