ws.parser_helpers.title module¶
- ws.parser_helpers.title.canonicalize(title)¶
Return a canonical form of the title, that is:
underscores are replaced with spaces,
leading and trailing whitespace is stripped,
consecutive spaces are squashed,
first letter is capitalized.
Note
The interwiki and namespace prefixes are not split, canonicalization is applied to the passed title as a whole.
- Parameters
title – a
strormwparserfromhell.wikicode.Wikicodeobject- Returns
a
strobject
- class ws.parser_helpers.title.Context(interwikimap, namespacenames, namespaces, legaltitlechars)¶
Bases:
objectA context class for the
Titleparser.The parameters can be fetched either from the
APIorDatabaseclass:- Parameters
interwikimap (dict) – a mapping representing the data from MediaWiki’s
interwikitablenamespacenames (dict) – a dictionary mapping namespace names to numbers
namespaces (dict) – a dictionary mapping namespace numbers to dictionaries providing details about the namespace, such as names or case-sensitiveness
legaltitlechars (str) – string of characters which are allowed to occur in page titles
Normally, the user does not interact with the
Contextclass. Both the API and Database classes provide shortcut functions (API.TitleandDatabase.Title, respectively) which construct the necessary context and pass it to theTitleclass.
- class ws.parser_helpers.title.Title(context, title)¶
Bases:
objectA helper class intended for easy manipulation with wiki titles. Title parsing complies to the rules used in MediaWiki code and the interface is inspired by the magic words. Besides namespace detection, we also parse interwiki prefixes, which is useful for parsing the wiki links on lower level than what
mwparserfromhellprovides (it does not take the wiki configuration into account). The functionality depends on theContextclass for the validation of interwiki and namespace prefixes.- Parameters
context (Context) – a context object for the parser
title – a
strormwparserfromhell.wikicode.Wikicodeobject
The
titleis parsed by theparse()method.- parse(full_title)¶
Splits the title into
(iwprefix, namespace, pagename, sectionname)parts and canonicalizes them. Can be used to set these attributes from a string of full title instead of creating new instance.- Parameters
full_title – The full title to be parsed, either a
strormwparserfromhell.wikicode.Wikicodeobject.- Raises
InvalidTitleCharErrorwhen the page title is not valid
- format(*, iwprefix=False, namespace=False, sectionname=False, colon=False)¶
General formatting method.
- property iwprefix¶
The interwiki prefix of the title.
This attribute has a setter which raises
ValueErrorwhen the supplied interwiki prefix is not valid.Note that interwiki prefixes are case-insensitive and the canonical representation stored in the object is lowercase.
- property namespace¶
Same as
{{NAMESPACE}}in MediaWiki.This attribute has a setter which raises
ValueErrorwhen the supplied namespace is not valid.
- property namespacenumber¶
Same as
{{NAMESPACENUMBER}}in MediaWiki.
- property articlespace¶
Same as
{{ARTICLESPACE}}in MediaWiki.
- property talkspace¶
Same as
{{TALKSPACE}}in MediaWiki.
- property pagename¶
Same as
{{PAGENAME}}in MediaWiki, drops the interwiki and namespace prefixes. The section anchor is not included. Other*pagenameattributes are based on this attribute.This attribute has a setter which calls
parse()to split the supplied page name.
- property fullpagename¶
Same as
{{FULLPAGENAME}}in MediaWiki, but also includes interwiki prefix (if any).
- property basepagename¶
Same as
{{BASEPAGENAME}}in MediaWiki, drops the interwiki and namespace prefixes and the rightmost subpage level.Note
The
$wgNamespacesWithSubpagesoption is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.
- property subpagename¶
Same as
{{SUBPAGENAME}}in MediaWiki, returns the rightmost subpage level.Note
The
$wgNamespacesWithSubpagesoption is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.
- property rootpagename¶
Same as
{{ROOTPAGENAME}}in MediaWiki, drops the interwiki and namespace prefixes and all subpages.Note
The
$wgNamespacesWithSubpagesoption is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.
- property articlepagename¶
Same as
{{ARTICLEPAGENAME}}in MediaWiki.
- property talkpagename¶
Same as
{{TALKPAGENAME}}in MediaWiki.
- property sectionname¶
The section anchor, usable in wiki links. It is passed through the
ws.parser_helpers.encodings._anchor_preprocess()function, but it is not anchor-encoded nor decoded.Note
Section anchors on MediaWiki are usually encoded (see
ws.parser_helpers.encodings.dotencode()), but decoding is ambiguous due to whitespace squashing and the fact that the escape character itself (i.e. the dot) is not encoded even when followed by two hex characters. As a result, the canonical form of the anchor cannot be determined without comparing to the existing sections of the target page.This attribute has a setter.
- property leading_colon¶
Returns
":"if the parsed title had a leading colon (e.g. for links like[[:Category:Foo]]), otherwise it returns an empty string.
- dbtitle(expected_ns=None)¶
Returns the title formatted for use in the database.
In practice it is something between
pagenameandfullpagename:Namespace prefix is stripped if there is no interwiki prefix and the parsed namespace number agrees with
expected_ns. This is to cover the creation of new namespaces, e.g. pagesFoo:Barexisting first in the main namespace and then moved into a separate namespace,Foo:.If there is an interwiki prefix or a section name,
DatabaseTitleErroris raised to prevent unintended data loss.
- Parameters
expected_ns (int) – expected namespace number
- make_absolute(basetitle)¶
Changes a relative link to an absolute link. Has no effect if called on an absolute link.
Types of a relative link:
same-page section links (e.g.
[[#Section name]]on pageBaseis changed to[[Base#Section name]])subpages (e.g.
[[/Subpage]]on pageBaseis changed to[[Base/Subpage]])
Note
The
$wgNamespacesWithSubpagesoption is ignored (not available via API anyway), the method behaves as if subpages were enabled for all namespaces.Known MediaWiki incompatibilities:
We allow links starting with
../even on top-level pages.Because we use the
os.pathmodule to join the titles, even links starting with/or../and containing/./in the middle are allowed. In MediaWiki such links would be invalid.
- exception ws.parser_helpers.title.InvalidTitleCharError¶
Bases:
ws.parser_helpers.title.TitleErrorRaised when the requested title contains an invalid character.
- exception ws.parser_helpers.title.InvalidColonError¶
Bases:
ws.parser_helpers.title.TitleErrorRaised when the requested title contains an invalid colon at the beginning.
- exception ws.parser_helpers.title.DatabaseTitleError¶
Bases:
ws.parser_helpers.title.TitleErrorRaised when calling
Title.dbtitle()would cause a data loss.