ws.parser_helpers.title module¶
- ws.parser_helpers.title.canonicalize(title)¶
Return a canonical form of the title, that is:
underscores are replaced with spaces,
leading and trailing whitespace is stripped,
consecutive spaces are squashed,
first letter is capitalized.
Note
The interwiki and namespace prefixes are not split, canonicalization is applied to the passed title as a whole.
- Parameters
title – a
str
ormwparserfromhell.wikicode.Wikicode
object- Returns
a
str
object
- class ws.parser_helpers.title.Context(interwikimap, namespacenames, namespaces, legaltitlechars)¶
Bases:
object
A context class for the
Title
parser.The parameters can be fetched either from the
API
orDatabase
class:- Parameters
interwikimap (dict) – a mapping representing the data from MediaWiki’s
interwiki
tablenamespacenames (dict) – a dictionary mapping namespace names to numbers
namespaces (dict) – a dictionary mapping namespace numbers to dictionaries providing details about the namespace, such as names or case-sensitiveness
legaltitlechars (str) – string of characters which are allowed to occur in page titles
Normally, the user does not interact with the
Context
class. Both the API and Database classes provide shortcut functions (API.Title
andDatabase.Title
, respectively) which construct the necessary context and pass it to theTitle
class.
- class ws.parser_helpers.title.Title(context, title)¶
Bases:
object
A helper class intended for easy manipulation with wiki titles. Title parsing complies to the rules used in MediaWiki code and the interface is inspired by the magic words. Besides namespace detection, we also parse interwiki prefixes, which is useful for parsing the wiki links on lower level than what
mwparserfromhell
provides (it does not take the wiki configuration into account). The functionality depends on theContext
class for the validation of interwiki and namespace prefixes.- Parameters
context (Context) – a context object for the parser
title – a
str
ormwparserfromhell.wikicode.Wikicode
object
The
title
is parsed by theparse()
method.- parse(full_title)¶
Splits the title into
(iwprefix, namespace, pagename, sectionname)
parts and canonicalizes them. Can be used to set these attributes from a string of full title instead of creating new instance.- Parameters
full_title – The full title to be parsed, either a
str
ormwparserfromhell.wikicode.Wikicode
object.- Raises
InvalidTitleCharError
when the page title is not valid
- format(*, iwprefix=False, namespace=False, sectionname=False, colon=False)¶
General formatting method.
- property iwprefix¶
The interwiki prefix of the title.
This attribute has a setter which raises
ValueError
when the supplied interwiki prefix is not valid.Note that interwiki prefixes are case-insensitive and the canonical representation stored in the object is lowercase.
- property namespace¶
Same as
{{NAMESPACE}}
in MediaWiki.This attribute has a setter which raises
ValueError
when the supplied namespace is not valid.
- property namespacenumber¶
Same as
{{NAMESPACENUMBER}}
in MediaWiki.
- property articlespace¶
Same as
{{ARTICLESPACE}}
in MediaWiki.
- property talkspace¶
Same as
{{TALKSPACE}}
in MediaWiki.
- property pagename¶
Same as
{{PAGENAME}}
in MediaWiki, drops the interwiki and namespace prefixes. The section anchor is not included. Other*pagename
attributes are based on this attribute.This attribute has a setter which calls
parse()
to split the supplied page name.
- property fullpagename¶
Same as
{{FULLPAGENAME}}
in MediaWiki, but also includes interwiki prefix (if any).
- property basepagename¶
Same as
{{BASEPAGENAME}}
in MediaWiki, drops the interwiki and namespace prefixes and the rightmost subpage level.Note
The
$wgNamespacesWithSubpages
option is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.
- property subpagename¶
Same as
{{SUBPAGENAME}}
in MediaWiki, returns the rightmost subpage level.Note
The
$wgNamespacesWithSubpages
option is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.
- property rootpagename¶
Same as
{{ROOTPAGENAME}}
in MediaWiki, drops the interwiki and namespace prefixes and all subpages.Note
The
$wgNamespacesWithSubpages
option is ignored (not available via API anyway), the property behaves as if subpages were enabled for all namespaces.
- property articlepagename¶
Same as
{{ARTICLEPAGENAME}}
in MediaWiki.
- property talkpagename¶
Same as
{{TALKPAGENAME}}
in MediaWiki.
- property sectionname¶
The section anchor, usable in wiki links. It is passed through the
ws.parser_helpers.encodings._anchor_preprocess()
function, but it is not anchor-encoded nor decoded.Note
Section anchors on MediaWiki are usually encoded (see
ws.parser_helpers.encodings.dotencode()
), but decoding is ambiguous due to whitespace squashing and the fact that the escape character itself (i.e. the dot) is not encoded even when followed by two hex characters. As a result, the canonical form of the anchor cannot be determined without comparing to the existing sections of the target page.This attribute has a setter.
- property leading_colon¶
Returns
":"
if the parsed title had a leading colon (e.g. for links like[[:Category:Foo]]
), otherwise it returns an empty string.
- dbtitle(expected_ns=None)¶
Returns the title formatted for use in the database.
In practice it is something between
pagename
andfullpagename
:Namespace prefix is stripped if there is no interwiki prefix and the parsed namespace number agrees with
expected_ns
. This is to cover the creation of new namespaces, e.g. pagesFoo:Bar
existing first in the main namespace and then moved into a separate namespace,Foo:
.If there is an interwiki prefix or a section name,
DatabaseTitleError
is raised to prevent unintended data loss.
- Parameters
expected_ns (int) – expected namespace number
- make_absolute(basetitle)¶
Changes a relative link to an absolute link. Has no effect if called on an absolute link.
Types of a relative link:
same-page section links (e.g.
[[#Section name]]
on pageBase
is changed to[[Base#Section name]]
)subpages (e.g.
[[/Subpage]]
on pageBase
is changed to[[Base/Subpage]]
)
Note
The
$wgNamespacesWithSubpages
option is ignored (not available via API anyway), the method behaves as if subpages were enabled for all namespaces.Known MediaWiki incompatibilities:
We allow links starting with
../
even on top-level pages.Because we use the
os.path
module to join the titles, even links starting with/
or../
and containing/./
in the middle are allowed. In MediaWiki such links would be invalid.
- exception ws.parser_helpers.title.InvalidTitleCharError¶
Bases:
ws.parser_helpers.title.TitleError
Raised when the requested title contains an invalid character.
- exception ws.parser_helpers.title.InvalidColonError¶
Bases:
ws.parser_helpers.title.TitleError
Raised when the requested title contains an invalid colon at the beginning.
- exception ws.parser_helpers.title.DatabaseTitleError¶
Bases:
ws.parser_helpers.title.TitleError
Raised when calling
Title.dbtitle()
would cause a data loss.