ws.interlanguage.InterlanguageLinks module¶
- class ws.interlanguage.InterlanguageLinks.InterlanguageLinks(api)¶
Bases:
object
Update interlanguage links on ArchWiki based on the following algorithm:
Fetch list of all pages with prop=langlinks to be able to build a langlink graph (separate from the content dict for quick searching).
Group pages into families based on their title, which is the primary key to denote a family. The grouping is case-insensitive and includes even pages without any interlanguage links. The family name corresponds to the only English page in the family (or when not present, to the English base of the localized title).
For every page on the wiki:
Determine the family of the page.
Assemble a set of pages in the family. This is done by first including the pages in the group from step 2., then pulling any internal langlinks from the pages in the set (in unspecified order), and finally based on the presence of an English page in the family:
If there is an English page directly in the group from step 2. or if other pages link to an English page whose group can be merged with the current group without causing a conflict, its external langlinks are pulled in. As a result, external langlinks removed from the English page are assumed to be invalid and removed also from other pages in the family. For consistency, also internal langlinks are pulled from the English page.
If the pulling from an English page was not done, external langlinks are pulled from the other pages (in unspecified order), which completes the previous inclusion of internal langlinks.
Check if it is necessary to update the page by comparing the new set of langlinks for a page (i.e.
family.titles - {title}
) with the old set obtained from the wiki’s API. If an update is needed:Fetch content of the page.
Update the langlinks of the page.
If there is a difference, save the page.
- content_namespaces = [0, 4, 10, 12, 14]¶
- edit_summary = 'update interlanguage links'¶
- property allpages¶
- property wrapped_titles¶
- titles_in_family(master_title)¶
Get the titles in the family corresponding to
master_title
.- Parameters
master_title (str) – a page title (does not have to be English page)
- Returns
a
(titles, tags)
tuple, wheretitles
is the set of titles in the family (includingtitle
) andtags
is the set of corresponding language tags
- get_langlinks(full_title)¶
Uses
self.titles_in_family()
to get the titles of all pages in the family, removes the link to the passed title and sorts the list by the language subtag.- Returns
a list of
(tag, title)
tuples
- static update_page(title, text, langlinks, weak_update=True)¶
- Parameters
title (str) – title of the page
text (str) – wikitext of the page
langlinks – a sorted list of
(tag, title)
tuples as obtained fromself.get_langlinks()
weak_update – When
True
, the langlinks present on the page are mixed with those suggested byfamily_titles
. This is necessary only when there are multiple “intersecting” families, in which case the intersection should be preserved and solved manually. This is reported in _merge_families.
- Returns
updated wikicode
- update_allpages()¶
- find_orphans()¶
Returns list of pages that are alone in their families.
- rename_non_english()¶