ws.db.schema module

Known incompatibilities from MediaWiki schema:

  • Not binary compatible, but stores the same data. Thus compatibility can be achieved via wiki-scripts <-> MWAPI interface, but wiki-scripts can’t read a MediaWiki database directly. This wouldn’t be possible even theoretically, since the database can contain serialized PHP objects etc.
  • Added some custom tables.
  • Enforced foreign key constraints, including namespaces stored in custom tables, and some check constraints.
  • Columns not available via the API (e.g. user passwords) are nullable, since they are not part of the mirroring process. Likewise revision.rev_text_id is nullable so that we can sync metadata and text separately.
  • user_groups table has primary key to avoid duplicate entries.
  • Removed columns that were deprecated even in MediaWiki:
    page.page_restrictions archive.ar_text archive.ar_flags
  • Reordered columns in archive table to match the revision table.
  • Revamped the protected_titles table - removed unnecessary columns pt_user, pt_reason and pt_timestamp since the information can be found in the logging table. See https://phabricator.wikimedia.org/T65318#2654217 for reference.
  • Boolean columns use Boolean type instead of SmallInteger as in MediaWiki.
  • Unknown/invalid IDs are represented by NULL instead of 0. Except for user_id, where we add a dummy user with id = 0 to represent anonymous users.
  • Removed default values from all timestamp columns.
  • Removed silly default values - if we don’t know, let’s make it NULL.
  • Revamped the tags tables:
    • Besides the tag name, we need to store everything that MediaWiki generates or stores elsewhere.
    • The change_tag table was split into tagged_recentchange, tagged_logevent, tagged_revision and tagged_archived_revision. Foreign keys on the other tables are enforced.
    • The equivalent of the tag_summary table does not exist, we can live with the GROUP BY queries.
  • Various notes on tables used by MediaWiki, but not wiki-scripts:
    • site_stats: we don’t sync the site stats because the values are inconsistent even in MediaWiki
    • sites, site_identifiers: as of MW 1.28, they are not visible via the API
    • job, objectcache, querycache*, transcache, updatelog: not needed for wiki-scripts operation
    • user_former_groups: used only to prevent user auto-promotion into groups from which they were already removed; not visible through the API
ws.db.schema.create_custom_tables(metadata)
ws.db.schema.create_site_tables(metadata)
ws.db.schema.create_recentchanges_tables(metadata)
ws.db.schema.create_users_tables(metadata)
ws.db.schema.create_revisions_tables(metadata)
ws.db.schema.create_pages_tables(metadata)
ws.db.schema.create_recomputable_tables(metadata)
ws.db.schema.create_multimedia_tables(metadata)
ws.db.schema.create_tables(metadata)