You shall not publish: Edit filters on EN Wikipedia

Master Thesis Defence

Lyudmila Vaseva

6 January 2020

Overview

  • Motivation and research questions
  • Analysis sources
  • Findings
  • Directions for future studies

What is an edit filter

Motivation

Rise and decline in numbers of editors on EN Wikipedia Source: Halfaker et al. "The Rise and Decline of an Open Collaboration System: How Wikipedia’s reaction to popularity is causing its decline"

Research questions

  • Q1: What is the role of edit filters among existing algorithmic quality-control mechanisms on Wikipedia (bots, semi-automated tools, ORES)?
  • Q2: Edit filters are a classical rule-based system. Why are they still active today when more sophisticated ML approaches exist?
  • Q3: Which type of tasks do filters take over?
  • Q4: How have these tasks evolved over time (are there changes in the type, number, etc.)?

Analysis Sources

  • Literature
  • Documentation
  • Data

Q1: What is the role of edit filters among existing algorithmic quality-control mechanisms on Wikipedia (bots, semi-automated tools, ORES)?

  • edit filters triggered before an edit is published
  • disallow certain types of obvious, pervasive (perhaps automated), and difficult to remove vandalism directly
  • can target malicious users directly without restricting everyone (<-> page protection)
  • historically faster and more reliable, by being a direct part of the core software
  • people fed up with bot governance

Q2: Edit filters are a classical rule-based system. Why are they still active today when more sophisticated ML approaches exist?

  • introduced before most vandalism fighting ML systems came along
  • rule-based systems are more transparent and accountable
  • easier to work with
  • allow for finer levels of control than ML: i.e. disallowing specific users
  • allow more easily for collaboration

Q3: Which type of tasks do filters take over?

Filter actions for enabled public filters Filter actions for enabled hidden filters

Distribution of manually assigned labels for enabled filters

Q4: How have these tasks evolved over time (are there changes in the type, number, etc.)?

Number of filter hits per month, Mar 2009-Jan 2019

Number of edits over the years

Number of reverts per month, Jul 2001-Jul 2017 Data source: R.S. Geiger and A. Halfaker. 2017. Code and Datasets for: Operationalizing Conflict and Cooperation Between Automated Software Agents in Wikipedia. Figshare (2017). https://doi.org/10.6084/m9.figshare.5362216

Number of filter hits per month, according to filter action Number of filter hits per month, according to manuall assigned labels

Number of filter hits per month, according to causing editor's action

Directions for future studies

  • Verify results
  • What proportion of quality control work do filters take over?
  • To implement a bot or to implement a filter?
  • What are the repercussions on affected editors?
  • What are the differences between how filters are governed on EN Wikipedia compared to other language versions?

Thank you!

These slides are licensed under the CC BY-SA 4.0 License.

by sa

The project is available under: https://git.imp.fu-berlin.de/luvaseva/wikifilters

Questions? Comments? Thoughts?

There are 954 edit filters on EN Wikipedia: roughly 21% of them
are active, 16% are disabled, and 63% are deleted There are 954 edit filters on EN Wikipedia: roughly 21% of them are active, 16% are disabled, and 63% are deleted

Distribution of detailed manual tags

Funnel diagramm of all vandal fighting mechanisms according to me