-
Lyudmila Vaseva authoredLyudmila Vaseva authored
- Overview
- Motivation
- Research questions
- Approach / Analysis Sources
- State of the Scientific Literature
- Q1: What is the role of edit filters among existing algorithmic quality-control mechanisms on Wikipedia (bots, semi-automated tools, ORES, humans)?
- Q2: Edit filters are a classical rule-based system. Why are they still active today when more sophisticated ML approaches exist?
- Q3: Which type of tasks do filters take over?
- Q4: How have these tasks evolved over time (are they changes in the type, number, etc.)?
- Open Questions / Directions for future studies
- Current Limitations
- Bigger picture: Upload filters
- Thank you!
- Questions? Comments? Thoughts?
- Motivation
- What is an edit filter
- Motivations for its introduction
- Edit filters in the quality control mechanisms frame
- Data Analysis: Edit Filters on EN Wikipedia
- What do most active filters do?
- Descriptive statistics
- Manual classification
% You shall not publish: Edit filters on EN Wikipedia % Master Thesis Defence % Lusy
Gandalf
- on title slide?
- CI? look up whether there are templates
Overview
- Motivation
- Research Questions
- State of the literature/Literature: What does the scientific community know
- Documentation: What is an edit filter and why was it introduced according to Wikipedia's/MediaWiki pages/Wikipedia's community?
- Data Analysis: Edit filters on English Wikipedia
- Evaluation/Limitations
- Open questions/Directions for future studies
Motivation
why is it relevant? decription of state of the art what is the question? Q1-Q4
- wikipedia is a complex socio technical system
- we have the luck it's "open", so we can study it and learn how things work and apply the insights to less open systems
- "anyone can edit": increasing popularity in 2006; -> increasing need for quality control
- edit filters a one particular mechanism for quality control among several, and one previously unstudied
- seem relevant to understand how and what they do since they make it possible to disallow edits (and other actions, but above all edits) from the very beginning
Research questions
Q1: What is the role of edit filters among existing algorithmic quality-control mechanisms on Wikipedia (bots, semi-automated tools, ORES, humans)?
Q2: Edit filters are a classical rule-based system. Why are they still active today when more sophisticated ML approaches exist?
Q3: Which type of tasks do filters take over?
Q4: How have these tasks evolved over time (are they changes in the type, number, etc.)?
Approach / Analysis Sources
- Literature
- Documentation
- Data (Edit filter patterns, DB log table)
State of the Scientific Literature

- One thing is ostentatiously missing: edit filters
- TODO discuss mechanisms with the help of the summary table from Chapter 2
Q1: What is the role of edit filters among existing algorithmic quality-control mechanisms on Wikipedia (bots, semi-automated tools, ORES, humans)?
- 1st mechanism activated to control quality (at the beginning of the funnel)
- historically: faster, by being a direct part of the core software: disallow even before publishing
- can target malicious users directly without restricting everyone (<-> page protection)
- introduced to take care of obvious but cumbersome to remove vandalism
- people fed up with bot introduction and development processes (poor quality, no tests, no code available for revision in case of problems) (so came up with a new approach)
- filter allow more easily for collaboration
Q2: Edit filters are a classical rule-based system. Why are they still active today when more sophisticated ML approaches exist?
- introduced before vandalism fighting ML systems came along (verify!); so they were there first historically; still work well; don't touch a running system^^
- a gap was perceived in the existing system which was filled with filters
- in functionality: disallow cumbersome vandalism from the start
- in governance: bots are poorly tested, communication and updates are difficult
- volunteer system: people do what they like and can (someone has experience with this types of tech and implemented it that way)
- rule-based systems are more transparent and accountable
- and easier to work with (easier to add yet another rule than tweak paremeters in an obscure ML based approach)
- allows for finer levels of control than ML: i.e. disallowing specific users
Q3: Which type of tasks do filters take over?
- in total most filters are hidden: so implemented with the purpose of taking care of cumbersome vandalism by specific malicious users
- vandalism/good faith/maintenance
- when a new problem emerges: when is a bot chosen and when a filter: depends probably at least partially on who is handling it; TODO: ask people! (there are bot operators who are also filter managers)
Q4: How have these tasks evolved over time (are they changes in the type, number, etc.)?
- filter hit numbers are of the same magnitude as reverts (way higher than initially expected)
- beginning: more good faith, later more vandalism hits (somewhat unexpected)
- surge in 2016 and a subsequently higher baseline in hit numbers (explaination?)
- overall number of active filters stays the same (condition limit)
- most active filters of all times are quite stable through the years
Open Questions / Directions for future studies
- MR is merged: abuse_filter_history table is now available
- How have edit filters's tasks evolved over time? : should be easier to look into it with the abuse_filter_history When a dump becomes available, an extensive investigation of filters' actions, creation and activation patterns, as well as patterns they have targeted over time will be possible.
- What proportion of quality control work do filters take over?: Filter hits can be systematically compared with the number of all edits and reverts via other quality control mechanisms.
- Is it possible to study the filter patterns in a more systematic fashion?
- What can be learnt from this?: For example, it has come to attention that 1/5 of all active filters discriminate against new users via the \verb|!("confirmed" in user_groups)| pattern. Are there other tendencies of interest?
- Is there a qualitative difference between the tasks/patterns of public and hidden filters?: According to the guidelines for filter creation, general filters should be public while filters targeting particular users should be hidden. Is there something more to be learnt from an examination of hidden filters' patterns? Do they actually conform to the guidelines? %One will have to request access to them for research purposes, sign an NDA, etc.
- How are false positives handled?: Have filters been shut down regularly, because they matched more false positives than they had real value? Are there big amounts of false positives that corrupt the filters hit data and thus the interpretations offered by the current work?
- To implement a bot or to implement a filter?: An ethnographic inquiry into if an editor is simultaneously an edit filter manager and a bot operator when faced with a new problem, how do they decide which mechanism to employ for the solution?
- What are the repercussions on affected editors? An ethnographic study of the consequences of edit filters for editors whose edits are filtered. Do they experience frustration or alienation? Do they understand what is going on? Or do they experience for example edit filters' warnings as helpful and appreciate the hints they have been given and use them to improve their collaboration?
- What are the differences between how filters are governed on EN Wikipedia compared to other language versions?: Different Wikipedia language versions each have a local community behind them. These communities vary, sometimes significantly, in their modes of organisation and values. It would be very insightful to explore disparities between filter governance and the types of filters implemented between different language versions.
- Are edit filters a suitable mechanism for fighting harassment?: A disturbing rise in online personal attacks and harassment is observed in a variety of online spaces, including Wikipedia~\cite{Duggan2014}. The Wikimedia Foundation sought to better understand harassment in their projects via a Harassment Survey conducted in 2015~\cite{Wikimedia:HarassmentSurvey}. According to the edit filter noticeboard archives~\cite{Wikipedia:EditFilterNoticeboardHarassment}, there have been some attempts to combat harassment by means of filters. The tool is also mentioned repeatedly in the timeline of Wikipedia's Community Health Initiative~\cite{Wikipedia:CommunityHealthInitiative} which seeks to reduce harassment and disruptive behaviour on Wikipedia. An evaluation of its usefulness and success at this task would be really interesting.
- (How) has the notion of
vandalism'' on Wikipedia evolved over time?: By comparing older and newer filters, or respectively updates in filter patterns, it could be investigated whether there has been a qualitative change in the interpretation of the
vandalism'' notion on Wikipedia. - What are the urgent situations in which edit filter managers are given the freedom to act as they see fit and ignore best practices of filter adoption?: (i.e. switch on a filter in log only mode first and announce it on the notice board so others can have a look)? Who determines they are urgent? These cases should be scrutinised extra carefully since ``urgent situations'' have historically always been an excuse for cuts in civil liberties.
Current Limitations
Data
- Only EN Wikipedia
- abuse_filter_history missing
- no access to hidden filters
Process
- manual filter classification only conducted by me
- no ethnographic analysis; can answer valuable questions (i.e. bot vs filter?)
- Evaluation: what would I do differently?/what went not so well
- Start writing after getting hold of all the data
Bigger picture: Upload filters

Thank you!
These slides are licensed under the CC BY-SA 4.0 License.
Questions? Comments? Thoughts?
OLD

Motivation
- What is the role of filters among existing (algorithmic) quality-control mechanisms (bots, semi-automated tools, ORES, humans)? Which type of tasks do filters take over?
- How have these tasks evolved over time (are they changes in the type, number, etc.)?
- What are suitable areas of application for rule-based systems such as filters in contrast to the other ML-based approaches?
What is an edit filter
- MediaWiki extension
- regex based filtering of edits and other actions (e.g. account creation, page deletion or move, upload)
- triggers before an edit is published
- different actions can be defined
Motivations for its introduction
- disallow certain types of obvious pervasive (perhaps automated) vandalism directly
- takes more than a single click to revert
- human editors can use their time more productively elsewhere
Edit filters in the quality control mechanisms frame
- the question of infrastructure
- guidelines say: for in-depth checks and problems with a particular article bots are better (don't use up resources)
- they were introduced before the ml tools came around.
- they probably work, so no one sees a reason to shut them down
- hypothesis: Wikipedia is a diy project driven by volunteers; they work on whatever they like to work
- hypothesis: it is easier to understand what's going on than it is with a ML tool. people like to use them for simplicity and transparency reasons
- hypothesis: it is easier to set up a filter than program a bot. Setting up a filter requires "only" understanding of regular expressions. Programming a bot requires knowledge of a programming language and understanding of the API.
Data Analysis: Edit Filters on EN Wikipedia
What do most active filters do?
135 repeating characters & tag, warn
30 "large deletion from article by new editors" & tag, warn
61 "new user removing references" & tag
18 "test type edits from clicking on edit bar" & deleted in Feb 2012
3 "new user blanking articles" & tag, warn
Descriptive statistics

all filters: 954
public filters: 361
Active public filters: 110
disabled (but not deleted) public filters: 35
deleted public filters: 216
hidden filters: 593
active hidden filters: 91
disabled (but not deleted) hidden filters: 118
deleted hidden filters: 384
Number of filter hits per month March 2009-March 2019

Filters Actions

Active Public Filters Actions

Active Hidden Filters Actions

Manual classification
vandalism, good faith and maintenance
- difficult to distinguish
- a lot of subcategories
Vandalism
id hits public comment
46 356945 "Poop" vandalism
365 85470 Unusual changes to featured or good content
16 2005 Prolific socker I
Good Faith
id hits public comment
180 175939 Large unwikified new article
98 39401 Creating very short new article
maintenance
id hits public comment
577 1566 VisualEditor bugs: Strange icons
345 13832 Extraneous formatting from browser extension
942 1573 Log edits to protected pages