``Abuse Filter is enabled'' reads the title of one of the eight stories of the March 23rd 2009 issue of English Wikipedia's community newspaper, The Signpost~\cite{Signpost2009}.
``Abuse Filter is enabled'' reads the title of one of the eight stories of the 23 March 2009 issue of English Wikipedia's community newspaper, The Signpost~\cite{Signpost2009}.
``The extension allows all edits to be checked against automatic filters and heuristics, which can be set up to look for patterns of vandalism including page move vandalism and juvenile-type vandalism, as well as common newbie mistakes,'' the article proclaims.
The extension, or at least its end user facing parts, was later renamed to ``edit filter'' in order to not characterise false positives as ``abuse'' and thus alienate good faith editors striving to improve the encyclopedia~\cite{Wikipedia:EditFilter},~\cite{Wikipedia:EditFilterTalkArchiveNameChange}.
...
...
@@ -48,7 +48,7 @@ We discuss (who is in) the edit filter manager group in section~\ref{section:who
For illustration purposes, let us have a closer look at what a single edit filter looks like.
Edit filter with ID 365 is public
\footnote{There are also private (hidden) filters. The distinction is discussed in more detail in sections~\ref{section:4-history} and \ref{sec:public-hidden}.}
and currently enabled (as of June 30th 2019).
and currently enabled (as of 30 June 2019).
This means the filter is working and everyone interested can view the filter's details.
Its description reads ``Unusual changes to featured or good content''.
The filter pattern is:
...
...
@@ -72,7 +72,7 @@ For that they need to have been registered for at least four days and have made
yet tries to edit a page in the article namespace which contains ``Featured'' or ``Good article'' and they either insert a redirect, delete 3/4 of the content or add 3/4 on top, the edit is automatically disallowed.
Note that an edit filter editor can easily change the action of the filter. (Or the pattern, as a matter of fact.)
The filter was last modified on October 23rd 2018.
The filter was last modified on 23 October 2018.
All these details can be viewed on the filter's detailed page~\cite{Wikipedia:EditFilter365}
or on the screenshot thereof (figure~\ref{fig:filter-details}) that I created for convenience.
...
...
@@ -107,7 +107,7 @@ Every update of a filter action, pattern, comments or other flags (whether the f
And every time a filter matches, the editor's action that triggered it as well as further data such as the user who triggered the filter, their IP address, a diff of the edit (if it was an edit), a timestamp, the title of the page the user was looking at, etc. are logged in \emph{abuse\_filter\_log}.
Most frequently, edit filters are triggered upon new edits, there are however further editor's actions that can trip an edit filter.
As of June 30th 2019, these include: \emph{edit}, \emph{move}, \emph{delete}, \emph{createaccount}, \emph{autocreateaccount}, \emph{upload}, \emph{stashupload}\footnote{See line 181 in \url{https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/AbuseFilter/+/refs/heads/master/includes/special/SpecialAbuseLog.php}}.
As of 30 June 2019, these include: \emph{edit}, \emph{move}, \emph{delete}, \emph{createaccount}, \emph{autocreateaccount}, \emph{upload}, \emph{stashupload}\footnote{See line 181 in \url{https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/AbuseFilter/+/refs/heads/master/includes/special/SpecialAbuseLog.php}}.
%TODO explain what the actions are, especially the less obvious ones such as `autocreateaccount'
Historically, further editor's actions such as \emph{feedback}, \emph{gatheredit} and \emph{moodbar} could trigger an edit filter.
These are in the meantime deprecated. %TODO explain why? I have the guess that these are not available in the software anymore (generally, not only for the edit filters)
...
...
@@ -277,8 +277,8 @@ A discussion is held there, usually for 7 days, before a decision is reached~\ci
As of 2017, when the ``edit filter helper'' group was introduced (editors in this group have the \emph{abusefilter-view-private} permission)~\cite{Wikipedia:EditFilterHelper},
the usual process seems to be that editors request this right first and only later the full \emph{abusefilter-modify} permissions\footnote{That is the tendency we observe at the Edit filter noticeboard~\cite{Wikipedia:EditFilterNoticeboard}.}.
According to the edit filter managers list for the EN Wikipedia~\cite{Wikipedia:EditFilterManagersList}, as of May 10, 2019 there are 154 users in this group
\footnote{For comparison, as of March 9, 2019 there are 1181 admins~\cite{Wikipedia:Admins}. The role does not exist at all on the German, Spanish and Russian Wikipedias where all administrators have the \emph{abusefilter\_modify} permission~\cite{Wikipedia:EditFilterDE}, \cite{Wikipedia:EditFilterES}, \cite{Wikipedia:EditFilterRU}.}.
According to the edit filter managers list for the EN Wikipedia~\cite{Wikipedia:EditFilterManagersList}, as of 10 May 2019 there are 154 users in this group
\footnote{For comparison, as of 9 March 2019 there are 1181 admins~\cite{Wikipedia:Admins}. The role does not exist at all on the German, Spanish and Russian Wikipedias where all administrators have the \emph{abusefilter\_modify} permission~\cite{Wikipedia:EditFilterDE}, \cite{Wikipedia:EditFilterES}, \cite{Wikipedia:EditFilterRU}.}.
Out of the 154 edit filter managers only 11 are not administrators (most of them have other privileged groups such as ``rollbacker'', ``pending changes reviewer'', ``extended confirmed user'' and similar though).
The edit filter managers group is quite stable, with only 4 users who have become an edit filter manager since November 2016 (according to the archives of the edit filter noticeboard where the permission is requested)~\cite{Wikipedia:EditFilterNoticeboard}.
@@ -14,7 +14,7 @@ Section~\ref{sec:patterns} studies characteristics of the edit filters in genera
\section{Data}
\label{sec:overview-data}
A big part of the present analysis is based upon the \emph{abuse\_filter} table from \emph{enwiki\_p} (the database which stores data for the EN Wikipedia), or more specifically a snapshot thereof which was downloaded on January 6th, 2019 via quarry, a web-based service offered by Wikimedia for running SQL queries against their public databases~\footnote{\url{https://quarry.wmflabs.org/}}.
A big part of the present analysis is based upon the \emph{abuse\_filter} table from \emph{enwiki\_p} (the database which stores data for the EN Wikipedia), or more specifically a snapshot thereof which was downloaded on 6 January 2019 via quarry, a web-based service offered by Wikimedia for running SQL queries against their public databases~\footnote{\url{https://quarry.wmflabs.org/}}.
The complete dataset can be found in the repository for the present paper~\cite{gitlab}.
This table, along with \emph{abuse\_filter\_actions}, \emph{abuse\_filter\_log}, and \emph{abuse\_filter\_history}, are created and used by the AbuseFilter MediaWiki extension~(\cite{gerrit-abusefilter-tables}), as discussed in section~\ref{sec:mediawiki-ext}.
...
...
@@ -150,7 +150,7 @@ The scripts that generate the statistics discussed here, can be found in the jup
\subsection{General traits}
As of January 6th, 2019 there are $954$ filters in the \emph{abuse\_filter} table.
As of 6 January 2019 there are $954$ filters in the \emph{abuse\_filter} table.
It should be noted, that if a filter gets deleted, merely a flag is set to indicate so, but no entries are removed from the database.
So, the above mentioned $954$ filters are all filters ever made up to this date.
This doesn't mean that it never changed what the single filters are doing, since edit filter managers can freely modify filter patterns, so at some point a filter could be doing one thing and in the next moment it can be filtering a completely different phenomenon.
...
...
@@ -255,7 +255,7 @@ The detailed distribution of manually assigned codes and their parent categories
\subsection{Who trips filters}
As of March 15, 2019 $16,489,266$ of the filter hits were caused by IP users, whereas logged in users had matched an edit filter's pattern $6,984,897$ times.
As of 15 March 2019 $16,489,266$ of the filter hits were caused by IP users, whereas logged in users had matched an edit filter's pattern $6,984,897$ times.
A lot of the logged in users have newly created accounts (many filters look for newly created, or respectively, not confirmed accounts in their pattern).
%TODO look how many filters are checking for ``!(""confirmed"" in user_groups)''
...
...
@@ -419,7 +419,7 @@ On the other hand, when we look at the ten most active filters of all times (see
Another area in which filters are active are various types of blankings (mostly by new users) where the filters issue warnings pointing towards possible alternatives the editor may want to achieve or the proper procedure for deleting articles for instance. % this wasn't foreseen either, right?
The table also shows that the mechanism ended up being quite active in preventing silly (e.g. inserting series of repeating characters) or profanity vandalism.
Interestingly, that is not what the developers of the extension believed it was going to be good for:
``It is not, as some seem to believe, intended to block profanity in articles (that would be extraordinarily dim), nor even to revert page-blankings, '' claimed its core developer on July 9th 2008~\cite{Wikipedia:EditFilterTalkArchive1Clarification}.
``It is not, as some seem to believe, intended to block profanity in articles (that would be extraordinarily dim), nor even to revert page-blankings, '' claimed its core developer on 9 July 2008~\cite{Wikipedia:EditFilterTalkArchive1Clarification}.
A further assumption that didn't carry into effect was that ``filters in this extension would be triggered fewer times than once every few hours''~\cite{Wikipedia:EditFilterTalkArchive1}.
\footnote{Here, by ``trigger'' is meant that an editor's action will match a filter's pattern and set off the configured filter's action(s).}.