\chapter{Descriptive Overview of Edit Filters on the English Wikipedia} \label{chap:overview-en-wiki} After tracing the debate surrounding the introduction of edit filters and piecing together how they work and what their supposed purpose was in chapter~\ref{chap:filters}, here I explore the edit filters currently existent on the English Wikipedia. I want to gather a understanding of what types of tasks these filters take over, in order to compare them to the declared aim of the filters, and, as far as feasible, trace how these tasks have evolved over time. The data upon which the analysis is based is described in section~\ref{sec:overview-data} and the methods used—in chapter~\ref{chap:methods}. The manual classification of EN Wikipedia's edit filters I've undertaken in an attempt to understand what is it that they actually filter is presented in section~\ref{sec:manual-classification}. Section~\ref{sec:patterns} studies characteristics of the edit filters in general, whereas their activity is analysed in section~\ref{sec:filter-activity}. \section{Data} \label{sec:overview-data} A big part of the present analysis is based upon the \emph{abuse\_filter} table from \emph{enwiki\_p} (the database which stores data for the EN Wikipedia), or more specifically a snapshot thereof which was downloaded on 6 January 2019 via quarry, a web-based service offered by Wikimedia for running SQL queries against their public databases~\footnote{\url{https://quarry.wmflabs.org/}}. The complete dataset can be found in the repository for the present paper~\cite{gitlab}. This table, along with \emph{abuse\_filter\_actions}, \emph{abuse\_filter\_log}, and \emph{abuse\_filter\_history}, are created and used by the AbuseFilter MediaWiki extension~(\cite{gerrit-abusefilter-tables}), as discussed in section~\ref{sec:mediawiki-ext}. Selected queries have been run via quarry against the \emph{abuse\_filter\_log} table as well. These are the foundation for the filters activity analysis undertaken in section~\ref{sec:filter-activity}. Unfortunately, the \emph{abuse\_filter\_history} table which will be necessary for a complete historical analysis of the edit filters is currently not exposed to the public due to security/privacy concerns~\cite{phabricator} \footnote{A patch was submitted to Wikimedia's operations repository where the replication scripts for all publicly exposed databases are hosted~\cite{gerrit-tables-replication}. It is in a process of review, so hopefully, historical filter research will be possible in the future.}. A comprehensive historical analysis is therefore one of the directions for future research discussed in section~\ref{sec:further-studies}. A concise description of the tables has been offered in section~\ref{sec:mediawiki-ext} which discusses the AbuseFilter MediaWiki extension in more detail. For further reference, the schemas of all four tables can be viewed in figures~\ref{fig:app-db-schemas-af},~\ref{fig:app-db-schemas-afl},~\ref{fig:app-db-schemas-afh} and~\ref{fig:app-db-schemas-afa} in the appendix. %TODO incorporate this here \begin{comment} \cite{Geiger2014} "the idea that Wikipedia only takes place on wiki- pedia.org – or even entirely on the Internet – is a huge misunderstanding (Konieczny, 2009; Reagle, 2010). Wikipedia is not a virtual world, especially one located entirely on the wiki." e.g. in order to get hold of \emph{abuse\_filter\_history} I had to engage with - wikipedia.org - mediawiki.org - irc channels - phabricator - gerrit - toolserver/cloudservices ---- other spaces Wikipedia takes place - mailinglists - WomenEdit/offenes Editieren @Wikimedia - Wikimania - Wikimedia's office and daily work \end{comment} \section{Types of Edit Filters: Manual Classification} \label{sec:manual-classification} In order to get a better understanding of what exactly it is that edit filters are filtering, I applied emergent coding (see section~\ref{sec:gt}) to all filters, scrutinising their names, patterns, comments, and actions. Three big clusters of codes were identified, namely ``vandalism'', ``good faith'', and ``maintenance'', as well as the auxiliary cluster ``unknown''. These are discussed in more detail later in this section, but first the coding itself is presented. \subsection{Coding Process and Challenges} As already mentioned, I applied emergent coding on the dataset from the \emph{abuse\_filter} table and let the labels originate directly from the data. I looked through the data paying special attention to the name of the filters (``af\_public\_comments'' field of the \emph{abuse\_filter} table), the comments (``af\_comments''), the pattern constituting the filter (``af\_pattern''), and the designated filter actions (``af\_actions''). The assigned codes emerged from the data: some of them being literal quotes of terms used in the description or comments of a filter, while others summarised the perceived filter functionality. In addition to that, for vandalism related labels, I used some of the vandalism types elaborated by the community in~\cite{Wikipedia:VandalismTypes}. However, this typology was regarded more as an inspiration instead of being adopted 1:1 since some of the types were quite general whereas more specific categories seemed to render more insights. For instance, I haven't applied the ``addition of text'' category since it seemed more useful to have more specific labels such as ``hoaxing'' or ``silly\_vandalism'' (check the code book in the appendix~\ref{app:code_book} for definitions). Moreover, I found some of the proposed types redundant. For example, ``sneaky vandalism'' seems to overlap partially with ``hoaxing'' and partially with ``sockpuppetry'', and for some reason, ``personal attacks'' are listed twice. Based on the emergent coding method described in section~\ref{sec:gt}, I have labeled the dataset twice. I let potential labels emerge during the first round of coding. Then, I scrutinised them, merging labels that seemed redundant and letting the most descriptive code stay. At the same time, the codes were also sorted and unified into broader categories which seemed to relate the single labels to each other. Thereby, a code book with the conclusive codes was defined (see appendix~\ref{app:code_book}). Subsequently, I labeled the whole dataset again using the code book. Unfortunately, the validation steps proposed by the method could not be realised, since no second researcher was available for the labeling. This is one of the limitations discussed in section~\ref{sec:limitations}, and respectively something that can and should be remedied in future research. %1st labeling Following challenges were encountered during the first round of labeling: There were some ambiguous cases which I either tagged with the code I deemed most appropriate and a question mark, or assigned all possible labels (or both). There were also cases for which I could not gather any insight relying on the name, comments and pattern, since the filters were hidden from public view and the name was not descriptive enough. However, upon some further reflection, I think it is safe to assume that all hidden filters target a form of (more or less grave) vandalism, since the guidelines suggest that filters should not be hidden in the first place unless dealing with cases of persistent and specific vandalism where it could be expected that the vandalising editors will actively look for the filter pattern in their attempts to circumvent the filter\cite{Wikipedia:EditFilter}. Therefore, during the second round of labeling I tagged all hidden filters for which there weren't any more specific clues (for example in the name of the filter) as ``hidden\_vandalism''. And then again, there were also cases, not necessarily hidden, where I could not determine any suitable label, since I didn't understand the pattern, and/or none of the existing categories seemed to fit, and/or I couldn't think of an insightful new category to assign. During the first labeling, these were labeled ``unknown'', ``unclear'' or ``not sure''. For the second round, I have unified all of them under ``unclear''. For a number of filters, it was particularly difficult to determine whether they were targeting vandalism or good faith edits. The only thing that would have distinguished between the two would have been the contributing editor's motivation, which no one could have known (but the editor in question themself). During the first labeling session, I tended to label such filters with ``vandalism?, good\_faith?''. For the second labeling, I stuck to the ``assume good faith'' guideline~\cite{Wikipedia:GoodFaith} myself and only labeled as vandalism cases where good faith was definitely out of the question. One feature which guided me here was the filter action which represents the judgement of the edit filter manager(s). Since communication is crucial when assuming good faith, all ambiguous cases which have a less ``grave'' filter action such as ``tag'' or ``warn'' (which seeks to give feedback and thereby effect/influence a constructive contribution) have received a ``good\_faith'' label. On the other hand, filters set to ``disallow'' were tagged as ``vandalism'' or a particular type thereof, since the filter action is a clear sign that at least the edit filter managers have decided that seeking a dialog with the offending editor is no longer an option. For the second round of labeling, I tagged the whole dataset again using the compiled code book (see \ref{app:code_book}) and assigned to every filter exactly one label—the one deemed most appropriate (although oftentimes alternative possibilites were listed as notes), without looking at the labels I assigned the first time around. I intended to compare the labels from both coding sessions and focus on more ambiguous cases, re-evaluting them using all available information (patterns, public comments, labels from both sessions, as well as any notes I made along the line). Unfortunately, time was scarce, so the analysis of the present section is based upon the second round of labeling. Comparing codes from both labeling sessions and refining the coding, or respectively have another person label the data should be done in the future. The datasets developed during both labeling sessions are available in project's repository~\cite{gitlab}. As signaled at the beginning of the section, following four parent categories of codes were identified: ``vandalism'', ``good faith'', ``maintenance'', and ``unknown''. The subsections that follow discuss the salient properties of each of them. \begin{comment} % Kept as a possible alternative wording for private vs public and labeling decisions in ambiguous cases It was not always a straightforward decision to determine what type of edits a certain filter is targeting. This was of course particularly challenging for private filters where only the public comment (name) of the filter was there to guide the coding. On the other hand, guidelines state up-front that filters should be hidden only in cases of particularly persistent vandalism, in so far it is probably safe to establish that all hidden filters target some type of vandalism. However, the classification was difficult for public filters as well, since oftentimes what makes the difference between a good-faith and a vandalism edit is not the content of the edit but the intention of the editor. While there are cases of juvenile vandalism (putting random swear words in articles) or characters repetiton vandalism which are pretty obvious, that is not the case for sections or articles blanking for example. For these, from the edit alone there is no way of knowing whether the deletion was malicious or the editor conducting it just wasn't familiar with say the correct procedure for moving an article. \end{comment} \subsection{Vandalism} The vast majority of edit filters on EN Wikipedia could be said to target (different forms of) vandalism, i.e. maliciously intended disruptive editing (or other activity)~\cite{Wikipedia:Vandalism}. Some examples thereof are filters for juvenile types of vandalism (inserting swear or obscene words or nonsence sequences of characters into articles), for hoaxing (inserting obvious or less obvious false information in articles), for template vandalism (modifying a template in a disruptive way which is quite severe, since templates are displayed on various pages), or for spam (inserting links to promotional content, often not related to the content being edited). %TODO stick to one terminology; no "juvenile" vandalism otherwise All codes belonging to the vandalism category together with a definition and examples can be consulted in the code book attached in the appendix~\ref{app:code_book}. Some vandalism types seem to be severer than others (e.g. sock puppetry \footnote{Sock puppetry denotes the creation and employment of several accounts for various purposes such as pushing a point of view, or circumventing bans. For more information, see the code book in the appendix~\ref{app:code_book}} or persistent long term vandals). It is mostly in these cases that the implemented filters are hidden. Labels refering to such types of vandalism form their own subcategory: ``hardcore vandalism''. %TODO think about naming It should be mentioned at this point that I also classified ``harassment'' and ``personal attacks'' as ``hardcore vandalism'', since these types of edits are highly harmful and often dealt with by hidden filters, although according to~\cite{Wikipedia:Vandalism} both behaviours are disruptive editing rather than vandalism and should generally be handled differently. \subsection{Good Faith} The second biggest category identified were filters targeting (mostly) disruptive but not necessarily made with bad intentions edits. The adopted name ``good faith'' is a term utilised by the Wikipedia community itself, most prominently in the guideline ``assume good faith''~\cite{Wikipedia:GoodFaith}. Filters from this category are frequently aimed at unconstructive edits done by new editors, not familiar with syntax, norms, or guidelines which results in broken syntax, disregard of established processes (e.g. deleting something without running it through an Articles for Deletion process, etc.) or norms (e.g. copyright violations), or unencyclopedic edits (e.g. without sources/with improper sources; badly styled; or with a skewed point of view). The focus of these filters lies in the communication with the disrupting editors: a lot of the filters issue warnings intending to guide the editors towards ways of modifying their contribution to become a constructive one (compare with section~\ref{sec:filters-external}). Codes from this category often take into consideration the area the editor was intending to contribute to or respectively that they (presumably) unintentionally disrupted. \subsection{Maintenance} Some of the encountered edit filters on the EN Wikipedia were targeting neither vandalism nor good faith edits. Rather, they had their focus on (semi-)automated routine (clean up) tasks. These filters form the ``maintenance'' category. Some of them target for instance bugs like broken syntax caused by a faulty browser extension. Or there are such which simply track particular behaviours (such as mobile edits or edits made by unflagged bots) for various purposes. The ``maintenance'' category differs conceptually from the ``vandalism'' and ``good faith'' ones in so far that the logic behind it isn't editors' intention, but rather ``side''-occurances that mostly went wrong. I've also grouped here various test filters (used by individual editors or jointly used by all editors). \subsection{Unknown} This is an auxiliary category comprising the ``unknown'' and ``misc'' codes %TODO allign with code book, right now there are 3 tags in the unknown cluster used to code all filters where the functionality stayed completely opaque for the observer, or, although it was comprehensible what the filter was doing, still no better fitting label emerged. %************************************************************ %\section{Descriptive statistics/Patterns/General traits of the filters} \section{Filter Characteristics} \label{sec:patterns} This section explores some general features of the edit filters on Engish Wikipedia based on the data from the \emph{abuse\_filter} table. The scripts that generate the statistics discussed here, can be found in the jupyter notebook in the project's repository~\cite{gitlab}. \subsection{General Traits} As of 6 January 2019 there are $954$ filters in the \emph{abuse\_filter} table. It should be noted, that if a filter gets deleted, merely a flag is set to indicate so, but no entries are removed from the database. So, the above mentioned $954$ filters are all filters ever made up to this date. This doesn't mean that it never changed what the single filters are doing, since edit filter managers can freely modify filter patterns, so at some point a filter could be doing one thing and in the next moment it can be filtering a completely different phenomenon. There are cases of filters being ``repurposed'' or modified to filter for example a more general occurance. This doesn't happen very often though. Mostly, if a filter is not useful anymore, it is just disabled and eventually deleted and new filters are implemented for current problems. $361$ of all filters are public, the remaining $593$—hidden. $110$ of the public ones are active, $35$ are disabled, but not marked as deleted, and $216$ are flagged as deleted. Out of the $593$ hidden filters $91$ are active, $118$ are disabled (not deleted), and $384$ are deleted. The relative proportion of these groups to each other can be viewed on figure~\ref{fig:general-stats}. \begin{figure} \centering \includegraphics[width=0.9\columnwidth]{pics/general-stats-donut.png} \caption[Overview of active, disabled and deleted filters on EN Wikipedia]{There are 954 edit filters on EN Wikipedia: roughly 21\% of them are active, 16\% are disabled, and 63\% are deleted}~\label{fig:general-stats} \end{figure} \subsection{Public and Hidden Filters} \label{sec:public-hidden} As signaled in section~\ref{section:4-history}, historically it was planed to make all edit filters hidden from the general public. The community discussions rebutted that so a guideline was drafted calling for hiding filters ``only where necessary, such as in long-term abuse cases where the targeted user(s) could review a public filter and use that knowledge to circumvent it.''~\cite{Wikipedia:EditFilter}. This is however not always complied with and edit filter managers do end up hiding filters that target general vandalism despite consensus that these should be public~\cite{Wikipedia:PrivacyGeneralVandalism}. Such cases are usually made public eventually (examples hereof are filters 225 ``Vandalism in all caps'', 260 ``Common vandal phrases'', or 12 ``Replacing a page with obscenities''). Also, oftentimes when a hidden filter is marked as ``deleted'', it is made public. Further, caution in filter naming is suggested for hidden filters and editors are encouraged to give such filters just simple description of the overall disruptive behaviour rather than naming a specific user that is causing the disruptions. (The latter is not always complied with, there are indeed filters named after the accounts causing a disruption.) Still, it draws attention that currently nearly $2/3$ of all edit filters are not viewable by the general public (compare figure~\ref{fig:general-stats}). Unfortunately, without the full \emph{abuse\_filter\_history} table there is no way to know how this ration has developed historically. However, the numbers fit the assertion of the extension's core developer according to whom edit filters target particularly determined vandals (filters aimed at whom are, as a general rule, hidden in order to make circumvention more difficult). On the other hand, if we look at the enabled filters only, there are actually more or less the same number of public enabled and hidden enabled filters ($110$ vs. $91$). %TODO this is a kind of an interpretation. Take it out here and put it in the fazit of the chapter? This leads to the hypothesis that it is rather that hidden filters have higher fluctuation rates, i.e. that they target specific phenomena that are over after a particular period of time after which the filters get disabled and eventually—deleted. This again makes sense when compared to the hidden vs. public filter policy: hidden filters for particular cases and very determined vandals, public filters for general patterns which reflect more timeless patterns. \subsection{Filter Actions} Another interesting parameter observed here are the currently configured filter actions for each filter. Figure~\ref{fig:all-active-filters-actions} depicts the actions set up for all enabled filters. And figures~\ref{fig:active-public-actions} and~\ref{fig:active-hidden-actions} show the actions of all enabled public and hidden filters respectively. It is noticeable that the most common action for the enabled hidden filters is ``disallow'' whereas most enabled public filters are set to ``tag'' or ``tag,warn''. This is congruent with the community's claim that hidden filters target particularly perstistent vandalism, which is best outright disallowed. A lot of public filters on the other hand still assume good faith from the editors and try to dissuade them from engaging in disruptive behaviour by using warnings or just tag conspicious behaviour for further investigation. \begin{figure} \centering \includegraphics[width=1\columnwidth]{pics/all-active-filters-actions.png} \caption{EN Wikipedia edit filters: Filters actions for all filters}~\label{fig:all-active-filters-actions} \end{figure} \begin{figure} \centering \includegraphics[width=1\columnwidth]{pics/active-public-actions-big.png} \caption{EN Wikipedia edit filters: Filters actions for enabled public filters}~\label{fig:active-public-actions} \end{figure} \begin{figure} \centering \includegraphics[width=1\columnwidth]{pics/active-hidden-actions-big.png} \caption{EN Wikipedia edit filters: Filters actions for enabled hidden filters}~\label{fig:active-hidden-actions} \end{figure} \subsection{What Do Filters Target} \label{sec:what-do-filters-target} This section examines in detail the results of the manual tagging of the filters according to their perceived functionality described in section~\ref{sec:manual-classification}. As figures~\ref{fig:manual-tags-all} and \ref{fig:manual-tags-active} demonstrate, the majority of filters seem to target vandalism (little surprise here). The second biggest category comprise the ``good faith'' filters, while ``maintenance'' and ``unknown'' filters make up a relatively small part of the total number of filters. The proportion of vandalism related filters is higher when all filters are considered and not just the enabled ones. Again, this is probably due to the presumed higher fluctuation rates of hidden filters which (according to my labeling, see section~\ref{sec:manual-classification} for rationale) are always vandalism related. It also comes to attention that the relative share of maintenance related filters is higher when all filters are regarded. The detailed distribution of manually assigned codes and their parent categories can be view on figure~\ref{fig:manual-tags}. %TODO make these two subfigures of the same figure \begin{figure} \centering \includegraphics[width=0.9\columnwidth]{pics/donut-manual-tags-all-t20b.png} \caption{Manual tags parent categories distribution: all filters}~\label{fig:manual-tags-all} \end{figure} \begin{figure} \centering \includegraphics[width=0.9\columnwidth]{pics/donut-manual-tags-active-t20b.png} \caption{Manual tags parent categories distribution: enabled filters (January 2019)}~\label{fig:manual-tags-active} \end{figure} \begin{landscape} \begin{figure} \centering \includegraphics[width=1\columnwidth]{pics/manual-tags-distribution-t20b.png} \caption{Edit filters manual tags distribution}~\label{fig:manual-tags} \end{figure} \end{landscape} Another feature explored was the explicit targeting of not confirmed users (see table~\ref{tab:newbie-filters}). It arrests attention that various filters have what the edit filter managers have dubbed ``the newbie check'': \verb|!("confirmed" in user_groups)| as one of their first conditions. There are in total $43$ such filters, $26$ of them are enabled as of January 2019 (so they make up approximately 20\% of all enabled filters at the time) and $9$ of the enabled filters disallow the edit directly when matched. \begin{table*}[h] \centering \begin{tabular}{p{1cm} p{9cm} r p{2cm} } % \toprule Filter ID & Publicly available description & Hitcount & Actions \\ \hline 61 & New user removing references & 1611956 & tag \\ 384 & Addition of bad words or other vandalism & 1159239 & disallow\\ 30 & Large deletion from article by new editors & 840871 & warn,tag\\ 636 & Unexplained removal of sourced content & 726764 & warn\\ 3 & New user blanking articles & 700522 & warn,tag\\ 432 & Starting new line with lowercase letters & 558578 & warn,tag\\ 225 & Vandalism in all caps & 482872 & disallow\\ 50 & Shouting & 480960 & warn,tag\\ 231 & Long string of characters containing no spaces & 380302 & warn,tag\\ 46 & "Poop" vandalism & 356945 & disallow\\ 39 & School libel and vandalism & 150568 & warn,tag\\ 11 & You/He/She/It sucks & 109657 & warn,tag\\ 680 & Adding emoji unicode characters & 95242 & disallow\\ 365 & Unusual changes to featured or good content & 85470 & disallow\\ 126 & Youtube links & 65137 & log only\\ 803 & Prevent new users from editing other's user pages & 46756 & disallow\\ 117 & removal of Category:Living people & 43822 & tag\\ 113 & Misplaced \#redirect in articles & 20885 & warn,tag\\ 59 & New user removing templates on image description & 19938 & tag\\ 655 & Large plot section addition & 16051 & tag\\ 784 & Harambe vandalism & 9265 & disallow\\ 912 & Possible "fortnite" vandalism & 7505 & warn,tag\\ 860 & Ryan Ross vandalism & 3451 & disallow\\ 766 & Alt-right labeling & 1866 & warn,tag\\ 921 & Suspicious claims of nazism & 1422 & tag\\ 843 & Prevent new users from creating redirects to [[Donald Trump]] & 98 & disallow\\ \end{tabular} \caption{Filters aimed at unconfirmed users}~\label{tab:newbie-filters} \end{table*} \subsection{Who Trips Filters} As of 15 March 2019 $16,489,266$ of the filter hits were caused by IP users, whereas logged in users had matched an edit filter's pattern $6,984,897$ times. A lot of the logged in users have newly created accounts (many filters look for newly created, or respectively, not confirmed accounts in their pattern). %TODO look how many filters are checking for ``!(""confirmed"" in user_groups)'' %TODO this here is an interpretation; decide what to do with it A user who just registered an account (or who doesn't even bother to) is rather to be expected to be inexperienced with Wikipedia, not familiar with all policies and guidelines and perhaps nor with MediaWiki syntax. It also sounds plausible that majority of vandalism edits come from the same type of newly/recently registered accounts. In general, it is rather unlikely that an established Wikipedia editor should at once jeopardise the encyclopedia's purpose and start vandalising. Although apparently there are determined trolls who ``work accounts up'' to admin and then run rampant. %TODO mention and discuss that filters discriminate towards new users: ``!(""confirmed"" in user_groups)'' is the first condition for a lot of them \section{Filter Activity} \label{sec:filter-activity} This section explores filter activity from two perspectives: It looks into the numbers of filter hits per month in~\ref{sec:filter-hits-month} and discusses the most active filters over the years in~\ref{sec:most-active-filters}. \subsection{Filter Hits per Month} \label{sec:filter-hits-month} The number of filter hits per month over the years can be backtracked on figure~\ref{fig:filter-hits}. There is a dip in the number of hits in late 2014 and quite a surge in the beginnings of 2016, after which the overall number of filter hits stayed higher. There is also a certain periodicity to the graph, with smaller dips in the northern hemisphere's summer months (June, July, August) and smaller peaks in autumn/winter (mostly October/November). This tendency is not observed for the overall number of edits (see figure~\ref{fig:edits-development}). Apparently, above all editors tripping filters are on vacation in June, July and August. Further, it is interesting to break down filter activity according to the types determined via the manual tagging (see section~\ref{sec:manual-classification}): The corresponding distribution is shown in figure~\ref{fig:filter-hits-manual-tags}. On the one hand, it demonstrates above all a surge in the hits of filters targeting vandalism in 2016. On the other hand, another, somewhat subtler trend emerges: In the first years following the introduction of the mechanism, good faith filters were matched most frequently. This changed around the end of 2012 and since then the most hits are marked by vandalism filters. \begin{figure} \centering \includegraphics[width=1\columnwidth]{pics/filter-hits-zoomed.png} \caption{EN Wikipedia edit filters: Hits per month}~\label{fig:filter-hits} \end{figure} \begin{figure} \centering \includegraphics[width=1\columnwidth]{pics/filter-hits-manual-tags.png} \caption{EN Wikipedia edit filters: Hits per month according to manual tags}~\label{fig:filter-hits-manual-tags} \end{figure} \begin{figure} \centering \includegraphics[width=1\columnwidth]{pics/reverts.png} \caption{EN Wikipedia: Reverts for July 2001–April 2017}~\label{fig:reverts} \end{figure} Regarding the hits surge and subsequent higher hit numbers, three possible explanations come to mind: \begin{enumerate} \item the filter hits mirror the overall edits pattern from this time; \item there was a general rise in vandalism in this period; \item or there was a change in the edit filter software that allowed more filters to be activated, or a bug that caused the peak (in the form of a lot of false positives). \end{enumerate} I've undertaken following steps in an attempt to verify or refute each of these speculations:\\ \\ \textbf{The filter hits mirror the overall edits pattern from this time} \\ I've compared the filter hits pattern with the overall number of edits of the time (May 2015–May 2016). No correspondance could be determined (see figure~\ref{fig:edits-development}). \\ \\ \textbf{There was a general rise in vandalism in this period}\\ This assumption is supported by the peak in the hits of vandalism related filters end 2015–beginning 2016 observed in figure~\ref{fig:filter-hits-manual-tags}. In order to verify it, a comparison of the filters' hits patterns with revert patterns of other quality control mechanisms seems logical. Unfortunately, computing these numbers is time-consuming and not completely trivial. One needs a dump of English Wikipedia's edit history data for the period in question; then one has to determine the reverts in this data set (e.g. by using the \emph{mwreverts} python library); and then, more specifically, one needs to extract reverts done by quality control actors. Last step is crucial, since not every revert signifies a malicious edit is being reverted. This point is aptly illustrated by~\cite{GeiHal2017} who have demonstrated that reverts can mean productive collaborative work between different agents. The dumps are large and it takes time and computing power to obtain them and extract reverts. According to Geiger and Halfaker who have done this for their replication study~\cite{GeiHal2017}, the April 2017 database dump offered by the Wikimedia Foundation was 93GB compressed and it took a week to extract reverts out of it on a 16 core Xeon workstation. They also list the challenges they faced in determining bot accounts and their reverts. Since time was scarce, I have run a first check of this assumption using the 2017 reverts dataset compiled by Geiger and Halfaker's for their study \footnote{Both researchers have placed a great value on reproducibility and have published their complete datasets, as well as scripts they used for their analyses for others to use and verify: \url{https://github.com/halfak/are-the-bots-really-fighting}.}. The dataset is old, but still sufficient for scrutinising events at the beginning of 2016. Figure~\ref{fig:reverts} shows the total number of reverts, as well as reverts done by bots over time computed by Geiger and Halfaker. The filter hits pattern of 2015–2016 with the peak in filter hits and subsequent higher number of overall hits is not mirrored by the revert numbers \footnote{Just for completenes, the spike in March 2013 is the batch action by AddBot removing interwiki links, since these were handled by Wikidata discussed in the introduction of Geiger and Halfaker's paper. It didn't have anything to do with vandalism.} (note that the y-axis of both the revert and the filter hit plots is of the same magnitude). As cautioned earlier, not every revert can be equated with cleaning up a disruptive edit, however, figure~\ref{fig:reverts} demonstrates that either quality control reverts constitute a relatively small portion of all reverts being done, or that there wasn't a general surge in vandalism around this time. (Or that only vandalism caught by filters peaked, which sounds somewhat improbable.) \\ \\ \textbf{There was a change in the edit filter software that allowed more filters to be activated, or a bug that caused false positives}\\ Since so far neither of the other hypothesis could be verified, this explanation sounds likely. Another piece of data that seems to support it is the breakdown of the filter hits according to triggered filter action. As demonstrated on figure~\ref{fig:filter-hits-actions}, there was above all a significant hits peak caused by ``log only'' filters. As discussed in section~\ref{sec:introduce-a-filter}, it is an established praxis to introduce new filters in ``log only'' mode and only switch on additional filter actions after a monitoring period showed that the filters function as intended. Hence, it is plausible that new filters in logging mode were introduced, which were then switched off after a significant number of false positives occured. However, upon closer scritiny, this could not be confirmed. The filters with greatest number of hits in the period January–March 2016 are mainly the most triggered filters of all times and nearly all of them have been around for a while in 2016. Also, no bug or a comparable incident with the software was found upon an inspection of the extension's issue tracker~\cite{phab-abusefilter-2015}, or commit messages of the commits to the software done during May 2015–May 2016~\cite{gerrit-abusefilter-source}. Moreover, no mention of the hits surge was found in the noticeboard~\cite{Wikipedia:EditFilterNoticeboard} and edit filter talk page archives~\cite{Wikipedia:EditFilterTalkArchive2016}. The in section~\ref{sec:filter-activity} mentioned condition limit has not changed either, as far as I can tell from the issue tracker, the commits and discussion archives, so the possible explanation that simply more filters have been at work since 2016 seems to be refuted as well. The only somewhat interesting pattern that seems to shed some light on the matter is the breakdown of hits according to the editor's action which triggered them: There is an obvious surge in the attempted account creations in the period November 2015–May 2016 (see figure~\ref{fig:filter-hits-editors-actions}). As a matter of fact, this could also be the explanation for the peak of log only hits—the most frequently tripped filter for the period January–March 2016 is filter 527 ``T34234: log/throttle possible sleeper account creations''. It is a throttle filter, with no further actions enabled, so everytime an edit matches its pattern, a ``log only'' entry is created in the abuse log. %it disallows every X attempt, only logging the rest of the account creations. %I think in its current form, it does not actually disallow anything, a ``disallow'' action should be enabled for this and the filter action is only 'throttle'; so in this form, it seems to simply log account creations And the 3rd most active filter is a ``log only'' filter as well: 650 ``Creation of a new article without any categories''. (It was neither introduced at the time, nor was there any major change in the filter pattern.) Together, filters 527 and 650 are responsible for over 60\% of the ``log only'' hits in every of the months January, February and March 2016. Another idea that seemed worth persuing was to look into the editors who tripped filters and their corresponding edits. For the period January–March 2016 there are some very active IP editors, the top of whom (with over $1.000$ hits) seemed to be engaging exclusively in the (probably automated) posting of spam links. Their edits however constitute some 1-3\% of all hits from the period which is insufficient to explain the peak \footnote{Upon closer examination, these edits all seemed to contain spam links about erectile dysfunction medication and their IP records pertained to a Russian registry. It is however possible that the offending editors were using a VPN or another proxy technology. The speculations abouth the intent of the edits remain out of the scope of the present work.}. \begin{comment} so the explanation ``it was viagra spam coming from Russian IPs'' is somewhat unsatisfactory. (Yes, it was viagra spam, and yes, a ``whois'' lookup proved them to really be Russian IPs. And, yes, whoever was editing could've also used a VPN, so I'm not opening a Russian bot fake news conspiracy theory just yet.) \end{comment} A more systematic scrutiny of the editors causing the hits was not possible due to time constraints, but may contribute more insights. Right now, all the data analysed on the matter stems from the \emph{abuse\_filter\_log} table and the checks of the content of the edits were done manually on a sample basis via the web frontend of the AbuseLog~\cite{Wikipedia:AbuseLog} where one can click on the diff of the edit for edits that matched public filters. No simple automated check of what the offending editors were trying to publish was possible since the \emph{abuse\_filter\_log} table does not store the text of the edit which matches a filter's pattern directly, but rather contains a reference to the \emph{text} table where the wikitext of all individual page revisions is stored~\cite{Wikipedia:TextTable}. One needs to join the hit data from \emph{abuse\_filter\_log} with the \emph{text} table to obtain the content of the edits. \begin{comment} Last but not least, I took a step back and contemplated the significant geo/socio-political events from the time, which triggered a lot of media (and Internet) attention and desinformation campaigns. Following things came to mind: 2016 US elections, the Brexit referendum and the so-called ``refugee crisis'' in Europe. There was also a severe organisational crisis in Wikimedia at the time during which a lot of staff left and eventually the executive director stepped down. However, I couldn't draw a direct relationship between any of these political events and the edits caught by edit filters. \end{comment} Last but not least, an investigation into the pages on which the filters were triggered proved them (the pages) to be quite innocuous: The page where most filter hits were logged in January 2016 (beside the login page, on which all account creations are logged) was ``Skateboard'' and the $660$ filter hits here are rather insignificant compared to the $372.907$ hits for the whole month. And the page in March (apart from the user login page) on which most filter hits took place was the user page for user 209.236.119.231 who was also the editor with second most hits and who was apparently trying to post spam links on his own user page (after posting twice to ``Skateboard''). In general, the pages on which filters match seem more like a randomly selected platform on which the disrupting editors unload their spam. \begin{figure} \centering \includegraphics[width=0.9\columnwidth]{pics/filter-hits-actions.png} \caption{EN Wikipedia edit filters: Hits per month according to filter action}~\label{fig:filter-hits-actions} \end{figure} \begin{figure} \centering \includegraphics[width=0.9\columnwidth]{pics/filter-hits-editor-actions.png} \caption{EN Wikipedia edit filters: Hits per month according to triggering editor's action}~\label{fig:filter-hits-editors-actions} \end{figure} \subsection{Most Active Filters Over the Years} \label{sec:most-active-filters} Table~\ref{tab:most-active-actions} displays the ten most active filters of all times together with their corresponding number of hits, actions, and manually assigned label. Only one among them fits the description of targeting malicious determined vandals: filter 527 ``T34234: log/throttle possible sleeper account creations''. The second area in which these filters are active are various types of blankings (mostly by new users) where the filters issue warnings pointing towards possible alternatives the editor may want to achieve or the proper procedure for deleting articles for instance. The table also shows that the mechanism ended up being quite active in preventing silly (e.g. inserting series of repeating characters) or profanity vandalism. \begin{table*}[t] \centering \begin{tabular}{p{1cm} r p{5cm} p{2cm} p{3cm}} % \toprule Filter ID & Hitcount & \raggedright Publicly available description & Actions & Manual tag (parent category) \\ \hline 61 & 1,611,956 & \raggedright new user removing references & tag & good\_faith\_refs (good\_faith) \\ 135 & 1,371,361 &\raggedright repeating characters & tag, warn & silly\_vandalism (vandalism)\\ 527 & 1,241,576 &\raggedright T34234: log/throttle possible sleeper account creations (hidden filter) & throttle & sockpuppetry (vandalism) \\ 384 & 1,159,239 &\raggedright addition of bad words or other vandalism & disallow & profanity\_vandalism (vandalism) \\ 172 & 935,925 & \raggedright section blanking & tag & good\_faith\_deletion (good\_faith) \\ 30 & 840,871 & \raggedright large deletion from article by new editors & tag, warn & good\_faith\_deletion (good\_faith) \\ 633 & 808,716 &\raggedright possible canned edit summary & tag & general\_vandalism (vandalism) \\ 636 & 726,764 &\raggedright unexplained removal of sourced content & warn & good\_faith\_deletion (good\_faith) \\ 3 & 700,522 & \raggedright new user blanking articles & tag, warn & good\_faith\_deletion (good\_faith) \\ 650 & 695,601 & \raggedright creation of a new article without any categories & (log only) & general\_tracking (maintenance) \\ \end{tabular} \caption{What do most active filters do?}~\label{tab:most-active-actions} \end{table*} It is also interesting to trace the trends for the ten most active filters for each year since the introduction of the AbuseFilter extension. According to tables~\ref{tab:app-most-active-2009} through~\ref{tab:app-most-active-2018}, this list has remained remarkably stable over time: From year to year, there is a difference of 2-3 filters. Also, at least half of the most active filters for each year overlap with the most active filters of all times. \begin{table} \centering \begin{tabular}{r p{9cm} r } % \toprule Filter ID & Publicly available description & Hitcount \\ % is the hitcount for the year or altogether till now?-- for the year, of course \hline 135 & repeating characters & 175455 \\ 30 & "large deletion from article by new editors" & 160302 \\ 61 & "new user removing references" & 147377 \\ 18 & Test type edits from clicking on edit bar & 133640 \\ 3 & "new user blanking articles" & 95916 \\ 172 & "section blanking" & 89710 \\ 50 & "shouting" (contribution consists of all caps, numbers and punctuation) & 88827 \\ 98 & "creating very short new article" & 80434 \\ 65 & "excessive whitespace" & 74098 \\ 132 & "removal of all categories" & 68607 \\ % \bottomrule \end{tabular} \caption{10 most active filters in 2009}~\label{tab:app-most-active-2009} \end{table} \begin{table} \centering \begin{tabular}{r p{9cm} r } % \toprule Filter ID & Publicly available description & Hitcount \\ \hline 61 & "new user removing references" & 245179 \\ 135 & repeating characters & 242018 \\ 172 & "section blanking" & 148053 \\ 30 & "large deletion from article by new editors" & 119226 \\ 225 & Vandalism in all caps & 109912 \\ 3 & "new user blanking articles" & 105376 \\ 50 & "shouting" & 101542 \\ 132 & "removal of all categories" & 78633 \\ 189 & BLP vandalism or libel & 74528 \\ 98 & "creating very short new article" & 54805 \\ % \bottomrule \end{tabular} \caption{10 most active filters in 2010}~\label{tab:app-most-active-2010} \end{table} \begin{table} \centering \begin{tabular}{r p{9cm} r } % \toprule Filter ID & Publicly available description & Hitcount \\ \hline 61 & "new user removing references"& 218493 \\ 135 & repeating characters & 185304 \\ 172 & "section blanking" & 119532 \\ 402 & New article without references & 109347 \\ 30 & Large deletion from article by new editors & 89151 \\ 3 & "new user blanking articles" & 75761 \\ 384 & Addition of bad words or other vandalism & 71911 \\ 225 & Vandalism in all caps & 68318 \\ 50 & "shouting" & 67425 \\ 432 & Starting new line with lowercase letters & 66480 \\ % \bottomrule \end{tabular} \caption{10 most active filters in 2011}~\label{tab:app-most-active-2011} \end{table} \begin{table} \centering \begin{tabular}{r p{9cm} r } % \toprule Filter ID & Publicly available description & Hitcount \\ \hline 135 & repeating characters & 173830 \\ 384 & Addition of bad words or other vandalism & 144202 \\ 432 & Starting new line with lowercase letters & 126156 \\ 172 & "section blanking" & 105082 \\ 30 & Large deletion from article by new editors & 93718 \\ 3 & "new user blanking articles" & 90724 \\ 380 & Multiple obscenities & 67814 \\ 351 & Text added after categories and interwiki & 59226 \\ 279 & Repeated attempts to vandalize & 58853 \\ 225 & Vandalism in all caps & 58352 \\ % \bottomrule \end{tabular} \caption{10 most active filters in 2012}~\label{tab:app-most-active-2012} \end{table} \begin{table} \centering \begin{tabular}{r p{9cm} r } % \toprule Filter ID & Publicly available description & Hitcount \\ \hline 135 & repeating characters & 133309 \\ 384 & Addition of bad words or other vandalism & 129807 \\ 432 & Starting new line with lowercase letters & 94017 \\ 172 & "section blanking" & 92871 \\ 30 & Large deletion from article by new editors & 85722 \\ 279 & Repeated attempts to vandalize & 76738 \\ 3 & "new user blanking articles" & 70067 \\ 380 & Multiple obscenities & 58668 \\ 491 & Edits ending with emoticons or ! & 55454 \\ 225 & Vandalism in all caps & 48390 \\ % \bottomrule \end{tabular} \caption{10 most active filters in 2013}~\label{tab:app-most-active-2013} \end{table} \begin{table} \centering \begin{tabular}{r p{9cm} r } % \toprule Filter ID & Publicly available description & Hitcount \\ \hline 384 & Addition of bad words or other vandalism & 111570 \\ 135 & repeating characters & 111173 \\ 279 & Repeated attempts to vandalize & 97204 \\ 172 & "section blanking" & 82042 \\ 432 & Starting new line with lowercase letters & 75839 \\ 30 & Large deletion from article by new editors & 62495 \\ 3 & "new user blanking articles" & 60656 \\ 636 & Unexplained removal of sourced content & 52639 \\ 231 & Long string of characters containing no spaces & 39693 \\ 380 & Multiple obscenities & 39624 \\ % \bottomrule \end{tabular} \caption{10 most active filters in 2014}~\label{tab:app-most-active-2014} \end{table} \begin{table} \centering \begin{tabular}{r p{9cm} r } % \toprule Filter ID & Publicly available description & Hitcount \\ \hline 650 & Creation of a new article without any categories & 226460 \\ 61 & New user removing references & 196986 \\ 636 & Unexplained removal of sourced content & 191320 \\ 527 & T34234: log/throttle possible sleeper account creations & 189911 \\ 633 & Possible canned edit summary & 162319 \\ 384 & Addition of bad words or other vandalism & 141534 \\ 279 & Repeated attempts to vandalize & 110137 \\ 135 & repeating characters & 99057 \\ 686 & IP adding possibly unreferenced material to BLP & 95356 \\ 172 & "section blanking" & 82874 \\ % \bottomrule \end{tabular} \caption{10 most active filters in 2015}~\label{tab:app-most-active-2015} \end{table} \begin{table} \centering \begin{tabular}{r p{9cm} r } % \toprule % \toprule Filter ID & Publicly available description & Hitcount \\ \hline 527 & T34234: log/throttle possible sleeper account creations & 437099 \\ 61 & New user removing references & 274945 \\ 650 & Creation of a new article without any categories & 229083 \\ 633 & Possible canned edit summary & 218696 \\ 636 & Unexplained removal of sourced content & 179948 \\ 384 & Addition of bad words or other vandalism & 179871 \\ 279 & Repeated attempts to vandalize & 106699 \\ 135 & repeating characters & 95131 \\ 172 & "section blanking" & 79843 \\ 30 & Large deletion from article by new editors & 68968 \\ % \bottomrule \end{tabular} \caption{10 most active filters in 2016}~\label{tab:app-most-active-2016} \end{table} \begin{table} \centering \begin{tabular}{r p{9cm} r } % \toprule Filter ID & Publicly available description & Hitcount \\ \hline 61 & New user removing references & 250394 \\ 633 & Possible canned edit summary & 218146 \\ 384 & Addition of bad words or other vandalism & 200748 \\ 527 & T34234: log/throttle possible sleeper account creations & 192441 \\ 636 & Unexplained removal of sourced content & 156409 \\ 650 & Creation of a new article without any categories & 151604 \\ 135 & repeating characters & 80056 \\ 172 & "section blanking" & 70837 \\ 712 & Possibly changing date of birth in infobox & 59537 \\ 833 & Newer user possibly adding unreferenced or improperly referenced material & 58133 \\ % \bottomrule \end{tabular} \caption{10 most active filters in 2017}~\label{tab:app-most-active-2017} \end{table} \begin{table} \centering \begin{tabular}{r p{9cm} r } % \toprule Filter ID & Publicly available description & Hitcount \\ \hline 527 & T34234: log/throttle possible sleeper account creations & 358210 \\ 61 & New user removing references & 234867 \\ 633 & Possible canned edit summary & 201400 \\ 384 & Addition of bad words or other vandalism & 177543 \\ 833 & Newer user possibly adding unreferenced or improperly referenced material & 161030 \\ 636 & Unexplained removal of sourced content & 144674 \\ 650 & Creation of a new article without any categories & 79381 \\ 135 & repeating characters & 75348 \\ 686 & IP adding possibly unreferenced material to BLP & 70550 \\ 172 & "section blanking" & 64266 \\ % \bottomrule \end{tabular} \caption{10 most active filters in 2018}~\label{tab:app-most-active-2018} \end{table} \section{Conclusions} \label{sec:5-conclusions} This chapter explored the edit filters on EN Wikipedia in order to determine what types of tasks these filters take over, and how these tasks have evolved over time. Different characteristics of the edit filters, as well as their activity through the years were scrutinised. Three main types of filter tasks were identified: preventing/tracking vandalism, guiding good faith but nonetheless disruptive edits towards a more constructive contribution, and various maintenance jobs such as tracking bugs or other conspicuous behaviour. It was further observed, that filters aimed at particularly malicious users or behaviours are usually hidden, whereas filters targeting general patterns are viewable by anyone interested. It was determined that hidden filters seem to fluctuate more, which makes sense given their main area of application. Public filters often target silly vandalism or test type edits, as well as spam. Disallowing edits by very determined vandals handled by hidden filters are in accord with the initial aim with which the filters were introduced (compare section~\ref{section:4-history}). The high number of such filters (compare section~\ref{sec:what-do-filters-target}) seems to confirm that edit filters are fulfilling their purpose. On the other hand, when the ten most active filters of all times (see table~\ref{tab:most-active-actions}) are regarded, only one of them appears to take care of the malicious determined vandals who motivated the creation of the AbuseFilter extension. The rest of the most frequently matching filters target a combination of good faith edits (above all such concerning deletions) and silly/profanity vandalism. Interestingly, that is not what the developers of the extension believed it was going to be good for: ``It is not, as some seem to believe, intended to block profanity in articles (that would be extraordinarily dim), nor even to revert page-blankings, '' claimed its core developer on 9 July 2008~\cite{Wikipedia:EditFilterTalkArchive1Clarification}. A further assumption that didn't carry into effect was that ``filters in this extension would be triggered \footnote{Here, by ``trigger'' is meant that an editor's action will match a filter's pattern and set off the configured filter's action(s).} fewer times than once every few hours''~\cite{Wikipedia:EditFilterTalkArchive1}. As a matter of fact, a quick glance at the AbuseLog~\cite{Wikipedia:AbuseLog} confirms that there are often multiple filter hits per minute, so the mechanism is used fairly actively, despite that its areas of application partially diverge from the ones intially conceived. In fact, the numbers of filter hits on EN Wikipedia are in the same order of magnitude as the revert numbers (compare figures~\ref{fig:filter-hits} and \ref{fig:reverts}). Regarding the temporal filter activity trends, it was ascertained that a sudden peak took place in the end of 2015–beginnings of 2016, after which the overall filter hit numbers stayed higher than they used to be before this occurence. Although there were some pointers towards what happened there: a surge in account creation attempts and possibly a big spam wave (the latter has to be verified in a systematic fashion), no really satisfying explanation of the phenomenon could be established. This remains one of the possible direction for future studies. In their 2012 paper Halfaker and Riedl propose a bot taxonomy according to which Wikipedia bots could be classified in one of the following task areas: content injection, monitoring, or curating; augmenting MediaWiki functionality; or protection from malicious activity~\cite{HalRied2012}. And although there are no filters that inject or curate content, there are definitely filters whose aim is to protect the encyclopedia from malicious activity, and such that augment MediaWiki's functionality e.g. by providing warning messages (with hopefully helpful feedback) or by tagging certain behaviours to be aggregated on dashboards for later examination. In this sense, edit filters and bots appear to be rather similar. \begin{comment} Bot taxonomy Task area | Example ----------------------------------------------------- Content injection | RamBot monitoring | SpellCheckerBot curating | Helpful Pixie Bot ("corrects ISBNs and other structural features of articles such as section capitalization") | interlanguage bots (deprecated bc of Wikidata?) ------------------------------------------------------ | Augment MediaWiki functionality | AIV Helperbot "turns a simple page into a dynamic priority-based discussion queue to support administrators in their work of identifying and blocking vandal" | SineBot - signs and dates comments ------------------------------------------------------ Protection from malicious activity | ClueBot_NG | XLinkBot \end{comment}