diff --git a/thesis/5-Overview-EN-Wiki.tex b/thesis/5-Overview-EN-Wiki.tex index 8a2049910ede0a890043aea7049a5873689c05cd..31067fae581a1755316d7cede371e495c2750ed2 100644 --- a/thesis/5-Overview-EN-Wiki.tex +++ b/thesis/5-Overview-EN-Wiki.tex @@ -11,6 +11,16 @@ and the methods we use–in chapter 3. Section~\ref{sec:patterns} explores (syn) some patterns in the edit filters' usage and.. And we look into the manual classification of EN Wikipedia's edit filters I've undertaken in an attempt to understand what is it that they actually filter in section~\ref{sec:manual-classification}. +%TODO check whether to discuss this bunch of random questions and where +\begin{comment} + \item how often were filters with different actions triggered? (afl\_actions) (over time) --> abuse\_filter\_log + \item what types of users trigger the filters (IPs? registered?) : IPs: 16,489,266, logged in users: 6,984,897 (Stand 15.03.2019); + \item on what articles filters get triggered most frequently (afl\_title) + \item what types of user actions trigger filters most frequently? (afl\_action) (edit, delete, createaccount, move, upload, autocreateaccount, stashupload) + \item in which namespaces get filters triggered most frequently? + %TODO categorise filters according to which name spaces they apply to; pay special attention to edits in user/talks name spaces (may be indication of filtering harassment) -- check notebook +\end{comment} + \section{Data} \label{sec:overview-data} @@ -23,13 +33,16 @@ Selected queries have been run via quarry against the \emph{abuse\_filter\_log} Unfortunately, the \emph{abuse\_filter\_history} table which will be necessary for a complete historical analysis of the edit filters is currently not exposed to the public due to security/privacy concerns~\cite{phabricator}. Therefore, the present work only touches upon historical trends in a qualitative fashion. %TODO how are these determined: API to abuse_filter_history; general stats from abuse_filter or qualitatively shows patterns. -A comprehensive historical analysis is therefore (syn!) one of the possibilities/directions for future studies (syn). +A comprehensive historical analysis is therefore (syn!) one of the possibilities/directions for future studies (syn) discussed in section~\ref{sec:further-studies}. %TODO maybe move to appendix; mention tables have been discussed in~\ref{sec:mediawiki-ext} and only quote here the one for abuse\_filter since we are using the data A concise description of the tables has been offered in section~\ref{sec:mediawiki-ext} which discusses the AbuseFilter MediaWiki extension in more detail. -Here, only the schema of the \emph{abuse\_filter} table has been included (figure~\ref{fig:db-schemas-af}), since that is the data the present analysis is based upon. +Here, only the schema of the \emph{abuse\_filter} table has been included (figure~\ref{fig:db-schemas-af}), since that is the data the majority of the present analysis is based upon. For further reference, the schemas of all four tables can be viewed in figures~\ref{fig:app-db-schemas-af},~\ref{fig:app-db-schemas-afl},~\ref{fig:app-db-schemas-afh} and~\ref{fig:app-db-schemas-afa} in the appendix. +%TODO incorporate this +and, of course, the whole \emph{abuse\_filter} table snapshot can be consulted in the repository~\cite{github}. + \begin{figure*} \begin{verbatim} abuse_filter @@ -56,17 +69,19 @@ abuse_filter \caption{abuse\_filter schema}~\label{fig:db-schemas-af} \end{figure*} -\section{Descriptive statistics/Patterns} +\section{Descriptive statistics/Patterns/General traits of the filters} \label{sec:patterns} -In this section, we explore some general patterns of the edit filters on Engish Wikipedia, or respectively the data from the \emph{abuse\_filter} table. +In this section, we explore some general traits/patterns of/trends in the edit filters on Engish Wikipedia, or respectively the data from the \emph{abuse\_filter} table. The scripts that generate the statistics discussed here, can be found in the jupyter notebook in the project's repository. %TODO add link after repository has been cleaned up \subsection{Filter characteristics} -As of January 6th, 2019 there are $954$ filters in this table. +% General stats +As of January 6th, 2019 there are $954$ filters in the \emph{abuse\_filter} table. It should be noted, that if a filter gets deleted, merely a flag is set to indicate so, but no entries are removed from the database. So, the above mentioned $954$ filters are all filters ever made up to this date. -This doesn't mean that it never changed what the filters are doing, since, as pointed out in chapter~\ref{}, edit filter managers can freely modify filter patterns, so at some point the filter could be doing one thing and in the next moment, it is filtering a completely different phenomenon. +This doesn't mean that it never changed what the single filters are doing, since, as pointed out in chapter~\ref{}, edit filter managers can freely modify filter patterns, so at some point the filter could be doing one thing and in the next moment it can be filtering a completely different phenomenon. +There are cases of filters being ``repurposed'' or modified to filter for example a more general occurance/phenomenon. This doesn't happen very often though. $361$ of all filters are public, the remaining $593$–hidden. $110$ of the public ones are active, $35$ are disabled, but not marked as deleted, and $216$ are flagged as deleted. @@ -79,23 +94,10 @@ The relative proportion of these groups to each other can be viewed on figure~\r \caption{EN Wikipedia edit filters: hidden, disabled and deleted filters}~\label{fig:general-stats} \end{figure} -%TODO decide whether to keep data of the sort; it's not very accurate (since hidden filters are missing), there is no interesting tendency and I don't have any particular commentary on it. -Table ... show how many new filters have been introduced over the years. -2009: ~280 ; 1~27x (hidden, so we don't know) -2010: ~100 ; 280 (6.1.2010)- smth between 379 (28.12.2010) and 384 (10.2.2011) -2011: ~70 ; ~380 - 44x (440 is Nov 2011, 458 is April 2012) -2012: ~70 ; ~450 - 51x (514 is Dec 2012, 520 is Jan 2013) -2013: ~80 ; ~520 - 59x (593 is 3.10.2013, 602 is 15.1.2014) -2014: ~55 ; ~600 - 65x (650 is 16.12.2014, 655 is 19.1.2015) -2015: ~90 ; ~655 - ~745 (744 is 28.12.2015, 747 is 7.1.2016) -2016: ~75 ; ~745 - 81x (812 is 3.12.2016, 828 is 19.1.2017) -2017: ~75 ; ~820 - 89x (894 is 23.12.2018, 897 is 8.1.2018) -2018: ~55 ; ~895 - (949 is 16.12.2018) - - +\subsection{Filter actions} Another parameter we could observe are the currently configured filter actions for each filter. Figure~\ref{fig:all-filters-actions} depicts action per filter (note this includes all filters, also deleted ones and that some filters have multiple actions enabled). -And figures~\ref{fig:active-public-actions} and~\ref{fig:active-hidden-actions} the actions of all enabled public and hidden filters respectively. +And figures~\ref{fig:active-public-actions} and~\ref{fig:active-hidden-actions} show the actions of all enabled public and hidden filters respectively. It is noticeable that the most common action for the enabled hidden filters is ``disallow'' whereas most enabled public filters are set to ``tag'' or ``tag,warn''. This coincides/is congruent with the community claim that hidden filters target particularly perstistent vandalism, which is best outright disallowed. Most public filters on the other hand still assume good faith from the editors and try to dissuade them from engaging in disruptive behaviour by using warnings or just tag conspicious behaviour for further investigation. @@ -118,39 +120,65 @@ Most public filters on the other hand still assume good faith from the editors a \caption{EN Wikipedia edit filters: Filters actions for enabled hidden filters}~\label{fig:active-hidden-actions} \end{figure} +\subsection{What do filters target}%: general behaviour vs edits by single users -\subsection{Filter makers} +Most of the public filters target general disruptive behavious (e.g.?). +There are however some which target particular users or particular pages. +Arguably, (see guidelines) an edit filter may not be the ideal mechanism for this latter purpose, since every incoming edit is checked against all active filters. +Historically, filters have been introduced to track some specific sort of behaviour which was however neither malicious nor disruptive. +This contradicts/defies/fails the purpose of the mechanism and thus such filters have been (quite swiftly) disabled. +Some filters target (syn!) insults in general, and there are such which target (syn!) specifically insults aimed at particular persons (often edit filter managers). -Here, a few characteristics of the edit filter managers group are discussed. -As mentioned in section~\ref{}, EN Wikipedia has 154 edit filter managers as of (date). -The group is as discussed (syn!) quite small. -(However, for comparison there are only 4 users in the edit filter managers group on the Catalan Wikipedia and the role does not exist at all on the German, Spanish and Russian ones which leads to the assumption that for these languages all administrators have the \emph{abusefilter\_modify} permission.) +A lot of hidden filters target specific users/problems. +\begin{comment} + ** there are quite some filters targeting particular users: 290 (targets an IP range), 177 ('User:Television Radio'), 663 ('Techno genre warrior +', targets specific IP ranges) + ** there are also some targetting particular pages (verify!), although this clashed with the guidelines: 264 "Specific-page vandalism" (it's hidden though, so we don't know what exactly it's doing); 401 ("Red hair" vandalism); there's smth with the main page; 715 "IP notification on RFP/C" + ** there are also filters such as 199 (Unflagged bots) which were implemented in order to track something which was not quite malicious or abusive and were thus deemed inappropriate use of filters by the community and consequently (quite swiftly) deleted + ** some target insults in general and some contain regexes containing very specifically insults directed towards edit filter managers (see filter 12) +\end{comment} -The edit filter managers group is quite stable, with only 4 users who have become an edit filter manager since November 2016 (according to the discusssion archives of the edit filter noticeboard where the permission is requested). -Since the edit filter helper group has been created in September 2017, only 11 users have been granted the corresponding permissions and only one of them has been subsequently ``promoted'' to become an edit filter manager. -(Interestingly, currently (July 2019) there are 19 people in the edit filter helpers group, so apparently some of them have received the right although no records are there on the noticeboard??) +\subsection{Public and Hidden Filters} -Moreover, quite some of the 154 edit filter managers on English Wikipedia have a kind of ``not active at the moment'' banner on their user page, which leads to the conclusion that the edit filter managers group is aging. +The first noticeable typology is along the line public/private filters. -% Has it been the same people from the very beginning? +It draws attention that currently nearly $2/3$ of all edit filters are not viewable by the general public (compare figure~\ref{fig:general-stats}). +Unfortunately, without the full \emph{abuse\_filter\_history} table we cannot know how this ration has developed historically. +However, the numbers fit the assertion of the extension's core developer according to whom edit filters target particularly determined vandals. -% What type of work do the different managers take over? -There are a couple of very active managers who seem to keep an overview over all filters and do maintenance work on them e.g. updating conditions to optimise evaluation or updating (syn!) deprecated variable names upon updates (syn!) of the extension's code. +Although the initial plan was to make all filters hidden, the community discussions rebutted that so a guideline was drafted calling for +hiding filters ``only where necessary, such as in long-term abuse cases where the targeted user(s) could review a public filter and use that knowledge to circumvent it.''~\cite{Wikipedia:EditFilter}. +Further, caution in filter naming is suggested for hidden filters and editors are encouraged to give such filters just simple description of the overall disruptive behaviour rather than naming a specific user that is causing the disruptions. +(The later is not always complied with, there are indeed filters named after the accounts causing a disruption.) -Further interesting questions come to mind such as whether there are edit filter managers who specialise in creating different types of edit filters (compare manual classification). -However, in order to be able to answer this, an access to the whole \emph{abuse\_filter\_history} table is needed, so this remains a question (syn!) for future inquiry. +% TODO this whole paragraph seems redundant with chapter 4 +Only edit filter editors (who have the \emph{abusefilter-modify} permission) and editors with the \emph{abusefilter-view-private} permission can view hidden filters. +The later is given to edit filter helpers - editors interested in helping with edit filters who still do not meet certain criteria in order to be granted the full \emph{abusefilter-modify} permission, editors working with edit filters on other wikis interested in learning from the filter system on English Wikipedia, and Sockpuppet investigation clerks~\cite{Wikipedia:EditFilterHelper}. +As of March 17, 2019, there are 16 edit filter helpers on EN Wikipedia~\footnote{\url{https://en.wikipedia.org/wiki/Special:ListUsers/abusefilter-helper}}. +Also, all administrators are able to view hidden filters. +There is also a designated mailing list for discussing these: wikipedia-en-editfilters@lists.wikimedia.org. +It is specifically indicated that this is the communication channel to be used when dealing with harassment (by means of edit filters)~\cite{Wikipedia:EditFilter}. +It is signaled, that the mailing list is meant for sensitive cases only and all general discussions should be held on-wiki~\cite{Wikipedia:EditFilter}. + +%TODO decide whether to include this here or move back to actions + ** there's a tendency of editors to hide filters just for the heck of it (at least there are never clear reasons given), which is then reverted by other editors with the comment that it is not needed: 148, 225 (consesus that general vandalism filters should be public \url{[Special:Permalink/784131724#Privacy of general vandalism filters]}), 260 (similar to 225), 285 (same), 12 (same), 39 (unhidden with the comment "made filter public again - these edits are generally made by really unsophisticated editors who barely know how to edit a page. --zzuuzz") + +Oftentimes, when a hidden filter is marked as ``deleted'', it is made public. (examples!) -\subsection{Filter activity} -Thanks to quarry, we have all the filters that were triggered from the filter log per year, from 2009 (when filters were first introduced/the MediaWiki extension was enabled) till end of 2018, with their corresponding number of times being triggered: +\section{Filter activity} + +\subsection{Distinct filters over the years + condition limit} +Thanks to quarry, we have all the filters that were triggered from the filter log per year, % I do have the whole table actually, don't I? +from 2009 (when filters were first introduced/the MediaWiki extension was enabled) till end of 2018, with their corresponding number of times being triggered: Table~\ref{tab:active-filters-count} summarises the numbers of distinct filters that got triggered over the years. -So, the number of distinct filters that have been triggered over the years varies between $154$ in year 2014 and $254$ in 2018. +So, the number of distinct filters that have been triggered over the years (syn!) varies between $154$ in year 2014 and $254$ in 2018. The explanation for this not particularly wide range of active filters lies probably in the so-called condition limit. According to the edit filters' documentation~\cite{Wikipedia:EditFilterDocumentation}, the condition limit is a hard-coded treshold of total available conditions that can be evaluated by all active filters. Currently, it is set to $1,000$. -The motivation for the condition limit is to avoid performance issues since every incoming edit is checked against all currently active filters which means that the more filters are active the longer the checks take. +The motivation for the condition limit is to avoid performance issues since every incoming edit is checked against all currently enabled filters which means that the more filters are active the longer the checks take. However, the page also warns that counting conditions is not the ideal metric of filter performance, since there are simple comparisons that take significantly less time than a check against the \emph{all\_links} variable for example (which needs to query the database)~\cite{Wikipedia:EditFilterDocumentation}. \begin{table} @@ -174,15 +202,15 @@ However, the page also warns that counting conditions is not the ideal metric of \caption{Count of distinct filters that got triggered each year}~\label{tab:active-filters-count} \end{table} +\subsection{Most active filters of all times} The ten most active filters of all times (with number of hits and public description) are displayed in table~\ref{tab:most-active-actions}. For a more detailed reference, the ten most active filters of each year are listed in the appendix. %TODO are there some historical trends we can read out of it? -and, of course, the whole \emph{abuse\_filter} table snapshot can be consulted in the repository~\cite{github}. Already, a couple of patterns draw attention when we look at the most active (syn!) filters: They seem to catch a combination of possibly good faith edits which were none the less unconstructive (such as removing references, section blanking or large deletions) and what the community has come to call ``silly vandalism''~\cite{Wikipedia:VandalismTypes}: repeating characters and inserting profanities. Interestingly, that's not what the developers of the extension believed it was going to be good for: -``It is not, as some seem to believe, intended to block profanity in articles (that would be extraordinarily dim), nor even to revert page-blankings, '' claimed its core developer on July 9th 2008~\cite{Wikipedia:EditFilterTalkArchive1}. +``It is not, as some seem to believe, intended to block profanity in articles (that would be extraordinarily dim), nor even to revert page-blankings, '' claimed its core developer on July 9th 2008~\cite{Wikipedia:EditFilterTalkArchive1Clarification}. Rather, among the 10 most active filters, it is filter 527 ``T34234: log/throttle possible sleeper account creations'' which seems to target what most closely resembles the intended aim of the edit filter extension. %TODO explain again what the intended aim was Another assumption that proved to be wrong/didn't quite carry into effect was that ``filters in this extension would be triggered fewer times than once every few hours''. @@ -208,14 +236,25 @@ As a matter of fact, a quick glance at the AbuseLog~\footnote{\url{https://en.wi \caption{What do most active filters do?}~\label{tab:most-active-actions} \end{table*} +\begin{comment} + \item is it new filters that get triggered most frequently? or are there also very active old ones? -- we have the most active filters per year, where we can observe this. It's a mixture of older and newer filter IDs (they get an incremental ID, so it is somewhat obvious what's older and what's newer); is there a tendency to split and refine older filters? +\end{comment} + +% Most active filters per year %TODO compare with table and with most active filters per year: is it old or new filters that get triggered most often? (I'd say it's a mixture of both and we can now actually answer this question with the history API, it shows us when a filter was first created) +\subsection{Filter hits per month (+peak)} We can follow/track/backtrack the number of filter hits over the years (syn) on figure~\ref{fig:filter-hits}. There is a dip in the number of hits in late 2014 and quite a surge in the beginnings of 2016. Here is the explanation to that: %TODO discuss peak! (and overall pattern) \begin{comment} -Looking at january 2016: +Looking at january, feb, march 2016 vs sept 2016 +- high number of account creation attempts +- a lot of (viagra) spam +- a bunch of very active russian IPs publishing the spam +- the exact moment seems arbitrary + till now it comes to attention that a lot of accounts named something resembling <FirstnameLastname4RandomLetters> were trying to create an account (while logged in?) (or maybe it was just that the creation of these particular accounts itself was denied); this triggers filter 527 ("T34234: log/throttle possible sleeper account creations ") @@ -233,20 +272,7 @@ Note: do hidden filters appear in this numbers and in the table? (They are defin \end{figure} -\begin{comment} - \item how often were filters with different actions triggered? (afl\_actions) (over time) --> abuse\_filter\_log - \item what types of users trigger the filters (IPs? registered?) : IPs: 16,489,266, logged in users: 6,984,897 (Stand 15.03.2019); - \item on what articles filters get triggered most frequently (afl\_title) - \item what types of user actions trigger filters most frequently? (afl\_action) (edit, delete, createaccount, move, upload, autocreateaccount, stashupload) - \item in which namespaces get filters triggered most frequently? - %TODO categorise filters according to which name spaces they apply to; pay special attention to edits in user/talks name spaces (may be indication of filtering harassment) -- check notebook -\end{comment} - -\begin{comment} - \item is it new filters that get triggered most frequently? or are there also very active old ones? -- we have the most active filters per year, where we can observe this. It's a mixture of older and newer filter IDs (they get an incremental ID, so it is somewhat obvious what's older and what's newer); is there a tendency to split and refine older filters? -\end{comment} - -\section{Patterns in filters creation and usage} +\section{History} The present section explores qualitatively/highlights patterns in the creation and usage of edit filters. Unfortunately, no extensive quantitative analysis of these patterns was possible, since for it, an access to the \emph{abuse\_filter\_history} table is needed. @@ -254,7 +280,7 @@ The table is currently not replicated via.. and no public dump is accessible via This seems to have been the case in the past, however, due to security concerns the dumps were discontinued. %TODO cite phabricator A short term solution to renew the public replicas was not possible, so the present chapter only shows some patterns (syn!) observed via manual browsing of different filters' history via the exposed API endpoint which allows querying the \emph{abuse\_filter\_history} table for public filters. -\subsection{Usage} +\subsection{Filter Usage/Activity} Following general patterns (syn!) of filter usage were observed: There are filters that have been switched on for a while, then deactivated and never activated again. @@ -288,31 +314,13 @@ Another group constitute enabled filters that have never been switched off since There are also some filters that have always been enabled with the exception of brief periods of time when the filter was deactivated (and the activated again), probably in order to update the conditions: 79, 135 (there were couple of others in Shirik's list, go back and look); There seems to be a tendency that all actions but logging (which cannot be switched off) are took out, when edit filter managers are updating the regex of the filter. -Oftentimes, when a hidden filter is marked as ``deleted'', it is made public. (examples!) - +\subsection{Triggered actions change over time} %TODO leave this here or move to filter characteristics? It is not uncommon, that the action(s) a particular filter triggers change over time. As of the guidelines for implementing new filters, every filter should be enabled in ``log only'' mode at its introduction. After it has been deemed that the filter actually acts as desired, usually additional actions are switched on~\cite{Wikipedia:EditFilterInstructions}. Sometimes, when a wave of particularly persistent vandalism arises, a filter is temporarily set to ``warn'' or ``disallow'' and the actions are removed again as soon as the filter is not tripped very frequently anymore. %TODO examples? -\subsection{What do filters target}%: general behaviour vs edits by single users - -Most of the public filters target general disruptive behavious (e.g.?). -There are however some which target particular users or particular pages. -Arguably, (see guidelines) an edit filter may not be the ideal mechanism for this latter purpose, since every incoming edit is checked against all active filters. -Historically, filters have been introduced to track some specific sort of behaviour which was however neither malicious nor disruptive. -This contradicts/defies/fails the purpose of the mechanism and thus such filters have been (quite swiftly) disabled. -Some filters target (syn!) insults in general, and there are such which target (syn!) specifically insults aimed at particular persons (often edit filter managers). - -\begin{comment} - ** there are quite some filters targeting particular users: 290 (targets an IP range), 177 ('User:Television Radio'), 663 ('Techno genre warrior -', targets specific IP ranges) - ** there are also some targetting particular pages (verify!), although this clashed with the guidelines: 264 "Specific-page vandalism" (it's hidden though, so we don't know what exactly it's doing); 401 ("Red hair" vandalism); there's smth with the main page; 715 "IP notification on RFP/C" - ** there are also filters such as 199 (Unflagged bots) which were implemented in order to track something which was not quite malicious or abusive and were thus deemed inappropriate use of filters by the community and consequently (quite swiftly) deleted - ** some target insults in general and some contain regexes containing very specifically insults directed towards edit filter managers (see filter 12) -\end{comment} - \subsection{How do filters emerge?} ** an older filter is split? 79 was split out of 61, apparently; 285 is split between "380, 384, 614 and others"; 174 is split from 29 ** several older filters are merged? @@ -323,31 +331,43 @@ Some filters target (syn!) insults in general, and there are such which target ( ** "in addition to filter 148, let's see what we get - Cen" (https://en.wikipedia.org/wiki/Special:AbuseFilter/188) // this illustrates the point that edit filter managers do introduce stuff they feel like introducing just to see if it catches something -\section{Public and Hidden Filters} +\section{People} +\subsection{Filter makers} -The first noticeable typology is along the line public/private filters. +Here, a few characteristics of the edit filter managers group are discussed. +As mentioned in section~\ref{}, EN Wikipedia has 154 edit filter managers as of (date). +The group is as discussed (syn!) quite small. +(However, for comparison there are only 4 users in the edit filter managers group on the Catalan Wikipedia and the role does not exist at all on the German, Spanish and Russian ones which leads to the assumption that for these languages all administrators have the \emph{abusefilter\_modify} permission.) %TODO check! -It draws attention that currently nearly $2/3$ of all edit filters are not viewable by the general public (compare figure~\ref{fig:general-stats}). -Unfortunately, without the full \emph{abuse\_filter\_history} table we cannot know how this ration has developed historically. -However, the numbers fit the assertion of the extension's core developer according to whom edit filters target particularly determined vandals. +The edit filter managers group is quite stable, with only 4 users who have become an edit filter manager since November 2016 (according to the discusssion archives of the edit filter noticeboard where the permission is requested)~\cite{}. +Since the edit filter helper group has been created in September 2017, only 11 users have been granted the corresponding permissions and only one of them has been subsequently ``promoted'' to become an edit filter manager. +(Interestingly, currently (July 2019) there are 19 people in the edit filter helpers group, so apparently some of them have received the right although no records are there on the noticeboard??) -Although the initial plan was to make all filters hidden, the community discussions rebutted that so a guideline was drafted calling for -hiding filters ``only where necessary, such as in long-term abuse cases where the targeted user(s) could review a public filter and use that knowledge to circumvent it.''~\cite{Wikipedia:EditFilter}. -Further, caution in filter naming is suggested for hidden filters and editors are encouraged to give such filters just simple description of the overall disruptive behaviour rather than naming a specific user that is causing the disruptions. -(The later is not always complied with, there are indeed filters named after the accounts causing a disruption.) +Moreover, quite some of the 154 edit filter managers on English Wikipedia have a kind of ``not active at the moment'' banner on their user page, which leads to the conclusion that the edit filter managers group is aging. -Only edit filter editors (who have the \emph{abusefilter-modify} permission) and editors with the \emph{abusefilter-view-private} permission can view hidden filters. -The later is given to edit filter helpers - editors interested in helping with edit filters who still do not meet certain criteria in order to be granted the full \emph{abusefilter-modify} permission, editors working with edit filters on other wikis interested in learning from the filter system on English Wikipedia, and Sockpuppet investigation clerks~\cite{Wikipedia:EditFilterHelper}. -As of March 17, 2019, there are 16 edit filter helpers on EN Wikipedia~\footnote{\url{https://en.wikipedia.org/wiki/Special:ListUsers/abusefilter-helper}}. -Also, all administrators are able to view hidden filters. +% Has it been the same people from the very beginning? -There is also a designated mailing list for discussing these: wikipedia-en-editfilters@lists.wikimedia.org. -It is specifically indicated that this is the communication channel to be used when dealing with harassment (by means of edit filters)~\cite{Wikipedia:EditFilter}. -It is signaled, that the mailing list is meant for sensitive cases only and all general discussions should be held on-wiki~\cite{Wikipedia:EditFilter}. +% What type of work do the different managers take over? +There are a couple of very active managers who seem to keep an overview over all filters and do maintenance work on them e.g. updating conditions to optimise evaluation or updating (syn!) deprecated variable names upon updates (syn!) of the extension's code. -%TODO decide whether to include this here or move back to actions - ** there's a tendency of editors to hide filters just for the heck of it (at least there are never clear reasons given), which is then reverted by other editors with the comment that it is not needed: 148, 225 (consesus that general vandalism filters should be public \url{[Special:Permalink/784131724#Privacy of general vandalism filters]}), 260 (similar to 225), 285 (same), 12 (same), 39 (unhidden with the comment "made filter public again - these edits are generally made by really unsophisticated editors who barely know how to edit a page. --zzuuzz") +Further interesting questions come to mind such as whether there are edit filter managers who specialise in creating different types of edit filters (compare manual classification). +However, in order to be able to answer this, an access to the whole \emph{abuse\_filter\_history} table is needed, so this remains a question (syn!) for future inquiry. +\subsection{Who trips filters} + +- IPs and (newly) registered users + +\begin{comment} +# Memo new users + +When comparing the *vandalism* and *good faith* memos, it comes to attention that both type of edits are usually performed by new(ly/recently registered) users (or IP addresses). + +A user who just registered an account (or who doesn't even bother to) is most probably inexperienced with Wikipedia, not familiar with all policies and guidelines and perhaps nor with MediaWiki syntax. + +It is also quite likely (to be verified against literature!) that majority of vandalism edits come from the same type of newly/recently registered accounts. +In general, it is rather unlikely that an established Wikipedia editor should at once jeopardise the encyclopedia's purpose and start vandalising. +Although apparently there are determined trolls who ``work accounts up'' to admin and then run rampant. +\end{comment} \section{Types of edit filters: Manual Classification} \label{sec:manual-classification} @@ -356,21 +376,17 @@ The aim of this section is to get a better understanding of what exactly it is t Based on grounded theory methodology presented in chapter~\ref{chap:methods}, I applied emergent coding to all filters, scrutinising their patterns, comments and actions. %TODO Comment on exact process of coding (check with coding book, I think a lot is explained there already) -Three big clusters of filters were identified, namely ``vandalism'', ``good faith'' and ``maintenance''. %TODO define what each of them are +Three big clusters of filters were identified, namely ``vandalism'', ``good faith'' and ``maintenance''. %TODO define what each of them are; I actually work with 8 main clusters in the end; Unify this +\subsection{Challenges with labeling} It was not always a straightforward decision to determine what type of edits a certain filter is targeting. -This was of course, particularly challenging for private filters where only the public comment (name) of the filter was there to guide the coding. +This was of course particularly challenging for private filters where only the public comment (name) of the filter was there to guide the coding. On the other hand, guidelines state up-front that filters should be hidden only in cases of particularly persistent vandalism, in so far it is probably safe to establish that all hidden filters target some type of vandalism. However, the classification was difficult for public filters as well, since oftentimes what makes the difference between a good-faith and a vandalism edit is not the content of the edit but the intention of the editor. While there are cases of juvenile vandalism (putting random swear words in articles) or characters repetiton vandalism which are pretty obvious, that is not the case for sections or articles blanking for example. For these, from the edit alone there is no way of knowing whether the deletion was malicious or the editor conducting it just wasn't familiar with say the correct procedure for moving an article. -\begin{figure} -\centering - \includegraphics[width=0.9\columnwidth]{pics/manual-tags-distribution.png} - \caption{Edit filters manual tag distribution}~\label{fig:manual-tags} -\end{figure} - +\subsection{Editors' motivation} \begin{comment} # Filter according to editor motivation @@ -387,19 +403,7 @@ one of them refers to the edits made out of good and the other to the ones made ("The road to hell is paved with good intentions.") -## Open questions - -If discerning motivation is difficult, and, we want to achieve different results, depending on the motivation, that lead us to the question whether filtering is the proper mechanism to deal with disruptive edits. - -# Memo new users - -When comparing the *vandalism* and *good faith* memos, it comes to attention that both type of edits are usually performed by new(ly/recently registered) users (or IP addresses). - -A user who just registered an account is most probably inexperienced with Wikipedia, not familiar with all policies and guidelines and perhaps nor with MediaWiki syntax. - -It is also quite likely (to be verified against literature!) that majority of vandalism edits come from the same type of newly/recently registered accounts. -In general, it is highly unlikely that an established Wikipedia editor should at once jeopardise the encyclopedia's purpose and start vandalising. - +%TODO decide whether following two paragraphs are redundant with a lot of stuff already and get rid of them Users are urged to use the term "vandalism" carefully, since it tends to offend and drive people away. ("When editors are editing in good faith, mislabeling their edits as vandalism makes them less likely to respond to corrective advice or to engage collaboratively during a disagreement,"~\cite{Wikipedia:Vandalism}) @@ -411,12 +415,6 @@ Only if the disrupting editor proves to be uncooperating, ignores warnings and c \end{comment} -%TODO compare with code book and kick the paragraph out -In such ambiguous cases, we can be guided by the action the filter triggers (if it is ``disallow'' the filter is most probably targeting vandalism). -At the end, we labeled most ambiguous cases with both ``vandalism'' and ``good faith''. - - -%TODO include here a diagram with overview of the categories distribution In the subsections that follow the salient properties of each manually labeled category are discussed. @@ -441,6 +439,8 @@ In the initial version of the EditFilters Page (https://en.wikipedia.org/w/index There are also private filters targetting personal attack or abuse cases. Here, filters are private in order to protect the affected person(s)~\cite{Wikipedia:EditFilter}. + +\subsection{Hardcore vandalism} A dedicated subcluster of ``hardcore vandalism'' was defined (syn!) for these cases. %TODO what to make out of this? It's kind of interesting but doesn't really serve any purpose.. @@ -551,7 +551,17 @@ The maintenance cluster differs conceptually from the ``vandalism'' and ``good f \subsection{Unknown} -\subsection{Manual classification: outlook/concluding remarks} +\section{Manual tags discussion/manual tags + activity} + +\subsection{Manual tags distribution} +%TODO discuss figure +\begin{figure} +\centering + \includegraphics[width=0.9\columnwidth]{pics/manual-tags-distribution.png} + \caption{Edit filters manual tag distribution}~\label{fig:manual-tags} +\end{figure} + +\subsection{What filters were implemented immediately after the launch + manual tags} %TODO What were the first filters to be implemented immediately after the launch of the extension? The extension was launched on March 17th, 2009. Filter 1 is implemented in the late hours of that day. @@ -562,9 +572,16 @@ blanking articles (filter 3) personal attacks (filter 9,11) and obscenities (12) some concrete users/cases (hidden filters, e.g. 4,21) and sockpuppetry (16,17) +\subsection{Combine most active filters with manual tags} + \section{Fazit} \begin{comment} +## Open questions + +If discerning motivation is difficult, and, we want to achieve different results, depending on the motivation, that lead us to the question whether filtering is the proper mechanism to deal with disruptive edits. + +%TODO doesn't really seem related, maybe get rid of it altogether Vgl \cite{HalRied2012} Bot taxonomy diff --git a/thesis/references.bib b/thesis/references.bib index 6df6affa7a642927a15f55a32341679fd04743eb..08ecc14b63b255fa143aebb8cdbeb1d4f2c79997 100644 --- a/thesis/references.bib +++ b/thesis/references.bib @@ -479,6 +479,15 @@ \url{https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Edit_filter/Archive_1&oldid=884572675}} } +@misc{Wikipedia:EditFilterTalkArchive1Clarification, + key = "Wikipedia Edit Filter Talk Archive 1 Clarification", + author = {}, + title = {Wikipedia: Edit Filter Talk Archive 1 Clarification}, + year = 2019, + note = {Retreived May 22, 2019 from + \url{https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Edit_filter/Archive_1&oldid=884572675#Clarification}} +} + @misc{Wikipedia:EditFilterTalkArchiveNameChange, key = "Wikipedia Edit Filter Talk Archive Name Change", author = {},