From d62fa09e65d0081f33522c20bc95c5fca63b4b7b Mon Sep 17 00:00:00 2001 From: Lyudmila Vaseva <vaseva@mi.fu-berlin.de> Date: Mon, 1 Jul 2019 10:20:29 +0200 Subject: [PATCH] Continue chapter 4 clean up --- thesis/4-Edit-Filters.tex | 60 +++++++++++++++-------------------- thesis/5-Overview-EN-Wiki.tex | 13 ++++++++ thesis/conclusion.tex | 1 + 3 files changed, 39 insertions(+), 35 deletions(-) diff --git a/thesis/4-Edit-Filters.tex b/thesis/4-Edit-Filters.tex index 9f9156d..b31e964 100644 --- a/thesis/4-Edit-Filters.tex +++ b/thesis/4-Edit-Filters.tex @@ -85,6 +85,7 @@ and actions to take when the filter is triggered. %************************************************************************ \section{The AbuseFilter\footnote{Note that the user facing elements of this extention were renamed to ``edit filter'', however the extension itself, as well as its corresponding permissions, database tables etc. still reflect the original name.} Mediawiki extension} +\label{sec:mediawiki-ext} At the end, from a technical perspective, Wikipedia's edit filters are a MediaWiki plugin that allows every edit (and some other editor's actions) to be checked against a speficied regular expression pattern before it is published. @@ -93,7 +94,7 @@ Most frequently, edit filters are triggered upon new edits, there are however fu As of June 30th 2019, these include: `edit', `move', `delete', `createaccount', `autocreateaccount', `upload', `stashupload'\footnote{See l. 181 in \url{https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/AbuseFilter/+/refs/heads/master/includes/special/SpecialAbuseLog.php}}. %TODO explain what the actions are, especially the less obvious ones such as `autocreateaccount' Historically, further editor's actions such as `feedback', `gatheredit' and `moodbar' could trigger an edit filter. -However, this is no longer the case. %TODO explain why? I have the guess that these are not available in the software anymore (generally, not only for the edit filters) +These are in the meantime deprecated. %TODO explain why? I have the guess that these are not available in the software anymore (generally, not only for the edit filters) When a filter is triggered, beside logging this, a further filter action may be invoked as well. The plugin defines following possible filter actions: @@ -192,22 +193,19 @@ This is how the right ended up to be governed. The best practice way for introducing a new filter is described under \url{https://en.wikipedia.org/wiki/Wikipedia:Edit_filter/Instructions}. According to the page, these steps should be followed: \begin{itemize} - \item read the docs: \url{https://www.mediawiki.org/wiki/Extension:AbuseFilter/Rules_format} + \item read the documentation: \url{https://www.mediawiki.org/wiki/Extension:AbuseFilter/Rules_format} \item test with debugging tools: \url{https://en.wikipedia.org/wiki/Special:AbuseFilter/tools} (visible only for users who are already in the edit filter managers user group) - \item test with batch testing interface (dito) - \item create logging only filter: \url{https://en.wikipedia.org/wiki/Special:AbuseFilter/new} (needs permissions) + \item test with the batch testing interface (also available to edit filter managers only) + \item create a logging only filter: \url{https://en.wikipedia.org/wiki/Special:AbuseFilter/new} (edit filter manager permissions needed) \item announce the filter at the edit filter notice board~\cite{Wikipedia:EditFilterNoticeboard}, so other edit filter managers can comment on it - \item finally, fully enable the filter by adding an appropriate edit filter action. + \item finally, fully enable the filter by adding an appropriate additional edit filter action. \end{itemize} -Performance/efficiency seem to be fairly important for the edit filter system; +Performance/efficiency seems to be fairly important for the edit filter system; on multiple occasions, there are notes on recommended order of operations, so that the filter evaluates as resource sparing as possible~\cite{Wikipedia:EditFilterInstructions} or invitations to consider whether an edit filter is the most suitable mechanism for solving a particular issue at all~\cite{Wikipedia:EditFilter},~\cite{Wikipedia:EditFilterRequested}. - -% Can filter editors introduce each filter they feel like introducing? Or is a community consensus due when a new filter is introduced? - Anyone can propose a new edit filter. -An editor who notices problematic/weird/.. behaviour they deem needs a filter can raise the issue at \url{https://en.wikipedia.org/wiki/Wikipedia:Edit_filter/Requested}. +Every editor who notices some problematic behaviour they deem needs a filter can raise the issue at \url{https://en.wikipedia.org/wiki/Wikipedia:Edit_filter/Requested}. The request can then be approved and implemented by an edit filter manager (mostly after a discussion/clarification of the details). The Edit Filters Requests page also asks users to go through following checklist before requesting a filter: \begin{itemize} @@ -218,57 +216,50 @@ The Edit Filters Requests page also asks users to go through following checklist \item there are Titles Blacklist and Link/Spam Blacklist which should be used if the issue at hand has to do with a problematic title or link. \end{itemize} +Edit filter managers often introduce filters based on some phenomena they have observed caught by other filters, other algorithmic quality control mechanisms or general experience. +As all newly implemented filters, these are initially enabled in logging only mode until enough log entries are generated to evaluate whether the incident is severe and frequent enough to need a filter or not. +%TODO this actually fits also in the patterns of new filters in chap.5; these are the filters introduced for couple of days/hours, then switched off to never be enabled again + \subsection{Who can edit filters?} \label{section:who-can-edit} In order to be able to set up an edit filter on their own, an editor needs to have the \emph{abusefilter-modify} permission. -According to ~\cite{Wikipedia:EditFilter} this right is given only to editors who ``have the required good judgment and technical proficiency''. -Further down on the page it is clarified that it's administrators who can assign the permission to users (also to themselves) and they should only assign it to non-admins in exceptional cases, ``to highly trusted users, when there is a clear and demonstrated need for it''. +According to~\cite{Wikipedia:EditFilter} this right is given only to editors who ``have the required good judgment and technical proficiency''. +Further down on the page it is clarified that it is administrators who can assign the permission to users (also to themselves) and they should only assign it to non-admins in exceptional cases, ``to highly trusted users, when there is a clear and demonstrated need for it''. If editors wish to be given this permission, they can hone and prove their skills by helping with requested edit filters and false positives~\cite{Wikipedia:EditFilter}. The formal process for requesting the \emph{abusefilter-modify} permission is to raise it to the edit filter noticeboard~\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:Edit_filter_noticeboard}}. -%TODO who can raise the issue to the noticeboard? A discussion is held there, usually for 7 days, before a decision is reached~\cite{Wikipedia:EditFilter}. +As of 2017%TODO Check!! +, when the ``edit filter helpers'' group was introduced (editors in this group have .... rights), +the usual process is that editors first request an ``edit filter helper'' permission and are later converted to full edit filter managers. + A list of the current edit filter managers for the EN Wikipedia can be found here: \url{https://en.wikipedia.org/wiki/Special:ListUsers/abusefilter}. As of May 10, 2019, there are 154 users in the ``edit filter managers'' group\footnote{\url{https://en.wikipedia.org/w/index.php?title=Special:ListUsers&offset=&limit=250&username=&group=abusefilter&wpsubmit=&wpFormIdentifier=mw-listusers-form}}. (For comparison, as of March 9, 2019 there are 1181 admins, see \url{https://en.wikipedia.org/w/index.php?title=Special:ListUsers/sysop}.) Out of the 154 edit filter managers only 11 are not administrators. -\begin{comment} -Quite some of the 154 edit filter managers have a kind of "not active at the moment" banner on their user page. -How many new editors have gotten the permission in recent time? -Otherwise the group is apparently aging.. - - -CAT: https://ca.wikipedia.org/wiki/Especial:Usuaris/abusefilter (currently: 4 users) - --- auf Spanisch/Deutsch/Russisch existiert die Rolle nicht; interessant zu wissen, ob sie iwo subsumiert wurde --- auf Bulgarisch übrigens auch nicht, aber da existiert auch die gesamte EditFilter seite nicht -Probably it's simply admins who can modify the filters there. -\end{comment} - \subsection{Modifying a filter} -% I may have found smth for theis subsection +% TODO Should we keep this here? it's more a comment on the filters lifecycle that I also discuss in chap.5 It is not uncommon, that the action(s) a particular filter triggers change over time. As of the guidelines for introducing new filters, every filter should be enabled in ``log only'' mode at the beginning. After it has been deemed that the filter actually acts as desired, usually additional actions are switched on~\cite{Wikipedia:EditFilterInstructions}. Sometimes, when a wave of particularly persistent vandalism arises, a filter is temporarily set to ``warn'' or ``disallow'' and the actions are removed again as soon as the filter is not tripped very frequently anymore. %TODO src? other than data? -\begin{comment} -this subsection used to describe a filter's detailed page; I moved all of it to "example of an edit filter". Not sure whether there is some significant information that by all means has to be included in this section in particular. -\end{comment} \subsection{Urgent situations} -There are several provisions for urgent situations (which I think should be scrutinised extra carefully since ``urgent situations'' have historically always been an excuse for cuts in civil liberties). +Throughout the documentation, there are several provisions for urgent situations. For instance, generally, every new filter should be tested extensively in logging mode only (without any further actions) until a sufficient number of edits has demonstrated that it does indeed filter what it was intended to and there aren't too many false positives. As a matter of fact, caution is solicited both on the edit filter description page~\cite{Wikipedia:EditFilter} and on the edit filter management page~\cite{Wikipedia:EditFilterManagement}. Only then the filter should have ``warn'' or ``disallow'' actions enabled~\cite{Wikipedia:EditFilter}. %TODO move this to the introducing a filter part, where it's mentioned for the first time that filters should be "log only" in the beginning; move verything else to further studies/long list of interesting questions In ``urgent situations'' however (how are these defined? who determines they are urgent?), discussions about a filter may happen after it was already implemented and set to warn/disallow edits whithout thorough testing. Here, the filter editor responsible should monitor the filter and the logs in order to make sure the filter does what it was supposed to~\cite{Wikipedia:EditFilter}. +I think these cases should be scrutinised extra carefully since ``urgent situations'' have historically always been an excuse for cuts in civil liberties. +This is however beyond the scope of the present work and one of the directions for further studies suggested in section~\ref{sec:further-studies}. %************************************************************************ @@ -276,15 +267,13 @@ Here, the filter editor responsible should monitor the filter and the logs in or \subsection{What happens when a filter gets triggered?} There are several actions by editors that may trigger an edit filter. -Editing is the most common of them, but there are also filters targetting account creation, deletions, moving pages or uploading content\footnote{\url{https://www.mediawiki.org/wiki/Extension:AbuseFilter/Rules_format\#Variables_from_AbuseFilter}}. -%TODO src as a footnote or a proper ref? -% bzw think about how much of this is a unnecessary repetition and get rid of it +Editing is the most common of them, but as elaborated in section~\ref{sec:mediawiki-ext}, there are also filters targetting account creation, deletions, moving pages or uploading content. When an edit filter's regex pattern matches an editor's action, an entry is created in the \emph{abuse\_filter\_log} table and an additional action (or actions) may be invoked. The documentation of the AbuseFilter extension provides us a complete list of the possible edit filter actions~\cite{Mediawiki:AbuseFilterActions}: \begin{itemize} \item Logging: ``All filter matches are logged in the abuse log. This cannot be turned off.'' - \item Warning: ``The user is warned that their edit may not be appreciated, and is given the opportunity to submit it again. You may specify a specific system message containing the warning to display.'' A link to the false positives page~\cite{Wikipedia:EditFilterFalsePositives} is also provided. (the editor who tripped the filter is provided with the opportunity to revise their edit and re-submit it) + \item Warning: ``The user is warned that their edit may not be appreciated, and is given the opportunity to submit it again. You may specify a specific system message containing the warning to display.'' A link to the false positives page~\cite{Wikipedia:EditFilterFalsePositives} is also provided. The editor who tripped the filter is provided with the opportunity to revise their edit and re-submit it. \item Throttling: ``The filter will only match if a rate limit is tripped. You can specify the number of actions to allow, the period of time in which these actions must occur, and how those actions are grouped. The groupings are which sets of people should have aggregate (shared) throttles. That is, if you type "user", then the same user must match the filter a certain number of times in a certain period of time. You may also combine groups with commas to specify that throttle matches sharing all criteria will be aggregated. For example, using "ip,page", X filter matches in Y seconds from the same IP address to the same page will be required to trip the remainder of the actions.'' (So this is something like, do this and that if a user edits a particular page X times for a Y period of time. In this sense: throttling always has to be paired with another action?) @@ -295,6 +284,7 @@ The documentation of the AbuseFilter extension provides us a complete list of th \item Range-blocking: ``Somewhat of a "nuclear option", the entire /16 range from which the rule was triggered will be blocked for 1 week.'' \item Tagging: ``The edit or change can be 'tagged' with a particular tag, which will be shown on Recent Changes, contributions, logs, new pages, history, and everywhere else. These tags are styleable, so you can have items with a certain tag appear in a different colour or similar.'' \end{itemize} +%TODO shouldn't this be part of the MediaWiki Extension section? Range-blocking, blocking, removing from priviledged groups and revoking autopromoted groups haven't been used on the EN Wikipedia in recent years. %TODO: why? look for talk page archives around the last time they were used. Maybe there was a particular incident; nothing concerning autopromote in the EdiFilter talk page around 2012 To be more precise, the last time a filter action other than ``log only'', ``tag'', ``warn'' or ``disallow'' was triggered on the EN Wikipedia was in 2012. %TODO Refer to data analysis diff --git a/thesis/5-Overview-EN-Wiki.tex b/thesis/5-Overview-EN-Wiki.tex index 5ab0404..552d06d 100644 --- a/thesis/5-Overview-EN-Wiki.tex +++ b/thesis/5-Overview-EN-Wiki.tex @@ -505,6 +505,19 @@ Multiple filters have the comment "let's see whether this hits something", which ** are there a couple of very active edit filter managers, that are also (informal) leaders? ** Do edit filter managers specialize on particular types of filters (e.g. vandalism vs good faith?) +\begin{comment} +Quite some of the 154 edit filter managers have a kind of "not active at the moment" banner on their user page. +How many new editors have gotten the permission in recent time? +Otherwise the group is apparently aging.. + + +CAT: https://ca.wikipedia.org/wiki/Especial:Usuaris/abusefilter (currently: 4 users) + +-- auf Spanisch/Deutsch/Russisch existiert die Rolle nicht; interessant zu wissen, ob sie iwo subsumiert wurde +-- auf Bulgarisch übrigens auch nicht, aber da existiert auch die gesamte EditFilter seite nicht +Probably it's simply admins who can modify the filters there. +\end{comment} + * How are filter actions set ** there's this pattern that all actions but logging (which cannot be switched off) are took out, when edit filter managers are updating the regex of the filter ** there's a tendency of editors to hide filters just for the heck of it (at least there are never clear reasons given), which is then reverted by other editors with the comment that it is not needed: 148, 225 (consesus that general vandalism filters should be public \url{[Special:Permalink/784131724#Privacy of general vandalism filters]}), 260 (similar to 225), 285 (same), 12 (same), 39 (unhidden with the comment "made filter public again - these edits are generally made by really unsophisticated editors who barely know how to edit a page. --zzuuzz") diff --git a/thesis/conclusion.tex b/thesis/conclusion.tex index a170b09..09166ec 100644 --- a/thesis/conclusion.tex +++ b/thesis/conclusion.tex @@ -81,6 +81,7 @@ TheNameWithNoMan (talk) 17:39, 9 July 2008 (UTC)" %************************************************************************ \section{Directions for further studies} +\label{sec:further-studies} <insert long list of interesting questions here> \begin{itemize} -- GitLab