Apply T's feedback chap 4

61a8c562 · Lyudmila Vaseva · 641b77a3 · 61a8c562 · 61a8c562 · 61a8c562
Commit 61a8c562 authored 5 years ago by Lyudmila Vaseva
--- a/thesis/4-Edit-Filters.tex
+++ b/thesis/4-Edit-Filters.tex
@@ -46,7 +46,9 @@ We discuss (who is in) the edit filter manager group in section~\ref{section:who
 \subsection{Example of a filter}

 For illustration purposes, let us have a closer look at what a single edit filter looks like.
-Edit filter with ID 365 is public and currently enabled (as of June 30th 2019).
+Edit filter with ID 365 is public
+\footnote{There are also private (hidden) filters. The distinction is discussed in more detail in sections~\ref{section:4-history} and \ref{sec:public-hidden}.}
+and currently enabled (as of June 30th 2019).
 This means the filter is working and everyone interested can view the filter's details.
 Its description reads ``Unusual changes to featured or good content''.
 The filter pattern is:
@@ -78,7 +80,7 @@ some statistics (the average time the filter takes to check an edit, percentage
 comments (left by edit filter managers, generally to log and explain changes);
 flags (``Hide details of this filter from public view'', ``Enable this filter'', ``Mark as deleted'');
 links to last modified (with diff and user who modified it), the edit filter's history and a tool for exporting the filter to another wiki;
-and actions to take when the filter is triggered.
+and actions to take when the filter's pattern matches.

 \begin{figure}
 \centering
@@ -106,7 +108,7 @@ As of June 30th 2019, these include: \emph{edit}, \emph{move}, \emph{delete}, \e
 Historically, further editor's actions such as \emph{feedback}, \emph{gatheredit} and \emph{moodbar} could trigger an edit filter.
 These are in the meantime deprecated. %TODO explain why? I have the guess that these are not available in the software anymore (generally, not only for the edit filters)

-When a filter is triggered, beside logging the event in the \emph{abuse\_filter\_log} table (the only filter action which cannot be switched off), a further filter action may be invoked as well.
+When a filter's pattern is matched, beside logging the event in the \emph{abuse\_filter\_log} table (the only filter action which cannot be switched off), a further filter action may be invoked as well.
 The plugin defines following possible filter actions:
 \emph{tag}, \emph{throttle}, \emph{warn}, \emph{blockautopromote}, \emph{block}, \emph{degroup}, \emph{rangeblock}, \emph{disallow}\footnote{See line 2808 in \url{https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/AbuseFilter/+/refs/heads/master/includes/AbuseFilter.php}}.
 %TODO verify that none of the actions are deprecated; I have my doubts that for instance `revoking auto-promoted groups' may not be available anymore -- as far as I can see they are available in the software. However, there never was a community consensus to use them
@@ -115,9 +117,9 @@ The documentation of the AbuseFilter extension provides us comprehensive definit
    \item \emph{tag}: The contribution is tagged with a specific tag (which can be defined and styled by the edit filter manager) which then appears on Recent Changes, contributions, logs, history pages, etc. and allows aggregations of lists for dashboards and similar.
    \item \emph{throttle}: The filter is activated upon the tripping of a rate limit. Configurable parameters are the allowed number of actions, the period of time in which these actions must occur, and how those actions are grouped. Actions can be grouped by user, IP address, /16 IP range, creation date of the user account, page, site, the edit count of the user or a combination thereof. (A simple example for trottling is something like ``do this if page X is edited more than Y times in Z seconds''.)
     \item \emph{warn}: A warning is displayed that the edit may not be appreciated. (The warning message is configurable by the edit filter manager.) The editor who tripped the filter is provided with the opportunity to revise their edit and re-submit it. A link to the false positives page~\cite{Wikipedia:EditFilterFalsePositives} is also provided.
-     \item \emph{blockautopromote}: The user who triggered the filter is banned from receiving extra groups from \emph{\$wgAutopromote} for a random period of 3 to 7 days. %TODO what is wgAutopromote?
+     \item \emph{blockautopromote}: The user whose action matched the filter's pattern is banned from receiving extra groups from \emph{\$wgAutopromote} for a random period of 3 to 7 days. %TODO what is wgAutopromote?
    \item \emph{block}: The user who triggered the filter is blocked indefinitely. An error message is displayed to inform the user of this action.
-    \item \emph{degroup}: The user who triggered the filter is removed from all privileged groups (sysop, bureaucrat, etc). An error message is displayed to inform them of this action.
+    \item \emph{degroup}: The user whose action matched the filter's pattern is removed from all privileged groups (sysop, bureaucrat, etc). An error message is displayed to inform them of this action.
    \item \emph{rangeblock}: The entire /16 IP range from which the filter was triggered is blocked for a week.
    \item \emph{disallow}: An error message is shown to the editor informing them their edit was considered unconstructive and will not be saved. They are provided the opportunity to report a false positive.
 \end{itemize}
@@ -167,9 +169,9 @@ for the period between the announcement that the extension is planned up until t
 For a while at the beginnings of the discussion, there was some confusion among editors regarding the intended functionality of the edit filters.
 Participants invoked various motivations for the introduction of the extension (which sometimes contradicted each other) and argued for or against the filters depending on these.
 The discussion reflects a mix of ideological and practical concerns.
-The biggest controversies lay along the lines of filters being public-vs-private and the actions the filters were to invoke upon a match.
+The biggest controversies lay along the lines of filters being public-vs-private (hidden from public view) and the actions the filters were to invoke upon a match.
 An automated rights revokation or a block of the offending editor with no manual confirmation by a real person were of particular concern to a lot of editors (they were worried that the filters would not be able to understand context thus resulting in too many false positives and blocking many legitimate edits and editors).
-As far as I understood, these features were technically implemented but never really used on English Wikipiedia (although there are \emph{blockautopromote} actions triggered in the \emph{abuse\_filter\_log}). %TODO investigate what exactly this means and why it hasn't happened since 2012
+As far as I understood, these features were technically implemented but never really used on English Wikipiedia.

 As to the public-vs-private debate, the initial plan was that all filters are hidden from public view and only editors with special permissions (the edit filter managers) were supposed to be able to view and modify the patterns and consult the logs.
 The core developer of the extension was reasoning that its primary purpose was to fend off really persistent vandals with reasonable technical understanding who were ready to invest time and effort to circumvent anti-vandal measures
@@ -317,13 +319,13 @@ and all edits that trigger an edit filter are listed in the Abuse Log~\cite{Wiki
 \begin{figure}
 \centering
  \includegraphics[width=0.9\columnwidth]{pics/screenshots-filter-trigger/Screenshot-abuse-log.png}
-  \caption{Abuse Log showing all filter triggers by User:Schnuppi4223}~\label{fig:screenshot-abuse-log}
+  \caption{Abuse Log showing all filter matches by User:Schnuppi4223}~\label{fig:screenshot-abuse-log}
 \end{figure}

 \begin{figure}[t]
 \centering
  \includegraphics[width=0.9\columnwidth]{pics/screenshots-filter-trigger/Screenshot-trigger-warning-filter.png}
-  \caption{Editor gets notified their edit triggered multiple edit filters}~\label{fig:screenshot-warn-disallow}
+  \caption{Editor gets notified their edit matched multiple edit filters}~\label{fig:screenshot-warn-disallow}
 \end{figure}



--- a/thesis/5-Overview-EN-Wiki.tex
+++ b/thesis/5-Overview-EN-Wiki.tex
@@ -402,7 +402,7 @@ As demonstrated on figure~\ref{fig:filter-hits-actions}, there was above all a s
 As discussed in section~\ref{sec:introduce-a-filter}, it is an established praxis to introduce new filters in ``log only'' mode and only switch on additional filter actions after a monitoring period showed that the filters function as intended.
 Hence, it is plausible that new filters in logging mode were introduced, which were then switched off after a significant number of false positives occured.
 However, upon closer scritiny, this could not be confirmed.
-The most frequently triggered filters in the period January–March 2016 are mainly the most triggered filters of all times and nearly all of them have been around for a while in 2016.
+The filters with greatest number of hits in the period January–March 2016 are mainly the most triggered filters of all times and nearly all of them have been around for a while in 2016.
 Also, no bug or a comparable incident with the software was found upon an inspection of the extension's issue tracker~\cite{phab-abusefilter-2015}, or commit messages of the commits to the software done during May 2015–May 2016~\cite{gerrit-abusefilter-source}.
 Moreover, no mention of the hits surge was found in the noticeboard~\cite{Wikipedia:EditFilterNoticeboard} and edit filter talk page archives~\cite{Wikipedia:EditFilterTalkArchive2016}.
 The in section~\ref{sec:filter-activity} mentioned condition limit has not changed either, as far as I can tell from the issue tracker, the commits and discussion archives, so the possible explanation that simply more filters have been at work since 2016 seems to be refuted as well.
@@ -421,19 +421,19 @@ Their edits however constitute some 1-3\% of all hits from the period, so the ex
 (Yes, it was viagra spam, and yes, a ``whois'' lookup proved them to really be Russian IPs.
 And, yes, whoever was editing could've also used a VPN, so I'm not opening a Russian bot fake news conspiracy theory just yet.)
 A closer/more systematic scrutiny (syn!) of the editors causing the hits may be insightful though.
-Right now, all the data analysed on the matter stems from the \emph{abuse\_filter\_log} table and the checks of the content of the edits were done manually on a sample basis via the web frontend of the AbuseLog~\cite{Wikipedia:AbuseLog} where one can click on the diff of the edit for edits that triggered public filters.
-No simple automated check of what the offending editors were contributing was possible since the \emph{abuse\_filter\_log} table does not store the text of the edit which triggered a filter directly, but rather contains a reference to the \emph{text} table where the wikitext of all individual page revisions is stored~\cite{Wikipedia:TextTable}.
+Right now, all the data analysed on the matter stems from the \emph{abuse\_filter\_log} table and the checks of the content of the edits were done manually on a sample basis via the web frontend of the AbuseLog~\cite{Wikipedia:AbuseLog} where one can click on the diff of the edit for edits that matched public filters.
+No simple automated check of what the offending editors were contributing was possible since the \emph{abuse\_filter\_log} table does not store the text of the edit which matches a filter's pattern directly, but rather contains a reference to the \emph{text} table where the wikitext of all individual page revisions is stored~\cite{Wikipedia:TextTable}.
 One needs to join the hit data from \emph{abuse\_filter\_log} with the \emph{text} table to obtain the content of the edits.

 Last but not least, I took a step back and contemplated the significant geo/socio-political events from the time, which triggered a lot of media (and Internet) attention and desinformation campaigns.
 Following things came to mind: 2016 US elections, the Brexit referendum and the so-called ``refugee crisis'' in Europe.
 There was also a severe organisational crisis in Wikimedia at the time during which a lot of staff left and eventually the executive director stepped down.

-However, I couldn't draw a direct relationship between any of these political events and the edits which triggered edit filters.
+However, I couldn't draw a direct relationship between any of these political events and the edits caught by edit filters.
 An investigation into the pages on which the filters were triggered proved them (the pages) to be quite innocuous:
 the page where most filter hits were logged in January 2016 (beside the login page, on which all account creations are logged) was ``Skateboard'' and the $660$ filter hits here seem like a drop in the ocean compared to the $372.907$ hits for the whole month.
-And the most triggered page in March (apart from the user login page) was the user page for user 209.236.119.231 who was also the editor with second most hits and who was apparently trying to post spam links on his own user page (after posting twice to ``Skateboard'').
-In general, the pages on which filters are triggered seem more like a randomly selected platform (syn) on which the disrupting editors unload their spam.
+And the page in March (apart from the user login page) on which most filter hits took place was the user page for user 209.236.119.231 who was also the editor with second most hits and who was apparently trying to post spam links on his own user page (after posting twice to ``Skateboard'').
+In general, the pages on which filters match seem more like a randomly selected platform (syn) on which the disrupting editors unload their spam.
 %Should I even mention this at all?

 \begin{figure}
@@ -475,7 +475,8 @@ Rather, among the 10 most active filters, it is filter 527 ``T34234: log/throttl
    specific: and then have a look at the most active filters (of all times and if applicable per year) (also directly with what is their assigned manual tag)
 \end{comment}

-Another assumption that proved to be wrong/didn't quite carry into effect was that ``filters in this extension would be triggered fewer times than once every few hours''.
+Another assumption that proved to be wrong/didn't quite carry into effect was that ``filters in this extension would be triggered fewer times than once every few hours''
+\footnote{Here, by ``trigger'' is meant that an editor's action will match a filter's pattern and set off the configured filter's action(s).}.
 As a matter of fact, a quick glance at the AbuseLog~\cite{Wikipedia:AbuseLog} confirms that there are often multiple filter hits per minute.
 %TODO compute means --> we can conclude from these numbers that the mechanism is quite actively used


--- a/thesis/6-Discussion.tex
+++ b/thesis/6-Discussion.tex
@@ -120,7 +120,7 @@ and that I'm not willing to offer potential trolls ready-made lists.
 Finally, it stands to reason that if we are interested in the question when do people (who have access to both) implement a bot and when a filter, all we have to do is ask (see directions for future research in section~\ref{}).

 At the end, we should also ask ourselves why get certain filters and not others? What kinds of biases/problems are there?
-Is it fair and justified that a great number of filters are triggered only by new (not confirmed) editors?
+Is it fair and justified that a great number of filters target only new (not confirmed) editors?
 Why is it all right for an established editor to use swear words whereas it is not for newbies (see filter 384)?

 %Fazit
@@ -213,5 +213,5 @@ There are also various complaints/comments by users bewildered that their edits
    \item \textbf{Is there a qualitative difference between the tasks/patterns of public and hidden filters?}: We know of one general guideline/rule of a thumb (cite!) according to that general filters are to be public while filters targeting particular users are hidden. Is there something more to be learnt from an actual examination of hidden filters? One will have to request access to them for research purposes, sign an NDA, etc.
    \item \textbf{Do edit filter managers specialize on particular types of filters (e.g. vandalism vs good faith?)} \emph{abuse\_filter\_history } table is needed for this
    \item \textbf{What proportion of quality control work do filters take over?}: compare filter hits with number of all edits and reverts via other quality control mechanisms
-    \item \textbf{Do edit filter managers stick to the edit filter guidelines?}: e.g. no trivial problems (such as spelling mistakes) should trigger filters; problems with specific pages are generally better taken care of by protecting the page and problematic title by the title blacklist; general filters shouldn't be hidden
+    \item \textbf{Do edit filter managers stick to the edit filter guidelines?}: e.g. filters should't be implemented for trivial problems (such as spelling mistakes); problems with specific pages are generally better taken care of by protecting the page and problematic title by the title blacklist; general filters shouldn't be hidden
 \end{enumerate}