Finish refactoring chap 4

641b77a3 · Lyudmila Vaseva · 3f80f664 · 641b77a3 · 641b77a3 · 641b77a3
Commit 641b77a3 authored 5 years ago by Lyudmila Vaseva
--- a/thesis/2-Background.tex
+++ b/thesis/2-Background.tex
@@ -93,12 +93,14 @@ The researchers also warn that wording is tremendously important for the percept

 %TODO Concerns?

+\subsection{Page protection, etc.}
+\label{sec:page-protection}
 %TODO  Incorporate this, moved from chap. 4
 \begin{comment}
 \subsection{Alternatives to Edit Filters}
-
+%TODO explain each of these mechanisms in some detail
 Since edit filters run against every edit saved on Wikipedia, it is generally adviced against rarely tripped filters and a number of alternatives is offered to edit filter managers and editors proposing new filters.
-For example, there is the page protection mechanism suitable for handling a higher number of incidents concerning single page.
+For example, there is the page protection mechanism suitable for handling a higher number of incidents concerning single page. 
 Also, title and spam blacklists exist and these might be the way to handle disruptive page titles or link spam~\cite{Wikipedia:EditFilter}.
 (It is worth to note at this place, that both blacklists are also rule-based.)
 Moreover, it is recommended to run in-depth checks (e.g. for single articles) separately, for example by using bots~\cite{Wikipedia:EditFilterRequested}.
@@ -134,6 +136,10 @@ This is for one a harmful way to view the project, neglecting the ``assume good
 and also leads to such users seeking out easy to judge instancies from the queues in order to move onto the next entry more swiftly and gather more points
 leaving more subtle cases which really require human judgement to others.

+Transparency wise, one can criticise that the heuristics they use to compile the queues of potential malicious edits in need of attention are oftentimes obfuscated by the user interface and so the editors using them are not aware why exactly these and not other edits are displayed to them.
+The heurisics to use are configurable to an extent, however, one needs to be aware of this option. %TODO maybe move to pitfalls/concerns discussion
+
+
 %TODO review this concern as well!
 \begin{comment}
 \cite{HalGeiTer2014}

--- a/thesis/4-Edit-Filters.tex
+++ b/thesis/4-Edit-Filters.tex
@@ -331,11 +331,6 @@ and all edits that trigger an edit filter are listed in the Abuse Log~\cite{Wiki

 \section{Edit filters' role in the quality control ecosystem}

-\begin{comment}
-%TODO revise question with updated research questions from meeting notes 04.07.2019
-From l.12
-In the present chapter, we aim to understand how edit filters work, who implements and runs them and above all, how and why they were introduced in the first place and what the qualitative difference is between them and other algorithmic quality control mechanisms.
-\end{comment}
 The purpose of the present section is to review what we have learnt so far about edit filters and summarise how they fit in Wikipedia's quality control ecosystem.

 As timeline~\ref{fig:timeline} shows, the time span in which algorithmic quality control mechanisms (first vandal fighting bots and semi-automated tools, and later filters) were introduced fits logically the period after the exponential growth of Wikipedia took off in 2006 (compare figures~\ref{fig:editors-development},~\ref{fig:edits-development}).
@@ -378,48 +373,52 @@ As shown elsewhere~\cite{HalGeiMorRied2013}, this shift had a lot of repercussio
    \caption{EN Wikipedia: Number of edits over the years (source: \url{https://stats.wikimedia.org/v2/})}~\label{fig:edits-development}
 \end{figure}

-% Comparison of the mechanisms: each of them has following salient characteristics
 \subsection{Wikipedia's algorithmic quality control mechanisms in comparison}

-As we can read from timeline~\ref{fig:timeline}, filters were introduced at a moment when bots and semi-automated tools were already in place.
+As we can read from timeline~\ref{fig:timeline}, edit filters were introduced at a moment when bots and semi-automated tools were already in place.
 Thus, the question arises: Why were they implemented when already these other mechanisms existed?
 Here, we review the salient features of the different quality control mechanisms and the motivation for the filters' introduction.
 A concise summary of this discussion is offered in table~\ref{table:mechanisms-comparison}.

-The big adavantages of the edit filter extension are that it was going to be open source, the code well tested, with framework for testing single filters before enabling them and edit filter managers being able to collaboratively develop and improve filters, were the arguments of the plugin's developers.
-They viewed this as an improvement compared to (admin) bots which would be able to cover similar cases but whose code was mostly private, not tested at all, and with a single developer/operator taking care of them who was often not particularly responsive in emergency cases.
-(The most popular semi-automated anti-vandalism tools are also open source, their focus however lies somewhat differently, that is why probably they are not mentioned at all in this discussion.
-Transparency wise, one can criticise that the heuristics they use to compile the queues of potential malicious edits in need of attention are oftentimes obfuscated by the user interface and so the editors using them are not aware why exactly these and not other edits are displayed to them.
-The heurisics to use are configurable to an extent, however, one needs to be aware of this option. %TODO maybe move to pitfalls/concerns discussion
-ORES is open source as well, it is kind of a meta tool though and was besides introduced some 7 years after the edit filters, so obviously people were not discussing it at the time.)
-
-However, the main argument for introducing the extension remain the usecases it was supposed to take care of: the obvious persistent vandalism (often automated itself) which was easy to recognise but more difficult to clean up.
-Filters were going to do the job more neatly than bots by reacting faster, since the extension was part of the core software, %TODO reformulate, sounds semantically weird
-not allowing abusive content to become public at all.
-%Human editors are not very fast in general and how fast it is solving this with a bot depends on how often the bot runs and what's its underlying technical infrastructure (e.g. I run it on my machine in the basement which is probably less robust than a software extension that runs on the official Wikipedia servers).
-By being able to disallow such malicious edits from the beginning, the extension was to reduce the workload of other mechanisms and free up resources for vandal fighters using semi-automated tools or monitoring pages manually to work on less obvious cases that required human judgement.
-
-%TODO clean up these paragraphs
+Since edit filters are a fully automated mechanism, above all a comparison to bots seems obvious.
+The main argument for introducing the extension were the usecases it was supposed to take care of: the obvious persistent vandalism (often automated itself) which was easy to recognise but more difficult to clean up.
+Filters were going to do the job more neatly than bots by reacting faster, since the extension was part of the core software,
+and since they are triggered \emph{before} an edit is published–by not allowing abusive content to become public at all.
+By being able to disallow such malicious edits from the beginning, the extension was to reduce the workload of other mechanisms and free up resources for vandal fighters using semi-automated tools or monitoring pages manually to work on less obvious cases that required human judgement, reasoned proponents of the filters.
+
+%Structural/soft factors
+The rest of the arguments for edit filters vs bots touched on in the discussion prior to introducing filter~\cite{Wikipedia:EditFilterTalkArchive1} were more of infrastructural/soft nature. %TODO find a better description for this.
+The plugin's developers optimistically announced that it was going to be open source, the code well tested, with framework for testing single filters before enabling them and edit filter managers being able to collaboratively develop and improve filters.
+They viewed this as an improvement compared to (admin) bots which would be able to cover similar cases but whose code was mostly private, not tested at all, and with a single developer/operator taking care of them who was often not particularly responsive in emergency cases
+\footnote{For the sake of completeness, it should be mentioned here that the most popular semi-automated anti-vandalism tools are also open sourced.
+Their focus however lies somewhat differently, since a final human decision is required, and that is why probably they are not mentioned at all in this discussion.
+ORES is open source as well, it is kind of a meta tool that can be employed by the other mechanisms though and that is a why a direct comparison is also not completely feasible.
+Besides, it was introduced some 6-7 years after the edit filters, so obviously people were not discussing it at the time.
+}.
+
+% Comparison filters vs page protection
+Another apparent comparison is the one between edit filters and MediaWiki's page protection mechanism~\cite{Mediawiki:PageProtection}.
+As pointed out in section~\ref{sec:page-protection}, page protection is reasonable when a rise in disruptive activity on a particular page occurs.
+Similarly to applying an edit filter aiming at the specific page, page protection would simply disallow edits to it from the start.
+The difference however is that edit filters could target a specific malicious user (or users) directly, without imposing restrictions on the vast majority of editors.
+
+%Who does all of this, how difficult is it to become involved
+%TODO this seems a bit out of place, but no better placement found so far
 From all the mechanisms, it is probably the hardest to become engaged with edit filters.
 As signaled in section~\ref{section:who-can-edit}, the permissions are only granted to very carefully selected editors who have long history of participation on Wikipedia and mostly also various other special permissions.
 The numbers also demonstrate that this is the most exclusive group:
 as mentioned in section~\ref{section:who-can-edit}, there are currently 154 edit filter managers on EN Wikipedia,
 compared to at least 232 bot operators~\cite{Wikipedia:BotOperators} (most likely not all bot operators are listed in the category~\cite{Wikipedia:FAQCategory})
 and 6130 users who have the \emph{rollback} permission~\cite{Wikipedia:Rollback}.
-As to the difficulty/compteneces needed, it is probably easiest to learn to use semi-automated tools where one ``only'' has to learn the user interface of the software.
-Bots arguably require most background knowledge since on has to not only be familiar with a programming langauage but also learn to interact with Wikipedia's API, etc.
-Filters on the other hand, are arguably(syn) easier to use: here, ``only'' understanding of regular expressions is required.
+As to the difficulty/compteneces needed, it is probably easiest to learn to use semi-automated tools where one ``only'' has to master the user interface of the software.
+Bots require presumably most background knowledge since one has to not only be familiar with a programming langauage but also learn to interact with Wikipedia's API, etc.
+Filters on the other hand, are arguably easier to use: here, ``only'' understanding of regular expressions is required.

-Critical voices express different concerns about the individual mechanisms:
 %Different pitfalls and concerns are express
-%TODO finish
-
-\begin{comment}
-    \hline
-        \multirow{2}{*}{Concerns} & censorship infrastructure & ``botophobia'' & gamification & general ML concerns: hard to understand \\
-                                  & powerful, can in theory block editors based on (hidden) filters & & & \\
-\end{comment}
-
+As already summarised in chapter~\ref{chap:background}, critical voices worry about various aspects of the individual quality control mechanisms (see also table~\ref{table:mechanisms-comparison}).
+Concerns with filters resemble somewhat the concerns expressed about bots: namely, the apprehension of a fully-automated mechanism taking (potentially erroneuos) decisions about excluding editors from participation.
+In consequence, community consensus on using filter actions such as \emph{rangeblock}, \emph{block}, and \emph{degroup} never happened.
+According to the discussion archives~\cite{Wikipedia:EditFilterTalkArchive1}, others feared that edit filters were placing a lot of power in the hands of very few people.

 \begin{landscape}
    \begin{longtable}{ | p{3cm} | p{4.5cm} | p{4.5cm} | p{4.5cm} | p{4.5cm} | }
@@ -441,7 +440,8 @@ Critical voices express different concerns about the individual mechanisms:
                                            & understand REGEXes & programming knowledge, understand APIs, ... & get familiar with the tool & understand ML \\
    \hline
        \multirow{2}{*}{Concerns} & automated agents blocking/desysoping human users & ``botophobia'' & gamification & general ML concerns: hard to understand \\
-                                  & hidden filters lack transparency and accountability & & & \\
+                                  & hidden filters lack transparency and accountability & & interface makes some paths of action easier than others & \\
+                                  & censorship infrastructure & & & \\
    \hline
        Areas of application & persistent vandal with a known modus operandi and a history of circumventing prevention methods' demographic (obvious vandalism which takes time to clean up) & mostly obvious vandalism & less obvious cases that require human judgement & \\
    \hline
@@ -449,6 +449,7 @@ Critical voices express different concerns about the individual mechanisms:
 \end{longtable}
 \end{landscape}

+
 \begin{comment}
 \begin{verbatim}
                     | Filters                          | Bots                          |  Semi-Automated tools   |  ORES
@@ -502,18 +503,8 @@ Application areas    |
 \end{verbatim}
 \end{comment}

-% When is which mechanism used
-%\subsection{Application areas of the individual mechanisms}
-\subsection{Alternatives to Edit Filters}
-%TODO is this the most suitable place for this? If yes, write a better preamble
-
-Since edit filters run against every edit saved on Wikipedia, it is generally adviced against rarely tripped filters and a number of alternatives is offered to edit filter managers and editors proposing new filters.
-For example, there is the page protection mechanism suitable for handling a higher number of incidents concerning single page.
-Also, title and spam blacklists exist and these might be the way to handle disruptive page titles or link spam~\cite{Wikipedia:EditFilter}.
-(It is worth to note at this place, that both blacklists are also rule-based.)
-Moreover, it is recommended to run in-depth checks (e.g. for single articles) separately, for example by using bots~\cite{Wikipedia:EditFilterRequested}.
+%*************************************************************

-% Collaboration of the mechanisms
 \subsection{Collaboration of the mechanisms}
 %\subsection{Collaboration with bots (and semi-automated tools)}
 \label{subsection:collaboration-bots-filters}
@@ -527,16 +518,15 @@ The researchers demonstrate how a bot (ClueBot), and several editors using the s

 During the present study, I have also observed various cases of edit filters and bots mutually facilitating each other's work.
 %TODO check whether there are other types of cooperations at all: what's the deal with Twinkle? and update here!
-% are there further examples of such collaborations: consider scripting smth that parses the bots descriptions from https://en.wikipedia.org/wiki/Category:All_Wikipedia_bots and looks for "abuse" and "filter" -- nice idea, but no time
 DatBot, Mr.Z-bot and MusikBot are all examples for bots conducting support tasks for filters.
 DatBot~\cite{Wikipedia:DatBot} monitors the Abuse Log~\cite{Wikipedia:AbuseLog}
 and reports users tripping certain filters to WP:AIV (Administrator intervention against vandalism)\cite{Wikipedia:AIV} and WP:UAA (usernames for administrator attention)~\cite{Wikipedia:UAA}.
 It is the successor of Mr.Z-bot~\cite{Wikipedia:MrZBot}
 which used to report users from the abuse log to WP:AIV, but has been inactive since 2016 and therefore recently deactivated.
-%\url{https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/Mr.Z-bot_7}

 MusikBot also has several tasks dedicated to monitoring different aspects of edit filter behaviour and compiling reports for anyone interested:
-The FilterMonitor task ``[r]eports functional changes of edit filters to the watchable page User:MusikBot/FilterMonitor/Recent changes. The template \verb|{{recent filter changes}}| formats this information and can be transcluded where desired''~\cite{Wikipedia:MusikBotFilterMonitor}.
+The FilterMonitor task ``[r]eports functional changes of edit filters to the watchable page User:MusikBot/FilterMonitor/Recent changes. The template\\ %End line since otherwise template name protrudes in margin
+\verb|{{recent filter changes}}| formats this information and can be transcluded where desired''~\cite{Wikipedia:MusikBotFilterMonitor}.
 The StaleFilter task ``[r]eports enabled filters that have not had any hits in over 30 days, as specified by \verb|/Offset|''~\cite{Wikipedia:MusikBotStaleFilters}.
 The AbuseFilterIRC task ``[r]elays all edit filter hits to IRC channels and allows you to subscribe to notifications when specific filters are tripped. See \verb|#wikipedia-en-abuse-log-all| for the English Wikipedia feed''~\cite{Wikipedia:MusikBotAbuseFilterIRC}.

@@ -546,27 +536,27 @@ Although it is hidden, so we cannot view any details, filter 603 is named ``Spec
 And there are several filters (historically) configured to ignore particular bots: filter 76 (``Adding email address'') exempting XLinkBot, filter 28 (``New user redirecting an existing substantial page or changing a redirect'') exempting Anybot, filter 532 (``Interwiki Addition'') exempting Cydebot are some examples thereof.
 There are also filters configured to ignore all bots: filter 368 (``Making large changes when marking the edit as minor''), filter 702 (``Warning against clipboard hijacking''), filter 122(``Changing Username malformed requests'').

-\begin{comment}
-Apparently, Twinkle at least has the possibility of using heuristics from the abuse filter log for its queues.
-%TODO is that so? the only place I can find abuse filters mentioned with Twinkle is in the source code: https://github.com/azatoth/twinkle/blob/master/morebits.js#L2636; and I'm not quite sure what this part of the code does.
-\end{comment}
+Moreover, on occasions, data from the Abuse Log is used for (semi-)protecting frequently disrupted pages.

+%*************************************************************

 \subsection{Conclusions}
 %Conclusion, resume, bottom line, lesson learnt, wrap up

 In short, in this chapter we studied edit filters' documentation and community discussions and worked out the salient characteristics of this mechanism.
 We also compared the filters to other quality control technologies on Wikipedia such as bots, semi-automated anti-vandalism tools and the machine learning framework ORES.
-We studied(syn) the filters(syn) in the context and time of their introduction and concluded that the community (syn) introduced them as a means to fight obvious, particularly persistent (syn), and cumbersome to remove vandalism.
-Other minor (syn) arguments such as dissatisfaction with bot development processes (poorly tested, non-responsive operators) seemed to encourage the introduction as well.
+We considered edit filters in the context and time of their introduction and concluded that the community implemented them as a means to fight obvious, particularly persistent, and cumbersome to remove vandalism by disallowing it on the spot.
+Other ``softer'' arguments such as dissatisfaction with bot development processes (poorly tested, non-responsive operators) seemed to encourage the introduction as well.
 The individual filters are implemented and maintained by edit filter managers, a special highly-restricted user group.

-Revising the quality control mechanisms collaboration(syn) diagram~\ref{fig:funnel-no-filters} we introduced in chapter~\ref{chap:background}, we can now properly place the filters on it (see figure~\ref{fig:funnel-with-filters}),
+Revising the quality control ecosystem diagram~\ref{fig:funnel-no-filters} introduced in chapter~\ref{chap:background}, we can now properly place the filters on it (see figure~\ref{fig:funnel-with-filters}),
 and conclude that claims of the literature (see section~\ref{section:bots}) should be revised: in terms of temporality not bots but edit filters are the first mechanism to actively fend off a disruptive edit.

+\begin{landscape}
 \begin{figure}
 \centering
  \includegraphics[width=0.9\columnwidth]{pics/funnel-with-filters.png}
  \caption{Edit filters' role in the quality control ecosystem}~\label{fig:funnel-with-filters}
 \end{figure}
+\end{landscape}

--- a/thesis/references.bib
+++ b/thesis/references.bib
@@ -299,6 +299,15 @@
                    \url{https://www.mediawiki.org/w/index.php?title=Extension:AbuseFilter/Rules_format&oldid=3240087}}
 }

+@misc{Mediawiki:PageProtection,
+  key =          "Mediawiki",
+  author =       {},
+  title =        {MediaWiki Page Protection},
+  year =         2019,
+  note =         {Retreived July 22, 2019 from
+                    \url{https://www.mediawiki.org/w/index.php?title=Help:Protecting_and_unprotecting_pages&oldid=2981908}}
+}
+
 @misc{Mediawiki:SpamBlacklist,
  key =          "Mediawiki Spam Blacklist",
  author =       {},