diff --git a/thesis/2-Background.tex b/thesis/2-Background.tex index 6a571ba2afec5c612632233fe4b4678c0a097b27..3dac1de0494fd4a3997ee9d6639b5270a06b5826 100644 --- a/thesis/2-Background.tex +++ b/thesis/2-Background.tex @@ -4,7 +4,7 @@ The present chapter studies the scientific literature on Wikipedia's quality control mechanisms in order to better understand the role of edit filters in this ecosystem. Before 2009, academic studies on Wikipedia tended to ignore algorithmic agents altogether. -The number of their contributions to the encyclopedia was found to be low and therefore their impact was considered insignificant~\cite{KitChiBrySuhMyt2007}. +The number of their contributions to the encyclopedia was found to be low and therefore their impact was considered insignificant \cite{KitChiBrySuhMyt2007}. This has gradually changed since around 2009 when the first papers specifically dedicated to bots (and later semi-automated tools such as Huggle and Twinkle) were published. In 2010, Geiger and Ribes insistently highlighted that the scientific community could no longer neglect these mechanisms as unimportant or noise in the data~\cite{GeiRib2010}. @@ -60,8 +60,8 @@ and also comment on the (un)realiability of external infrastructure bots rely up Further bots involved in vandal fighting (besides ClueBot~\cite{GeiRib2010} and ClueBot NG~\cite{GeiHal2013}, \cite{HalRied2012}) discussed by the literature include: XLinkBot (which reverts edits containing links to domains blacklisted as spam)~\cite{HalRied2012}, HBC AIV Helperbots (responsible for various maintenance tasks which help to keep entries on the Administrator intervention against vandalism (AIV) dashboard up-to-date)~\cite{HalRied2012}, \cite{GeiRib2010}, -MartinBot~\cite{Wikipedia:MartinBot} and AntiVandalBot~\cite{Wikipedia:AntiVandalBot} (one of the first rule-based bots which detected obvious cases of vandalism)~\cite{HalRied2012}, -DumbBOT~\cite{Wikipedia:DumbBOT} and EmausBot~\cite{Wikipedia:EmausBot} (which do batch cleanup tasks)~\cite{GeiHal2013}. +MartinBot \cite{Wikipedia:MartinBot} and AntiVandalBot \cite{Wikipedia:AntiVandalBot} (one of the first rule-based bots which detected obvious cases of vandalism) \cite{HalRied2012}, +DumbBOT \cite{Wikipedia:DumbBOT} and EmausBot \cite{Wikipedia:EmausBot} (which do batch cleanup tasks) \cite{GeiHal2013}. Very crucial for the current analysis will also be Livingstone's observation in the preamble to his interview with the first large scale bot operator Ram-man that ``[i]n the Wikimedia software, there are tasks that do all sorts of things [...]. diff --git a/thesis/4-Edit-Filters.tex b/thesis/4-Edit-Filters.tex index 97c9ca6e7b8525ad7a79cab2cb59bf84cbe41363..39212cd7ddf71691a843c37a65892eacf9161893 100644 --- a/thesis/4-Edit-Filters.tex +++ b/thesis/4-Edit-Filters.tex @@ -108,7 +108,8 @@ and actions to take when the filter's pattern matches. At the end, from a technical perspective, Wikipedia's edit filters are a MediaWiki plugin that allows every edit (and some other editor's actions) to be checked against a specified pattern before it is published. -The extension introduces following database tables where all data generated by it is stored: \emph{abuse\_filter}, \emph{abuse\_filter\_log}, \emph{abuse\_filter\_action} and \emph{abuse\_filter\_history} \cite{gerrit-abusefilter-tables}. +The extension introduces following database tables where all data generated by it is stored: \emph{abuse\_filter}, \emph{abuse\_filter\_log}, \emph{abuse\_filter\_action},\\ +and \emph{abuse\_filter\_history} \cite{gerrit-abusefilter-tables}. \emph{abuse\_filter} contains detailed information about every filter. \emph{abuse\_filter\_action} stores the currently configured actions for each filter and their corresponding parameters. Every update of a filter action, pattern, comments or other flags (whether the filter is enabled, hidden, deleted), etc. is recorded in \emph{abuse\_filter\_history}. @@ -120,7 +121,8 @@ As of 30 June 2019, these include: \emph{edit}, \emph{move}, \emph{delete}, \emp Historically, further editor's actions such as \emph{feedback}, \emph{gatheredit} and \emph{moodbar} could trigger an edit filter. These are in the meantime deprecated. %TODO explain why? I have the guess that these are not available in the software anymore (generally, not only for the edit filters) -When a filter's pattern is matched, beside logging the event in the \emph{abuse\_filter\_log} table (the only filter action which cannot be switched off), a further filter action may be invoked as well. +When a filter's pattern is matched, beside logging the event in the\\ +\emph{abuse\_filter\_log} table (the only filter action which cannot be switched off), a further filter action may be invoked as well. The plugin defines following possible filter actions: \emph{tag}, \emph{throttle}, \emph{warn}, \emph{blockautopromote}, \emph{block}, \emph{degroup}, \emph{rangeblock}, \emph{disallow}\footnote{See line 2808 in \url{https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/AbuseFilter/+/refs/heads/master/includes/AbuseFilter.php}}. %TODO verify that none of the actions are deprecated; I have my doubts that for instance `revoking auto-promoted groups' may not be available anymore -- as far as I can see they are available in the software. However, there never was a community consensus to use them @@ -235,7 +237,7 @@ The Edit Filter Requested page asks users to go through the following checklist \item filters, after adding up, make editing slower, so the usefulness of every single filter and condition has to be carefully considered; \item in depth checks should be done by a separate software that users run on their own machines; \item no trivial errors should be caught by filters (e.g. concerning style guidelines); - \item there are the Titles Blacklist~\cite{Mediawiki:TitleBlacklist} and the Link/Spam Blacklist~\cite{Mediawiki:SpamBlacklist} which should be used if the issue at hand has to do with a problematic title or link. + \item there are the Titles Blacklist~\cite{Mediawiki:TitleBlacklist} and the Link/Spam Blacklist \cite{Mediawiki:SpamBlacklist} which should be used if the issue at hand has to do with a problematic title or link. \end{itemize} For edit filter managers, the best practice way for introducing a new filter is described on the Edit Filter Instructions page~\cite{Wikipedia:EditFilterInstructions}. diff --git a/thesis/5-Overview-EN-Wiki.tex b/thesis/5-Overview-EN-Wiki.tex index d8b52807c78610800ee5d6a671058a6a29df1164..96e09ed072f5c05587533859ee121516298fa6a5 100644 --- a/thesis/5-Overview-EN-Wiki.tex +++ b/thesis/5-Overview-EN-Wiki.tex @@ -59,7 +59,9 @@ These are discussed in more detail later in this section, but first the coding i \subsection{Coding Process and Challenges} As already mentioned, I applied emergent coding on the dataset from the \emph{abuse\_filter} table and let the labels originate directly from the data. -I looked through the data paying special attention to the name of the filters (``af\_public\_comments'' field of the \emph{abuse\_filter} table), the comments (``af\_comments''), the pattern constituting the filter (``af\_pattern''), and the designated filter actions (``af\_actions''). +I looked through the data paying special attention to the name of the filters\\ +(``af\_public\_comments'' field of the \emph{abuse\_filter} table), the comments\\ +(``af\_comments''), the pattern constituting the filter (``af\_pattern''), and the designated filter actions (``af\_actions''). The assigned codes emerged from the data: some of them being literal quotes of terms used in the description or comments of a filter, while others summarised the perceived filter functionality. In addition to that, for vandalism related labels, I used some of the vandalism types elaborated by the community in~\cite{Wikipedia:VandalismTypes}.