diff --git a/thesis/2-Background.tex b/thesis/2-Background.tex index 21cb9b66156c70e785604acf8455823243fb3413..fdc049e7125498cfdbeb6622cef7cd6afb0825f9 100644 --- a/thesis/2-Background.tex +++ b/thesis/2-Background.tex @@ -80,7 +80,11 @@ This led to the social understanding that ``bots ought to be better behaved than ORES~\cite{ORES} is an API based free libre and open source (FLOSS) machine learning service ``designed to improve the way editors maintain the quality of Wikipedia'' \cite{HalTar2015} and increase the transparency of the quality control process. It uses learning models to predict a quality score for each article and edit based on edit/article quality assessments manually assigned by Wikipedians. Potentially damaging edits are highlighted, which allows editors who engage in vandal fighting to examine them in greater detail. -The service was officially introduced in November 2015 by Aaron Halfaker\footnote{\url{https://wikimediafoundation.org/role/staff-contractors/}} (principal research scientist at the Wikimedia Foundation) and Dario Taraborelli\footnote{\url{http://nitens.org/taraborelli/cv}} (Head of Research at Wikimedia Foundation at the time)~\cite{HalTar2015}. +The service was officially introduced in November 2015 by Aaron Halfaker\footnote{\url{https://wikimediafoundation.org/role/staff-contractors/}} (principal research scientist at the Wikimedia Foundation +\footnote{The Wikimedia Foundation is a non-profit organisation dedicated to collecting and disseminating free knowledge. %TODO cleanup footnote +Beside Wikipedia, it provides and maintains the infrastructure for a family of projects such as .... +(See ... for more information.)} +) and Dario Taraborelli\footnote{\url{http://nitens.org/taraborelli/cv}} (Head of Research at Wikimedia Foundation at the time)~\cite{HalTar2015}. Its development is ongoing, coordinated and advanced by Wikimedia's Scoring Platform team. Since ORES is API based, in theory a myriad of services can be developed that use the predicted scores, or new models can be trained and made available for everyone to use. The Scoring Platform team reports that popular vandal fighting tools such as Huggle (see next section) have already adopted ORES scores for the compilation of their queues~\cite{HalTar2015}. diff --git a/thesis/3-Methods.tex b/thesis/3-Methods.tex index 1f75dcf92d0c815e1295a9c657fedbb23bffa6ac..06e8800f1fb4c2108e438eb9b346bf50efb872fd 100644 --- a/thesis/3-Methods.tex +++ b/thesis/3-Methods.tex @@ -54,7 +54,7 @@ In order to gain a detailed understanding of what edit filters are used for on E Coding is the process of labeling data in a systematic fashion in an attempt to comprehend it. It is about seeking patterns in data and later—trying to understand these patterns and the relationships between them. Emergent coding is one possible approach for making sense of data in content analysis~\cite{Stemler2001}. -Its key characteristic is letting the codes emerge during the process contrasted to starting with a set of preconcieved codes (also known as ``a priori codes''). +Its key characteristic is letting the codes emerge during the process contrasted to starting with a set of preconceived codes (also known as ``a priori codes''). Scholars regard this as useful because that way the danger of trying to press data into predefined categories while potentially overlooking other, better fitting codes is reduced~\cite[p.17]{Charmaz2006}. Instead, the codes stem directly from observations of the data. diff --git a/thesis/5-Overview-EN-Wiki.tex b/thesis/5-Overview-EN-Wiki.tex index d07946808be809e9649cbed00431eeeebca11afa..eebd744472decdff49c38d767caa036746554898 100644 --- a/thesis/5-Overview-EN-Wiki.tex +++ b/thesis/5-Overview-EN-Wiki.tex @@ -1,33 +1,30 @@ \chapter{Descriptive overview of Edit Filters on the English Wikipedia} \label{chap:overview-en-wiki} -The purpose of this chapter (syn?) is to explore the edit filters on the Englisch Wikipedia. +The purpose of this chapter is to explore the edit filters on the Englisch Wikipedia. We want to gather a understanding of what types of tasks these filters take over, and, as far as feasible, trace how these tasks have evolved over time. -%TODO describe what each section is about The data upon which the analysis is based is described in section~\ref{sec:overview-data} and the methods used—in chapter~\ref{chap:methods}. We look into the manual classification of EN Wikipedia's edit filters I've undertaken in an attempt to understand what is it that they actually filter in section~\ref{sec:manual-classification}. Section~\ref{sec:patterns} studies some general characteristics of the edit filters, whereas their activity is analysed in section~\ref{sec:filter-activity}. -And finally, some historical patterns are observed in section~\ref{sec:5-history}. - -%TODO tell a story with the chapter: what do filters do? How have their tasks evolved over time (if feasible) --> maybe tell it along the peak: it is an extraordinary situation, where we can see it is exactly X and Y and Z what filters do +And finally, some historical patterns are observed in section~\ref{sec:5-history}. %TODO maybe kick out historical patterns \section{Data} \label{sec:overview-data} -A big part of the present analysis is based upon the \emph{abuse\_filter} table from \emph{enwiki\_p}(the database which stores data for the EN Wikipedia), or more specifically a snapshot thereof which was downloaded on January 6th, 2019 via quarry, a web-based service offered by Wikimedia for running SQL queries against their public databases~\footnote{\url{https://quarry.wmflabs.org/}}. +A big part of the present analysis is based upon the \emph{abuse\_filter} table from \emph{enwiki\_p} (the database which stores data for the EN Wikipedia), or more specifically a snapshot thereof which was downloaded on January 6th, 2019 via quarry, a web-based service offered by Wikimedia for running SQL queries against their public databases~\footnote{\url{https://quarry.wmflabs.org/}}. The complete dataset can be found in the repository for the present paper~\cite{gitlab}. This table, along with \emph{abuse\_filter\_actions}, \emph{abuse\_filter\_log}, and \emph{abuse\_filter\_history}, are created and used by the AbuseFilter MediaWiki extension~(\cite{gerrit-abusefilter-tables}), as discussed in section~\ref{sec:mediawiki-ext}. Selected queries have been run via quarry against the \emph{abuse\_filter\_log} table as well. These are the foundation for the filters activity analysis undertaken in section~\ref{sec:filter-activity}. -Unfortunately, the \emph{abuse\_filter\_history} table which will be necessary for a complete historical analysis of the edit filters is currently not exposed to the public due to security/privacy concerns~\cite{phabricator}. -%TODO footnote about the submitted patch -Hence, in section~\ref{sec:5-history} the present work only touches upon historical trends in a qualitative fashion. %TODO how are these determined: API to abuse_filter_history; general stats from abuse_filter -or qualitatively shows patterns. +Unfortunately, the \emph{abuse\_filter\_history} table which will be necessary for a complete historical analysis of the edit filters is currently not exposed to the public due to security/privacy concerns~\cite{phabricator} +\footnote{A patch was submitted to Wikimedia's operations repository where the replication scripts for all publicly exposed databases are hosted~\cite{gerrit-tables-replication}. +It is in a process of review, so hopefully, historical filter research will be possible in the future.}. +Hence, in section~\ref{sec:5-history} the present work only touches upon historical trends in a qualitative fashion. %TODO how are these determined: API to abuse_filter_history; general stats from abuse_filter; OR maybe just get rid of the section altogether A comprehensive historical analysis is therefore one of the directions for future research discussed in section~\ref{sec:further-studies}. A concise description of the tables has been offered in section~\ref{sec:mediawiki-ext} which discusses the AbuseFilter MediaWiki extension in more detail. @@ -43,19 +40,24 @@ These are discussed in more detail later in this section, but first the coding i \subsection{Coding process and challenges} -As already mentioned, I applied emergent coding and let the labels originate directly from the data. +As already mentioned, I applied emergent coding on the dataset from the \emph{abuse\_filter} table and let the labels originate directly from the data. I looked through the data paying special attention to the name of the filters (``af\_public\_comments'' field of the \emph{abuse\_filter} table), the comments (``af\_comments''), the pattern constituting the filter (``af\_pattern''), and the designated filter actions (``af\_actions''). -The assigned codes emerged from the data: some of them being literal quotes of terms used in the decription or comments of a filter, while others summarised the perceived filter functionality. +The assigned codes emerged from the data: some of them being literal quotes of terms used in the description or comments of a filter, while others summarised the perceived filter functionality. In addition to that, for vandalism related labels, I used some of the vandalism types identified by the community in~\cite{Wikipedia:VandalismTypes}. However, this typology was regarded more as an inspiration instead of being adopted 1:1 since some of the types were quite general whereas more specific categories seemed to render more insights. For instance, I haven't applied the ``addition of text'' category since it seemed more useful to have more specific labels such as ``hoaxing'' or ``silly\_vandalism'' (check the code book in the appendix~\ref{app:code_book} for definitions). Moreover, I found some of the proposed types redundant. -For example, ``sneaky vandalism'' seems to overlap partially with ``hoaxing'' and partially with ``sockpuppetry'', ``link vandalism'' mostly overlaps with ``spam'' or ``self promotion'' (although not always), and for some reason, ``personal attacks'' are listed twice. +For example, ``sneaky vandalism'' seems to overlap partially with ``hoaxing'' and partially with ``sockpuppetry'', ``link vandalism'' mostly overlaps with ``spam'' or ``self promotion'' (although not always), and for some reason, ``personal attacks'' are listed twice. %TODO check I havent actually used "link vandalism" -I have labeled the dataset twice. -The motivation therefor was to return to it once I've gained better insight into the data and use this newly gained knowledge to re-evaluate ambiguous cases, i.e. re-label some data with codes that emerged later in the process. -This mode of labeling is congruous with the simultaneous coding and data analysis suggested by grounded theorists (compare section~\ref{sec:gt}). +Based on the emergent coding method described in section~\ref{sec:gt}, I have labeled the dataset twice. +I let potential labels emerge during the first round of coding. +Then, I scrutinised them, merging labels that seemed redundant and letting the most descriptive code stay. +At the same time, the codes were also sorted and unified into broader categories which seemed to relate the single labels to each other. +Thereby, a code book with the conclusive codes was defined (see appendix~\ref{app:code_book}). +Subsequently, I labeled the whole dataset again using the code book. +Unfortunately, the validation steps proposed by the method could not be realised, since no second researcher was available for the labeling. +This is one of the limitations discussed in section~\ref{sec:limitations}, and respectively something that can and should be remedied in future research. %1st labeling Following challenges were encountered during the first round of labeling: @@ -78,11 +80,13 @@ On the other hand, filters set to ``disallow'' were tagged as ``vandalism'' or a For the second round of labeling, I tagged the whole dataset again using the compiled code book (see \ref{app:code_book}) and assigned to every filter exactly one label—the one deemed most appropriate (although oftentimes alternative possibilites were listed as notes), without looking at the labels I assigned the first time around. I intended to compare the labels from both coding sessions and focus on more ambiguous cases, re-evaluting them using all available information (patterns, public comments, labels from both sessions, as well as any notes I made along the line). -Unfortunately, there was no time, so the analysis of the present section is based upon the second round of labeling. -Comparing codes from both labeling sessions and refining the coding is one of the possibilities for future research. %TODO (re-formulate!) +Unfortunately, time was scarce, so the analysis of the present section is based upon the second round of labeling. +Comparing codes from both labeling sessions and refining the coding, or respectively have another person label the data should be done in the future. + +The datasets developed during both labeling sessions are available in project's repository~\cite{gitlab}. -%TODO disclose links to 1st and 2nd labelling -First round of labeling is available under +As signaled at the beginning of the section, following four parent categories of codes were identified: ``vandalism'', ``good faith'', ``maintenance'', and ``unknown''. +The subsections that follow discuss the salient properties of each of them. \begin{comment} % Kept as a possible alternative wording for private vs public and labeling decisions in ambiguous cases @@ -94,11 +98,6 @@ While there are cases of juvenile vandalism (putting random swear words in artic For these, from the edit alone there is no way of knowing whether the deletion was malicious or the editor conducting it just wasn't familiar with say the correct procedure for moving an article. \end{comment} -%TODO axial coding: sounds a bit lame still? -At the end, an axial coding phase took place in which the identified codes were sorted and unified into broader categories which seemed to relate the single labels to each other. -As signaled at the beginning of the section, following four categories were identified (syn): ``vandalism'', ``good faith'', ``maintenance'', and ``unknown''. -The subsections that follow discuss the salient properties of each of them. - \subsection{Vandalism} The vast majority of edit filters on EN Wikipedia could be said to target (different forms of) vandalism, i.e. maliciously intended disruptive editing. diff --git a/thesis/6-Discussion.tex b/thesis/6-Discussion.tex index a70aeca2769456ab42399f881492a604372c8d33..67964e85c3d1461cdb5e911203f6ef3c29d5e114 100644 --- a/thesis/6-Discussion.tex +++ b/thesis/6-Discussion.tex @@ -155,6 +155,7 @@ Claudia: * A focus on the Good faith policies/guidelines is a historical develop %*************************************** \section{Limitations} +\label{sec:limitations} This work presents a first attempt at analysing Wikipedia's edit filter system. Several limitations of this study come to mind.