@@ -301,64 +301,68 @@ TODO: Discuss not complete application of method with Claudia (no direct partici
\subsection{General info on filters}
nums, etc.
\begin{itemize}
\item 954 filters on EN Wikipedia (Stand: Jan 2019)
\item of them: X active; ... -> donut diagram
\end{itemize}
\subsection{Q1: What is the role of edit filters among existing algorithmic quality-control mechanisms on Wikipedia (bots, semi-automated tools, ORES, humans)?}
<img src="images/funnel-with-filters-new.png" class="stretch" height="500" alt="Funnel diagramm of all vandal fighting mechanisms according to me">
* edit filters triggered *before* an edit is published
* disallow certain types of obvious, pervasive (perhaps automated), and difficult to remove vandalism directly
* can target malicious users directly without restricting everyone (<-> page protection)
* historically faster and more reliable, by being a direct part of the core software
* people fed up with bot governance
Notes:
* 1st mechanism activated to control quality (at the beginning of the funnel)
Consider pasting the comparison table from chapter 4 here
\begin{itemize}
\item 1st mechanism activated to control quality (at the beginning of the funnel)
\item edit filters triggered *before* an edit is published
\item disallow certain types of obvious, pervasive (perhaps automated), and difficult to remove vandalism directly
\item can target malicious users directly without restricting everyone ($\leftrightarrow$ page protection)
\item historically faster and more reliable, by being a direct part of the core software
\item introduced partially because people were fed up with bot governance
\end{itemize}
Notes:
* historically: faster, by being a direct part of the core software: disallow
even before publishing
* can target malicious users directly without restricting everyone (<-> page
protection)
* introduced to take care of obvious but cumbersome to remove vandalism
* people fed up with bot introduction and development processes (poor quality, no tests, no
code available for revision in case of problems) (so came up with a new approach)
* disallow certain types of obvious pervasive (perhaps automated) vandalism directly
* takes more than a single click to revert
* human editors can use their time more productively elsewhere
%TODO Consider pasting the comparison table from chapter 4 here
\subsection{Q2: Edit filters are a classical rule-based system. Why are they still active today when more sophisticated ML approaches exist?}
<img src="images/timeline-quality-control-mechanisms.png" class="stretch" height="500" alt="Timeline with the introductions of different quality control mechanisms">
From the archives of the edit filter talk page, discussion snippets prior to the introduction of the extension\cite{Wikipedia:EditFilterTalkArchive1}:
\begin{quotation}
The idea is to automatically deal with the very blatant, serial pagemovers for instance, freeing up human resources to deal with the less blatant stuff. SQLQuery me! 20:13, 9 July 2008 (UTC)
> "The idea is to automatically deal with the very blatant, serial pagemovers for instance, freeing up human resources to deal with the less blatant stuff. SQLQuery me! 20:13, 9 July 2008 (UTC)"
> "This extension is not designed to catch the subtle vandalism, because it's too hard to identify directly. It's designed to catch the obvious vandalism to leave the humans more time to look for the subtle stuff. Happy‑melon 16:35, 9 July 2008 (UTC)"
> This is quite different from, say, an anti-vandalism adminbot. The code is private, and, in any case, too ugly for anybody to know how to use it properly [...] and the bot is controlled by a private individual, with no testing.
This extension is not designed to catch the subtle vandalism, because it's too hard to identify directly. It's designed to catch the obvious vandalism to leave the humans more time to look for the subtle stuff. Happy‑melon 16:35, 9 July 2008 (UTC)
This is quite different from, say, an anti-vandalism adminbot. The code is private, and, in any case, too ugly for anybody to know how to use it properly [...] and the bot is controlled by a private individual, with no testing.
\end{quotation}
* introduced before most vandalism fighting ML systems came along
* rule-based systems are more transparent and accountable
* easier to work with
* allow for finer levels of control than ML: i.e. disallowing specific users
* allow more easily for collaboration
Why are filters still in use today?
\begin{itemize}
\item introduced before most vandalism fighting ML systems came along (they were there first and still work well; never touch a running system) (see timeline~\ref{fig:timeline})
\item rule-based systems are more transparent and accountable
\item easier to work with (easier to add yet another rule than tweak paremeters in an obscure ML based approach)
\item allow for finer levels of control than ML: i.e. disallowing specific users
\item allow more easily for collaboration
\item Wikipedia is a volunteer driven system: people do what they like and can (someone has experience with this types of tech and implemented it that way)
\end{itemize}
Notes:
* introduced before most vandalism fighting ML systems came along; so they
were there first historically; still work well; don't touch a running system^^
* a gap was perceived in the existing system which was filled with filters
* in functionality: disallow cumbersome vandalism from the start
* in governance: bots are poorly tested, communication and updates are
difficult
* volunteer system: people do what they like and can (someone has experience
with this types of tech and implemented it that way)
* rule-based systems are more transparent and accountable
* and easier to work with (easier to add yet another rule than tweak paremeters
in an obscure ML based approach)
* allows for finer levels of control than ML: i.e. disallowing specific users
* filter allow more easily for collaboration
\begin{table}
\begin{tabular}{ r | p{.7\linewidth}}
Oct 2001 & automatically import entries from Easton’s Bible Dictionary by a script \\
29 Mar 2002 & First version of \url{https://en.wikipedia.org/wiki/Wikipedia:Vandalism} (WP Vandalism is published) \\
Oct 2002 & RamBot \\
2006 & The Bot Approvals Group (BAG) was first formed \\
13 Mar 2006 & 1st version of Bots/Requests for approval is published: some basic requirements (also valid today) are recorded \\
28 Jul 2006 & VoABot II ("In the case were banned users continue to use sockpuppet accounts/IPs to add edits clearly rejected by consensus to the point were long term protection is required, VoABot may be programmed to watch those pages and revert those edits instead. Such edits are considered blacklisted. IP ranges can also be blacklisted. This is reserved only for special cases.") \\
21 Jan 2007 & Twinkle Page is first published (empty), filled with a basic description by beginnings of Feb 2007 \\
24 Jul 2007 & Request for Approval of original ClueBot \\
16 Jan 2008 & Huggle Page is first published (empty) \\
18 Jan 2008 & Huggle Page is first filled with content \\
23 Jun 2008 & 1st version of Edit Filter page is published: User:Werdna announces they're currently developing the extension \\
2 Oct 2008 &\url{https://en.wikipedia.org/wiki/Wikipedia_talk:Edit_filter} was first archived; its last topic was the voting for/against the extension which seemed to have ended end of Sep 2008 \\
Mar 2009 & The AbuseFilter extension is enabled on English Wikipedia \\
Jun 2010 & STiki initial release \\
20 Oct 2010 & ClueBot NG page is created \\
11 Jan 2015 & 1st commit to github ORES repository \\
30 Nov 2015 & ORES paper is published
\end{tabular}
\caption{Timeline: Introduction of algorithmic quality control mechanisms}~\label{fig:timeline}
\end{table}
%TODO check and ggf update timeline
\subsection{Q3: Which type of tasks do filters take over?}
...
...
@@ -466,6 +470,8 @@ To Robert, for the bagels and explaining CMYK and color spaces.
\bibliographystyle{ACM-Reference-Format}
\bibliography{references}
%TODO enter publisher and address for publications
%%
%% If your work has an appendix, this is the place to put it.