@@ -92,30 +92,28 @@ Semi-automated quality control tools are similar to bots in the sense that they
...
@@ -92,30 +92,28 @@ Semi-automated quality control tools are similar to bots in the sense that they
The difference however is that with semi-automated tools humans do the final assessment and decide what happens with the edits in question.
The difference however is that with semi-automated tools humans do the final assessment and decide what happens with the edits in question.
There is a scientific discussion of several tools:
There is a scientific discussion of several tools:
Huggle\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:Huggle}}, which is probably the most popular and widely used one is studied in~\cite{GeiHal2013},~\cite{HalRied2012}, and \cite{GeiRib2010}.
Huggle\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:Huggle}}, which is probably the most popular and widely used one, is studied in~\cite{GeiHal2013},~\cite{HalRied2012}, and \cite{GeiRib2010}.
Another very popular tool, Twinkle\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:Twinkle}}, is mentioned by~\cite{GeiHal2013} (it's really just a mention),~\cite{GeiRib2010}..
Another very popular tool, Twinkle\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:Twinkle}}, is commented on by~\cite{GeiHal2013},~\cite{GeiRib2010}, and~\cite{HalGeiMorRied2013}.
STiki\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:STiki}} is presented (syn!) by its authors in~\cite{WestKanLee2010}.
STiki\footnote{\url{https://en.wikipedia.org/wiki/Wikipedia:STiki}} is presented by its authors in~\cite{WestKanLee2010} and also discussed (syn!) by~\cite{GeiHal2013}.
Various older (and partially inactive) tools (syn!) are also mentioned (syn!) by the literature:
Various older (and partially inactive) applications are also mentioned by the literature:
Geiger and Ribes touch on Lupin's Anti-vandal tool\footnote{\url{https://en.wikipedia.org/wiki/User:Lupin/Anti-vandal_tool}}~\cite{GeiRib2010},
Geiger and Ribes touch on Lupin's Anti-vandal tool\footnote{\url{https://en.wikipedia.org/wiki/User:Lupin/Anti-vandal_tool}}~\cite{GeiRib2010},
Halfaker and Riedl discuss (syn!) VandalProof~\cite{HalRied2012}.
Halfaker and Riedl talk about VandalProof~\cite{HalRied2012}.
Some of these tools are more automated than others: Huggle and STiki for instance are able to revert an edit, issue a warning to the offending editor, and post a report on the AIV dashboard (if the user has already exhausted the warning limit) upon a single click,
Some of these tools are more automated than others: Huggle and STiki for instance are able to revert an edit, issue a warning to the offending editor, and post a report on the AIV dashboard (if the user has already exhausted the warning limit) upon a single click.
whereas the javascript based browser extension Twinkle adds contextual links to other parts of Wikipedia which facilitates fulfilment of particular tasks (rollback multiple edits, report problematic users to AIV, nominate an article for deletion)~\cite{GeiRib2010}.
The javascript based browser extension Twinkle on the other hand adds contextual links to other parts of Wikipedia which facilitates fulfilment of particular tasks such as rollback multiple edits, report problematic users to AIV, nominate an article for deletion~\cite{GeiRib2010}.
The main feature of Huggle and STiki
The main feature of Huggle and STiki is that they both compile a central queue of potentially harmful edits for all their users to check.
is that they both compile a central queue of potentially harmful edits for all their users to check.
The difference between both programs are the heuristics they use for their queues:
The difference between both programs are the heuristics they use for their queues:
By default, Huggle sends edits by users with warnings on their user talk page to the top of the queue, places edits by IP editors higher and ignores edits made by bots and other Huggle users altogether\cite{GeiRib2010},
By default, Huggle sends edits by users with warnings on their user talk page to the top of the queue, places edits by IP editors higher and ignores edits made by bots and other Huggle users altogether\cite{GeiRib2010}.
while STiki relies on the ``spatio-temporal properties of revision metadata''~\cite{WestKanLee2010} for deciding the likelihood of an edit to be vandalism.
In contrast, STiki relies on the ``spatio-temporal properties of revision metadata''~\cite{WestKanLee2010} for deciding the likelihood of an edit to be vandalism.
Huggle's queue can be reconfigured, however, some technical savvy and motivation is needed for this and thus, as~\cite{GeiRib2010} warn, it makes certain paths of action easier to take than others.
Huggle's queue can be reconfigured, however, some technical savvy and motivation is needed for this and thus, as~\cite{GeiRib2010} warn, it makes certain paths of action easier to take than others.
Another common trait of both programs is that as a standard, editors need the ``rollback'' permission in order to be able to use them~\cite{HalRied2012}. %TODO another source is STiki's doc
Nonetheless, a trait common to all of them is that as a standard, editors need the ``rollback'' permission in order to be able to use the software~\cite{HalRied2012}. %TODO ist that so? I can't find with certainty any info about Twinkle
Some critique that has been voiced regarding semi-automated anti-vandalism tools compares these to massively multiplayer online role-playing games (MMORPGs)~\cite{HalRied2012}.
The concern is that some of the users of said tools see themselves as vandal fighters on a mission to slay the greatest number of monsters (vandals) possible and by doing so to excell in the ranks
Some critique or concerns that have been voiced regarding semi-automated anti-vandalism tools compare these to massively multiplayer online role-playing games (MMORPGs)~\cite{HalRied2012}.
\footnote{STiki actually has a leader board: \url{https://en.wikipedia.org/wiki/Wikipedia:STiki/leaderboard}}.
They fear(syn) that some of the users of said tools see themselves as vandal fighters on a mission to slay the greatest number of monsters (vandals) possible and by doing so excell in the ranks
This is for one a harmful way to view the project, neglecting the ``assume good faith'' guideline~\cite{Wikipedia:GoodFaith}
\footnote{STiki really has a leader board: \url{https://en.wikipedia.org/wiki/Wikipedia:STiki/leaderboard}}.
and also leads to such users seeking out easy to judge instancies from the queues in order to move onto the next entry more swiftly and gather more points
This is for one a harmful way to view the project, neglecting the ``assume good faith'' guideline %TODO quote
leaving more subtle cases, which really require human judgement, to others.
and also leads to such users seeking out easy to judge cases from the queues in order to move onto the next entry more swiftly
leaving more subtle cases (syn!), which really require human judgement, to others.
\begin{comment}
\begin{comment}
%Huggle
%Huggle
...
@@ -157,13 +155,13 @@ and VandalProof which
...
@@ -157,13 +155,13 @@ and VandalProof which
\section{ORES}
\section{ORES}
ORES is an API based free libre and open source (FLOSS) machine learning service ``designed to improve the way editors maintain the quality of Wikipedia''~\cite{HalTar2015} and increase the transparency of the quality control process.
ORES is an API based free libre and open source (FLOSS) machine learning service ``designed to improve the way editors maintain the quality of Wikipedia''\cite{HalTar2015} and increase the transparency of the quality control process.
It uses learning models to predict a quality score for each article and edit based on edit/article quality assessments manually assigned by Wikipedians.
It uses learning models to predict a quality score for each article and edit based on edit/article quality assessments manually assigned by Wikipedians.
Potentially damaging edits are highlighted, which allows editors who engage in vandal fighting to examine them in greater detail.
Potentially damaging edits are highlighted, which allows editors who engage in vandal fighting to examine them in greater detail.
The service was officially introduced in November 2015 by Aaron Halfaker\footnote{\url{https://wikimediafoundation.org/role/staff-contractors/}} (principal research scientist at the Wikimedia Foundation) and Dario Taraborelli\footnote{\url{http://nitens.org/taraborelli/cv}} (Head of Research at Wikimedia Foundation at the time)~\cite{HalTar2015}.
The service was officially introduced in November 2015 by Aaron Halfaker\footnote{\url{https://wikimediafoundation.org/role/staff-contractors/}} (principal research scientist at the Wikimedia Foundation) and Dario Taraborelli\footnote{\url{http://nitens.org/taraborelli/cv}} (Head of Research at Wikimedia Foundation at the time)~\cite{HalTar2015}.
Its development is ongoing, coordinated and advanced by Wikimedia's Scoring Platform team.
Its development is ongoing, coordinated and advanced by Wikimedia's Scoring Platform team.
Since ORES is API based, in theory a myriad of services can be developed that use the predicted scores or, new models can be trained and made available for everyone to use.
Since ORES is API based, in theory a myriad of services can be developed that use the predicted scores or, new models can be trained and made available for everyone to use.
The Scoring platform team reports that popular vandal fighting tools(syn?) such as Huggle have already adopted ORES scores for the compilation of their queues~\cite{HalTar2015}.
The Scoring platform team reports that popular vandal fighting tools such as Huggle have already adopted ORES scores for the compilation of their queues~\cite{HalTar2015}.
What is unique about ORES is that all the algorithms, models, training data, and code are public, so everyone (with sufficient knowledge of the matter) can scrutinise them and reconstruct what is going on.
What is unique about ORES is that all the algorithms, models, training data, and code are public, so everyone (with sufficient knowledge of the matter) can scrutinise them and reconstruct what is going on.
This is certainly not true for machine learning services applied by commercial companies who have interest in keeping their models secret.
This is certainly not true for machine learning services applied by commercial companies who have interest in keeping their models secret.
Halfaker and Taraborelli express the hope that ORES would help hone quality control mechanisms on Wikipedia, and by decoupling the damage prediction from the actual decision how to deal with an edit make the encyclopedia more welcoming towards newcomers.
Halfaker and Taraborelli express the hope that ORES would help hone quality control mechanisms on Wikipedia, and by decoupling the damage prediction from the actual decision how to deal with an edit make the encyclopedia more welcoming towards newcomers.
...
@@ -177,38 +175,35 @@ The researchers also warn that wording is tremendously important for the percept
...
@@ -177,38 +175,35 @@ The researchers also warn that wording is tremendously important for the percept
For completion, it should be noted at this point that despite the steady increase of the proportion of fully and semi-automated tools usage for fighting vandalism~\cite{Geiger2009}, some of the quality control work is still done ``manually'' by human editors.
For completion, it should be noted at this point that despite the steady increase of the proportion of fully and semi-automated tools usage for fighting vandalism~\cite{Geiger2009}, some of the quality control work is still done ``manually'' by human editors.
These are, on one hand, editors who use the ``undo'' functionality from within the page's revision history.
These are, on one hand, editors who use the ``undo'' functionality from within the page's revision history.
On the other hand, there are also editors who engage with the classical/standard encyclopedia editing mechanism (click the ``edit'' button on an article, enter changes in the editor which opens, write an edit summary for the edit, click ``save'') rather than using further automated tools to aid them.
On the other hand, there are also editors who engage with the classic encyclopedia editing mechanism (click the ``edit'' button on an article, enter changes in the dialog which opens, write an edit summary for the edit, click ``save'') rather than using further automated tools to aid them.
When editors use these mechanisms for vandalism fighting, oftentimes they haven't noticed the vandalising edits by chance but rather have been actively watching the pages in question via the so-called watchlists~\cite{AstHal2018}.
When Wikipedians use these mechanisms for vandalism fighting, oftentimes they haven't noticed the vandalising edits by chance but rather have been actively watching the pages in question via the so-called watchlists~\cite{AstHal2018}.
This also gives us a hint as to what type of quality control work humans take over: less obvious and less rapid, editors who patrol pages via watchlists have some relationship to/deeper expertise on the topic. %TODO quote needed. according to~\cite{AstHal2018} along the funnel, increasingly complex judgement is required
This also gives us a hint as to what type of quality control work humans take over: less obvious and less rapid, editors who patrol pages via watchlists have some relationship to/deeper expertise on the topic. %TODO quote needed. according to~\cite{AstHal2018} along the funnel, increasingly complex judgement is required
%TODO vgl also funnel diagram incoming edits quality assurance by Halfaker
%TODO vgl also funnel diagram incoming edits quality assurance by Halfaker
\section{Conclusion}
\section{Conclusion}
\cite{AstHal2018} have a diagram describing the new edit review pipeline. Filters are absent.
%TODO move funnel diagram here (descending degree of automacy
%TODO find where in text to reference the graphic directly
For clarity, I have summarised the various aspects of algorithmic quality control mechanisms we discussed in the present chapter in table~\ref{table:mechanisms-comparison-literature}.
Their work can be fittingly illustrated by figure~\ref{fig:funnel-no-filters}, proposed in a similar fashion also by~\cite{AstHal2018}.
%TODO what I haven't discussed so far is the temporal/pipeline dimension
One thing is certain: so far, on grounds of literature study alone it remains unclear what the role/purpose of edit filters is.
%TODO is it better to introduce the graphic earlier?