Skip to content
Snippets Groups Projects
Commit 9ad6729b authored by Lyudmila Vaseva's avatar Lyudmila Vaseva
Browse files

Start ORES write up

parent 212561cf
No related branches found
No related tags found
No related merge requests found
\chapter{Background}
\label{chap:background}
In the present chapter we study scientific literature on vandalism in Wikipedia and the quality control mechanisms applied to counteract this vandalism in order to better undersand the role of edit filters in this ecosystem.
In the present chapter we study scientific literature on vandalism in Wikipedia and the quality control mechanisms applied to counteract this vandalism in order to better understand the role of edit filters in this ecosystem.
There are works on vandalism in general/vandalism detection, as well as several articles dedicated to the role bots play in mainataining quality on Wikipedia (cite... ), a couple which discuss combating vandalism by means of semi-automated tools such as Huggle, Twinkle and STiki (cite).
Time and again, the literature refers also to more ``manual'' forms of quality control by editors using watchlists to keep an eye on articles they care about or even accidentially discovering edits made in bad faith.
There is one mechanism though that is very ostentatiously missing from all these reports: none of them ever mention (is this really true?) the edit filter mechanism.
\cite{AstHal2018} have a diagram describing the new edit review pipeline. Filters are absent.
\section{Vandalism on Wikipedia}
%TODO put here papers on vandalism
......@@ -14,7 +15,6 @@ Papers discussing vandalism detection from IR/ML perspective:
\section{Quality-control mechanisms on Wikipedia}
\cite{AstHal2018} have a diagram describing the new edit review pipeline. Filters are absent.
Why is it important we study these mechanisms?
- their relative usage increases/has increased since they were first introduced
......@@ -285,16 +285,19 @@ removed the third vandal fighter's now-obsolete report."
\subsection{ORES}
\cite{HalTar2015}
"Today, we’re announcing the release of a new artificial intelligence service designed **to improve the way editors maintain the quality** of Wikipedia" (emphsis mine)
" This service empowers Wikipedia editors by helping them discover damaging edits and can be used to immediately “score” the quality of any Wikipedia article."
ORES is an API based FLOSS? machine learning service ``designed to improve the way editors maintain the quality of Wikipedia''~\cite{HalTar2015} and increase transparency of the quality control process.
It uses learning models to predict a quality score for each article and edit (based on edit/article quality assessments manually assigned by Wikipedians).
Potentially damaging edits are highlighted which allows editors who engage in vandal fighting to examine them in greater detail.
The service was officially introduced in November 2015 by Aaron Halfaker (principal research scientist at the Wikimedia Foundation) and Dario Taraborelli (who was Head of Research at Wikimedia Foundation at the time)~\cite{HalTar2015}. %TODO footnote https://wikimediafoundation.org/role/staff-contractors/
% http://nitens.org/taraborelli/cv
Since ORES is API based, in theory a myriad of services can be developed that use the predicted scores or, new models can be trained and made available for everyone to use.
The Scoring platform team reports that popular vandal fighting tools(syn?) such as Huggle have already adopted ORES for the compilation of their queues~\cite{HalTar2015}.
ORES development is ongoing, following people are involved: (Wikimedia Scoring Platform team).
What is unique about ORES is that all the algorithms, models, training data, and code are public, so everyone (with sufficient knowledge of the matter) can scrutinise them and reconstruct what is going on.
This is certainly not true for machine learning services applied by commercial companies who have interest in keeping their models secret.
"these specs actually work to highlight potentially damaging edits for editors. This allows editors to triage them from the torrent of new edits and review them with increased scrutiny. " (probably triage the edits, not the specs)
"By combining open data and open source machine learning algorithms, our goal is to make quality control in Wikipedia more transparent, auditable, and easy to experiment with."
//so, purpose of ORES is quality control
\cite{HalTar2015}
"Our hope is that ORES will enable critical advancements in how we do quality control—changes that will both make quality control work more efficient and make Wikipedia a more welcoming place for new editors."
......@@ -311,17 +314,10 @@ caution: biases in AI
"Examples of ORES usage. WikiProject X’s uses the article quality model (wp10) to help WikiProject maintainers prioritize work (left). Ra·un uses an edit quality model (damaging) to call attention to edits that might be vandalism (right)." //interesting for the memo
"Popular vandal fighting tools, like the aforementioned Huggle, have already adopted our revision scoring service."
further ORES applications:
" But revision quality scores can be used to do more than just fight vandalism. For example, Snuggle uses edit quality scores to direct good-faith newcomers to appropriate mentoring spaces,[4] and dashboards designed by the Wiki Education Foundation use automatic scoring of edits to surface the most valuable contributions made by students enrolled in the education program"
\cite{AstHal2018}
"the Scoring Platform team at the Wikimedia Foundation
released "draftquality" 8 model that is designed to detect "vandalism", "spam", and "personal attack"
articles for quick deletion. Predictions of all of these models can be accessed by the ORES[8]
webservice that the Wikimedia Foundation provides via public apis."
\section{Algorithmic Governance}
maybe move it to edit filters chapter
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment