Skip to content
Snippets Groups Projects
Commit 69d3a046 authored by Lyudmila Vaseva's avatar Lyudmila Vaseva
Browse files

Explain hits peak 2016

parent ad20a6e8
No related branches found
No related tags found
No related merge requests found
......@@ -426,7 +426,7 @@ Messy edits were done and others took them and re-modelled them.
\begin{figure}
\centering
\includegraphics[width=0.9\columnwidth]{pics/edits-development.png}
\includegraphics[width=0.9\columnwidth]{pics/edits-development-total.png}
\caption{EN Wikipedia: Number of edits over the years (source: \url{https://stats.wikimedia.org/v2/})}~\label{fig:edits-development}
\end{figure}
......
......@@ -221,6 +221,7 @@ A lot of public filters on the other hand still assume good faith from the edito
\subsection{What do filters target}
%: general behaviour vs edits by single users + manual tags
%TODO maybe get rid of this paragraph all together (it's partially handled by public vs private) --> merge both
As indicated in section~\ref{}, most of the public filters target disruptive behaviours in general (e.g. filter 384 disallows ``Addition of bad words or other vandalism'' by any non-confirmed user), while hidden filters are usually aimed at specific users.
There are however some public filter which target particular users or particular pages.
Arguably, (see guidelines) an edit filter may not be the ideal mechanism for this latter purpose, since every incoming edit is checked against all active filters.
......@@ -247,7 +248,6 @@ A lot of hidden filters target specific users/problems.
%TODO discuss figure
\begin{comment}
* maybe just plot the parent categories and have a closer look at one of them exemplarily
* maybe merge parent categories and only work with ``vandalism'', ``good faith'' and ``maintenance'' (and ``unknown'')
\end{comment}
......@@ -263,6 +263,7 @@ personal attacks (filter 9,11) and obscenities (12)
some concrete users/cases (hidden filters, e.g. 4,21) and sockpuppetry (16,17)
\subsection{Combine most active filters with manual tags}
\subsection{Who trips filters}
- IPs and (newly) registered users
......@@ -375,21 +376,48 @@ We can backtrack the number of filter hits over the years on figure~\ref{fig:fil
There is a dip in the number of hits in late 2014 and quite a surge in the beginnings of 2016.
%TODO There is also a certain periodicity to the graph, with smaller dips in the sommer months (june, july, august) and smaller peaks in autumn/winter (mostly Oct/Nov); either point this out as an interesting direction for further studies or find an explanation approach as well
% It would be interesting to compare this with overall number of edits; maybe there're just fewer edits in the northern hemisphere summer, since people are on vacation; hence there are also fewer edits that trip filters
Here is the explanation to that:
\begin{comment}
Looking at january, feb, march 2016 vs sept 2016
- high number of throttled account creation attempts (around 70.000 in January; the number however accounts for only half the difference between January 2016 and September 2016)
- a bunch of users (mostly IP editors) with markedly high hit numbers:
a random check of some of them proved there quite some IPs of a Russian registry all of which were trying to publish the same (viagra) spam links
however these most active editors account for about 1/100 of all filter hits;
%TODO look at the peak from various perspectives: public vs hidden; filter actions; editor's actions (already mentioned in 1st point); manual tags
- for a spam wave the exact moment seems arbitrary
- %TODO compute aggregated numbers for all months of the peak and months outside and see whether something comes to attention
- there is no obvious pattern or other abnormality in the pages on which the filters are triggered
%TODO sift through talk archives from this period: is the peak mentioned?
Three possible explanation to this phenomenon come to mind:
1. the filter hits mirror the overall edits pattern from this time.
2. there was a general rise in vandalism in this period.
3. there was a change in the edit filter software/ a bug that caused the peak (a lot of false positives) and/or allowed a greater number of filters to be activated.
I've undertaken following steps on each of these explanation paths.
1. I've compared the filter hits pattern with the overall number of edits of the time. No correspondance, or respectively no noticeable patterns(syn) in the edit counts were found/could be determined. (see figure~\ref{fig:edits-development})
2. In order to verify this assumption, it would be great to compare the filters hits patterns with anti-vandal bots and semi-automated tool' reverts patterns.
Unfortunately, no numbers are readily available, and assembling a dataset to answer this question is a no trivial task:
A dump is needed; a list of bot accounts is needed (no trivial either, since there is no consistent policy regarding bot accounts; only *some* of them have bot flag or "bot" in their account name; flag is removed when bot is no longer active) %TODO compare with Geiger Halfaker and their bot-bot revert study
So this is still something that can be explored further.
3. This explanation sounded very plausible/tempting.
Another piece of data that seemed to support it was the break down of the filters hits data according to triggered filter action.
As demonstrated on figure~\ref{}, there was above all a significant peak in the logs by ``log only'' filters.
As discussed in section~\ref{}, it is an established praxis to introduce new filters in ``log only'' mode and only switch on additional filter actions after a monitoring period that demonstrated that the filters function as desired/intended.
Hence, it sounds plausible that new filters in logging mode were introduced, which were then switched off after a significant number of false positives occured.
However, upon closer observation/contemplation, this hypothesis could not be confirmed.
The most often triggered filters in the period Jan-March 2016 are mostly the most triggered filters of all times and nearly all of them have been around for a while in 2016.
Also, no bug or a comparable incident with the software was found upon an inspection of the extension's issue tracker~\cite{}, or commit messages of the commits to the software done during this period~\cite{gerrit}.
Moreover, no mention of the hits surge was found in the noticeboard~\cite{} and edit filter talk page archives~\cite{}.
The in section~\ref{} mentioned condition limit has not changed either, as far as I can tell from the issue tracker, the commits and discussion archives, so the possible explanation that simply more filters have been at work since 2016 seems to be refuted as well.
The only somewhat telling/interesting patterns/phenomena that seem to shed some light on the matter are the breakdown of hits according to the editor's action which triggered them: there is an obvious surge in the attempted account creations in this period.
As a matter of fact, they could also be the explanation for the peak of log only hits–filter 527 (check!) ``Log/trottle accounts..'' is a throttle filter? so it disallows every X attempt, only logging the rest of the account creations.
Another explanation that seemed worth persuing was to look into the editors who tripped filters and their corresponding edits.
For the period Jan-March2016 there are some very active IP editors, the top of whom (how many hits) seemed to be enaging of the (probably automated) posting of spam links only.
Their edits however constitue some 1-3\% of all hits from the period, so the explanation ``it was viagra spam coming from Russian IPs'' is somewhat insufficient.
(Yes, it was viagra spam, and yes, a ``whois'' lookup proved them to really be Russian IPs.
And, yes, whoever was editing could've also used a VPN, so I'm not opening a Russian bot fake news conspiracy theory just yet.)
Significant Geo/Socio-political events from the time, which triggered a lot of media (and Internet) attention and desinformation campaigns
- 2016 US elections
- Brexit referendum
- the so-called ``refugee crisis'' in Europe
There was also a severe organisational crisis in Wikimedia at the time during which a lot of staff left and eventually the executive director stepped down.
However, I couldn't draw a direct relationship between any of these political events and the edits which triggered edit filters.
An investigation into the pages on which the filters were triggered proved them (the pages) to be quite innocuous:
one of the pages where most filter hits were logged in January 2016 was skateboard and the ~660 filter hits here seem like a drop in the ocean compared to the 37X.000 hits for the whole month.
\end{comment}
%TODO stretch plot so months are readable; darn. now it's too small on the pdf. Fix it! May be rotate to landscape?
\begin{figure}
......
thesis/pics/edits-development-total.png

22.6 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment