Skip to content
Snippets Groups Projects
Commit b0720238 authored by Lyudmila Vaseva's avatar Lyudmila Vaseva
Browse files

Remove qualitative exploration of historical trends in chap5

parent 9bc886a3
No related branches found
No related tags found
No related merge requests found
......@@ -484,74 +484,6 @@ As a matter of fact, a quick glance at the AbuseLog~\cite{Wikipedia:AbuseLog} co
%TODO compare with table and with most active filters per year: is it old or new filters that get triggered most often? (I'd say it's a mixture of both and we can now actually answer this question with the history API, it shows us when a filter was first created)
\section{Historical development}
\label{sec:5-history}
The present section explores qualitatively/highlights patterns in the creation and usage of edit filters.
Unfortunately, no extensive quantitative analysis of these patterns was possible, since for it, an access to the \emph{abuse\_filter\_history} table of the AbuseFilter plugin (compare section~\ref{sec:mediawiki-ext}) is needed.
Unlike the other tables of the extension, the \emph{abuse\_filter\_history} table is currently not replicated and no public dump is accessible via Wikimedia's cloud service Toolforge~\cite{Wikimedia:ToolforgeDatabases}.
This seems to have been the case in the past, however, due to security concerns the dumps were discontinued.
A short term solution to renew the public replicas was attempted but unfortunately haven't been successful yet.
That is why the present chapter only shows some tendencies observed via manual browsing of different filters' history via the exposed API endpoint which allows querying the \emph{abuse\_filter\_history} table for public filters~\cite{Wikipedia:AbuseFilterHistory}.
The discussions surrounding this issue and its progress can be viewed in the following ticket on Wikimedia's issue tracker:~\cite{phabricator}.
Hence, exploring historical patterns in detail remains one of the directions for future studies.
\subsection{What filters were implemented immediately after the launch + manual tags}
%TODO What were the first filters to be implemented immediately after the launch of the extension?
The extension was launched on March 17th, 2009.
Filter 1 is implemented in the late hours of that day.
Filters with IDs 1-80 (IDs are auto-incremented) were implemented the first 5 days after the extension was turned on (17-22.03.2009).
So, apparently the most urgent problems the initial edit filter managers perceived were:
page move vandalism (what Filter 1 initially targeted; it was later converted to a general test filter);
blanking articles (filter 3)
personal attacks (filter 9,11) and obscenities (12)
some concrete users/cases (hidden filters, e.g. 4,21) and sockpuppetry (16,17)
\subsection{Filter Usage/Activity}
%TODO decide how this fits into the overall narrative; write some kind of a fazit from this observations; also, decided whether this is the best representation or shouldn't they form a list rather
Following general filter operation practices were observed:
There are filters that have been switched on for a while, then deactivated and never activated again.
Some of them had only been active very briefly before they were switched off and deleted.
There are a couple of different reasons for that:
The edit filter managers decided not to implement the filter, because edit filters were deemed an inappropriate tool to deal with the issue at hand (e.g. filter 308 ``Malformed Mediation Cabal Requests'', 199 ``Unflagged Bots'', or 484 ``Shutdown of ClueBot by non-admin user'');
or decided to not implement the thing (that way): 290 ``172 Filter'' (catching edits about a Canadian politician coming from a certain IP range) was disabled, since relevant pages were protected;
or, because there were hardly any hits, so there wasn't really a problem at all (e.g. filter 304 ``Rayman vandalism'', 122 ``Changing Username malformed requests'', or 401 ``"Red hair" vandalism'').
This last group is possibly a result of edit filter managers implementing a filter ``just to see if it catches anything''.
It also occurs that filter managers implement a filter targeting the same phenomenon in parallel and without knowing of each other.
These duplicate cases are merged eventually, or alternatively all but one of them are switched off: filter 893 was switched off in favour of 891.
Sometimes, vandalism trends are only temporary and after a period of activity, the filters become stale.
This is also a reason for filters to be eventually powered off in order to save conditions from the condition limit.
Examples thereof are: 81 ``Badcharts'', 20 ``Saying "The abuse filter will block this"'', 663 ``Techno genre warrior''.
There are also filters that were switched off because they weren't doing what they were supposed to and only generated a big amount of false positives: filter 14 ``Test to detect new pages by new users''.
And there are filters testing a pattern which was eventually merged in another filter (e.g. filter 440 ``intextual.com markup'' was merged in filter 345 ``Extraneous formatting from browser extension'').
\begin{comment}
%TODO This is a duplicate of a paragraph in 4.5.1. Does it fit better here?
% this actually fits also in the patterns of new filters in chap.5; these are the filters introduced for couple of days/hours, then switched off to never be enabled again
Edit filter managers often introduce filters based on some phenomena they have observed caught by other filters, other algorithmic quality control mechanisms or general experience.
As all newly implemented filters, these are initially enabled in logging only mode until enough log entries are generated to evaluate whether the incident is severe and frequent enough to need a filter.
\end{comment}
Then, there are filters switched on for a while, deactivated for a while and activated again.
Sometimes because a pattern of vandalism is re-occuring, and sometimes—in order to fix technical issues with the filters: 61, 98 (was deactivated briefly since an editor found the "warn" action unfounded; re-enabled to tag), 148 ("20160213 - disabled - possible technical issue - see edit filter noticeboard - xaosflux")
Another group constitute enabled filters that have never been switched off since their introduction.
164, 642 (if we ignore the 2min period it was disabled on 13.4.2018), 733 (2.11.2015-present), 29 (18.3.2009-present), 30 (18.3.2009-present), 33 (18.3.2009-present), 39 (18.3.2009-present), 50 (18.3.2009-present), 59 (19.3.2009-present), 80 (22.3.2009-present)
There are also some filters that have always been enabled with the exception of brief periods of time when the filter was deactivated (and the activated again), probably in order to update the conditions: 79, 135 (there were couple of others in Shirik's list, go back and look);
There seems to be a tendency that all actions but logging (which cannot be switched off) are took out, when edit filter managers are updating the pattern of the filter.
\subsection{How do filters emerge?}
** an older filter is split? 79 was split out of 61, apparently; 285 is split between "380, 384, 614 and others"; 174 is split from 29
** several older filters are merged?
** or functionality of an older filter is took and extended in a newer one (479->631); (82->278); (358->633);
** new condition(s) are tested and then merged into existing filter : stuff from 292 was merged to 135 (https://en.wikipedia.org/wiki/Special:AbuseFilter/history/135/diff/prev/4408 , also from 366; following the comments from https://en.wikipedia.org/wiki/Special:AbuseFilter/292 it was not conceived as a test filter though, but it was rather merged in 135 post-factum to save conditions); 440 was merged into 345; apparently 912 was merged into 11 (but 11 still looks like checking for "they suck" only^^); in 460: "Merging from 461, 472, 473, 474, and 475. --Reaper 2012-08-17"
** an incident caught repeatedly by a filter motivates the creation of a dedicated filter (994)
** filter is shut down, because editors notice there are 2 (or more filters) that do nearly identical checks: 344 shut down because of 3
** "in addition to filter 148, let's see what we get - Cen" (https://en.wikipedia.org/wiki/Special:AbuseFilter/188) // this illustrates the point that edit filter managers do introduce stuff they feel like introducing just to see if it catches something
\section{Conclusions}
This chapter explored the edit filters on the Englisch Wikipedia in an attempt to determine what types of tasks these filters take over,
......@@ -577,6 +509,17 @@ a surge in account creation attempts and possibly a big spam wave (the latter ha
no really satisfying explanation of the phenomenon could be established.
This remains one of the possible direction for future studies.
%Historical trends
%TODO moved from section, revise so that it points to future work
The present section explores qualitatively/highlights patterns in the creation and usage of edit filters.
Unfortunately, no extensive quantitative analysis of these patterns was possible, since for it, an access to the \emph{abuse\_filter\_history} table of the AbuseFilter plugin (compare section~\ref{sec:mediawiki-ext}) is needed.
Unlike the other tables of the extension, the \emph{abuse\_filter\_history} table is currently not replicated and no public dump is accessible via Wikimedia's cloud service Toolforge~\cite{Wikimedia:ToolforgeDatabases}.
This seems to have been the case in the past, however, due to security concerns the dumps were discontinued.
A short term solution to renew the public replicas was attempted but unfortunately haven't been successful yet.
That is why the present chapter only shows some tendencies observed via manual browsing of different filters' history via the exposed API endpoint which allows querying the \emph{abuse\_filter\_history} table for public filters~\cite{Wikipedia:AbuseFilterHistory}.
The discussions surrounding this issue and its progress can be viewed in the following ticket on Wikimedia's issue tracker:~\cite{phabricator}.
Hence, exploring historical patterns in detail remains one of the directions for future studies.
%TODO VERY IMPORTANT: come back to the verification whether the filters have achieved their proclaimed end
......
......@@ -216,3 +216,16 @@ There are also various complaints/comments by users bewildered that their edits
\item \textbf{What proportion of quality control work do filters take over?}: compare filter hits with number of all edits and reverts via other quality control mechanisms
\item \textbf{Do edit filter managers stick to the edit filter guidelines?}: e.g. filters should't be implemented for trivial problems (such as spelling mistakes); problems with specific pages are generally better taken care of by protecting the page and problematic title by the title blacklist; general filters shouldn't be hidden
\end{enumerate}
%TODO further points for future study
\begin{comment}
\subsection{What filters were implemented immediately after the launch + manual tags}
\subsection{Filter Usage/Activity}
There are filters that have been switched on for a while, then deactivated and never activated again. (phenomenon was over; they never caught anything in the first place; ..)
Switched on and stayed on;
switched off very fast;...
\subsection{How do filters emerge?}
** an older filter is split? 79 was split out of 61, apparently; 285 is split between "380, 384, 614 and others"; 174 is split from 29
** several older filters are merged?
** or functionality of an older filter is took and extended in a newer one (479->631); (82->278); (358->633);
\end{comment}
......@@ -75,3 +75,62 @@ And there are also users who specifically dedicate substantial amount of their W
These dedicated vandal fighters mostly do so with the aid of some (semi or fully) automated tools which not only significantly speeds up the process (see below),
but, according to research, fundamentally changes the nature of the encyclopedia and its collaboration ecosystem~\cite{GeiRib2010}.
%***************************************************
\section{Historical development}
\subsection{What filters were implemented immediately after the launch + manual tags}
%TODO What were the first filters to be implemented immediately after the launch of the extension?
The extension was launched on March 17th, 2009.
Filter 1 is implemented in the late hours of that day.
Filters with IDs 1-80 (IDs are auto-incremented) were implemented the first 5 days after the extension was turned on (17-22.03.2009).
So, apparently the most urgent problems the initial edit filter managers perceived were:
page move vandalism (what Filter 1 initially targeted; it was later converted to a general test filter);
blanking articles (filter 3)
personal attacks (filter 9,11) and obscenities (12)
some concrete users/cases (hidden filters, e.g. 4,21) and sockpuppetry (16,17)
\subsection{Filter Usage/Activity}
%TODO decide how this fits into the overall narrative; write some kind of a fazit from this observations; also, decided whether this is the best representation or shouldn't they form a list rather
Following general filter operation practices were observed:
There are filters that have been switched on for a while, then deactivated and never activated again.
Some of them had only been active very briefly before they were switched off and deleted.
There are a couple of different reasons for that:
The edit filter managers decided not to implement the filter, because edit filters were deemed an inappropriate tool to deal with the issue at hand (e.g. filter 308 ``Malformed Mediation Cabal Requests'', 199 ``Unflagged Bots'', or 484 ``Shutdown of ClueBot by non-admin user'');
or decided to not implement the thing (that way): 290 ``172 Filter'' (catching edits about a Canadian politician coming from a certain IP range) was disabled, since relevant pages were protected;
or, because there were hardly any hits, so there wasn't really a problem at all (e.g. filter 304 ``Rayman vandalism'', 122 ``Changing Username malformed requests'', or 401 ``"Red hair" vandalism'').
This last group is possibly a result of edit filter managers implementing a filter ``just to see if it catches anything''.
It also occurs that filter managers implement a filter targeting the same phenomenon in parallel and without knowing of each other.
These duplicate cases are merged eventually, or alternatively all but one of them are switched off: filter 893 was switched off in favour of 891.
Sometimes, vandalism trends are only temporary and after a period of activity, the filters become stale.
This is also a reason for filters to be eventually powered off in order to save conditions from the condition limit.
Examples thereof are: 81 ``Badcharts'', 20 ``Saying "The abuse filter will block this"'', 663 ``Techno genre warrior''.
There are also filters that were switched off because they weren't doing what they were supposed to and only generated a big amount of false positives: filter 14 ``Test to detect new pages by new users''.
And there are filters testing a pattern which was eventually merged in another filter (e.g. filter 440 ``intextual.com markup'' was merged in filter 345 ``Extraneous formatting from browser extension'').
\begin{comment}
%TODO This is a duplicate of a paragraph in 4.5.1. Does it fit better here?
% this actually fits also in the patterns of new filters in chap.5; these are the filters introduced for couple of days/hours, then switched off to never be enabled again
Edit filter managers often introduce filters based on some phenomena they have observed caught by other filters, other algorithmic quality control mechanisms or general experience.
As all newly implemented filters, these are initially enabled in logging only mode until enough log entries are generated to evaluate whether the incident is severe and frequent enough to need a filter.
\end{comment}
Then, there are filters switched on for a while, deactivated for a while and activated again.
Sometimes because a pattern of vandalism is re-occuring, and sometimes—in order to fix technical issues with the filters: 61, 98 (was deactivated briefly since an editor found the "warn" action unfounded; re-enabled to tag), 148 ("20160213 - disabled - possible technical issue - see edit filter noticeboard - xaosflux")
Another group constitute enabled filters that have never been switched off since their introduction.
164, 642 (if we ignore the 2min period it was disabled on 13.4.2018), 733 (2.11.2015-present), 29 (18.3.2009-present), 30 (18.3.2009-present), 33 (18.3.2009-present), 39 (18.3.2009-present), 50 (18.3.2009-present), 59 (19.3.2009-present), 80 (22.3.2009-present)
There are also some filters that have always been enabled with the exception of brief periods of time when the filter was deactivated (and the activated again), probably in order to update the conditions: 79, 135 (there were couple of others in Shirik's list, go back and look);
There seems to be a tendency that all actions but logging (which cannot be switched off) are took out, when edit filter managers are updating the pattern of the filter.
\subsection{How do filters emerge?}
** an older filter is split? 79 was split out of 61, apparently; 285 is split between "380, 384, 614 and others"; 174 is split from 29
** several older filters are merged?
** or functionality of an older filter is took and extended in a newer one (479->631); (82->278); (358->633);
** new condition(s) are tested and then merged into existing filter : stuff from 292 was merged to 135 (https://en.wikipedia.org/wiki/Special:AbuseFilter/history/135/diff/prev/4408 , also from 366; following the comments from https://en.wikipedia.org/wiki/Special:AbuseFilter/292 it was not conceived as a test filter though, but it was rather merged in 135 post-factum to save conditions); 440 was merged into 345; apparently 912 was merged into 11 (but 11 still looks like checking for "they suck" only^^); in 460: "Merging from 461, 472, 473, 474, and 475. --Reaper 2012-08-17"
** an incident caught repeatedly by a filter motivates the creation of a dedicated filter (994)
** filter is shut down, because editors notice there are 2 (or more filters) that do nearly identical checks: 344 shut down because of 3
** "in addition to filter 148, let's see what we get - Cen" (https://en.wikipedia.org/wiki/Special:AbuseFilter/188) // this illustrates the point that edit filter managers do introduce stuff they feel like introducing just to see if it catches something
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment