@@ -657,9 +657,9 @@ So far, I haven't managed to trigger a filter with a different action.
\begin{itemize}
\item how many filters are there (were there over the years): 954 filters (stand: 06.01.2019); TODO: historically?
\item what do the most active filters do?: see~\ref{tab:most-active-actions}
\item get a sense of what gets filtered (more qualitative): TODO: refine after sorting through manual categories; preliminary: vandalism; unintentional suboptimal behavior from new users who don't know better ("good faith edits") such as blanking an article/section; creating an article without categories; adding larger texts without references; large unwikified new article (180); or from users who are too lazy (to write proper edit summaries; editing behaviours and styles not suitable for an encyclopedia (poor grammar/not commiting to orthography norms; use of emoticons and !; ascii art?); "unexplained removal of sourced content" (636) may be an attempt to silence a view point the editor doesn't like; self-promotion(adding unreferenced material to BLP; "users creating autobiographies" 148;); harassment; sockpuppetry; potential copyright violations
\item get a sense of what gets filtered (more qualitative): TODO: refine after sorting through manual categories; preliminary: vandalism; unintentional suboptimal behavior from new users who don't know better ("good faith edits") such as blanking an article/section; creating an article without categories; adding larger texts without references; large unwikified new article (180); or from users who are too lazy (to write proper edit summaries; editing behaviours and styles not suitable for an encyclopedia (poor grammar/not commiting to orthography norms; use of emoticons and !; ascii art?); "unexplained removal of sourced content" (636) may be an attempt to silence a view point the editor doesn't like; self-promotion(adding unreferenced material to BLP; "users creating autobiographies" 148;); harassment; sockpuppetry; potential copyright violations; that's more or less it actually. There's a third bigger cluster of maintenance stuff, such as tracking bugs or other problems, trying to sort through bot edits and such. For further details see the jupyter notebook.
\item has the willingness of the community to use filters increased over time?: looking at aggregated values of number of triggered filters per year, the answer is rather it's quite constant; TODO: plot it at a finer granularity
\item how often were (which) filters triggered: see \url{filter-lists/20190106115600_filters-sorted-by-hits.csv} and~\ref{tab:most-active-actions}; TODO aggregate hitcounts over tagged categories after finished tagging
\item how often were (which) filters triggered: see \url{filter-lists/20190106115600_filters-sorted-by-hits.csv} and~\ref{tab:most-active-actions}; see also jupyter notebook for aggregated hitcounts over tagged categories
\item percentage of triggered filters/all edits; break down triggered filters according to typology: TODO still need the complete abuse\_filter\_log table!; and probably further dumps in order to know total number of edits
\item percentage filters of different types over the years: TODO according to actions (I need a complete abuse\_filter\_log table for this!); according to self-assigned tags (finish tagging!)
\item what gets classified as vandalism? has this changed over time? TODO: (look at words and patterns triggered by the vandalism filters; read vandalism policy page); pay special attention to filters labeled as vandalism by the edit filter editors (i.e. in the public description) vs these I labeled as vandalism
...
...
@@ -678,7 +678,7 @@ So far, I haven't managed to trigger a filter with a different action.
\item what are the values in the "group" column? what do they mean?
\item which are the most frequently triggered filters of all time?
\item is it new filters that get triggered most frequently? or are there also very active old ones?
\item how many different edit filter editros are there (af\_user)?
\item how many different edit filter editors are there (af\_user)?
\item categorise filters according to which name spaces they apply to; pay special attention to edits in user/talks name spaces (may be indication of filtering harassment)
"We may be interested in how the notion of vandalism changed over the years. For this an inquiry into which filters have \"vandalism\" in their public description (and were tagged as \"vandalism\") and what they do may be interesting."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Potential harassment\n",
"\n",
"Another idea would be to classify filters according to the namespaces they cover. A filter targeting the talk/user name spaces may be indicative of dealing with personal attacks or harassment."
]
},
{
"cell_type": "markdown",
"metadata": {},
...
...
%% Cell type:markdown id: tags:
# An explorative inquiry into EN Wikipedia's edit filter system
This notebook serves to explore EN Wikipedia's edit filters
%% Cell type:code id: tags:
``` python
importpandasaspd
importitertools
importcollections
```
%% Cell type:markdown id: tags:
We import a cleaned version of manually annotated edit filters:
We may be interested in how the notion of vandalism changed over the years. For this an inquiry into which filters have "vandalism" in their public description (and were tagged as "vandalism") and what they do may be interesting.