diff --git a/article/proceedings.tex b/article/proceedings.tex index 5b5f059d1597bdb02ecb74fbc92e12ec86dfe7ad..366c75782ff3e914becee49fb0112eb634dd1fbf 100644 --- a/article/proceedings.tex +++ b/article/proceedings.tex @@ -657,9 +657,9 @@ So far, I haven't managed to trigger a filter with a different action. \begin{itemize} \item how many filters are there (were there over the years): 954 filters (stand: 06.01.2019); TODO: historically? \item what do the most active filters do?: see~\ref{tab:most-active-actions} - \item get a sense of what gets filtered (more qualitative): TODO: refine after sorting through manual categories; preliminary: vandalism; unintentional suboptimal behavior from new users who don't know better ("good faith edits") such as blanking an article/section; creating an article without categories; adding larger texts without references; large unwikified new article (180); or from users who are too lazy (to write proper edit summaries; editing behaviours and styles not suitable for an encyclopedia (poor grammar/not commiting to orthography norms; use of emoticons and !; ascii art?); "unexplained removal of sourced content" (636) may be an attempt to silence a view point the editor doesn't like; self-promotion(adding unreferenced material to BLP; "users creating autobiographies" 148;); harassment; sockpuppetry; potential copyright violations + \item get a sense of what gets filtered (more qualitative): TODO: refine after sorting through manual categories; preliminary: vandalism; unintentional suboptimal behavior from new users who don't know better ("good faith edits") such as blanking an article/section; creating an article without categories; adding larger texts without references; large unwikified new article (180); or from users who are too lazy (to write proper edit summaries; editing behaviours and styles not suitable for an encyclopedia (poor grammar/not commiting to orthography norms; use of emoticons and !; ascii art?); "unexplained removal of sourced content" (636) may be an attempt to silence a view point the editor doesn't like; self-promotion(adding unreferenced material to BLP; "users creating autobiographies" 148;); harassment; sockpuppetry; potential copyright violations; that's more or less it actually. There's a third bigger cluster of maintenance stuff, such as tracking bugs or other problems, trying to sort through bot edits and such. For further details see the jupyter notebook. \item has the willingness of the community to use filters increased over time?: looking at aggregated values of number of triggered filters per year, the answer is rather it's quite constant; TODO: plot it at a finer granularity - \item how often were (which) filters triggered: see \url{filter-lists/20190106115600_filters-sorted-by-hits.csv} and~\ref{tab:most-active-actions}; TODO aggregate hitcounts over tagged categories after finished tagging + \item how often were (which) filters triggered: see \url{filter-lists/20190106115600_filters-sorted-by-hits.csv} and~\ref{tab:most-active-actions}; see also jupyter notebook for aggregated hitcounts over tagged categories \item percentage of triggered filters/all edits; break down triggered filters according to typology: TODO still need the complete abuse\_filter\_log table!; and probably further dumps in order to know total number of edits \item percentage filters of different types over the years: TODO according to actions (I need a complete abuse\_filter\_log table for this!); according to self-assigned tags (finish tagging!) \item what gets classified as vandalism? has this changed over time? TODO: (look at words and patterns triggered by the vandalism filters; read vandalism policy page); pay special attention to filters labeled as vandalism by the edit filter editors (i.e. in the public description) vs these I labeled as vandalism @@ -678,7 +678,7 @@ So far, I haven't managed to trigger a filter with a different action. \item what are the values in the "group" column? what do they mean? \item which are the most frequently triggered filters of all time? \item is it new filters that get triggered most frequently? or are there also very active old ones? - \item how many different edit filter editros are there (af\_user)? + \item how many different edit filter editors are there (af\_user)? \item categorise filters according to which name spaces they apply to; pay special attention to edits in user/talks name spaces (may be indication of filtering harassment) \end{itemize} diff --git a/src/explore.ipynb b/src/explore.ipynb index c449caf302b82158267272e9cfa7f72411c1f67f..fbf09d1d6a923ab0293d8bfdefb93cec2da84bec 100644 --- a/src/explore.ipynb +++ b/src/explore.ipynb @@ -558,6 +558,31 @@ "print (raw_df['af_user_text'].value_counts())" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Vandalism\n", + "\n", + "We may be interested in how the notion of vandalism changed over the years. For this an inquiry into which filters have \"vandalism\" in their public description (and were tagged as \"vandalism\") and what they do may be interesting." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Potential harassment\n", + "\n", + "Another idea would be to classify filters according to the namespaces they cover. A filter targeting the talk/user name spaces may be indicative of dealing with personal attacks or harassment." + ] + }, { "cell_type": "markdown", "metadata": {},