From 0bf386c8169280da67e05830fd7e25fbf1eb63ce Mon Sep 17 00:00:00 2001 From: Lyudmila Vaseva <vaseva@mi.fu-berlin.de> Date: Fri, 21 Jun 2019 13:36:49 +0200 Subject: [PATCH] Add a guideline for distinguishing 'good_faith'/'vandalism' --- memos/code-book | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/memos/code-book b/memos/code-book index 4280392..bfb6561 100644 --- a/memos/code-book +++ b/memos/code-book @@ -20,10 +20,21 @@ Such non-descriptive hidden filters I labeled with 'hidden_vandalism'. (The seco And then again, there were also cases where I could not determine any suitable label, since I didn't understand the regex pattern or could not press it into a category. These were labeled 'unknown'. +For a number of filters, it was particularly difficult to determine whether they were targeting vandalism or good faith edits. +The only thing that would have distinguished between the two would be the editor's motivation, which we have no way of knowing. +During the first labeling session, I tended to label such filters with 'vandalism?', 'good_faith?'. + +For the cross-validation labeling (2nd time), I intend to stick myself to the "assume good faith" guideline %TODO: quote +and only label as vandalism cases where we cannot assume good faith anymore. +One characteristic/feature which guided me here is the filter action which represents the judgement of the edit filter manager(s). +Since communication is crucial when assuming good faith, all ambiguous cases which have "warn" as a filter action, will receive a 'good_faith' label. +On the other hand, I will label all filters set to "disallow" as 'vandalism' or a particular type thereof, since the filter action is a clear sign that at least the edit filter managers have decided that seeking a dialog with the offending editor is no longer an option. + The second time, I labeled the whole data set again, this time using the here quoted compiled code book and assigned to every filter every label I deemed appropriate, without looking at the labels I assigned the first time around. I then compared the labels from both coding sessions. %TODO And did what?; how big was the divergence between both coding sessions?; should I select one, most specific label possible? or allow for multiple labels? what should I do with the cases where it's not clear whether it's vandalism or good faith? %TODO quote M's methodology book +During the cross-validation labelling (2nd labeling), I Def Example -- GitLab