From 23487c92c87fa689415c11f0fcb7fccc87f4b898 Mon Sep 17 00:00:00 2001 From: Lyudmila Vaseva <vaseva@mi.fu-berlin.de> Date: Sat, 13 Jul 2019 20:31:57 +0200 Subject: [PATCH] Do some refactoring of the code book --- thesis/appendix.tex | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/thesis/appendix.tex b/thesis/appendix.tex index d33f67c..8750178 100644 --- a/thesis/appendix.tex +++ b/thesis/appendix.tex @@ -12,26 +12,27 @@ \section{Code book} \label{app:code_book} -The purpose of this document/section is to provide an overview of the labels\footnote{Here, I use the words "codes"/"tags"/"labels" interchangeably.} used for the manual tagging of edit filters. +The purpose of this section is to provide a detailed overview of the labels\footnote{Here, I use the words "codes"/"tags"/"labels" interchangeably.} used for the manual tagging of edit filters. \subsection{A few notes on the labels/labeling process} I started coding strongly influenced by the coding methodologies applied by Grounded Theory scholars~\cite[42-71]{Charmaz2006} and mostly let the labels emerge during the process. %TODO describe in greater detail? should appear in methodology anyway? -In addition to that, for vandalism related labels, I used some of the vandalism types identified by the community in \url{https://en.wikipedia.org/wiki/Wikipedia:Vandalism_types}\cite{Wikipedia:VandalismTypes}. -However, I regarded the types more as an inspiration and haven't adopted the proposed typology 1:1 since I found some of the identified types quite general and more specific categories seemed to render more insights -(for example, I haven't adopted the 'Addition of text' category since it seemed more insightful(syn!) to have more specific labels such as 'hoaxing' or 'silly\_vandalism', see below for definition), -Moreover, I found some of the proposed types redundant -(For example, 'Sneaky vandalism' seems to overlap partially with 'hoaxing' and partially with 'sockpuppetry', 'Link vandalism' mostly overlaps with 'spam' or 'self\_promotion', although not always and for some reason, 'Personal attacks' are listed twice.) +In addition to that, for vandalism related labels, I used some of the vandalism types identified by the community in~\cite{Wikipedia:VandalismTypes}. +However, I regarded the types more as an inspiration and haven't adopted the proposed typology 1:1 since I found some of the identified types quite general and more specific categories seemed to render more insights. +For instance, I haven't adopted the 'addition of text' category since it seemed more insightful(syn!) to have more specific labels such as 'hoaxing' or 'silly\_vandalism', see below for definition. +Moreover, I found some of the proposed types redundant. +For example, 'sneaky vandalism' seems to overlap partially with 'hoaxing' and partially with 'sockpuppetry', 'link vandalism' mostly overlaps with 'spam' or 'self\_promotion', although not always and for some reason, 'personal attacks' are listed twice. -I labeled the dataset twice. +I have labeled the dataset twice. One motivation therefor was to return to it once I've gained better insight into the data and more detailed understanding of it and use this newly gained knowledge to re-evaluate ambiguous cases, i.e. re-label some data with codes that emerged later in the process. -Another motivation for this second round of labeling was to ensure at least some intra-coder integrity, since, unfortunately, multiple coders were not available~\cite{LazFenHo2017}. %TODO add page num; I also need to elaborata on methodoly here +This process (syn) of labeling is congrous with the simultaneous coding and data collection suggested by grounded theory scholars~\cite{}. +Another motivation for this second round of labeling was to ensure at least some intra-coder integrity, since, unfortunately, multiple coders were not available~\cite{LazFenHo2017}. %TODO add page num; I also need to elaborate on methodoly here -The first labeling, I looked through the data paying special attention to the name of the filters ('af\_public\_comments'), the comments ("af\_comments"), as well as the regular expression pattern constituting the filter and identified one or several possible labels. %TODO reword? I also looked at the comments, name and regex the second time.. +During the first labeling, I looked through the data paying special attention to the name of the filters ('af\_public\_comments'), the comments ("af\_comments"), as well as the regular expression pattern constituting the filter and identified one or several possible labels. %TODO reword? I also looked at the comments, name and regex the second time.. In ambiguous cases, I either labeled the filter with the code which I deemed most appropriate and a question mark, or assigned all possible labels (or both). There were also cases for which I could not gather any insight relying on the name, comments and pattern, since the filters were hidden from public view and the name was not descriptive enough. However, upon some further reflection, I think it is safe to assume that all hidden filters target a form of (more or less grave) vandalism, since the guidelines suggest that filters should not be hidden unless dealing with cases of persistent and specific vandalism where it could be expected that the vandalising editors will actively look for the filter pattern in their attempts to circumvent the filter\cite{Wikipedia:EditFilter}. -Therefore, during the second round of labeling I intend to label all such cases as 'hidden\_vandalism'. +Therefore, during the second round of labeling I labeled all such cases 'hidden\_vandalism' (all of them where nothing more specific was found). And then again, there were also cases where I could not determine any suitable label, since I didn't understand the regex pattern, none of the existing categories seemed to fit and I couldn't think of an insightful new category to assign. During the 1st labeling, these were labeled 'unknown', 'unclear' or 'not sure'. For the second round, I intend to unify them under 'unclear'. -- GitLab