Skip to content
Snippets Groups Projects
Commit cfffb55c authored by Lyudmila Vaseva's avatar Lyudmila Vaseva
Browse files

Clean up code book text

parent 79398eb0
No related branches found
No related tags found
No related merge requests found
......@@ -16,41 +16,40 @@ The purpose of this document/section is to provide an overview of the labels\foo
\subsection{A few notes on the labels/labeling process}
I started coding strongly influenced by the coding methodologies applied by Grounded Theory scholars~\cite[42-71]{Charmaz2006} and mostly let the labels emerge during the process.
In addition to that, for vandalism related labels, I used some of the vandalism types identified by the community in \url{https://en.wikipedia.org/wiki/Wikipedia:Vandalism_types}. %TODO: I need arguments why I haven't taken this 1:1
I started coding strongly influenced by the coding methodologies applied by Grounded Theory scholars~\cite[42-71]{Charmaz2006} and mostly let the labels emerge during the process. %TODO describe in greater detail? should appear in methodology anyway?
In addition to that, for vandalism related labels, I used some of the vandalism types identified by the community in \url{https://en.wikipedia.org/wiki/Wikipedia:Vandalism_types}.
However, I regarded the types more as an inspiration and haven't adopted the proposed typology 1:1 since I found some of the identified types quite general and more specific categories seemed to render more insights
(for example, I haven't adopted the 'Addition of text' category since it seemed more insightful(syn!) to have more specific labels such as 'hoaxing' or 'silly\_vandalism', see below for definition),
Moreover, I found some of the proposed types redundant
(For example, 'Sneaky vandalism' seems to overlap partially with 'hoaxing' and partially with 'sockpuppetry', 'Link vandalism' mostly overlaps with 'spam' or 'self\_promotion', although not always and for some reason, 'Personal attacks' are listed twice.)
I labeled the dataset twice.
One motivation therefor was to return to it once I've gained better insight into the data and more detailed understanding of it and use this newly gained knowledge to re-evaluate ambiguous cases, i.e. re-label some data with codes that emerged later in the process.
Another motivation for this second round of labeling was to ensure at least some intra-coder integrity, since, unfortunately, multiple coders were not available~\cite{LazFenHo2017}. %TODO add page num; I also need to elaborata on methodoly here
The first time, I looked through the data paying special attention to the name of the filters, the comments, as well as the regular expression pattern constituting the filter and identified one or several possible labels. %TODO reword? I also looked at the comments, name and regex the second time..
The first labeling, I looked through the data paying special attention to the name of the filters ('af\_public\_comments'), the comments ("af\_comments"), as well as the regular expression pattern constituting the filter and identified one or several possible labels. %TODO reword? I also looked at the comments, name and regex the second time..
In ambiguous cases, I either labeled the filter with the code which I deemed most appropriate and a question mark, or assigned all possible labels (or both).
There were also cases for which I could not gather any insight relying on the name, comments and pattern, since the filters were hidden from public view and the name was not descriptive enough.
However, upon some further reflection, I think it is safe to assume that all hidden filters target a form of (more or less grave) vandalism, since the guidelines suggest that filters should not be hidden unless dealing with cases of persistent and specific vandalism where it could be expected that the vandalising editors will actively look for the filter pattern in their attempts to circumvent the filter. %TODO: quote needed
However, upon some further reflection, I think it is safe to assume that all hidden filters target a form of (more or less grave) vandalism, since the guidelines suggest that filters should not be hidden unless dealing with cases of persistent and specific vandalism where it could be expected that the vandalising editors will actively look for the filter pattern in their attempts to circumvent the filter\cite{Wikipedia:EditFilter}.
Therefore, during the second round of labeling I intend to label all such cases as 'hidden\_vandalism'.
And then again, there were also cases where I could not determine any suitable label, since I didn't understand the regex pattern or could not press it into a category.
And then again, there were also cases where I could not determine any suitable label, since I didn't understand the regex pattern, none of the existing categories seemed to fit and I couldn't think of an insightful new category to assign.
During the 1st labeling, these were labeled 'unknown', 'unclear' or 'not sure'.
For the second round, I intend to unify them under 'unknown'.
For the second round, I intend to unify them under 'unclear'.
For a number of filters, it was particularly difficult to determine whether they were targeting vandalism or good faith edits.
The only thing that would have distinguished between the two would be the editor's motivation, which we have no way of knowing.
The only thing that would have distinguished between the two would have been the contributing editor's motivation, which we had no way of knowing.
During the first labeling session, I tended to label such filters with 'vandalism?', 'good\_faith?'.
For the cross-validation labeling (2nd time), I intend to stick to the "assume good faith" guideline myself %TODO: quote
and only label as vandalism cases where good faith can definitely be no longer assumed.
%TODO compare also with revising codes as the analysis goes along according to Grounded Theory
For the cross-validation labeling (2nd time), I intend to stick to the "assume good faith" guideline\footnote{\url{https://en.wikipedia.org/w/index.php?title=Wikipedia:Assume_good_faith&oldid=889253693}} myself
and only label as vandalism cases where good faith can definitely be no longer assumed/out of the question.
One characteristic/feature which guided me here is the filter action which represents the judgement of the edit filter manager(s).
Since communication is crucial when assuming good faith, all ambiguous cases which have "warn" as a filter action, will receive a 'good\_faith' label.
Since communication is crucial when assuming good faith, all ambiguous cases which have a less 'grave' filter action such as "warn" or "tag", will receive a 'good\_faith' label.
On the other hand, I will label all filters set to "disallow" as 'vandalism' or a particular type thereof, since the filter action is a clear sign that at least the edit filter managers have decided that seeking a dialog with the offending editor is no longer an option.
%TODO compare also with revising codes as the analysis goes along according to Grounded Theory
The second time, I labeled the whole data set again, this time using the here quoted compiled code book and assigned to every filter every label I deemed appropriate, without looking at the labels I assigned the first time around.
I then compared the labels from both coding sessions. %TODO And did what?; how big was the divergence between both coding sessions?; should I select one, most specific label possible? or allow for multiple labels? what should I do with the cases where it's not clear whether it's vandalism or good faith?
I then compared the labels from both coding sessions. %TODO And did what?; how big was the divergence between both coding sessions?; should I select one, most specific label possible? or allow for multiple labels?
%TODO quote M's methodology book
%TODO maybe take this out here
Some of the hidden filters deal with cases of personal attacks.
These are hidden to protect the persons involved. %TODO quote needed
%TODO disclose links to 1st and 2nd labelling
First round of labeling is available under \url{https://github.com/lusy/wikifilters/blob/master/filter-lists/20190106115600_filters-sorted-by-hits-manual-tags.csv}.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment