Skip to content
Snippets Groups Projects
Commit 2a5cf161 authored by Lyudmila Vaseva's avatar Lyudmila Vaseva
Browse files

Refactor chapter 3

parent ebf36bab
No related branches found
No related tags found
No related merge requests found
\chapter{Methods}
%alt: Theoretical background?
\label{chap:methods}
This chapter describes the methodology applied for the study of edit filters.
%This chapter describes the methodology applied for the study of edit filters.
This chapter provides the theoretical background for the study of edit filters.
I make use of trace ethnography, described in the following section, in the study of documentation and discussion archives conducted in chapter~\ref{chap:filters} in order to understand the role of edit filters in the quality control ecosystem of English Wikipedia.
The emergent coding introduced in section~\ref{sec:gt}, combined with trace ethnography, is employed in chapter~\ref{chap:overview-en-wiki} for determining what tasks edit filters take care of.
%\section{Open Science}
The whole work tries to adhere to the principles of open science and reproducible research.
According to the definition of Bartling and Friesike provided in their book \textit{Opening Science}~\cite{BarFri2014}, open science is primarily characterised, unsurpisingly, by its openness.
There is an open communication of the methods and results in every stage of the research project, allowing, importantly, for an easier disclosure of negative results.
The code for all data processing and computational anlyses I have done, as well as other artefacts I have used or compiled have been openly accessible in the project's repository since the beginning of the present research~\cite{gitlab}
and can be re-used under a free license. %TODO (which one?)
Anyone interested can follow the process and/or use the data or scripts to verify my computations or run their own and thus continue this research along one of the directions suggested in section~\ref{sec:further-studies} or in a completely new one.
%******************************************************
\section{Trace Ethnography}
\label{sec:trace-ethnography}
The main theoretical framework for the analysis presented in chapters~\ref{chap:filters} and~\ref{chap:overview-en-wiki} constitutes the trace ethnography.
The concept was first utilised by Geiger and Ribes in their 2010 work ``The work of sustaining order in Wikipedia: the banning of a vandal''~\cite{GeiRib2010} and introduced in detail in a 2011 paper~\cite{GeiRib2011} by the same authors.
Trace ethnography constitutes the main theoretical framework for the analysis presented in chapters~\ref{chap:filters} and~\ref{chap:overview-en-wiki}.
The concept was first utilised by Geiger and Ribes in their 2010 paper ``The work of sustaining order in Wikipedia: the banning of a vandal''~\cite{GeiRib2010} and introduced in detail in a 2011 article~\cite{GeiRib2011} by the same authors.
They define trace ethnography as a methodology which
``combines the richness of participant-observation
with the wealth of data in logs so as to reconstruct
patterns and practices of users in distributed
sociotechnical systems''.
It is supposedly especially practical for research in such distributed systems, since there direct partipants observation is impractical, costly and tend to miss phenomena which manifest themselves in the communication between spatially separated sites rather than in the single location.
It is supposedly especially useful for research in such distributed systems, since there direct partipants observation is impractical, costly and tend to miss phenomena which manifest themselves in the communication between spatially separated sites rather than in the single location.
In~\cite{GeiRib2011} the scholars use documents and document traces: MediaWiki revision data, more specifically—edit summary fields of the single revisions and markers/codes left within the edit summaries; documentation of semi-automated software tools; and even use the tools (Huggle and Twinkle) themselves to observe what traces these leave;
in order to reconstruct quite exactly single strands of actions and comprehend how different agents on Wikipedia work together towards the blocking of a single malicious user.
In~\cite{GeiRib2011} the scholars use documents and document traces: MediaWiki revision data, more specifically—edit summary fields of the single revisions and markers/codes left programatically within the edit summaries; documentation of semi-automated software tools; and even use the tools (Huggle and Twinkle) themselves to observe what traces these leave;
in order to reconstruct quite exactly individual strands of actions and comprehend how different agents on Wikipedia work together towards the blocking of a single malicious user.
They refer to ``turn[ing] thin documentary traces into “thick descriptions” of actors and events''.
What is more, these traces are used by Wikipedians themselves in order to do their work efficiently.
Geiger and Ribes underline the importance of insider knowledge when reconstructing actions and processes based on the traces,
the need for ``an ethnographic understanding of the activities, people, systems, and technologies which contribute to their production''.
They alert that via trace ethnography only that can be observed which is recorded by the system and records are always incomplete.
They alert that via trace ethnography only that can be observed which is recorded by the system and that records are always incomplete.
This consideration is elaborated on in more detail in~\cite{GeiHal2017}, where Geiger and Halfaker make the point that ``found data'' generated by a system for a particular purpose (e.g. revision history whose purpose is to keep a track of who edited what when and possibly revert (to) a particular revision) is rarely ideally fitting as a dataset to answer the particular research question of a scientist.
The importance of interpreting data in their corresponding context and the pitfalls of tearing analysis out of context are also underlined by Charmaz in~\cite{Charmaz2006}.
She cites intersecting(syn) data from multiple sources/of different types as a possible remedy for this problem. %TODO re-phrase, sounds weird
She cites intersecting data from multiple sources/of different types as a possible remedy for this problem.
Geiger and Ribes~\cite{GeiRib2011} also warn of possible privacy breaching through thickening traces:
although records they use to reconstruct paths of action are all open, the thick descriptions compiled can suddenly expose a lot of information about single users which never existed in this form before and who never gave their informed consent for their data being used this way.
Last but not least, Geiger and Ribes~\cite{GeiRib2011} also warn of possible privacy breaching through thickening traces:
Although the records they use to reconstruct paths of action are all open, the thick descriptions compiled can suddenly expose a lot of information about single users which never existed in this form before and the individuals concerned never gave their informed consent for their data being used this way.
%******************************************************
\section{Emergent Coding}
\label{sec:gt}
In order to gain a detailed understanding of what edit filters are used for on English Wikipedia, in chapter~\ref{chap:overview-en-wiki} all filters are labeled via emergent coding.
In order to gain a detailed understanding of what edit filters are used for on English Wikipedia, in chapter~\ref{chap:overview-en-wiki} all filters are scrutinised and labeled via a technique called emergent coding.
Coding is the process of labeling data in a systematic fashion in an attempt to comprehend it.
It is about seeking patterns in data and later—trying to understand these patterns and the relationships between them.
Emergent coding is one possible approach for making sense of data in content analysis~\cite{Stemler2001}.
Its key characteristic is letting the codes emerge during the process contrasted to starting with a set of preconcieved codes (also known as ``a priori codes'').
Scholars regard this as useful because that way the danger of trying to press data into predefined categories while potentially overlooking other, better fitting codes is reduced~\cite[p.17]{Charmaz2006}.
Instead, the codes stem directly from observations of the data.
Traditionally in content analysis, there are at least two researchers involved in an emergent coding process.
During an initial examination of the data, they independently come up with preliminary codes~\footnote{I use the words ``codes'', ``labels'', ``tag'', and ``categories'' interchangeably.} which are then compared and discussed until a consolidated code book is developed.
Then, all researchers involved use this code book to—again independently—label the data.
At the end, their labelings are compared and the reliability of the coding is verified.
If the results don't reach a pre-defined agreement level, differences are discussed and previous steps are repeated.
% It's a GT thing
%Subsequently, a more abstract coding phase—the so called axial coding—can take place.
%It ``relates categories to subcategories, specifies the properties and dimensions of a category''~\cite[p.60]{Charmaz2006} and thus organises conceptually the codes established in previous steps. % my organisation of the codes in vandalism/good faith/maintenance/unknown
%Took out GT and refered to coding in content analysis instead, since no theory building actually takes place
\begin{comment}
Different variations of coding are widely used by grounded theory scholars for making sense of (mainly qualitative) data.
Grounded theory describes a myriad/... of frameworks/... for building a scientific theory \emph{grounded} in (mostly qualitative) data analysis.
Here, no finished theory is developed, but instead I employed a grounded theory inspired coding process in order to understand what edit filters filter.
I've followed the coding guidelines proposed/described by Charmaz in~\cite[p.42-71]{Charmaz2006}.
%TODO edit: I don't really do anything with constructivist gt, so kick out "I've chosen this interpretation of GT"
I've chosen Charmaz's interpretation of grounded theory (she speaks of ``grounded theor\emph{ies}'' and calls her own constructivist rendering of it ``\emph{a} way of doing grounded theory'') precisely because of her acknowledgement of the subjective nature of every (piece of) research which is shaped by the believes, background and theoretical understanding of the people who conduct it, who always \emph{interpret} the subject they study rather than give an exact portrayal of it:
``we are part of the world we study and the data we collect. We \textit{construct} our grounded theories through our past and present involvements and interactions with people, perspectives, and research practices''~\cite[p.10]{Charmaz2006}
%TODO compare with section above that records are incomplete
Grounded theory describes a myriad of methodological frameworks applied in the social sciences for building/constructing a scientific theory \emph{grounded} in the methodical gathering and analysis of data.
She advocates for ``gathering rich—detailed and full—data and placing them in their relevant situational and social contexts''~\cite[p.10-11]{Charmaz2006} which is in line with Geiger and Ribes thick descriptions generated by trace ethnography
I've chosen Charmaz's interpretation of grounded theory (she speaks of ``grounded theor\emph{ies}'' and calls her own constructivist rendering of it ``\emph{a} way of doing grounded theory'') precisely because of her acknowledgement of the subjective nature of every piece of research which is shaped by the believes, background and theoretical understanding of the people who conduct it.
Researchers always \emph{interpret} the subject of their inquiry rather than give an exact portrayal of it:
``we are part of the world we study and the data we collect. We \textit{construct} our grounded theories through our past and present involvements and interactions with people, perspectives, and research practices''~\cite[p.10]{Charmaz2006}.
Charmaz advocates for ``gathering rich—detailed and full—data and placing them in their relevant situational and social contexts''~\cite[p.10–11]{Charmaz2006} which is in line with Geiger and Ribes thick descriptions generated by trace ethnography
\footnote{As a matter of fact, both Charmaz, and Geiger and Ribes refer to ``thick descriptions'' which were coined as a term by~\cite{Geertz1973}}.
Coding is the process of labeling data in an attempt to comprehend it in a systematic fashion.
It is about seeking patterns in data and later—trying to understand these patterns and the relationships/correlations between them.
In the present inquiry, I applied emergent coding in chapter~\ref{chap:overview-en-wiki} when trying to make sense of the tasks EN Wikipedia's edit filters are employed for.
Key characteristic of the method are to let the codes emerge during the process contrasted to starting with a set of preconcieved codes.
Here, no finished theory is developed, but instead I employed a grounded theory inspired coding process in order to understand what edit filters filter.
Coding is the process of labeling data in a systematic fashion in an attempt to comprehend it.
It is about seeking patterns in data and later—trying to understand these patterns and the relationships between them.
In the present inquiry, I apply emergent coding in chapter~\ref{chap:overview-en-wiki} when trying to make sense of the tasks EN Wikipedia's edit filters are employed for.
Key characteristic of the method are to let the codes emerge during the process contrasted to starting with a set of preconcieved codes (also known as ``a priori codes'').
Scholars regard this as useful because that way the danger of trying to press data in predefined categories while potentially overlooking other, better fitting codes is reduced.
Instead, the codes emerge/stem directly from observations of the data and stay linguistically close to the data.
A coding process is comprised of at least two phases: initial and focused coding.
During the initial phase (syn!) fragments of data are studied closely for ``their analytic import'' and potential promising codes.
Instead, the codes stem directly from observations of the data. % and during the intial coding phase stay linguistically close to the data.
A coding process (emergent or not) is comprised of at least two steps: initial and focused coding~\cite[p.42–70]{Charmaz2006}.
During the initial phase, fragments of data are studied closely for ``their analytic import'' and potential promising codes.
During focused coding, the most promising initial codes are extensively tested against the data.
Since coding and analysis take place simultaneously, it is also part of the process/common to come back later and re-code parts of the data with labels that have emerged (syn) later (syn) in the process.
Finally, a third coding phase took place—the so called axial coding which ``relates categories to subcategories, specifies the properties and dimensions of a category''~\cite[p.60]{Charmaz2006}. % my organisation of the codes in vandalism/good faith/maintenance/unknown
\section{Open Science}
The whole work tries to adhere to the principles of open science and reproducible research.
According to the definition of Bartling and Friesike provided in their book \textit{Opening Science}~\cite{BarFri2014}, open science is primarily characterised, unsurpisingly, by its openness
There is an open communication of the methods and results in every stage of the research project, allowing, importantly, for an easier disclosure of negative results.
The code for all data processing and computational anlyses I have done, as well as other artefacts I have used or compiled have been openly accessible in the project's repository since the beginning of the present research~\cite{gitlab}
and can be re-used under a free license (which one?).
Anyone interested can follow the process and/or use the data or scripts to verify my computations or run their own and thus continue this research along one of the directions suggested in section~\ref{sec:further-studies} or in a completely new one.
Since coding, data gathering and analysis take place simultaneously, it is common to go back and re-code parts of the data with more insightful labels that have emerged later in the process.
Subsequently, a more abstract coding phase—the so called axial coding—can take place.
It ``relates categories to subcategories, specifies the properties and dimensions of a category''~\cite[p.60]{Charmaz2006} and thus organises conceptually the codes established in previous steps. % my organisation of the codes in vandalism/good faith/maintenance/unknown
\end{comment}
......@@ -427,6 +427,16 @@
\url{https://en.wikipedia.org/w/index.php?title=Wikipedia:Wikipedia_Signpost/2009-03-23/Abuse_Filter&oldid=878994386}}
}
@article{Stemler2001,
title = {An overview of content analysis},
author = {Stemler, Steve},
journal = {Practical assessment, research \& evaluation},
volume = {7},
number = {17},
pages = {137--146},
year = {2001}
}
@inproceedings{WestKanLee2010,
title = {Stiki: an anti-vandalism tool for Wikipedia using spatio-temporal analysis of revision metadata},
author = {West, Andrew G and Kannan, Sampath and Lee, Insup},
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment