From d7fb2e44253655433e94b8082d4133d9b515555e Mon Sep 17 00:00:00 2001 From: Lyudmila Vaseva <vaseva@mi.fu-berlin.de> Date: Mon, 25 Feb 2019 07:25:36 +0100 Subject: [PATCH] Transfer overleaf changes --- article/proceedings.tex | 237 +++++++--------------------------------- 1 file changed, 42 insertions(+), 195 deletions(-) diff --git a/article/proceedings.tex b/article/proceedings.tex index a1f0548..e973772 100644 --- a/article/proceedings.tex +++ b/article/proceedings.tex @@ -146,25 +146,49 @@ from \url{https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2009-03-23/ --> so, purpose (at least at the beginning): fight vandalism; find/.. common newbie mistakes -\section{Epistemological interest} +%************************************************************************ + +\section{Intended Contributions} +%Epistemological interest What do we want to know? +Context of work: algorithmic quality-control mechanisms (bots, ORES, humans) → filter? + +\begin{itemize} + \item Description of how filters integrate into the algorithmic quality control mechanism in Wikipedia + \item Do filters work the desired way/help for a smoother Wikipedia service or is it a lot of work to maintain them and the usefulness is questionable? + \item What can we filter with a REGEX? And what not? Are regexes the suitable technology for the means the community is trying to achieve? + \item Filter are classical rule-based systems. What are suitable areas of application for such rule-based system in contrast to ML systems? +\end{itemize} + + +What can we study? + +\begin{itemize} + \item Discussions on filter patterns? On filter repercussions? + \item Whether filters work the desired way/help for a smoother Wikipedia service or is it a lot of work to maintain them and the usefullness is questionable? + \item What can we filter with a REGEX? And what not? Are regexes the suitable technology for the means the community is trying to achieve? + \item add also "af\_enabled" column to filter list; could be that the high hit count was made by false positives, which will have led to disabling the filter (TODO: that's a very interesting question actually; how do we know the high number of hits were actually leggit problems the filter wanted to catch and no false positives?) +\end{itemize} + + +\begin{comment} * Think about: what's the computer science take on the field? How can we design a "better"/more efficient/more user friendly system? A system that reflects particular values (vgl Code 2.0, Chapter 3, p.34)? * go over notes in the filter classification and think about interesting controversies, things that attract the attention * what are useful categories * GT is good for tackling controversial questions: e.g. are filters with disallow action a too severe interference with the editing process that has way too much negative consequences? (e.g. driving away new comers?) * What can we study? - * Discussions on filter patterns? On filter repercussions? - * Whether filters work the desired way/help for a smoother Wikipedia service or is it a lot of work to maintain them and the usefullness is questionable? * Question: Is it worth it to use a filter which has many side effects? - * What can we filter with a REGEX? And what not? Are regexes the suitable technology for the means the community is trying to achieve? +\end{comment} -* add also "af\_enabled" column to filter list; could be that the high hit count was made by false positives, which will have led to disabling the filter (TODO: that's a very interesting question actually; how do we know the high number of hits were actually leggit problems the filter wanted to catch and no false positives?) +%************************************************************************ +\section{Algorithmic quality-control mechanisms on Wikipedia} +%Context -\section{Putting things in perspective} +%Literature review! \subsection{Vandalism on Wikipedia} @@ -222,198 +246,13 @@ examples of disruptive editing: \subsection{Harassment and bullying} -\url{https://en.wikipedia.org/wiki/Wikipedia:WikiBullying} - -"This is an explanatory supplement to the Wikipedia:Civility and Wikipedia:Ownership of articles policies. -This page is intended to provide additional information about concepts in the page(s) it supplements. This page is not one of Wikipedia's policies or guidelines, as it has not been thoroughly vetted by the community." - -"WikiBullying is using Wikipedia to threaten and/or intimidate other people, whether they are Wikipedia editors or not." -"If you feel that you are being bullied or another user has threatened you with bodily harm, it is important that you report them immediately to the Incidents page on the Administrator's Noticeboard so the matter can be properly dealt with." -"All complaints about bullying, even those which turn out to be unjustified should be treated with seriousness and respect, and any WP:BOOMERANG on individuals who have complained they are being bullied is contrary to the principles of respect for thoughtful intellectual discourse that Wikipedia represents. No one should ever fear coming forward to make the community aware of a bullying concern." - -"There are essentially two forms of bullying on Wikipedia: attacks against the individual editor by targeting a single user, or giving the perception of power aimed at the entire Wikipedia community at large." - -"Forms of WikiBullying: - - 1.1 Asserting ownership: "No article on Wikipedia is owned by any editor. Any text that is added to Wikipedia is freely licensed under WP:CC-BY-SA and other users are free to add, remove or modify it at will, provided that such editing is done responsibly." - 1.2 POV Railroading: "Point of View (POV) railroading refers to the use of bullying tactics to discredit an editor with an opposing viewpoint or eliminate them from a discussion." - 1.3 False accusations: "False accusations are a common form of bullying on Wikipedia, although people do sometimes make honest mistakes. Accusations of misconduct made without evidence are considered a serious personal attack." - 1.4 Misrepresentation: "Quoting others out of context and other forms of straw man argument are against the civility policy. Again, try to find out if there has been a misunderstanding." - 1.5 Making "no-edit" orders contrary to policy: "Another form of wikibullying is to issue no-edit orders which are not backed by current policies (or guidelines). A "no-edit" order is a message sent to a single editor (who is not banned) or to the Wikipedia community not to edit at all or in a particular manner, or not to edit a particular page or part of a page at all or in a particular manner. These messages can be sent to a user's talk page, placed on an article's talk page, or in hidden text that would not be missed if an editor attempts to edit the article or section. No editor may unilaterally take charge over an article or part of an article by sending no-edit orders. - -There are some no-edit orders that are acceptable. For example, if a consensus has already been formed regarding a topic, and a single editor has constantly stubbornly defied the ruling, politely discussing this one-on-one on the user's talk page is acceptable." - 1.6 Wikihounding: "Wikihounding is the singling out of one or more editors, and joining discussions on multiple pages or topics they may edit or multiple debates where they contribute, to repeatedly confront or inhibit their work. This is with an apparent aim of creating irritation, annoyance or distress to the other editor. Wikihounding usually involves following the target from place to place on Wikipedia." - 1.7 Use of hidden text: "Some unacceptable uses are: - - Telling all other editors not to edit the page - Telling others not to remove a section of the article, as if the section were written in stone - Telling others that a page should not be proposed for deletion, when this may be doubted by others - Writing new guidelines that apply specifically to the page and branding them as "policy." In the past, policies that have been proposed for a single article have failed to attain a consensus." - 1.8 Real life threats: "The Wikimedia Foundation, if need be, will investigate or arrange for law enforcement to investigate threats of violence." -" - -\subsection{The bigger picture: Upload filters} +%************************************************************************ \section{Methodology} \subsection{Grounded Theory} -"This book provides \textit{a} way of doing grounded theory" (p.9)~\cite{Charmaz2006} - -Preface -"At each phase of the research journey, \textit{your} reasings of your work guide your next moves."(p.xi) -"In short, the finished work is a construction–yours." (p.xi) - -Chapter 1 -"we build levels of abstraction directly from the data" (p.3) - -"Glaser and Strauss aimed to move qualitative inquiry beyond descriptive studies into the reals of explanatory theoretical frameworks,"(p.6) - -Criteria: -"a completed grounded theory met the following criteria: a close fit with the data, usefulness, conceptual density, durability over time, modifiability, and explanatory power." (p.6) - -"assumed that process, not structure, was fundamental to human existence" (p.7) -"A process consists of unfolding temporal sequences that may have identifiable markers with clear beginnings and endings and benchmarks in between. [...] Thus, single events become linked as part of a larger whole." (p.10) - -"we are part of the world we study and the data we collect. We \textit{construct} our grounded theories through our past and present involvements and interactions with people, perspectives, and research practices."(p.10) -"My approach explicitely assumes that any theoretical rendering offers and \textit{interpretive} portrayal of the studied world, not an exact picture of it." (p.10) - -"I advocate gathering rich–detailed and full–data and placing them in their relevant situational and social contexts." (p.10-11) // cooking data with care - -Chapter 2: -"What do you want to study? Which research problem might you pursue? [...] How do you use methods to gather rich data?" (p.13) -"Obtaining rich data means seeking 'thick' description (Geertz, 1973)" (p.14) - -"we first aim to see this world as our research participants do–from the inside."(p.14) -"You might learn that what outsiders assume abouth the world you study may be limited, imprecise, mistaen, or egregiously wrong."(p.14) - -"\textit{How} you collect data affects \textit{which} phenomena yo will see, \textit{how}, \textit{where}, and \textit{when} you will view them, and \textit{what} sense you will make of them." (p.15) - -"We are not scientific obeservers who can dismiss scrutiny of our values by claiming scientific neutrality and authority." (p.15) - -"grounded theorists often begin their studies with certain research interests and a set of general concepts" (p.16) -"need to remain as open as possible to whatever we see" (p.17) -"We do not force preconceived ideas and theories directly upon our data." (p.17) -"The quality–and credibility–of your study starts with the data." (p.18) -"Skimpy data may give you a wonderful start but do not add up to a detailed study or a nuanced grounded theory." (p.18) -"What kind of data stands as rich and sufficient?: -* Have I collected enough background data about persons, processes, and settings to have ready recall and to understand and portray the full range of contexts of the study? // what actors are there: admins; editors (good faith/vandals); edit filter managers (how do I become a member of this group?); people requesting an edit filter -* Have I gained detailed descriptions of a range of participants' views and actions? // I have got traces.. ; TODO maybe look for which filters have a discussion/filter request/sock puppet investigation linked to them in the comments; maybe also conduct IVs? -* Do the data reveal what lies beneath the surface? -* Are the data sufficient to reveal changes over time? // TODO: get hold of the log table!!!! -* Have I gained multiple views of participants' range of actions? // + filter actions! (filters are also actors according to ANT) -* Have I gathered data that enable me to develop analytic categories? -* What kinds of comparisons can I make between data? How do these comparisons generate and inform my ideas? -" (p.18-19) - -"We demonstrate our respect by making concerted efforts to learn about their views and actions and to try to understand their lives from their perspectives." (p.19) -"we must test our assumptions about the worlds we study, not unwittingly reproduce these assumptions."(p.19) -"It means discovering what our research participants take for granted or do not state as well as what they say and do."(p.19) -"We try to understand but do not ncessarily adopt or reproduce their views as our own."(p.19) - -starting questions: -" -* What's happening here? -* What are the basic social processes? -* What are the basic social psychological processes" (p.20) - -"Everything may seem significant–or trivial."(p.20) -TODO: Look for this in potential IVs: -" -* From whose point of view is a given process fundamental? From whose is it marginal? -* How do the observed social processes emerge? How do participants' actions construct them? -* Who exerts control over these processes? Under what conditions? -* What meanings do different participants attribute to the process? How do they talk about it? What do they emphasize? What do they leave out? -* How and when do their meanings and actions concerning the process change? -"(p.20) -"Do they provide an idealized picture wrapped in a public relations rhetoric" (p.20) -"When does a basic social process become visible or change?"(p.20) - -"Actions may defy stated intentions. Different participants have different vantage points–and, sometimes, competing agendas. Do they realize they hold competing agendas? How do they act on them? When, if ever, does conflict emerge?" - -TODO: Look for people who have triggered smth repeatedly. Could it be good faith? Or were they testing? What happened afterwards? - -Field notes in GT: -" -* record individual and collective action -* contain full, detailed notes with anecdotes and observations -* emphasize significant processes occurring in the setting -* address what participants define as interesting and/or problematic -* attend to participants' language use -* place actors and actions in scenes and contexts -* become progressively focused on key analytic ideas -" (p.22) - -TODO: show the actions and process that construct the topic - -"show how people move through the organization–or are moved through it" (p.23) -"seeing data everywhere and nowhere" (p.23) - -GT: -1) compare data from the beinning of the research -2) compare data with emerging categories -3) demonstrate relations between concepts and categories (p.23) - -TODO: answer following questions (p.24) -" -* What is the setting of action? When and how does action take place? -* What is going on? What is the overall activity being studied, the relatively long-term behavior about which participants organize themselves? What specific acts comprise this activity? --> maintaining a community-sources encyclopedia? -* What is the distribution of participants over space and time in these locales? -* How are actors [research participants] organized? What organizations effect, oversee, regulate or promote this activity? -* How are members stratified? Who is ostensibly in charge? Does being in charge vary by activity? How is membership achieved and maintained? -* What do actory pay attention to? What is important, preoccupying, critical? -* What do they pointedly ignore that other persons might pay attention to? -* What symbols do actors invoke to understand their worlds, the participants and processes whithin them, and the objects and events they encounter? What names do they attach to objects, events, persons, roles, settings, equipment? -* What practices, skills, strategems, methods of operation do actors employ? -* Which theories, motives, excuses, justifications or other explanations do actors use in accounting for their participation? How do they explain to each other, not to outside investigators, what they do and why they do it? -* What goals do actors seek? When, form their perspective, is an act well or poorly done? How do they judge action–by what standards, developed and applied by whom? -* What rewards do various actors gain from their participation?" - -"intensive intervie fosters eliciting each participant's interpretation of his or her experience"(p.25) - -"Researchers treat extant texts \textit{as} data to address their research questions although these texts were produced for other–often very different–purposes." (p.35) -"As acounts, texts tell something of intent and have intended–and perhaps unintended–audiences."(p.35) - -"interview respondents may wish to appear affable, intelligent, or politically correct and thus shape their responses accordingly" (p.36) - -"search for reasons for disparities between observed realities and written responses"(p.36) - -additional types of data we can use: -public records, government reports, organizational documents, mass media, literature, autobiographies, personal correspondence, Internet discussions, and earlier qualitative materials from data banks. - -TODO: Answer for myself: -" -* What are the parameters of the information? -* On what and whose facts does this information rest? -* What does the information mean to various participants or actors in the scene? -* What does the information leave out? -* Who has access to facts, records, or sources of the information? -* Who is the inteded audience for the information? -* Who benefits from shaping and/or interpreting this information in a particular way? -* How, if at, all does the information affect actions? -"(p.37-38) - -"To the extent possible, we need to situate texts in their contexts." (p.39) -"Where do the data come from? Who participated in shaping them? What did the authors intend? Have participants provided sufficient information for us to make a plausible interpretation? And do we have sufficient knowledge of the relevant worlds to read their words with any understanding?"(p.39) -"Much textual analysis is without context, or worse, out of context. [...] Providing a description of the times, actors, and issues gives you a start. Multiple methods help, such as intervieweing key participants, and using several types of documents also helps." (p.39) - -TODO: Questions to ask of a text (p.39-40): -" -* How was the text produced? By whom? -* What is the ostensible purpose of the text? Might the text serve other unstated or assumed purposes? Which ones? -* How does the text represent what its author(s) assumed to exist? Which meanings are embedded within it? How do those meanings reflect a particular social, historica, and perhaps organizational context? -* What is the structure of the text? -* How does its structure shape what is said? Which categories can you discern in its structure? What can you glean from these categories? Do the categories change in sequential texts over time? How so? -* Which contextual meanings does the text imply? -* How does its content construct images of reality? -* Which realities does the text claim to represent? How does it represent them? -* What, if any, unintended information and meanings might you see in the text? -* How is language used? -* Which rules govern the constructuion of the text? How can you discern them in the narrative? How do these rules reflect both tacit assumptions and explicit meanings? How might they be related to other data on the same topic? -* When and how do telling points emerge in the text? -* What kinds of comparisons can you make between texts? Between different texts on the same topic? Similar texts at different times such as organizational annual reports? Between different authors who address the same questions? -* Who benefits from the text? Why? -" +%************************************************************************ \section{Data} @@ -424,6 +263,8 @@ What is the best place herefor? * What other data sources can I explore? * Interview with filter managers? with admins? with new editors? +%************************************************************************ + \section{What is an edit filter} \textbf{Definition} @@ -467,6 +308,8 @@ And the currently configured filter actions are: ``disallow''. \textbf{Difference bot/filter} +%************************************************************************ + \section{Edit filter governance} \textbf{Interesting questions:} @@ -534,6 +377,8 @@ CAT: https://ca.wikipedia.org/wiki/Especial:Usuaris/abusefilter (currently: 4 us apart from that: current ongoing discussions on single filters/problems that may require a filter +%************************************************************************ + \section{Technical layer} \subsection{The edit filter mediawiki extention} @@ -687,8 +532,9 @@ So far, I haven't managed to trigger a filter with a different action. (Interesting side note: editing via TOR is disallowed: "Your IP has been recognised as a TOR exit node. We disallow this to prevent abuse" or similar, check again for wording. Compare: "Users of the Tor anonymity network will show the IP address of a Tor "exit node". Lists of known Tor exit nodes are available from the Tor Project's Tor Bulk Exit List exporting tool." \url{https://en.wikipedia.org/wiki/Wikipedia:Vandalism}) +%************************************************************************ -\section{Edit filters on the English Wikipedia: State of the art} +\section{Descriptive overview of Edit Filters on the English Wikipedia} \textbf{Interesting questions} \begin{itemize} @@ -1036,12 +882,13 @@ harassment! mailinglist There's also separate documentation of long term abuse (see notes) -\section{Critical discussion} +\section{Discussion} * why get certain filters (and not others?) * do filter solve effectively the task they were conjured up to life to fulfil? * what kinds of biases/problems are there? * who is allowed to edit edit filters? +\subsection{The bigger picture: Upload filters} \section{Conclusion} -- GitLab