The digital closet, p.14

The Digital Closet, page 14

 

The Digital Closet
Select Voice:
Brian (uk)
Emma (uk)  
Amy (uk)
Eric (us)
Ivy (us)
Joey (us)
Salli (us)  
Justin (us)
Jennifer (us)  
Kimberly (us)  
Kendra (us)
Russell (au)
Nicole (au)



Larger Font   Reset Font Size   Smaller Font  

  Based on prior evidence, we can expect that if the datasets these algorithms are trained on do not include representative samples of populations or include biased representations of certain populations the result will be algorithms that exhibit systematic errors when it comes to identifying and classifying POC.128 It turns out that this is precisely the case with ImageNet. As I’ve demonstrated elsewhere, the synset on ImageNet that gathers images of Black people consists of images in low resolution that show few facial details, that have bodies positioned further away from the camera, that strongly feature celebrities (around 1 percent of the entire dataset is pictures of Barack Obama) and memes. Most inexcusable, however, is that over 6 percent of the entire category’s dataset is composed of images of white people dressed in blackface, largely due to images of Dutch people dressed as Zwarte Piet (i.e., “Black Pete”) during their Christmas celebrations.129 While it is difficult to estimate the impact this might have on adult content filters without more systematic evidence, it is safe to assume that adult content filters will have higher error rates for images of POC and BIPOC women in particular. It is thus likely that POC are experiencing higher false-positive rates where their nonpornographic content is unjustly flagged by automated content moderation systems.

  Another major bias that remains is the Western context of these visual datasets, which were largely compiled when images from the US dominated the internet. Google researchers have found that this has led to failures of image recognition systems to accurately identify scenes from other cultural contexts and geographic locations.130 One of the most frequently cited examples is the ability to identify things like weddings, brides, and grooms because of cultural and geographic differences in wedding attire and locations (see figure 2.5). Facebook’s AI lab has similarly found that image recognition algorithms also demonstrate embedded cultural biases when they label objects, as they are 15 to 20 percent more likely to incorrectly identify objects from non-Western and low-income communities.131 It is certainly the case that US- and Western-centric biases about what constitutes obscenity and pornography are embedded in these algorithms as well, as most exposed female-presenting breasts or buttocks and any genitalia will trigger the algorithm globally regardless of whether that particular community would consider that nudity to be pornographic. It will perform even worse at interpreting community standards regarding what constitutes artistic nudity.

  Figure 2.5

  Wedding photographs with Google’s label predictions based on Open Images. Source: Tulsee Doshi, “Introducing the Inclusive Images Competition,” Google AI Blog, September 6, 2018. https://ai.googleblog.com/2018/09/introducing-inclusive-images-competition.html.

  Google has increasingly been trying to combat this US and Western bias in its algorithms through its Crowdsource app, which asks users to contribute free labor akin to Amazon’s Mechanical Turk with tasks like translation, translation validation, handwriting recognition, sentiment evaluation, and landmark recognition.132 The company has plans to combat cultural bias in image recognition systems with its Inclusive Images Competition, where it challenges you to use its Open Images dataset to train an algorithm that can successfully be applied to two challenge datasets that Google collected from their global user community via Crowdsource.133 Of the nearly 35,000 images, fewer than fifty could be described as depicting scantily clad bodies. The most risqué image I found was an outline of Bart Simpson showing his butt, and more often, the closest thing to racy or risqué images were images of men in tank tops or sleeveless shirts playing basketball. While the Inclusive Images Competition is a worthwhile endeavor, it certainly does not contain the correct images to properly train a machine learning algorithm to make higher-order distinctions about types of nudity based on cultural contexts—like what is artistic and what is culturally normalized versus what is censorable for its prurience.

  Another important concern when it comes to the datasets specifically designed to train adult content filters is the consent of the people who are depicted in the images used to train the algorithms. While ImageNet and Open Images are the only publicly accessible image datasets that the image recognition algorithms at Google are known to employ, it is likely that they have propriety datasets in-house for this purpose as well. It is industry practice to ignore concerns over consent when collecting image datasets at this scale, and we might take an example from public adult image datasets used to train algorithms to produce deepfake porn as an example of the issues over consent that arise with these sorts of datasets. After scouring subreddits like r/GeneratedPorn and /AIGeneratedPorn and interviewing coders working on deepfake pornography, Motherboard found that many of these datasets included not only images without people’s consent but also images of porn from producers who have been accused of lying to women and coercing them into having sex on camera. These include images from sites like Girls Do Porn, which stands accused of human trafficking and rape. Perhaps most notably, they include images from Czech Casting because each Czech Casting video came with a photoset that was extremely appealing to machine learning programmers. As Samantha Cole explains,

  Each video of a woman also comes with a uniform set of photographs. Each set includes a photograph of the woman holding a yellow sign with a number indicating her episode number, like a mugshot board. Each set also includes photographs of the women posing in a series of dressed and undressed shots on a white background: right side, left side, front, back, as well as extreme close ups of the face, individual nipples, and genitalia.134

  The obsession with objectification in the mainstream heteroporn industry makes it a particularly appealing sample for adult image datasets, which, coupled with its sheer abundance and availability online, likely ensures that it is strongly over-represented in adult image datasets. Again, without stronger empirical evidence, it is hard to be certain, but this is a likely explanation for the high incidence of LGBTQIA+ content being unduly filtered by automated content moderation algorithms online that we’ll see in chapter 3. Having more mainstream heteroporn in the dataset means not only that it is better at identifying mainstream heteroporn but also that it is better at distinguishing between what is heterosexual porn and what is not heterosexual porn. It is likely less accurate at making the distinction between pornography and nonpornography when it comes to LGBTQIA+ content.

  While it is easily imaginable that Google’s public relations department would try to externalize the causality of these biases by laying them at the feet of the social collective whose data they mine or the digital laborers working through platforms like Amazon’s Mechanical Turk to label the data they train their algorithms on, this clearly is not the case. The meanings established for the dataset’s categories prefigure what data will eventually populate them. Take, for example, the term “closet queen,” one of three child synsets for the synset of “homosexual,” “homophile,” “homo,” and “gay” in WordNet. A closet queen is defined as “a negative term for a homosexual man who chooses not to reveal his sexual orientation.”135 In its 2011 dataset—the most easily accessible online—ImageNet had thirty-two images representing the term “closet queen” (see figure 2.6). While in its current instantiation, the “closet queen” category is not very threatening and perhaps even laughably bad, it is a very good indicator of the potential implications of such a dataset. Anonymous Mechanical Turk laborers are presented with images of human bodies and prompted to provide this derogatory label to those images based on the presumed sexual identity of the people depicted. The architecture of the dataset demands that stereotypes about what constitutes the successful performance of a particular sex, gender, and sexuality become hardwired into the visual dataset. Regardless of which images end up populating the category, the category’s very existence determines the way a computer will see—it will see stereotypically. For example, two men hugging, especially from behind, is a key indicator of closeted homosexuality.

  As Alexander Cho has shown, the “default publicness” of social media platforms can lead to LGBTQIA+ youth being outed by computers, which has tragic consequences and reinforces heteronormativity by encouraging youths with unsupportive families or communities to avoid producing or consuming any online content that might out them.136 This is exacerbated by a system increasingly data mining not only their sexuality but also the sexual semantics of all the web content they interact with. Beyond this, it is easy to imagine a much more intentional and nefarious future application of such a technology for the automation of outing, where people performing machine-readable acts of closeted queerness become automatically identifiable. While some might view this as the imaginary of dystopic science fiction, I would caution against such a quick dismissal. In 2017, Yilun Wang and Michal Kosinski engineered a deep neural network to analyze images of people’s faces and determine their sexual orientation. Their system used publicly available images from a dating site they have refused to name in hopes of slowing copycats.137 Wang and Kosinski’s system was able to accurately distinguish between “gay” and “heterosexual” men in 81 percent of cases and 74 percent of cases in women (compared to human success rates of 61 percent and 54 percent, respectively).138 While a number of scholars posted critical responses online to the preprint version of the article, demonstrating the limitations of the system,139 it is hard not to be frightened by the potential capacities of these systems, especially when their visual datasets include contextual data beyond faces (clothes, locations, comportments, other people, and so on), operate at web scale, and incorporate human semantic labeling through Amazon’s Mechanical Turk.

  Figure 2.6

  Images for “closet queen” synset on ImageNet.

  The United States and the United Kingdom, in particular, have a long history of selling technology with few to no strings attached to oppressive regimes around the world, ranging from IBM’s sale of tabulators to support the Third Reich’s “final solution,” as documented by Edwin Black, to more recent sales of metadata-based surveillance technologies by Britain’s Government Communications Headquarters, an intelligence and security organization, to Honduras, Bahrain, Saudi Arabia, China, and Qatar.140 And even if Western governments and companies were to exercise a previously unheard of self-restraint by refusing to sell computer vision technologies with such capabilities to regimes interested in the automation of outing, ImageNet is publicly available, as are many of the computer science write-ups of computer vision implementations built atop ImageNet. Anyone, from domestic neo-Nazi alt-right groups to oppressive governments abroad could build such a system themselves were it not available for purchase ready-made, provided ImageNet continues to build out its visual catalogue for terms like “closet queen.” Even if this image data is not used to out people, the counting and classifying of LGBTQIA+ people has a long history of rendering them susceptible to dehumanization and violence.141 This historical LGBTQIA+ precarity is only exacerbated now that private corporations control web-scale data collections and data analytics tools.142

  In both WordNet and ImageNet, as well as in the image recognition algorithms built atop them, like Google’s SafeSearch and Cloud Vision API, we can see the hacker ethic at work. Programmers are focused exclusively on implementing their ideas through the most practical means, largely ignoring the potential social harms these new technologies might cause or assuming that any ill effects can be patched on an ad hoc basis. The datasets that serve as the foundation for the majority of computer vision applications in the world today are riddled with biases, most notably biases about sex, gender, and sexuality. These biases deeply impact how the machine learning algorithms trained on them operate and likely can never be adequately patched after the fact. Biased data will always produce biased results. Without fostering interdisciplinary and diverse dialogue on what unbiased data might look like and large-scale investment in implementing less biased datasets, the infrastructure of the internet will continue to reinforce our preexisting prejudices and further marginalize LGBTQIA+ communities. Lastly, the most common industry response is that human reviewers are the answer for correcting these biases after the fact. However, as we’ll see in the next section and chapter 3, these human reviewers put into practice just as much heteronormative bias as the algorithmic systems they are meant to correct.

  The Heteronormativity of Content Review Labor

  Facebook’s “Human Algorithms”

  While few humanities and social sciences scholars have unpacked at length the operations of automated content filters, like those discussed above, a number of them have investigated their human counterparts, frequently composed of an underpaid, overburdened, and globalized labor force responsible for censoring broad swaths of the internet.143 I would contest that this latter phenomenon can best be understood in relation to efforts to automate content moderation through machine learning algorithms like natural language processing systems and computer vision or image recognition systems. The way that major tech companies envision and situate this labor, structure and schematize it, and mask it behind confidentiality agreements and compartmentalization will all strongly reflect these companies’ ideas and practices from designing algorithms. In fact, as we’ll see, companies like Facebook even describe these laborers as “human algorithms.” While the public archive surrounding Google’s Cloud Vision API allowed for unique insight into their automated content moderation practices, we will now turn to Facebook’s human content moderators because their response to criticism in the wake of the 2016 US election led to them opening up their content moderation practices to the public in unique ways that offer the best insight into how these “human algorithms” are at work within the company.

  Facebook only began publishing data on the enforcement of its Community Standards in 2018. In their first report, they found that between seven and nine content views out of every ten thousand were of pieces of content that contained violations of its adult nudity and pornography standards.144 In 2019, that number was up to eleven to fourteen views per ten thousand.145 In their latest report, the company notes that since October of 2017, between 0.05 and 0.15 percent of all Facebook content contained flagged violations of the adult nudity and sexual activity clauses of the Community Standards. In each quarter since then, the company has censored between twenty to forty million pieces of content. Around 96 percent of all flagged content was caught by Facebook’s automated content moderation system, with the remaining 4 percent being flagged by the user community.146 Many of these determinations are considered by the company to be obvious, but the ones that fall into gray areas are kicked up to human reviewers whose labor has been formalized by the company such that they are sometimes referred to as “human algorithms.”147

  The labor force performing these reviews of flagged content is largely hired through a California-based outsourcing firm named oDesk, which farms out content moderation labor for both Google and Facebook, largely hiring from call centers. Around 2012, Facebook employed only fifty moderators for the entire platform, largely from Asia, Africa, and Central America. They were paid $1 per hour plus incentives for reviewing certain amounts of content during their four-hour shifts that could bring their total pay up to $4 an hour—this was the same year Facebook had its initial public offering at $100 billion.148 In the wake of the 2016 election and Facebook’s numerous scandals ranging from Russian trolls to Cambridge Analytica, the company was employing 4,500 content moderators.149 By 2018, it was employing 7,500 with plans of increasing that number to 15,000.150 While these numbers have been released, the company maintains secrecy about the number and location of its moderating hubs. As the content moderation labor force has been increased, training has been streamlined. New contract laborers receive two weeks of training and a set of prescriptive manuals for assessing content. They also are given access to Facebook’s Single Review Tool (SRT), which allows them to act like human algorithms, categorizing content and checking whether it meets the appropriate sections of Facebook’s Community Standards.151

  These manuals and the SRT are created by young engineers and lawyers at the company who work to distill all content moderation into a series of yes-no decisions, thus producing an algorithm that can be run on the outsourced laborers’ bodies and minds. While Facebook claims that there are no time constraints on these laborers, inside information indicates that moderators have eight to ten seconds to review each piece of content (longer for videos), and they have targets of around a thousand pieces of reviewed content per workday. The materials that have been released have all been in English, requiring laborers not fluent in English to use Google Translate throughout their daily work and increasing the difficulty of accurately moderating content.152 It is worth noting as well that Facebook currently does not have enough training data prepared for its automated content flagging systems to be very accurate in languages other than English and Portuguese. Despite these linguistic difficulties, moderators are collectively required to review over ten million pieces of content per week and are expected to review every piece of flagged content on the platform within twenty-four hours. The company aims for a benchmark error rate of less than 1 percent, which means that there are still tens of thousands of moderation errors made each day by the platform’s human algorithms.153 As Max Fisher notes, “[M]oderators, at times relying on Google Translate, have mere seconds to recall countless rules and apply them to the hundreds of posts that dash across their screens each day.”154

 

Add Fast Bookmark
Load Fast Bookmark
Turn Navi On
Turn Navi On
Turn Navi On
Scroll Up
Turn Navi On
Scroll
Turn Navi On
183