Precision, Recall, and Inclusivity

[This is a guest post by Talia’s girlfriend Annie, who is maintaining this blog while Talia is away at Middlbury Language Schools]

In my last post on here I mentioned that my job involves writing programs that identify important information from text documents. When testing these programs, there are two main ways of evaluating the accuracy of their output: precision and recall. Put simply, precision refers to what fraction of the retrieved information is relevant to what we’re looking for, while recall refers to what fraction of the relevant information in the original text is retrieved by the program. Ideally, one would want a program to have both high precision and high recall – that is, for it to return most or all of the information the user is looking for and little or no irrelevant information – but this isn’t always possible. More realistically, you’ll often face a tradeoff between precision and recall. You can optimize for precision, and make sure that everything in your output is the sort of thing the user is looking for, but then you run the risk of overlooking other information that might be slightly less obviously relevant, but still within the scope of the user’s query. You can optimize for recall, and make sure to return every piece of information that could possibly be relevant, but then there’s the chance you’ll also turn up a lot of junk data along with the useful stuff. Or you can try to strike a balance between precision and recall, getting each one high enough to be useful without sacrificing the other.

Which of these evaluation metrics is most important depends on what task a program is intended to accomplish, and what the end user’s particular needs are. For instance, suppose you’re writing a program to automatically filter out offensive content on social media. (Some of my classmates and I wrote a program like this for a class project once, although it didn’t end up working very well.) If a social media company is going to be using your program as the first line of moderation for everything that gets posted on their platform, you likely don’t want it to become a heavy-handed censor. In this case you’ll probably want to err on the side of optimizing for precision, filtering out only the things that are well and truly beyond the pale and letting human moderators make the call on the edge cases. On the other hand, if this program is intended as an optional add-on that users of a social media platform can choose to enable, you’ll probably want to err on the side of optimizing for recall. The people who are likeliest to use such an add-on tend to have a very strong need to filter out certain content – say, people with anxiety or trauma who are trying to avoid seeing content that is triggering for them – so you want to make sure all of this content gets filtered, even if that means blocking some harmless stuff as well. High precision isn’t inherently better than high recall, or vice versa; it all depends on the specific goal you’re trying to achieve.

I’m now going to completely switch gears for a minute and move from this relatively dry, technical subject to one with much more emotional heft: inclusivity in the LGBTQ+ community. (I’m focusing on this community because I’m a part of it, but I doubt it’s the only community this discussion will be relevant for.) If you’ve spent any time discussing LGBTQ+ politics – especially on the internet, where political discourse is a full-contact sport – you’re no doubt aware of the frequent heated debates about where exactly to draw the boundaries of this community. For instance, I once saw a post on Facebook proposing the acronym SAGA (Sexuality And Gender Acceptance) as an alternative term to LGBTQ+ that could include everyone without making the acronym cumbersomely long, and one of the comments was arguing against this idea, pointing out that such imprecise wording could allow kinky straight people to elbow their way into the community. Or on a more serious note, back in June when Pride was going on, I saw rather a lot of posts on Tumblr arguing about whether asexuals should be included in Pride, with several people arguing that asexuals hadn’t experienced the same oppression that gay, lesbian, bi, and trans people had, and so Pride wasn’t for them.

 

When I come across arguments like this, all I can think is: this is clearly a situation in which recall matters a whole lot more than precision. Remember how I said that whether it’s better to optimize for recall or precision depends on your particular goal? Well, what is the goal of building an LGBTQ+ community? There are obviously many goals, but as far as I’m concerned, the primary goal is to provide a space for people who are marginalized by the heteronormativity and cissexism of mainstream culture, where they can be safe and free to live their authentic lives, and where they find the support and solidarity they need in order to overcome the obstacles that the rest of society has placed in their way.

Personally, I care much more about making sure that everyone who needs such a space has access to it than I do about keeping out people who don’t meet some set of entry requirements. Sure, there might be a few straight kinksters who will see inclusive language around sexuality and assume it’s talking about them, but there are also countless queer kids who are just figuring out their sexuality or gender and aren’t sure what label – if any – to claim. Don’t we want them to know they have a place in our community, even if they don’t have the precise words for what they are yet?

For every asexual person who doesn’t face much discrimination for their orientation, there’s another one who’s facing ostracism from their family for rejecting their “sacred duty” of getting married and having children. Don’t they need our love and support too? By focusing on precision instead of recall – that is, focusing on who we need to exclude instead of who we need to include – we run the risk of pushing away some of the people who our community could do the most good for.

In other words, when the machine revolution comes and our new robot overlords implement Fully Automated Luxury Gay Space Communism, I know what sort of information retrieval algorithms I want them to be running.

To read more of Annie’s content check out her blog at http://www.escape-velocities.com/ or the guest posts page for a list of posts on word-for-sense made by people that aren’t Talia.

The Ghost in the Machine in the Chinese Room

[This is a guest post by Talia’s girlfriend Annie, who is maintaining this blog while Talia is away at Middlbury Language Schools. Also, I’m sorry this post is a couple days late; I’ve been really busy this week.]

The “Chinese room” is a famous thought experiment in philosophy of mind that argues that, no matter how well the output of a computer program imitates the output of human thought processes, a computer can never attain true consciousness or understanding. The argument was first articulated by philosopher John Searle in his 1980 paper “Minds, Brains, and Programs,” and runs as follows:

Suppose that I'm locked in a room and given a large batch of Chinese,writing. Suppose furthermore (as is indeed the case) that I know no Chinese, either written or spoken, and that I'm not even confident that I could recognize Chinese writing as Chinese writing distinct from, say, Japanese writing or meaningless squiggles. To me, Chinese writing is just so many meaningless squiggles. Now suppose further that after this first batch of Chinese writing I am given a second batch of Chinese script together with a set of rules for correlating the second batch with the first batch. The rules are in English, and I understand these rules as well as any other native speaker of English. They enable me to correlate one set of formal symbols with another set of formal symbols, and all that "formal" means here is that I can identify the symbols entirely by their shapes. Now suppose also that I am given a third batch of Chinese symbols together with some instructions, again in English, that enable me to correlate elements of this third batch with the first two batches, and these rules instruct me how to give back certain Chinese symbols with certain sorts of shapes in response to certain sorts of shapes given me in the third batch. Unknown to me, the people who are giving me all of these symbols call the first batch a "script," they call the second batch a "story," and they call the third batch "questions." Furthermore, they call the symbols I give them back in response to the third batch "answers to the questions," and the set of rules in English that they gave me, they call the 'program." Now just to complicate the story a little, imagine that these people also give me stories in English, which I understand, and they then ask me questions in English about these stories, and I give them back answers in English. Suppose also that after a while, I get so good at following the instructions for manipulating the Chinese symbols and the programmers get so good at writing the programs that from the external point of view – that is, from the point of view of somebody outside the room in which I am locked – my answers to the questions are absolutely indistinguishable from those of native Chinese speakers. Nobody just looking at my answers can tell that I don't speak a word of Chinese. Let us also suppose that my answers to the English questions are, as they no doubt would be, indistinguishable from those of other native English speakers, for the simple reason that I am a native English speaker. From the external point of view-from the point of view of someone reading my "answers" – answers to the Chinese questions and English questions are equally good. But in the Chinese case, unlike the English case, I produce the answers by manipulating uninterpreted formal symbols. As far as the Chinese is concerned, I simply behave like a computer; I perform computational operations on formally specified elements. For the purposes of the Chinese, I am simply an instantiation of the computer program.

Searle argues that, just as his ability to manipulate symbols to produce what looks like fluent Chinese would not mean he understands Chinese, neither would a computer’s ability to do the same mean that it understands Chinese (or whichever language it is imitating).

Now, I’m not going to mince words here: I think this argument is completely wrong. But it’s wrong in a way that’s worth considering at length, because I think it illuminates a common error in how many people think about computers – and, for that matter, about minds.

 

In identifying himself with the computer in this thought experiment, Searle is implicitly treating the computer, and thus the “mind” of a hypothetical artificial intelligence, as isolable from the programs that are given to it. In Searle’s formulation, programming a computer is analogous to giving a set of instructions to a human who then blindly carries them out. The human’s mind processes the instructions and moves the body so as to carry them out, but doesn’t gain deeper understanding from them.

But I think Searle is drawing his conceptual boundaries in the wrong place. By placing a human in the Chinese room as the agent carrying out the instructions, Searle has biased himself and his audience in favor of his interpretation. We all know that humans have conscious, thinking minds, so we naturally assume that the human is the only thing in the Chinese room that could be doing any thinking. But is this necessarily the case? I would argue that no, it’s not. Searle’s main error, in my view, is his identification of the human in the Chinese room with the computer. The human is actually carrying out a role more akin to that of a processor – the part of the computer’s physical hardware that translates the information in a program into actions. Treating the processor as if it were the entire computer completely overlooks the role played by the information in the programs themselves. Treating the processor as the location of an artificial intelligence’s “mind”, with no reference to the information being processed, is like looking for the human mind in the physical architecture of individual neurons without paying any attention to the electrochemical state of those neurons and the information encoded by that state. In the analogy between mechanical minds and human minds, programming a computer isn’t like giving a human a list of instructions – it’s more like giving them a psychiatric drug that directly modifies the functioning of their brain.

In the Chinese room, the human is embedded in a larger system that includes the rule books the human is using to process the Chinese writing. The human may not understand Chinese, but I would argue that this larger system does. If this system contains all the information necessary to recognize a semantically meaningful input and produce an equally meaningful output in response, and to do so with all the robustness and fluidity of a native human speaker of the language, I would be entirely comfortable saying that this system understands the language. If we are assuming that anything capable of understanding a language must qualify as a mind, then the Chinese room represents one mind embedded inside another one.

If this argument is hard for you to intuitively grasp, it might help to think about what the physical architecture of the Chinese room would have to look like. At my current job, I work on programs that process language, and the code for these programs is really long. I only work on a small portion of it, but I’d guess that if the whole thing were printed out, it could fill up a few books. And what these programs are capable of is not even close to what the hypothetical program in Searle’s thought experiment would have to be capable of. The programs I work on take a piece of text as input and identify key words and phrases that relate to a particular domain of interest. Most of them stop there, but the most loquacious among them will spit out one of a few pre-written phrases to prompt the user for more input. That’s a far cry from producing fluent speech that’s indistinguishable from that of a native speaker. A program that could do that would have to be vastly longer and more complicated than any of the programs I’m familiar with.* And in the Chinese Room experiment, we’re not even talking about a digital representation of these programs, but an analog one, written out in English on physical sheets of paper. Theoretical physicist Scott Aaronson, in his book Quantum Computing Since Democritus, describes what this might look like:

The third thing that annoys me about the Chinese Room argument is the way it gets so much mileage from a possibly misleading choice of imagery, or, one might say, by trying to sidestep the entire issue of computational complexity purely through clever framing. We’re invited to imagine someone pushing around slips of paper with zero understanding or insight, much like the doofus freshmen who write (a + b)^2 = a^2 + b^2 on their math tests. But how many slips of paper are we talking about? How big would the rule book have to be, and how quickly would you have to consult it, to carry out an intelligent Chinese conversation in anything resembling real time? If each page of the rule book corresponded to one neuron of a native speaker’s brain, then probably we’d be talking about a “rule book” at least the size of the Earth, its pages searchable by a swarm of robots traveling at close to the speed of light. When you put it that way, maybe it’s not so hard to imagine this enormous Chinese-speaking entity that we’ve brought into being might have something we’d be prepared to call understanding or insight.

Much more complicated than a guy in a room shuffling papers around, no?

To read more of Annie’s content check out her blog at http://www.escape-velocities.com/ or the guest posts page for a list of posts on word-for-sense made by people that aren’t Talia.

Citations:

Searle, John. “Minds, Brains, and Programs.” The Behavioral and Brain Sciences, Vol. 3, Cambridge University Press, 1980.  

Aaronson, Scott. Quantum Computing since Democritus.  Cambridge University Press, 2013.

 

*Chatbots that can produce fairly naturalistic output do exist (eg. Siri or Cortana), and they are indeed more complex than the programs I work on, but even they haven’t achieved the level of fluency described in Searle’s experiment. The most advanced one I’m aware of is Microsoft’s Tay, and at its most fluent it produced a fairly convincing impression of an internet troll. Whether this can be considered human-level linguistic proficiency is, shall we say, open to interpretation.

Why Names Have Power

[This is a guest post by Talia’s girlfriend Annie, who is managing this blog while Talia is away at Middlebury Language Schools.]

In his memoir, physicist Richard Feynman shares an anecdote about going birdwatching with his father as a child:

One kid says to me, “See that bird? What kind of bird is that?”

I said, “I haven’t the slightest idea what kind of a bird it is.”

He says, “It’s a brown-throated thrush. Your father doesn’t teach you anything!”

But it was the opposite. He had already taught me: “See that bird?” he says. “It’s a Spencer’s warbler.” (I knew he didn’t know the real name.)... You can know the name of that bird in all the languages of the world, but when you’re finished, you’ll know absolutely nothing about the bird. You’ll only know about humans in different places, and what they call the bird. So let’s look at the bird and see what it’s doing – that’s what counts.” (I learned very early the difference between knowing the name of something and knowing something.)

I’ve always loved this passage. A name feels like knowledge, and it’s easy to fool ourselves into thinking we understand something just because we know what it’s called. But a name alone doesn’t give us any information that we can do anything with. For instance, when I was a little kid, I loved drawing robots, and I always made sure to show that my robots had an anode and a cathode. Six-year-old Annie fancied herself quite scientifically knowledgeable for knowing what an anode and a cathode were. But I don’t think she could have told you what anodes and cathodes actually do, or why a robot would need them.

There is, however, a flip side to this. As any philosopher worth their salt will tell us, the map is not the territory. But you can’t fold the territory up and put in your pocket. The external world doesn’t come to us neatly packaged for human consumption; it comes in the form of a panoply of complicated information from our senses with no clear markers of what is significant. To give something a name is to draw a line around a little bit of this information soup and say “Look, this is important! Pay attention to this!” Once we’ve picked something out like this, we can distinguish its unique properties and communicate them to other people. It’s gone from being part of the background noise of reality to being something we can consciously interact with, and potentially make use of. Naming our world gives us power over our world.

This is a concept that linguists run into all the time when we study languages other than our own. Different people and cultures treat different facets of the world as significant, so different languages end up drawing these lines differently. In his book The Last Speakers, linguist K. David Harrison writes:

I began to think of language as existing not only in the head, or perhaps not entirely in the heads of speakers, but in local landscapes, objects, and lifeways. Languages animate objects by giving them names, making them noticeable when we might not otherwise be aware of them. Tuvan has a word iy (pronounced like the letter e), which indicates the short side of a hill. I had never noticed that hills had a short side. But once I learned the word, I began to study the contours of hills, trying to identify the iy. It turns out that hills are asymmetrical, never perfectly conical, and indeed one of their sides tends to be steeper and shorter than the others. If you are riding a horse, carrying firewood, or herding goats on foot, this is a highly salient concept. You never want to mount a hill from the iy side, as it takes more energy to ascend, and an iy descent is more treacherous, as well. Once you know about the iy, you see it in every hill and identify it automatically, directing your horse, sheep, or footsteps accordingly. This is a perfect example of how language adapts to local environment, by packaging knowledge into ecologically relevant bits. Once you know that there is an iy, you don’t really have to be told to notice it or to avoid it. You just do. The language has taught you useful information in a covert fashion, without explicit instruction.

The Tuvan language has taken this environmental knowledge – which wouldn’t necessarily be obvious on its own – and encapsulated it into an easily accessible form.

Or, take the debate over indigenous languages of the Arctic and their words for snow. (Please!) It’s a common misconception that these languages have an unusually large number of words for snow and ice – depending on how one counts it, they don’t necessarily have many more than English. But overzealous corrections of this misconception can overlook the way these languages’ words for snow are in fact noteworthy. Harrison gives the following examples of words for ice in the Yupik language, quoted from the book Watching Ice and Weather Our Way, a compendium of traditional Yupik environmental knowledge:

Qenu: Newly forming slush ice. It forms when it first gets cold. Pequ: Ice that was bubbled up by pressure ridging. [The] bulb cracks and falls down, and when it breaks, the water shows up. It is then covered by new ice or snow and it is very dangerous to walk on... Nutemaq: Old ice floes that are thick and appear to have had a snow bank on them for a long period of time. Good to work on. Nuyileq: Crushed ice beginning to spread out; dangerous to walk on. The ice is dissolving, but still has not dispersed in water, although it is vulnerable for one to fall through and sink. Sometimes seals can even surface on this ice because the water is starting to appear

To a linguist, what’s interesting about these words isn’t their number per se, but the fact that they encode knowledge in a way that’s particularly useful for surviving in the Arctic. Pequ, nutemaq, and nuyileq, are all “ice,” but only nutemaq is safe to walk on. An English speaker would need to give a detailed description to convey what a Yupik speaker can express in a single word.

Finally, let’s return to Feynman and the birds:

[My father] said, “For example, look: the bird pecks at its feathers all the time. See it walking around, pecking at its feathers?”

“Yeah.”

He says, “Why do you think birds peck at their feathers?”

I said, “Well, maybe they mess up their feathers when they fly, so they’re pecking them in order to straighten them out.”

“All right,” he says. “If that were the case, then they would peck a lot just after they’ve been flying. Then, after they’ve been on the ground a while, they wouldn’t peck so much any more – you know what I mean?”

“Yeah.”

He says, “Let’s look and see if they peck more just after they land.”

It wasn’t hard to tell: there was not much difference between the birds that had been walking around a bit and those that had just landed. So I said, “I give up. Why does a bird peck at its feathers?”

“Because there are lice bothering it,” he says. “The lice eat flakes of protein that come off its feathers.”

He continued, “Each louse has some waxy stuff on its legs, and little mites eat that. The mites don’t digest it perfectly, so they emit from their rear ends a sugar-like material, in which bacteria grow.”

Finally he says, “So you see, everywhere there’s a source of food, there’s some form of life that finds it.”

So here’s an exercise for you to try: next time you go outside, look around and take in the sea of sensory data you’re swimming in. Find some little piece of it that interests you – a bird, maybe, or a plant, or an insect, or anything that catches your eye. Do what Feynman and his father did, and observe it in as much detail as you can. Notice what it does, and try to figure out why. But then go one step further and give it a name. You don’t have to know the “real” word for it; just call it something that will stick in your memory. The next time you see it, your new word will be waiting in your mind, and all your observations will be there with it. By naming that part of reality, you’ll have made it a part of you.

Citations:
Feynman, Richard P. Classic Feynman. W.W. Norton & Company, 2006, New York, NY.
Harrison, K. David. The Last Speakers. National Geographic Society, 2010, Washington, D.C.