Machine learning and AI algorithms are increasingly trusted to help with or to make extremely important decisions, and so we'd like to make sure that, as much as possible, that they're free of undesirable forms of bias, such as gender bias, ethnicity bias, and so on. What I want to do in this video is show you some of the ideas for diminishing or eliminating these forms of bias in word embeddings. When I use the term bias in this video, I don't mean the bias-variance sense of bias. Instead, I mean gender, ethnicity, sexual orientation bias. There's a different sense of bias than is typically used in the technical discussion on machine learning. But to motivate the problem, we'll talk about how word embeddings can learn analogies, like man is the woman and king is the queen. But what if you ask it, man is the computer programmer as woman is the what? And so the authors of this paper, Toga Bulagbasi, Kai Weichang, James Zhou, Venkatesh Saligrama, and Adam Kalai, found a somewhat horrifying result where a learned word embedding might output man is the computer programmer as woman is the homemaker. And that just seems wrong, and it enforces a very unhealthy gender stereotype. And it'd be much more preferable to have the algorithm output man is the computer programmer as woman is the computer programmer. And they found also father is the doctor as mother is the what? And the really unfortunate result is that some learned word embeddings will output mother is the nurse. So word embeddings can reflect the gender, ethnicity, age, sexual orientation, and other biases of the text used to train the model. One that I'm especially passionate about is bias relating to social economic status. I think that every person, whether you come from a wealthy family or a low income family or anywhere in between, I think everyone should have great opportunities. And because machine learning algorithms are being used to make very important decisions, they're influencing everything ranging from college admissions to the way people find jobs, to loan applications, whether your application for a loan gets approved, to in the criminal justice system, even sentencing guidelines. Learning algorithms are making very important decisions. And so I think it's important that we try to change learning algorithms to diminish as much as is possible or ideally eliminate these types of undesirable biases. Now, in the case of word embeddings, they can pick up the biases of the text used to train the model. And so the biases they pick up will tend to reflect the biases in text as is written by people. Over many decades, over many centuries, I think humanity has made progress in reducing these types of bias. And I think maybe fortunately for AI, I think we actually have better ideas for quickly reducing the bias in AI than for quickly reducing the bias in the human race. Although I think we're by no means done for AI as well. And there's still a lot of research and hard work to be done to reduce these types of biases in our learning algorithm. But what I want to do in this video is share with you one example of a set of ideas due to the paper referenced at the bottom by Balabasi and others on reducing the bias in word embeddings. So here's the idea. Let's say that we've already learned a word embedding. So the word babysitter is here. The word doctor is here. We have a grandmother here. And grandfather here. Maybe the word girl is embedded there. The word boy is embedded there. And maybe she is embedded here. And he is embedded there. So the first thing we're going to do is identify the direction corresponding to a particular bias we want to reduce or eliminate. And for illustration, I'm going to focus on gender bias. But these ideas are applicable to all of the other types of bias that I mentioned on the previous slide as well. So in this example, and so how do you identify the direction corresponding to the bias? For the case of gender, what you can do is take the embedding vector for he and subtract the embedding vector for she because that differs by gender. And take e male, subtract e female, and take a few of these and average them. And take a few of these differences and basically average them. And this will allow you to figure out in this case that what looks like this direction is the gender direction or the bias direction. Whereas this direction is unrelated to the particular bias we're trying to address. So this is the non-bias direction. And in this case, the bias direction, think of this as a 1D subspace, whereas the non-bias direction, this would be a 299 dimensional subspace. And I've simplified the description a little bit. In the original paper, the bias direction can be higher than one dimensional and rather than taking an average, as I'm describing it here, it's actually found using a more complicated algorithm called SVD, singular value decomposition, which is closely related to if you're familiar with principal components analysis and use this idea similar to the PC or the principal components analysis algorithm. After that, the next step is a neutralization step. So for every word that's not definitional, project it to get rid of bias. So there are some words that intrinsically capture gender. So words like grandmother, grandfather, girl, boy, she, he, you know, a gender is intrinsic in the definition, whereas there are other words like doctor and babysitter that we want to be gender neutral. And really, in the more general case, you might want words like doctor and babysitter to be ethnicity neutral or sexual orientation neutral and so on. But we'll just use gender as the illustrating example here. But so for every word that is not definitional, this basically means not words like grandmother and grandfather, which really have a very legitimate gender component because by definition, grandmothers are female and grandfathers are male. But so for words like doctor and babysitter, let's just project them onto this axis to reduce their component or to eliminate the component in the bias direction, to reduce their component in this horizontal direction. So that's the second neutralized step. And then the final step is called equalization, in which you might have pairs of words such as grandmother and grandfather or girl and boy, where you want the only difference in their embedding to be the gender. And so in this, and so why do you want that? Well, in this example, the distance or the similarity between babysitter and grandmother is actually smaller than the distance between babysitter and grandfather. And so this maybe reinforces an unhealthy or maybe undesirable bias that grandmothers end up babysitting more than grandfathers. So in the final equalization step, what we would like to do is to make sure that words like grandmother and grandfather are both exactly the same similarity or exactly the same distance from words that should be gender neutral, such as babysitter or such as doctor. So there are a few linear algebra steps for that. But what it will basically do is move grandmother and grandfather to a pair of points that are equally distant from this axis in the middle. And so the effect of that is that now the distance between babysitter compared to these two words will be exactly the same. And so in general, there are many pairs of words like this, grandmother, grandfather, boy, girl, sorority, fraternity, girlhood, boyhood, sister, brother, niece, nephew, daughter, son that you might want to carry out through this equalization step. So the final detail is how do you decide what word to neutralize? So for example, the word doctor seems like a word you should neutralize that make it non-gender specific or non-ethnicity specific. Whereas the words grandmother and grandmother should not be made non-gender specific. And there are also words like beard that is just a statistical fact that men are much more likely to have beards than women. So maybe beard should be closer to male than female. And so what the authors did is train a classifier to try to figure out what words are definitional, what words should be gender specific and what words should not be. And it turns out that most words in the English language are not definitional, meaning that gender is not part of the definition. And there's such a relatively small subset of words like this, grandmother, grandfather, girl, boy, sorority, fraternity, and so on, that should not be neutralized. And so a linear classifier can tell you what words to pass through the neutralization step to project out this bias direction, to project it onto this essentially 299 dimensional subspace. And then finally, the number of pairs you want to equalize, that's actually also relatively small and is at least for the gender example, it is quite feasible to hand pick most of the pairs you want to equalize. So the full algorithm is a bit more complicated than I presented here. You can take a look at the paper for the full details and you also get to play a few of these ideas in the program exercises as well. So to summarize, I think that reducing or eliminating bias of our learning algorithms is a very important problem because these algorithms are being asked to help with or to make more and more important decisions in society. In this video, I shared just one set of ideas for how to go about trying to address this problem, but this is still very much an ongoing area of active research by many researchers. So that's it for this week's videos. Best of luck with this week's program exercises and I look forward to seeing you next week.