The GenAI Revolution, Ep 1: What is GenAI and Why is it Biased?

Graphic for HU2U - The GenAI Revolution, Ep 1: What is GenAI and Why is it Biased?

In This Episode

You've probably noticed how discussions about AI, particularly Generative AI, have been everywhere since the start of 2023. It's called a disruptive technology since it's rapidly and thoroughly changing how humans operate in every digital space, which of course has implications beyond just the digital world. 

This 1st episode of our 3 part mini-series, "The GenAI Revolution: Navigating the Potential & Peril for Howard University & Our Communities," hosted by Dr. Kweli Zukeri delves into the rapid rise and pervasive influence of Generative AI. Perspectives on AI's influence vary: some see it as a boon to creativity and productivity, while others fear job displacement and intellectual exploitation. 

Join us to hear from Howard University, HBCU, and tech leaders in discussion about how AI works and why it’s biased, as well as the root causes of these biases, which stem from the data and algorithms that power it. We dig into how AI biases can shape our world and what we can do to challenge and change them.

From HU2U is a production of Howard University and is produced by University FM.

 

 

Guest Host: Dr. Kweli Zukeri, Alumni; AVP of Web Innovation & Strategy, Howard University

Featured Interviews:

  • Sam Altman, Founder, OpenAI
  • Anna Makanju, Vice President for Global Impact, OpenAI
  • Coretta Martin, Alumna; Founder of IEP&Me
  • Antonio McMichael, Alumni; Principal Software Engineer, Microsoft
  • Dr. Safiya Umoja Noble, Author of Algorithms of Oppression; David O. Sears Presidential Endowed Chair of Social Sciences at UCLA; Professor of Gender Studies, Information Studies, African American Studies
  • Dr. Amy Quarkume, Associate Professor of Africana Studies; Director of Graduate Studies, Center for Applied Data Science and Analytics 
  • Simone Tyler, Alumna; SVP of Revenue & Blavity, Afrotech 
  • Dr. Gloria Washington, Assoc. Professor of Computer Science; P.I. of the Elevate Black Voices Project
  • Gabriella Waters, Director of Operations, Center for Equitable AI & Machine Learning, Morgan State University

Listen on all major podcast platforms

Episode Transcript

Publishing Date: September 10, 2024

[00:00:00] Intro: You've probably noticed how discussions about AI, and in particular generative AI, are all over the place since the start of 2023. Gen AI is a disruptive technology since it's rapidly and thoroughly changing how humans operate in every digital space, which is deeply consequential for all of us. It's making an impact in every professional field, both organically and from top-down system integrations. For students and educators, it's just begun to play a major role in learning and teaching.

Depending on your perspective and understanding, you may see this as a positive phenomenon, the latest advancement in technological innovation that is exponentially boosting humanity's creative and productive potential, and therefore you are wholeheartedly embracing it. Or, you may see it as an impending doom that's poised to take your job, exploit your intellect, erode your critical thinking ability, and lock you into biased treatment, and therefore be fighting its adoption, tooth and nail. Or, you may be somewhere in between. You may be following the latest developments every day, or maybe you're somewhat aloof to its increasing prevalence.

Whatever the case, you won't be able to ignore it for much longer, even if you try. Let's dig into it.

Welcome to HU2U, the podcast where we bring today's important topics and stories from Howard University to you. I'm Dr. Kweli Zukeri, a Howard alum, currently serving as the assistant vice president of web strategy and innovation at Howard, and I'll be your host for this three-part HU2U miniseries that will dive into how generative AI is affecting the globe, with a focus on our university community.

The purpose of the miniseries is to help our community understand this revolution and for our members to become more able to position themselves to benefit from it, as well as hopefully inspire us to further shape it to be less harmful to humanity. There is so much to explore within this topic, so we'll just be scratching the surface. We'll examine the benefits of utilizing Gen AI to support work and business efforts, as some Howard alum entrepreneurs are doing, as well as discuss some of the major AI issues you've heard about, like, why it contains bias, how that impacts us, and some potential solutions university researchers are working on right now.

To tell this story, I draw from my interviews that took place during the last several months with faculty, alums, students, and researchers from Howard and other HBCUs, as well as some of the tech industry pioneers and leaders that are directly shaping this revolution.

Since we'll be spending some time together, I'll tell you a little bit about me. I would describe myself professionally as an enterprise and UX strategist, technologist, and psychologist. I know, it's a lot of ists. More importantly, what I do professionally and otherwise is driven by deep-seated inclinations to uphold justice for people of African descent and all humans. My internal compass is the meaning of my name, truth, recognizing that it's multi-layered, multi-dimensional, often ephemeral, and it's always a necessary step on a liberatory path.

I'm an alum of UNC at Chapel Hill, as well as Howard's Department of Psychology. Now, why is this subject so important to me? We are in what has been labeled the fourth industrial revolution, which is marked by the onset of web 3.0, the metaverse, AI, and super and quantum computing. With all seismic technological and societal shifts, there's great opportunity for some and increased exploitation of those living in the margins already. While digital technology has the ability to level the playing field in many ways, it's still largely dominated by behemoth corporations, and therefore it exacerbates existing social inequities.

I want Black and brown people who are largely locked out of capitalizing on the potential of technological advancements, to both be able to take advantage of this moment and also help shape what happens to produce more equitable and sustainable results. To see to it that new technologies are used truly for the globe's benefit and not just to serve the interests of the wealthiest 1% and the people of wealthy nations exclusively. Also, sort of a nerd, so this is really just me nerding out with y'all. So, thanks, and I hope you enjoy this mini series.

In this first episode, we'll establish a basic understanding of artificial intelligence, or AI, as well as what generative AI, or Gen AI, as I'll refer to it, is. We'll explore some of the potential benefits and touch on a few of the larger issues that it has brought about, several of which we'll expand upon during the next two episodes.

Then, we'll spend some time exploring bias within Gen AI and Gen AI development, as well as the types of efforts underway to mitigate bias and make Gen AI more equitable.

So, what really is artificial intelligence? I asked Gabriella Waters, director of operations for Morgan State University Center for Equitable AI, to give a high level overview.

[00:05:55] Gabriella: I would try to help people contextualize the fact that the AI emerging would have kind of happened maybe 30 years ago. In terms of how integrated it is into all of our lives and systems and all of that stuff, we've been using it. AI was created in the 1950s. So, we classify AI into three categories — ANI, AGI, and ASI.

ANI is artificial narrow intelligence. That's what we have right now. AGI is Artificial General Intelligence. The difference is Artificial Narrow Intelligence is where the system is really good at a particular task. So, it's really, really good at generating words. It's really, really good at playing chess. It's really, really good at some one thing.

Artificial General Intelligence is where the model is as good as a human at more than one task. We're not in artificial general intelligence. There's a lot of marketing right now that are saying we're like minutes away from AGI. And I keep telling people we're as close to AGI as we are to Andromeda.

The two galaxies are getting close to each other, but they're still millions of light years apart. And so, you can see Andromeda if you look through a telescope, but it's not close.

And then you have ASI, which is artificial super intelligence. And that's what the Terminator is and the Terminator movies and things like that. So, when people joke with me and they're like, “Oh, you're building the Terminator,” and I'm like, “That's never going to happen.”

[00:07:33] Kweli: As Gabriella noted, the type of AI that truly exists now is artificial narrow intelligence, or ANI, which can perform very specific functions. ANI is built using a method called machine learning, which is when software engineers feed a massive amount of data into databases to train a model or AI system to appear to recognize real life things, which allows the model to perform a narrow or focused task based on their understanding.

There is a subset of machine learning called deep learning, which has become very popular because it allows a model to learn on its own. We don't have time in this series to discuss this in depth, but it's a big part of what's made ANI more powerful in recent years.

So, getting back to machine learning, I asked Howard alum Antonio McMichael, who is a Principal Software Engineer at Microsoft, working on some state-of-the-art AI projects there, to explain what it is and how it relates to AI.

[00:08:27] Antonio: Machine learning uses something called a model, which has all those different examples of labeled data, where you're saying these are the properties, and based on these properties, this is what it is. So, if you have a bunch of those type of examples, you generate a model so that you can use that model to mold parameters into something that fits into that model. And then you get an output based on where the mapping is or the relevance score.

I'm trying to compare what I have here to a result set and see what's most similar to what you're looking at. And so, if it makes sense based on your inputs and it's similar, then I'm going to say, okay, this is a dog. So, if you say that you're putting in the properties of a dog, but there's a cat there, I should be able to take those properties and say, okay, based on these parameters, this is a dog or it's a cat. There's a lot of derivatives of AI based on that.

[00:09:18] Kweli: One of the major derivatives are prediction models, whose main purpose is to predict or classify. Prediction-model-powered functionality is super intertwined into our day to day lives already. It powers Netflix and Hulu's personalized show and movie suggestions and recommended products in your Amazon shopping cart. It also underlies technologies whose use has further-reaching consequences, such as facial recognition and surveillance systems.

The other now very prevalent type of ANI is generative AI or Gen AI, which is of course the focus of this miniseries. Gabriella helped put this into context again.

[00:09:56] Gabriella: Embedding of the generative AI is the new layer. We've already been taking advantage of AI systems all over the place, Alexa, GTS, predictive text. So, when you're typing your text message and it tries to complete the word for you or show you what words should come next, that's all AI. That's all powered by AI.

Lots of AI systems have been working in our lives in the background, but it's this generative AI that's new and has become quickly infused into so many systems.

[00:10:30] Kweli: Currently, an end user like you and I can access Gen AI. Base tools most often by interacting with a chatbot interface in which we plug in a prompt, which could be anything from a general question about a subject matter you want to learn more about, to a request to analyze data that you provide. The tool provides its response in natural language, and you can continue using prompts to adjust this output or generate new output.

There are many tools like this available now, each of which can process one or multiple types of content, including text, audio, images, and video. Some tools are more general and can serve a broad range of functions, such as ChatGPT, from OpenAI or Google's Gemini, while others are crafted to serve very specific needs, such as generating speech audio that sounds like you or someone else.

The sophistication of these tools is rapidly increasing, as more and more resources are being invested into them. This is also why so-called deep fakes, or audio and video content that copies real people, can now be easily produced. All you have to do is provide some videos or images and audio of a real person, and a tool can learn from that and generate new fake video or audio in the person's likeness.

And that is what I just did, actually. That whole summary of what Gen AI tools can do was something I wrote and plugged into a podcasting program called Descript that we’re using to produce this show. Then Descript used my AI voice to create the audio. Could you tell anything was different?

If you're like me, you may listen to podcasts while doing other things, like driving or cooking dinner or working out, so maybe you weren't totally focused enough to be able to have caught it. If you could tell, then you must have been paying attention closely, and I appreciate that.

Okay, getting back to our discussion. Antonio elaborates on how Gen AI works.

[00:12:25] Antonio: With generative AI, it's like you're providing entirely new content now. So, you take in the same parameters and you have such a massive data set that you can piece together from different answers and different documents within your data set to generate an entirely new category that may not even be in your, your model.

And that's where the concept of large language models come from. With large language models, you have lots and lots and lots of data. You have ridiculous amounts of text and examples, like whole dictionaries. And when you get to that scale and you break down exactly the difference into like granular parts so that things can be pieced together in different ways, then you get into what generative AI can really be, and that is taking some example of what you may have seen hundreds of thousands of samples and trying to fit what you are inputting and generate something novel or new that hasn't been generated before based off of the input.

[00:13:24] Kweli: He mentioned Large Language Models, or LLMs, which are basically what power the recognition and generative capabilities of Gen AI. They require a massive amount of computing power, energy consumption, and data to build and maintain.

Now, as you may have noticed, a lot of discussion about Gen AI refers to its underlying “data.” So, we really have to have a basic understanding of what data is. It's essentially any information that a model can take in. It can include everything from a PowerPoint presentation, to videos and photos on your smartphone, to media platform metrics, such as how many times you check your Instagram account, which for some, let's be honest, is way too often.

There's a broader ongoing discussion across the globe about the massive nonstop collection of data occurring all over the place because our personal information and tracking of our day to day behaviors, which is collected by all the apps and digital tools we've become accustomed to using, are part of that data. Data has become one of the most valuable commodities in the digital era. In fact, there's a whole big data industry and marketplace in which it's being bought and sold every day. This is a larger conversation for another time, so suffice it to say that being aware of data collection is very important so that you, as a consumer, can make informed decisions. Data is by no measure perfect, and in fact a lot of it reflects the worst of humankind. We'll get more into that later.

However, in theory, the more data, i.e. examples of human-generated communication, that an LLM is built upon, the better it can generate new content. But LLMs are not omniscient. Their responses are just inferences based on its understanding from the data. Gabriela cautions us not to misinterpret what Gen AI and LLMs are.

[00:15:22] Gabriella: Just being aware that that's what's kind of going on, being aware that these kinds of tools, regenerative AI, what they're really good for and what their limitations are. They're really, really good at the fun stuff, the imagine a world kind of stuff, and then it will write this stuff out for you. It's not going to be really, really good at giving you direct facts.

An LLM is not a retrieval system. An LLM is designed to just give you words. And sometimes they're going to be great in sequence, and sometimes they're going to look great in sequence, but they're going to be utter nonsense.

Generative AI doesn't speak a language. It doesn't know anything. It's completely oriented in the stochastic model that it is programmed using, so it's all probability. So, because when you are training it, you are giving it the sandbox that it plays in, which is you sit here as an information dispersal agent.

But if I were to put this in layman's terms, generative AI are just imagination boxes. If you said what language is your imagination speaking, everyone's going to say a different language. It's literally looking at all the words there are and trying to figure out what the next thing could be. In the case of images, it's looking at all the things that would make up an image and making a judgment about what's the most likely color, shape, form that should be next to this one, in order to create the picture.

[00:17:04] Kweli: So, why is Gen AI such a big deal right now? In societies that highly value productivity, such as ours, anything that can support quality work in less time, meaning it improves efficiency, is going to be highly valued. One big question across the globe is, will Gen AI eventually replace so many functions that it leads to widespread replacement of jobs? Or, will it be more so leveraged by the workforce so that workers can become more productive? In an ideal world, the power of these tools requires us to do less work, and therefore, we can work less.

However, in the United States, the default value system is based on the bottom line. Let us never forget history's darkest lessons related to technological advancement in this country. For example, consider the initial intent and subsequent result of the cotton gin being invented. I spoke with Dr. Amy Quarkume, who among other things is a Howard Associate Professor of Africana Studies, and the graduate director at the Center for Applied Data Science and Analytics. I asked her why Gen AI has made such a splash recently.

[00:18:14] Amy: It's a big deal for someone who doesn't know a little bit about it, because it's a sense of time is, time is a great gift, and it's a great gift you cannot get back, right? So, the idea that I can do something and not spend so much time gives me more time to do something else. So, as we talk about our communities, many times, I would be great if I rested. I would be great if I had an assistant. I would be great if I just had time to myself. So, AI creates that. And if we don't equally have access to our time, then we see the sense of this gap where there's some people who are doing so much with less time and benefiting from it. And there are those of us who are doing the same thing and using more time, let's say physically, there will be at this rate, if we don't catch up and close the gap in the access point to AI, we will see in our physical bodies.

I really think the, like, the life expectancy will shift a bit. Because people are still working so much with the people who are not working so much, and they're getting greater value.

[00:19:19] Kweli: So, how can Gen AI tools save you time? I'll give you a personal example, cause I get excited every time I find a new way. I'm working on a literature review right now. And for anyone that's done research for publishing, you know that creating a reference list with all of your sources can take several hours or more. And it's also really boring.

So, I tried plugging in all of my articles into ChatGPT and got it to create a reference list in APA style in about 15 minutes. Whew!

I asked some high-performing Howard alum who have a lot on their plate how they leverage AI in their day-to-day work and personal lives. Senior Vice President of Revenue at Afrotech at Blavity, Simone Tyler, expressed how useful it's become in her day-to-day life as an entrepreneur and C-suite executive.

[00:20:11] Simone: I use it every day, every single day. So, I think the easiest thing is to help formulate my ideas, right? So, you have an idea, you very quickly, you know, you write one or two sentences and it can spit out an entire creative brief for you, right? So, to me, that saves me two hours, if not three, just right there.

I'm getting married next week.

[00:20:33] Kweli: Congrats!

[00:20:34] Simone: Thank you. And I'm like, I got to write all these emails and get comms out to people. And I'm like, Or do I? I actually just need to put a few details in here and it writes my email for me. And then I also use it, like, use really fun tools for calendaring and scheduling and things like that make my life a lot easier and also reduces my need to have a lot of assistance.

[00:20:55] Kweli: I also spoke with tech entrepreneur Coretta Martin, and she described how it helps automate mundane tasks, which frees her time and energy to devote to more important aspects of her work and get some time back for her personal life.

[00:21:08] Coretta: I love data. I love math, but it's not necessarily the best use of my time, particularly, as, as a leader or decision-maker. And when you can use any of the tools that exist currently to just crunch large sets of data to upload Excel spreadsheets and just get the information you need, I can now spend more time making the decision, pondering the options, versus, you know, historically, data says most leaders make decisions in 10 minutes or less. We don't have to anymore. I have all the information I need now in 10 minutes or less. So, I can spend the next 50 minutes considering my options. And the quickness and efficiency I need doesn't have to be in making a fast decision. It can be the back end work.

So, I use it quite a bit. I love it. And if nothing else, the efficiency piece is what's pushing policies like four-day work week. And as someone who loves a work-life balance, I'm going to keep pushing the usage of these tools.

[00:22:03] Kweli: I like it, for sure.

[00:22:05] Coretta: And so that we can get to the point where an efficient life can also lead to self-care.

[00:22:09] Kweli: Now, I know some people have been reluctant to take advantage of Gen AI. While its potential benefits are very real, there is also a lot of manufactured hype around it right now, which has pressured everyone to get involved or appear to be behind the curve. Its development is moving at a dizzying pace, to say the least, despite still being so new. There's so many Gen AI tools being introduced and continuously updated. And just about every technology company is integrating it into their existing product lines and services. But maybe you're just not an early adopter and need some help learning how to use it. Or maybe, you're suspicious of the technology due to some of its problematic aspects, which we'll get into in a bit.

Personally, I think a healthy level of suspicion is always warranted from historically conscious Black folk when it comes to new technological developments. However, I would caution us not to let that suspicion stop us from seeking a deeper understanding and taking advantage of the power of these tools, as not doing so as a collective will eventually equate to falling further behind economically.

It’s important to remember that so many of humankind’s oldest and important technological innovations and scientific advancements were produced in Africa and by people of African descent. And so, really, we ain’t new to this.

Dr. Quarkume helps us put this into context.

[00:23:39] Amy: So, let us not be confused that the first people that started civilization are still walking, right? So, we created this. We were at the beginning of this conversation as far as technology, innovation. There's no reason why we should be scared. We shouldn't be scared at all. Dr. Clark talked about the last group that came to the conversation, not act like they created the conversation, and making us feel like we're not a part of the conversation, which is a big problem.

[00:24:03] Kweli: And you're listening to John Henry Clark, the historian.

[00:24:05] Amy: Yes, yes, yes. So, the idea that we should be scared, right? The fears in, we don't know enough. But the problem is that we have not been, we haven't had access. But we've been kept out of certain conversations.

But if we went in the wrong, we can talk. If we went in the wrong, we can build. If we went in the wrong, because we've been building. We've been talking language, science, technology, like we were at the beginning of that conversation, right? But now, we've been left out of it so much that it hurts and it's scary, but it's not scary at all.

[00:24:34] Kweli: Coretta also addressed the fact that fear and hesitation have accompanied technological advancement throughout history.

[00:24:40] Coretta: Innovation is a part of life and for people to really embrace artificial intelligence. I think that there's a lot of fear right now that it's a robot, someone's going to lose their job. And mind you, if you work in customer service or in the office space, there are some studies that say there might be some impact on jobs, but there's really going to be more of an impact on the increased amount of jobs and increased amount of opportunities.

So, Socrates was worried about writing. Other people were worried about radio. Some people were worried about the printing press and no one being able to memorize things anymore. In 2005, when I was at Howard, they were worried about email. AI is just the next thing. It's here. It's great. There's so many uses. Get into it, like be into it, and really figure out how you can use it in your life or how you could be a part of the movement of building it.

[00:25:32] Sam: I think critical thinking, creativity, the ability to figure out what other people want, the ability to have new ideas, in some sense, that'll be the most valuable skill of the future.

[00:25:43] Kweli: That's Sam Altman, the co-founder and CEO of OpenAI, which is the company that created ChatGPT and really ignited the current generative AI revolution. He articulated some of his vision about the technology's potential.

[00:25:57] Sam: If you think of a world where every one of us has a whole company worth of AI assistants that are doing tasks for us to help us express our vision and make things for other people and make these new things in the world, the most important thing then will be the quality of the ideas, the curation of the ideas, because AI can generate lots of great ideas but still need a human there to say this is the thing other people want.

And also, humans, I think, really care about the human behind something. So, when I read a book that I really love, the first thing I want to do is go read about that author. And if an AI wrote that book, I think I'll somehow connect to it much less. Same thing when I look at a great piece of art or if I am using some company's product, I want to know about the people that created that.

[00:26:41] Kweli: As I noted at the beginning of the episode, Gen AI is a disruptive technology, meaning it's drastically and rapidly changing how we operate, whether for better or for worse. As is the case with all major societal shifts, this carries both positive potential and peril that must be addressed. While it's out of our scope to discuss all the major issues at length within this mini series, we'll address just a few and then discuss bias in Gen AI in depth to close out.

Keep in mind that the adverse effects associated with all major societal changes hit already marginalized communities, such as Black folks, hardest. And the benefits on an aggregate level are harder for such communities to obtain. Fortunately, there are some policy and legal efforts on all levels underway, from local to international laws, attempting to prevent and reverse harm, albeit, they naturally come about slower than the technology itself.

Some of the general issues that I think I have to at least mention here are the following. Environmental sustainability. The globe is already experiencing an existential crisis with climate change, which is due to the energy consumption of human industry. One very significant part of that energy consumption is massive cloud computing infrastructure that supports our digital world. Gen AI, which is only growing, requires a humongous amount of computing power. As such, it's accelerating the already alarming rate of this energy consumption.

Human exploitation. The so-called developed world's over consumption of digital devices, like laptops, smartphones, and others, already underpins an insatiable devouring of natural resources used to create them. This is driving extreme environmental and human exploitation and destruction. Likely more than anywhere else, this is the case in the Democratic Republic of the Congo, of Central Africa, which contains over 70% of the world's cobalt, a mineral necessary in manufacturing digital device batteries. The rise of Gen AI will accelerate demand for these devices, as their capacity needs will eventually increase. And unless international policies change, this will further drive human exploitation.

Deep fakes. There are a bunch of programs out there that allow a user to easily create very realistic but fake videos of real people. The implications here are endless, as this could be used to impact elections, geopolitics, media, as well as ruin the reputation and likeness of individuals in the process. To me, one of the most concerning ways this is playing out is the rising prevalence of deep fake porn videos that high school boys create and distribute of their girl peers, and a surprisingly very soft response from many school administrations to this heinous act.

Lastly, copyrights and IP, intellectual property. As we discussed, LLM training depends on access to an enormous amount of data. The data consumption includes copyrighted materials, such as news articles and visual art, which are being ingested into these models without consent of, or compensation, to the creators, which is leading to a growing number of lawsuits, such as the New York Times suing of OpenAI, to name just one prominent case. There's a deep tradition of intellectual theft of people of African descent in the United States. So, I think this concern hits particularly deeply for our community's creatives.

So, we've established that Gen AI depends on swallowing as much data as possible to become more sufficient at emulating human content generation. Data comes from humans. And as I'm sure you're aware, people are biased. People in America are trained to be racist, and this bias and racism is reflected in the content that people have created over the past hundreds of years up to the present moment.

Sometimes this bias and racism is explicit, but more often than not, nowadays it's implicit and nefariously embedded into narratives that are woven into that content. Also, because Gen AI is a technology built by humans, the structure of the product has racism and bias built into it, even if unintentionally, as a result.

I spoke with Dr. Gloria Washington, Associate Professor in Howard's Department of Computer Science, who's leading some really cool AI projects that we'll get more into in our next episode. I asked her to break down why and how this happens.

[00:31:04] Gloria: Machine learning is built from data. So, what will sometimes end up happening, you have an overabundance of data that sometimes comes from one group.Sometimes this group can be biased data, but sometimes it can include a mix of all kinds of diverse kinds of data sets. Now, that's good. But when you have homogeneous data where you don't have different opinions, it's only the same sort of thing, you're going to get into a situation where the AI actually is biased.

So, the decisions that come from the underlying machine learning are biased because they're trained and they're tested on the same amount of data that is homogeneous. So, of course, it's not going to be able to recognize Black people or recognize the individual ways that Black people talk.

So, an example is, like, if you are only learning about what a little Black child looks like, and let's say you only get short people who are light-skinned and, like, a machine learning algorithm only sees pictures of short little light-skinned Black people. What it would say is, “I'm going to produce a model for you that is an image of a short little light-skinned person.” So, what would end up happening is that, if it goes into trying to recognize different kinds of Black children and you're getting a tall, 13-year-old who happens to be dark-skinned, it would not truly be able to recognize you. So, it sometimes can truly be flawed. 

[00:32:36] Kweli: How does racism and bias in AI show up and affect the experience for a user? Well, again, it's often a reflection of larger pervasive bias within societal belief systems. And any place that those beliefs are regurgitated helps subsequently reinforce it. It becomes cyclical. Let's run a test. What is the first image that comes to your mind when you think about this word, “scientist?”

What'd you think of? For many of you, I bet it was an old white man with long white hair and a white lab coat, because it's such a repeated and pervasive depiction that we encounter across media in cartoons and movies as we grow up. Now, what do you think I'd get if I asked ChatGPT to create an image of a scientist? Well, it gave me a young white man in a white lab coat. While this fits the stereotype, it also reflects the ugly truth that there have historically been a disproportionate amount of white men in that profession in the modern Western world, because of historic, as well as recent, systematic discrimination against women and people of African descent.

So, biased data leads to biased systems. But just as important to this equation are the algorithms that make the models run. I was thrilled to have the opportunity to speak with Dr. Safiya Noble before she participated in a fireside chat at Howard University. Dr. Noble is the Sears Presidential Endowed Chair of Social Sciences and Professor of Gender Studies, African American Studies, and Information Studies at UCLA, as well as the director of the UCLA Center on Race and Digital Justice. She's the author of the pioneering book, Algorithms of Oppression, and is one of the foremost authorities on that subject. She's also been a strong voice, along with several other Black women scholars and researchers, cautioning the public about the dangers of systemic digital oppression for years. And more recently, this, of course, has included AI.

Dr. Noble gave me the origin story of her pivotal work.

[00:34:42] Safiya: Just want to say it’s a huge honor to be here. It’s my first time in Howard. Hope it won’t be the last. I did this early study but has now come to be almost like common knowledge. I took Google and I took all of the census categories, the racial and ethnic and gender categories. And I just crossed them and I did more than 80 searches. What I found was that the kinds of results you got when you look on racial and ethnic identities match the racial hierarchy of the United States, the white supremacist hierarchy, where black girls and black women were stratified at the bottom with the worst representation and the worst results. For example, when you search on black girls, you’d get 80% pornography on the first page of results in Google. Now, you didn’t have to add the word “sex” and you didn’t have to add the word “porn.” Black girls and women were just synonymous with pornography.

And this is very similar maps onto this kind of racist history, sexist history, of the, kind of, commercialization, commodification of black women’s identity, of our bodies. It also maps onto white  supremacists’ notions that black women and girls are more sexual than other people. And of course we know that this is a racist stereotype invented at the end of the transatlantic slave trade.

That study really weigh in on describing how this large-scale internet companies are incredibly “bias” is an even probably a strong enough word, but they certainly hold these kinds of derogatory, discriminatory implications for us, and not just for black people but for many oppressed people.

And that really led to my writing of the book Algorithms of Oppression, where I had many, many, many cases to demonstrate what happened when people in Silicon Valley and Silicon corridors around the world make technologies that are harmful, just straight up harmful to people of color. And that’s really what I’ve spent the last, at least 15 years of my life, working on and trying to make common knowledge. Algorithms in AI are a part of our lexicon. I can say, over a decade ago, they were not. The average person didn’t know what these things were.

[00:37:06] Kweli: During the fireside chat, Howard's provost, Dr. Anthony Wutoh, asked her to break it down. And she discussed the relationship between data and algorithmic bias.

[00:37:17] Anthony: What is an algorithm? And what is the impact of algorithmic discrimination? 

[00:37:23] Safiya: So, an algorithm is really a set of instructions that you give to a computer for it to compute. And typically, in the most basic way of thinking about it, it would be instructions that, to program a computer to, let's say, work through a decision tree. If these conditions, then those conditions, right, are these types of outputs. It's often working across different kinds of data. An algorithm might be very, very sophisticated and be so complex that even the makers of the algorithm don't quite know how it works. So, in the case of search, the search algorithms for big tech companies are so complex, hundreds, if not thousands, of people have worked on them over time. It's not like an old school car where you can just, like, pull up the hood and, like, tinker with the carburetor. It's a system where you can't even figure out what piece of code is broken or why we're getting particular outputs.

[00:38:23] Kweli: That complexity that is added to over time and by many people reminds me of the human experience and how we psychologically develop over a lifetime. It may be hard to pinpoint all of the experiences that make up one's belief system, including bias and racist beliefs.

[00:38:39] Safiya: Part of the reason why algorithms, I think, can be discriminatory is because they're often trained on data sets that are also full of discrimination. If you have healthcare data and you have 100 years of healthcare data and you train your algorithm to predict who's most likely to, let's say, develop cancer, you're going to have a very accurate, probably, algorithmic output. You're going to have the output that's going to probably identify white men with money who've had the best health care.

[00:39:19] Anthony: Based on the data.

[00:39:20] Safiya: Because those are the people who are going to be in the data set, right? And all of the Black people who didn't make it into the data set, therefore, are not going to get predicted into your output as people who should be pre-screened.

These are the kinds of just very simple ways that algorithmic discrimination happens, which is, you have an algorithm that's looking for a set of scenarios or outcomes, and it is looking at data that is going to be fraught with all kinds of historical patterns.

So, typically, most of the kinds of algorithms and AI that we experience today are looking for patterns. But the patterns, again, are going to not always tell the full story. And if you're not looking at the impact of what those results are, it will be very difficult for you to even know that there's discrimination happening.

This is a complex question. And the reason why is because I think there are a lot of different points of view right now in the field about how to mitigate or solve these problems.

[00:40:29] Kweli: Provost Wutoh went on to ask her about how society should approach fixing the problem of bias and racism in digital systems. Dr. Noble reflected on the long struggle to push the industry that controls these technologies to even recognize that technical and AI systems contain bias, and the pushback that scholars and researchers, such as herself, received when making this case prior to very recently. She emphasized the importance of the public keeping pressure to hold them accountable.

[00:40:58] Safiya: Scholars, activists, journalists who are watching, who have been, kind of, in the forefront of naming the problems, which is deeply unsatisfactory. This was heretical at the time to make that claim 15 years ago. I would go into computer science circles and men would shout at me and would say, “Absolutely not. This is, this is impossible. Algorithms aren't political. They're not social. They're just math.”

The claims that algorithms or AI could be harmful, you know, that claim has been like pushing a boulder up a mountain. I mean, it's really been difficult. I would say the majority of people who've led that conversation in the world, in the English-speaking world, let's say, have been Black people, Black women, LGBTQ people, who have been asking different questions of these systems than the makers of the systems.

And, of course, I think of Black women in particular in the field that I'm in as like canaries in the coal mine. I mean, we just see things differently. We're asking different questions. I think after training computer scientists and people who work in tech and engineering for more than a decade that there's a high degree of ignorance and lack of knowledge about society. No one's checking on the work until we experience the negative effects of these different kinds of systems.

[00:42:27] Kweli: So, how is the industry addressing these issues? In a number of ways. I'm sure you've at least heard of OpenAI, a company that I've mentioned several times already in this episode, whose mission is actually to develop responsible artificial general intelligence for the world, which is way beyond just Gen AI, which is a form of artificial narrow intelligence.

However, along the way, OpenAI advanced Gen AI in a way that no one had yet been able to accomplish, and it took the world by storm. Being the foremost pioneer of the technology has come with the responsibility of addressing the controversy and problems that come with it.

One of their responses in late 2023 was to initiate an AI Ethics Council in partnership with Operation HOPE. Part of that initiative, so far, has been to engage with HBCUs in discussion about the problems and potential solutions. So, OpenAI co-founder and CEO, Sam Altman, and his team came to Howard earlier this year to participate in a fireside chat on campus with Howard's president, Dr. Ben Vinson III, to do just that.

I got to sit down and speak with Sam before the event, and I asked him how OpenAI is addressing the bias that has surfaced over time in their ChatGPT model.

[00:43:46] Sam: The first thing I would point out is just look at the improvement trajectory. You know, if you look at the models that had significant problems with embedded bias and discrimination from several years ago, or even like more than that, and if you look at how a model like GPT-4 performs on any bias test that you'd like to give to it, things have gotten much, much better. I won't say we don't have any work left to do. Of course, we always need to improve. But some of the techniques that we use, like reinforcement learning from human feedback, to address these issues have worked far better than I expected.

Now, technological solve is only part of it. There are then very complex questions about who gets to decide what the specification of behavior should be. But we have a technological approach that works really well. Of course, the societal component of the solution is often, in this case, is harder than just the technological approach.

[00:44:38] Kweli: During the event itself, President Vinson went on to ask him more about industry's role in improving the technology.

[00:44:44] Sam: We're an institution that believes we have a, a strong place in the future of AI. How can we ensure with, with so much happening so quickly, the ethical considerations are prioritized and what role should industry leaders such as yourself and some of your colleagues, what role should… in making sure that we have responsible practices?

The way that I like to talk about this is to say that artificial intelligence can be the greatest technological revolution, the greatest new tool, the greatest engine for economic growth the world has had. But it's a can, not a will. The technology, clearly, is amazing. But to get deployed well, it will take the partnership and integration with all of society and making sure that this is done in an equitable way that lifts everybody up.

And the technology actually lends itself, I think, quite well to that, but we can't do it on our own. ChatGPT is now a very widely used product. It's got to serve a lot of people, and it's got to do that in an inclusive way, a fair way, in a way that, sort of, brings the best of this technology to everyone.

The even harder question, which is, who decides what the behavior of these systems should be? What does that process look like? How do you make sure that marginalized voices that don't always get heard are heard loudly? For that, we need a lot of collaboration. We're launching some new stuff about democratic inputs. But we're very excited that the technology allows this, and now we get to face the harder societal problem of making sure it happens.

[00:46:11] Kweli: Sam spoke about reinforcement learning via human feedback, which is a process that occurs after a tool is launched, in which algorithmic and other structural adjustments are applied to prevent specific responses. While this does help mitigate bias, it's, sort of, like placing band aids to fix an underlying problem that's still present, when deeper fixers are really needed in the long run.

One of the things Sam noted is the need for a diverse set of young people to join the industry. As all too often, still today in 2024, the technical workforce is largely run by white men. Sam encouraged Howard students to consider helping to change that.

[00:46:49] Sam: I always love speaking at universities, but one of them is like, “Please, apply to OpenAI. You know, we'd love to recruit you.”

[00:46:54] Dr. Vinson: That's good. That's good.

[00:46:58] Kweli: Audience members posed some thoughtful questions for Sam that reflected our community's concerns, as it relates to larger societal issues around misinformation during the Q&A session. Here's one of them from Howard alum, Reverend Vince Van.

[00:47:12] Vince: So, 15 years from now, we're going to see young people trust AI more than they trust any other thing, right? And the tough part is a place like Florida, we know that certain books and even history from a Black lens is being removed. I use Chat as well, but there's moments where you ask a question about, you know, slavery or texts in the scripture and there's a reference to say, you know, make them think about lens of the time period. And it, kind of, softens the reality of what actually happened. If we're removing actual history from other places, how do we trust that your company is going to do the work to engage with those Historical Black Institutions to make sure 15, 20 years from now that accurate information is on OpenAI?

[00:47:53] Sam: I'd love to talk to you more about that. We have got to figure out a way that what gets encoded into these models, not just the knowledge but the culture and the context and how things get explained, brings this diverse perspective of the whole world into it. So, if there's areas we're missing the mark there now, we'd love to, like, talk about those.

But more than that, we'd love the input on the principles going forward and how to set up a reliable process to make this work. One of my, like, sayings to people is better every rep. I believe that contact with reality and putting this out and getting hundreds of millions of people to use it and tell us what's good and what's bad, what works and what doesn't, is the only way to make it better.

[00:48:32] Kweli: While not uncommon in big tech, this technological approach of getting a product out to the masses and then relying on feedback to iteratively improve it has resulted in repeated news headlines since 2023, as the public has uncovered instances of bias and racism appearing in Gen AI. In the short term, I believe it can lead to psychological and potentially other harm, as Dr. Noble alluded to earlier. So, with this approach, it’s imperative that companies and product owners of Gen AI-based tools maintain accessible communication channels and are truly responsive to feedback from the public.

I later had the privilege of speaking with Anna Makanju, who is OpenAI's Head of Global Operations. She also happens to be a woman of African descent, with a uniquely global background who is helping shape the globe's approach to engaging with Gen AI. In this role, and recognizing the vast implications of the emerging technology, she has led the company's proactive engagement approach with government and society, which has made us, sort of, an outlier when it comes to what concerns are prioritized during the early years of most significant technology companies.

[00:49:47] Anna: I just really wanted to be partnering from the outset on how this technology would be rolled out and incorporated. I wanted civil servants and people who work for public benefit and public service to really be at the forefront of this. It is easier said than done. I think, partly, because when you work in government, you have to deal with the crises that are incoming. I think it was very difficult for people to necessarily understand, we didn't really know exactly how it would be incorporated or what impact it would have.

[00:50:18] Kweli: Anna went on to talk about one of OpenAI's responses to the lack of diversity within the technology workforce, which many cite as one of the major reasons for the bias that's built into technology generally, is its internal residency program that hosts researchers and professionals of a plethora of disciplines.

I thought this was very intriguing, considering Dr. Noble's insightful critique that we heard earlier, which is that part of the underlying problem in the often siloed field of computer science is its lack of multidisciplinary considerations.

[00:50:50] Anna: It also will be really important, is to encourage people to be in these companies. This is the absolute best way that we have to steer this technology. Like, we have to be there. We have to be at the table. We have to be in the room. And I can't imagine anything more impactful than that.

To be fair, like every piece of it is important. Like, the advocacy, being in the regulatory spaces where you shape the guardrails, but also like being in these companies. So, we, obviously, like, we want to recruit. And we have a program, actually, that's helped bring a lot of people of color into open AI because it's for people who are not necessarily ML engineers, but have other expertise that's going to be relevant. It's called the Residency Program.

The way it works is you, kind of, get exposed to different teams. We teach you AI and you, kind of, figure out how have things that you've already learned, or have the aptitude to work on, contribute to what OpenAI is doing. So, it's a little bit of like a matching program where you can get up to speed on the technology and we can get up to speed on your contributions and figure out where to slot them in.

[00:51:57] Kweli: Anna also outlined some of OpenAI's methods of examining its products before and after release, which is an area in which some of its residents may come in handy.

[00:52:06] Anna: Because it's not like the only thing you can do, is build the algorithm that, sort of, like, powers the model. We have several other teams that look at every other thing. You know, we have a safety systems team that thinks about, like, what are the potential harms that technology can cause? And what are the things that we should build to understand them?

An investigation team that says, like, “Well, what are patterns of behavior that we should be looking for that might be harmful?” We have a team that's like, “Okay, knowing what the next level of this technology is going to look like, how can we think creatively about what that means, both in the negative and positive terms?” Like, the human data team that really looks at, like, sophisticated outputs, where there might be, like, more subtle inaccuracies or bias and, you know, you really need an expertise that is specific to think about what the impact of that could be.

So, there's so many, sort of, parts of the life cycle of these tools that we look at. And, and so, the expertise is not just going to be like coding.

[00:53:08] Kweli: I've got to say, we've only scratched the surface in this episode in outlining what Gen AI is, examining how and why bias shows up within it, and just a bit about how it's being addressed. I encourage you to keep exploring this very important subject. Maybe, you will become, or continue to be, one of those vital activists and/or scholars who pushes to keep policymakers and big tech accountable.

In the next episode of this mini series, we'll talk to some Howard thought leaders and change-makers, whose work presents solutions to certain aspects of the racism and other bias within AI, as well as explore how this type of effort reflects the historical HBCU and black intellectual tradition of improving American society for the greater good.

Thanks for tuning in to HU2U and the first installment of our three-part miniseries examining the impact of the emergent Gen AI revolution on people of African descent through a Howard lens. Make sure to subscribe to this podcast wherever you're listening, so you catch the rest of the series. Also, feel free to send me your feedback after listening to each episode. My contact info is in the show's notes, and I'd love to hear from you. I’m Dr. Kweli Zukeri, see you next time. Peace!

Categories

HU2U AI Mini-Series and HU2U Podcast: Season 2