DONATE

This is the first installment of a new series, Science at the Edge. The series explores the benefits, tradeoffs, and risks associated with innovative solutions while unpacking questions about ethics, policy, or public perceptions.

In The AI Revolution and Misinformation, we discuss the potential media threats AI tools such as ChatGPT pose to spreading misinformation and false narratives across the digital landscape. Associate Professor of Information Science Casey Fiesler joins Institute Director Kristan Uhlenbrock in conversation from the University of Colorado where Casey is a leading researcher of technology ethics, internet law and policy, and online communities. 

Watch a video of this discussion on our YouTube channel.

 

The AI Revolution and Misinformation

KRISTAN UHLENBROCK: Welcome to the start of a brand new series called Science at the Edge. In this series, we're going to be talking about new ideas and innovations and science and technology and how it influences society. We are taking on generative AI or artificial intelligence, and its impact on misinformation. 

Generative AI is a type of artificial intelligence that can generate text, images, and other media in response to prompts that you've given it. Maybe you've heard of ChatGPT or DALL-E or one of these other language and image learning platforms. We're going to discuss how they impact and influence many aspects of our society. We’ll talk about how this technology is impacting misinformation, our media landscape, and the trustworthiness of information in a heavily information-saturated world. 

I'm very pleased to introduce Dr. Casey Fiesler. Casey is an Associate Professor in the Department of Information Sciences right here in our backyard at the University of Colorado Boulder. She also has multiple affiliations with Silicon Flatirons, which is the law school, as well as the ATLAS Institute. Casey has a Ph.D. in human-centered computing as well as a law degree. Her work is primarily focused on technology, ethics and law, and online communities. She is also actively engaged as a champion for broadening and empowering participation in computing. Here she shares an overview of the capabilities of generative AI.

 

What is Generative AI?

CASEY FEISLER: I imagine you have seen a lot of headlines about this over the past year or so. It was last summer that Open AI's generators, DALL-E, and some others were making headlines. But it really kicked off in December (2022) when ChatGPT was released. If you are interested in keeping abreast of what's going on in this phase I actually maintain this running spreadsheet of news articles about AI ethics and policy. The reason I point this out is because every day there's something new. A lot of news is around the ethical issues and the limitations of these technologies, which I think is really important for people to understand. And this all starts with how difficult it is to understand what artificial intelligence is because that is such a broadly used and not always appropriately used term. 

AI is just machine learning, it's just statistics. We are encountering this constantly. When you hear about AI or you hear about algorithms, this is what we mean.  

I'll give you a very simple example here: let's say that a computer has a data set of thousands and thousands of photographs of cats and dogs. And then here's a picture of my dog. The AI would say probably, oh 92% accuracy that this is a dog. But one thing that you might have heard a lot about when it comes to AI is people talking about bias. So imagine that all the dogs in the training data are golden retrievers, plus you've got all these varieties of cats. Suddenly, my dog might look more like a cat than a dog because he looks more like a cat than a golden retriever. This is how we end up with problems like this one that you might have heard about from Google a number of years ago. Bias in AI often comes because of bias in the training data. 

Another example: let's say that you are creating a hiring AI to help you decide who to hire. We know from decades and decades of resume audit studies that humans are sexist and racist when it comes to decisions about who to hire for jobs. So if that's your training data, then guess what? The AI is also sexist and racist. Machine learning is finding patterns in training data and making predictions.  

Generative AI is when instead of labeling things or predicting things based on training data, it's actually creating something new. It's generating something new. So now I can go to DALL-E and say, "Give me a dog. Give me a cat." And the images are of dogs and cats that do not exist, they're not actual photographs. They weren't created by an artist. They were created because DALL-E is trained on countless images of dogs and cats.  

So it has created a brand new dog or cat. And these are pretty simple – I could ask for a dog and a cat wearing colorful knitted hats surrounded by balloons. I could also ask for a photograph of a dog and a cat in front of the Denver Museum of Nature and Science. But if you look closely you'll likely see that there are some problems, and DALL E is not the only system like this. 

Another one that you might have heard about is MidJourney. Typically when you see very beautiful realistic images, that's where that's coming from. Again, it can create things that do not exist in reality and nature. These are the kinds of things that you would've expected to have seen created by an artist that’s out of their mind. It can also create something that looks like a photograph of a person that does not actually exist, or actual people who exist but are doing something that they couldn't be, like Han Solo and Chewbacca with a smartphone.  

 

ChatGPT and Other Large Language Models

You might have also heard about different types of text generators like Google's Bard or GitHub Copilot which helps you write code. But probably what you've heard a whole lot about is ChatGPT. ChatGPT is a large language model (LLM). I can’t go into great detail on how it works -- that would take far too long. But the important thing to know is that it's not a search engine. It would be more accurate to say that ChatGPT is a fancy auto-complete. The way that it does that is so complicated and impressive that it really does seem like magic. 

The important thing to know is it is not understanding what you're saying. It is predicting what word is most likely to come next. And this was done with a lot of different levels of learning, including reinforcement learning, which is when humans rated the responses that it was providing and that info went back in to continue to train the model. 

For example, I could ask it to write a script for an opening scene of Star Trek, the original series, where they visit the museum. ChatGPT is actually very good at writing fan fiction. Is it good fan fiction? Probably not, but the fact that you could tell it to do pretty much anything and it can do it is very impressive. 

Another example: I asked it to write a Wikipedia entry for me (I’m not on Wikipedia). There are a lot of things it generated that seem right but aren't. It says here that I have a bachelor's degree from the University of Georgia. My bachelor's degree is actually from Georgia Tech. And if you don’t know about the rivalry between those two schools, this was a very big mistake. It also provided citations and selected publications. All of the people here are people that I have written papers with and these are topics that I study, but none of the papers exist. It also claimed that I was in Forbes 30 under 30, which is actually a little bit flattering that for some reason this model thinks that I should have been. Here's where you see things start to break down, instead of a search engine, it's about what word comes next. It cited papers and accolades that could exist because they sound right.  

Another thing you should not ask ChatGPT to do is to solve math problems for you. I was like, “Hey, give me a math problem, and don't give me the answer.” I thought I could give it the wrong answer and it would correct me and tell me how to do it correctly, but instead, I gave it the wrong answer and it was like, “Good job.”  

 

Misinformation, Deepfakes, and Bias

So you can see some ways that you might get accidental misinformation. (It is very important for students to know about citations.) But there are also ways that you could use these technologies to generate deepfakes. 

These are two very famous examples from the past few months. Someone on Twitter created these images of Donald Trump being physically arrested and put them on Twitter and said, "I made these." But then people started picking them up and posting them elsewhere. And then they're in news articles. They fooled a lot of people, but this is not a brand-new thing that has come up in the past year. The thing with the tools that have become popularized in the past year is how easy they are to use.  

The good news is that there are ways that you can spot deepfakes or just spot AI-generated images. They often have weird little things. Sometimes it's like someone has three arms and it is amazing how you overlook those things until you're looking for them. But of course, these are things that with a little bit of effort you can fix and people are trying to make the technology better every day. 

I want to mention in terms of types of misinformation -- we have the unintentional stuff, the mistakes, we have people using these adversarially, and then there's bias in the system. I asked MidJourney to give me an image of a computer science professor, and it gives me three bald white men lecturing to a room of bald white men. So there are potential types of representational harm. 

We're going get to physical robots that have these kinds of models integrated into them pretty soon. Imagine that a kid says to their robot companion, "I wanna play dolls. Please go get me a scientist." And they always bring her Ken instead of Barbie. This is what happens when these models are basically just a regurgitation of everything on the internet.  

I'm sure you can all name a dozen more ethical issues that are happening here, but there is a speed at which this is moving right now that I think is problematic. There are some ways that we need to pull back and make sure that we're not, as this person from Microsoft said, "Waiting to see what we need to fix later might be too late. The harm might already be done by then." 

You also might have heard a lot of calls for slowing down AI because we're worried about these very existential risks like human extinction. My take on this is that it's a bit of a distraction to be saying that what we really need to be worried about is human extinction when there are a whole lot of other things that are actual tangible problems that we know about and are happening right now. I don't think that we need to think that far in the future yet because right now there are a lot of harms we already know about. 

 

The Impact of AI on How We Work

KRISTAN: If you gave (a generative AI tool) a prompt for writing a script or something, are you going to see a similar template or module coming out over the course of time, or depending on that prompt are you gonna see a diverse variety?

CASEY: You would probably see a pretty good variety. Though I will say that ChatGPT has some memory. If you are using your same account then it might give you some things that are very similar. The weirder your ask is, the more likely I think there is to be a lot of variation. 

There does tend to be a particular style of speech though, that you're getting out of it. I am finding that I can spot something written by ChatGPT much easier than I could six months ago. But it is definitely generating everything on the fly. Again it is word by word

KRISTAN: And when you teach about this and you have students engaging with you I've heard people have concerns about plagiarism.  Could you talk a little bit about that and the level of risk around plagiarism?

CASEY: This has obviously been a huge topic of conversation and I have to say it feels a little bit shortsighted to me that Open AI released ChatGPT at the beginning of December, which was literally right when students were just about to do final papers and exams. 

So teachers were very worried about this because there was no time for anyone to react and figure out how to handle it. ChatGPT and other technologies like this are tools. Sometimes people will say to educators, “Would you keep your students from using a calculator? Would you keep your students from using spell check?” And the answer is of course not, but I would keep my students from using spell check if it was a spelling test. Essentially the difference is that you need to figure out when something is a spelling test and when it's not. 

I think it's totally reasonable for educators to tell their students to not use ChatGPT in certain types of situations, but there are others in which it might be totally appropriate. If we're worried about students cheating, which of course is a thing that happens, I've heard a lot of educators talking about how to adjust their ways of evaluating and learning to make sure that's not something that is feasible to do. Or telling students that they're allowed to use it as long as they say how they used it and make sure that they're meeting certain types of learning objectives.  

There are detectors, but they're not very good yet. I've seen examples where people ran excerpts from the Constitution and the Bible through these detectors and they showed up as ChatGPT generated. I worry a lot about false positives and I feel the same way about plagiarism detectors. I think that a lot of educators spend far too much time trying to keep students who shouldn't get As from getting As, and not enough time trying to make sure the students who should get As do get As. 

I think that we're gonna see some changes in education, but in the meantime, I hope that people are patient with educators and that educators are understanding of their students.  

KRISTAN: How are these tools being used with the news media and journalists? I know certain newsrooms have policies and certain ones don't, and I can imagine it's a quickly evolving landscape. Do you have a good sense of how this is and isn't being used or other concerns for journalism?

CASEY: I'm not super worried about actual legitimate journalists using these tools extensively, in part because the risk of incorrect information would be so strong. They're using it to help edit writing or that sort of thing. What I'm more concerned about are things like Buzzfeed News, which recently went away. But think back to the genre of Buzzfeed that was like “Here's the top 20 XYZ.” You could write 10 of those in five minutes using ChatGPT. I think we're gonna see that kind of thing. Which again, might not be intentional misinformation, but could end up containing misinformation.  

The other thing that I imagine journalists and writers are worried about is job loss and ways that their content is being used without their consent. If you've been a journalist for a while, I guarantee you that your writing is in the training data for these models. And we'll be hearing a lot about this from artists, but it's really true for anyone who's ever written anything on the internet. So are there any copyright protections built in? To some degree. 

KRISTAN: Could you talk a little bit about that?  

CASEY: So, the first thing I'll say is that I think legal issues around this and ethical issues around this are very different. Actually, my dissertation was about fair use. 

I'm very pro-fair use, and I'm pro content reuse.  I think that copyright law can sometimes be used to stifle creativity and innovation. And so the argument that Open AI or anyone who has made one of these models would use all of this data from the internet, and that is fair use which is basically an exception to copyright laws. They allow for things like quoting a book in a book review or news showing images of things and this sort.  

Probably the most related sort of existing precedent for this is Google Books. Google was sued because of basically scraping a bunch of books to create Google Books. That was found to be fair use because they used books to make a search engine. That was considered to be a transformation of the content. What you're doing though is using books to make books, which I actually think could be considered quite differently. 

I think that the underlying ethical issue has to do with using people's work without their consent, without attributing it to them, and in a way that could hurt those people. If you are a stock photo artist, I'm a little bit concerned. AI is not taking your jobs. Humans are firing you. I don't think we have to worry about job loss for everything from AI. But things like stock photo artists or artists who take commissions to create the covers for self-published books, things like that could be replaced by using these image generators. 

And so knowing that all of your art is training the technology that could make you lose your livelihood is a significant ethical concern, regardless of the legality of it.  

 

The Ethics of AI

KRISTAN: I want to pick up on a few of these things as we go through some of the workforce issues that you alluded to. But let me focus on the ethical conversation since that is definitely what a lot of your background is in. How do we put some ethical boundaries around the platform and that conversation? Or are you thinking about users because those would be very dramatically different approaches? And when you have something open source, how do you put ethics around that?

CASEY: I think what you're getting at is where are we looking at the responsibility? Is it the people who are creating or releasing these technologies or are we thinking about how people use them?  

I'll go back to misinformation. If someone uses MidJourney to create an image of a politician doing something that they clearly did not do and then releases it pretending it's real, and that impacts people and affects an election then yes, it is obviously the fault and responsibility of the person who did that. However, I would argue that knowing that this is a problem, knowing that this is a particular kind of bad actor adversarial use case, there is some responsibility on the creators of this technology to try to mitigate that harm as much as possible. We've talked about obvious things like watermarks. It could be as simple as watermarks embedded into the metadata in the code. 

There are ways around these things but anything that you can do to add more friction to these kinds of negative uses of this technology I think is really important. And at this point, we can't pretend we don't know what the problems are gonna be. They knew before this was released. 

There is an interesting example. Thinking about the MidJourney computer science professor example, DALL-E actually has a bias mitigation technique to prevent that from happening. So if you go to DALL-E and ask for an image of a computer science professor, it will probably give you three white men and one woman or black man or something like that. The reason is because they know about the bias in their training data. They know that it's very likely that the results are going to end up skewing towards frankly, white people for everything, or stereotypes because of the training data. And so they actually appended demographic features onto the end of search results for a small subset of the results. 

So what you would end up with is computer science professors, three white men, and then the last result will be an image of a computer science professor that's a Black woman. So that's something that they put into the design of the technology to ensure that those results were not as biased as they would be otherwise. 

KRISTAN: You mentioned the introduction of friction into this. What are some examples that we currently have or that you could envision helping with that?  

CASEY: A watermark could be very beneficial. Students cheating is another example that people are really concerned about and people really do want detectors. I do think that with some more work, there should be able to be some kind of accurate detector for these systems. The way that you end up doing that is basically having certain types of statistical irregularities in the output that you could check for. It would at least mean that you would then have to paraphrase it, right? 

I made some talks early on about ChatGPT's tendency to fabricate sources to make sure that people knew that this was a thing and that students could know that was a thing. And someone said, “Oh, I know that I shouldn't use those sources. Instead, I go find other sources that are appropriate and I cite those instead.” 

And I said, “It sounds like you're doing a big part of the assignment yourself.” 

 

The Continuing Evolution of AI

KRISTAN: Are (chatbots) going to be reflective of just the culture of the day? Or is that gonna be able to change over time?  

CASEY: Right now ChatGPT is based on training data that I think is no earlier than 2021. So again, it's not a search engine. 

It’s not looking for new information. I assume they will continue to be updated. A concern that I have been hearing about recently is model collapse. Which basically means each generation of the model will be further trained. And within five or six generations, it was just giving you garbage. They referred to that as model collapse. Basically, the idea is that if AI is being trained off of its own output over time, it just degrades. Because again it's this probabilistic model of what's happening. 

So if you're looking for statistical regularities in the data, and if you keep getting the same statistical irregularities over and over again, they compound and become gibberish. Whereas if you have human data that's created by humans, it doesn't have those kinds of regularities because we are very irregular. 

Additionally, this is only slightly related to that question, but I do have this concern that the more AI-generated content we have on the internet, we could get to a point in the future where more of the content that we have access to is created by AI than was ever created by humans. This is not outside the realm of possibility. If we get to that point and we're just training AI off of stuff that was created by AI, we're gonna be seeing the same kind of thing over and over, or potentially even just down to garbage.  

The creator of Black Mirror, Charlie Brooker, said that he had ChatGPT write a Black Mirror episode for him, and it was bad. The reason it was bad was because it was just like what you would expect a Black Mirror episode to be, because that's what it gives you, right? ChatGPT is just a jumble of tropes.  

KRISTAN: You've talked a little bit about all the headlines about AI, I'm quite curious about the spectrum -- are they exaggerated or should we be concerned?

CASEY: My take on the AI safety crowd is that it's a little too concerned with existential risk like the human race is going to go extinct because of AI. I tend to find that distracting. Not everyone feels this way.  

Some of the folks who are directly involved in the development of AI, like the Open AI CEO, have been very vocal about the importance of regulating. But then they are very specific that what they mean is regulating it to prevent human extinction. So Open AI put out a blog post about the importance of governance towards super governance of superintelligence. 

We are multiple steps away from super intelligence right now. We have AI that can tell you what a dog is. And then there's general AI, which is like AI that could do anything. We're not too general about AI yet. And then beyond general, AI is super intelligence, which is the idea of AI that is smarter and better in every way than humans.  

Open AI said that we should start thinking about the governance of superintelligence because it's coming. But then this blog post was saying, to be clear, we're not saying that you should be applying these regulations to what exists now. And so I just want to make sure that we don't get so focused on regulating for the robot wars that we don't worry about things like misinformation and bias. And these are real harms. I'm someone who thinks it's actually very important to look toward the future. But I think sometimes you get a little bit in the weeds thinking that far ahead. I understand exactly what's happening with things like job loss and misinformation and bias.  

KRISTAN: There's this new AI act that the European Union Parliament adopted. I think just this month or last month. Obviously, it hasn't completely passed, but it's expected to be approved this year. And I think some of the focus on it from what I was reading, was on transparency and identifying those potential risks. Is this the first step in government, trying to figure out where and how it could potentially be used to regulate or to set policy? Talk to me about some of the challenges of regulating and creating policies and where do you see opportunities in this conversation?  

CASEY: The AI Act is a risk-based framework. Basically, the idea is that each technology has its own sort of risk analysis and then how it's dealt with and what kinds of restraints it has are based on that risk. The idea is that some things might be so risky that they would be banned outright. Other things might be required to have audits or something like that. And some things might be so low risk that they're barely regulated at all.  

My understanding is that there are arguments over whether this would be the case. But an example of something I've seen that could potentially be banned under this regulation would be emotion recognition AI. This is AI that determines someone's emotion from their voice or their gait or their face. The benefits don't outweigh the risks of bias, and misuse by law enforcement would be the example that comes to mind. 

The US does not really have AI regulation yet. Last year the White House Office of Science and Technology put out what’s called a blueprint for the AI Bill of Rights. It's actually a really nice document that lays out a lot of concerns about AI. When I first started talking about this, a lot of people thought that I meant rights for the AI, but it’s about our rights, human rights in the context of AI. So things like certain types of consent requirements and transparency requirements and this sort of thing. I would like to see something like that be a blueprint for legislation.  

As someone who has a law degree and has done some legal advocacy work, I tend to be a bit pessimistic about regulation. Just the fact that we barely have any data privacy regulation in this country. I'm a little bit skeptical. And part of this has to do with lobbying. The other thing that I saw recently is that the Open AI CEO while talking about how important regulation is, was also quietly lobbying the EU lawmakers against some of the things in the AI Act. This has the ability to concentrate even more power in the hands of people who already have all the power. I think this is true because of the small number of big tech companies that we have. 

I think that one of the big obstacles to regulation, particularly in the US, is power differentials. I don't know that I have a good solution to that unfortunately, but we need to acknowledge it.  

KRISTAN: You had already alluded to different versions of people putting in a bias check, in essence, around what produces the output. Is that being done or is that still just individual companies and tech companies doing that within their own software?

CASEY: For DALL E, it was that we know we can't fix the dataset, so we're gonna do this other thing instead. It is extremely hard, if not impossible, frankly, to de-bias data sets of this magnitude. Most of the conversations I've been seeing around data sets have been around ownership and consent issues.   

I recently went to some of the listening sessions that the copyright office was having with artists and authors and that sort of thing. And there are a lot of people who think that there should be a licensing regime for AI. So like music copyright, which is incredibly complicated. One of the weird things about music copyright and people getting paid is that there's an organization or multiple organizations, that basically take in all of the revenue that should be paid to composers for things like their songs being played in bars and on the radio and that sort of thing. As opposed to a radio station having to keep track of every song that they play and giving 5 cents to that person, like how streaming works. This organization takes all of this money and then doles it out to people based on how popular their song is at any given time. It is a very simplistic explanation, but there have been discussions of basically the idea here. 

And, this again I like, I don't know about legally, but ethically I feel this pretty strongly, which is I do think there's an issue with the fact that Open AI (I just keep picking on them cause that's who everyone talks about, but they're not the only ones who created this immensely profitable technology) absolutely could not exist were it not for all of the input from all of these humans who created all the data that trains it without any kind of compensation. 

I'm not anti-AI. I actually think that a lot of these technologies have the ability to do things that are pretty darn cool. But I think that if something is going to be as transformative and change the world as much as people think that it's going to, it is incredibly important that we hear these things out so that we don't get to a place where it's done more harm than good. 

 

Mediating the Impact of AI

KRISTAN: Humans have a long history of developing new technologies and then sometimes struggling to ensure that the impacts are beneficial. Is AI one of those? What is it really going to be useful for? What is the benefit of it to us?

CASEY: Yes. I do think we have seen this before. I think that this could be bigger in some ways but I think that there are some similarities too. For example, in the early days of the Internet. If you think about how completely different the world would be if there were no Internet. 

If you look back at science fiction from the mid-20th century, there was no Internet, but there were flying cars. So I do think that there are going to be benefits of this technology. 

A lot of people are very concerned about job loss. And again, I think it's very important to remember that AI is not taking people's jobs, humans are firing them. You're like, okay we now only need one copywriter instead of five, so we're gonna fire four copywriters. But what if you gave that one copywriter a three-hour work week instead and paid them the same? AI should be able to augment what we're doing as humans so that it's faster, more efficient, and that sort of thing so that maybe we don't have to work ourselves to death.  

To me, that's the dream that AI is going to result in us being able to have shorter work weeks. Maybe we even get to that point very far in the future, but if, AI continues to be as transformative as people say it will be, maybe we need universal basic income. But I would love to see a world in which it's not “Oh no, AI takes everyone's job.” And instead, it's “what if we didn't have to work as much, but still got paid a living wage?” 

I think that kind of thing is very challenging, but there are things that AI can do to help people with the kinds of work that we're doing. That's how I would like to see it - as an augmentation rather than a replacement. And it's not going to write very good fan fiction. 

KRISTAN: What are some ways or recommendations you'd give for people who want to crosscheck the trustworthiness of some of this information? What are those tips that you give people?

CASEY: Some of this is so simplistic, but you really do have to fact-check ChatGPT. If it's giving you any information that's of consequence, go search for it in the way that you would've searched for the information before and maybe it will have now given you some keywords that will make that searching easier.

When it comes to things like deepfakes, you just need to be incredibly cognizant of where information is coming from. If you see an image on a random Twitter, why would you trust that? 

I talk about this stuff on TikTok. And people will ask all the time how is there educational content on TikTok and why would you ever believe anything you hear on TikTok. It's not that you don't believe things because you hear it on TikTok, you believe it because it's me and that's why I have my real name and my credentials there. 

And when I tell you something about AI then maybe you believe it, but some random person with zero followers and you have no idea who they are, why would you believe them? And that is just what we need to apply to everything now, unfortunately. We really need to be able to figure out whether the source of information is credible or not. 

KRISTAN: Thank you, Casey. What should we be paying attention to next in this space? Where are you spending a lot of your time thinking that you think some of us should be thinking a little bit more critically about?

CASEY: I've been spending a lot of time thinking about basically what you just asked me about. How can I also help people have an even better understanding of how these technologies work so they can have a better understanding of their limitations? And so that's what I think is really important right now -- is there education around what these systems are like? 

Very simplistically, it's fancy auto-complete, not magic. And I think that's important for people to understand. In some ways, it might make them less scared. The world is probably not ending tomorrow, which would be a good thing for people to not think it's happening right away. 

There are a lot of hard problems in the world and society, right? And I think it's how do we find the ones that we can deal with right now and focus on those and work on them collectively? 

 

READ NEXT:

Disclosure statement:
The Institute for Science & Policy is committed to publishing diverse perspectives in order to advance civil discourse and productive dialogue. Views expressed by contributors do not necessarily reflect those of the Institute, the Denver Museum of Nature & Science, or its affiliates.