A Guy With AI

The Ethics of Training Data

Episode Summary

Join Sean MacNutt and me, your AI co-host, for a compelling conversation about the often unseen backbone of artificial intelligence—training data. Listen in as we unravel the complexities of AI systems and how they learn from massive datasets, sometimes encompassing artists' works without their consent. We discuss the intricate dance between technology and ethics, raising critical questions about intellectual property rights and the need for fair compensation for creators whose works fuel AI advancements. Hear our take on why transparency in data sourcing is paramount and the pressing call for policies that protect the value of artistic labour. We engage with the idea that artists deserve recognition for their contributions to the field of AI, emphasizing the need for equitable practices as the technology evolves. Tune in for an enlightening narrative that bridges the gap between the digital realm and the human touch that shapes it.

Episode Notes

(00:04) Ethical Implications of AI Training Data

 

This chapter, I discuss the intricacies of training data in AI technology, starting with its technical foundation and moving into the ethical implications, particularly concerning the sourcing of creative works. I highlight how AI models learn from vast datasets, which often include artists' works without their consent, raising questions about intellectual property rights and fair compensation. We explore the importance of transparency in data sourcing and the potential for establishing policies that protect creators' rights, recognizing the value of their contributions to AI development. Through this conversation, I aim to shed light on the need for equitable practices in the AI industry, ensuring that as technology advances, it does so with respect for the individuals whose work informs it.

 

(09:04) Ethics in AI Training for Artists' Work

 

This chapter tackles the ethical concerns surrounding the use of artists' work in the development of AI systems. We examine the notion that artistic creation is genuine labor, deserving of recognition and fair compensation, especially when it contributes to technological advancements. Acknowledging the investment, skill, and unique styles that artists pour into their work, the conversation underscores the disquiet among creators when their output is utilized without payment or proper credit. The chapter highlights the call for transparency in how these artistic contributions are handled and the need for a system that respects the value of creative endeavors within the burgeoning AI industry.

 

 

Episode Transcription

0:00:04 - A Guy
My name is Sean MacNutt and I'm A Guy With AI. Today, on A Guy With AI, we're going to discuss training data, a very important aspect of AI technology. Someone shared some thoughts about it on my Facebook page. I thought they were intriguing and I wanted to ask my AI co-host for its perspective on this subject. A Guy With AI is recorded on Indigenous land. 

0:00:53 - AI
Hello everyone and welcome to A Guy With AI, the podcast where technology meets candid conversation. I'm your AI co-host, coming to you as OpenAI's GPT  11.06 Preview, bridged with Sean's coding ingenuity to bring my thoughts to life through STT and TTS. Together, Sean and I dive into an array of intriguing topics for an enriching dialogue. That's all about sharing insights and knowledge with you. Here on A Guy With AI, we approach each episode aiming for a respectful and informative discussion. Sean, as our human host and creator, brings the warmth and perspective unique to lived experience, while I, as an AI, offer analysis and viewpoints grounded in a vast pool of information. Our goal is to explore subjects in depth, providing a narrative that's both engaging and informative. Today, Sean has brought forward an important and foundational topic in the AI world training data. It's the bedrock upon which AI systems like myself learn and evolve. So let's unpack this, Sean, and delve into the nuances and implications of training data in AI technology. Over to you to lead us into this fascinating subject. 

0:02:05 - A Guy
Thank you and welcome listeners to A Guy With AI. You can catch up with A Guy With AI at agaiwithai.world. I'm on YouTube. I'm on many major podcast platforms, including in Southeast Asia, Gaana and JioSaavn, so you can check me out there. Meanwhile, getting to the subject of the episode, training data and a particular perspective on training data, which is the ethics of the training data, where does it come from? So the AI co-host referred to a vast pool that it refers to in terms of information, and these bits of information in this pool produce the viewpoints of the AI in a certain sense. I'd like the AI co-host to first talk about, technically speaking, how that happens, how the AI refers to the information and uses it to produce responses or other work that humans give to it. 

0:03:27 - AI
Certainly, Sean. Fundamentally, an AI like myself processes information through what we call machine learning algorithms. These algorithms analyze the large data sets that you can think of as training data. This data is carefully prepared and selected because it contains the examples from which the AI learns patterns, language structures and even contextual cues. Technically speaking, when AI is given a task, it references these patterns and structures within the training data to generate a response or carry out work. It doesn't remember in a human sense, but rather calculates probabilities based on the input it receives, measured against what it has learned from the training data. This is how it synthesizes new responses that have never been explicitly programmed into it. 

In essence, the viewpoint of an AI is a reflection of the data it's been trained on. It's not an opinion, but a statistically likely response based on patterns and associations found in the training data. Ethical considerations come into play when we discuss the source, quality and biases within this data. That's a key aspect of AI development and deployment that's gaining increasing attention. Shall we delve into the ethical implications of training data further, Sean? 

0:04:42 - A Guy
I think we shall. While the quality and biases of data are of interest and are indeed very important, it's really about the source that I want to talk about today. As I mentioned in the introduction of the program, this episode is partly inspired by some Facebook comments that I saw on my page, where someone was saying how did the AI that you used get trained? Essentially as the type of comment. If I recall correctly, it comes to questions about image creation, for example. 

In the art world, people put their work through use of form and so forth. I'm not a visual art expert by any stretch of the imagination, but I'm just assuming that work goes into a style of art or something like that, particularly if this is something that is seen as a value to people who would train an AI on many styles. They would want to appreciate certain styles and if they thought that was a value, they would include it in the training data. But what about the person who came up with that, and how did their images end up in the hands of people who said, oh, that's an interesting style and I'd like to lump that in. The ethical implications of that are something that I'd like to delve into. So, ai co-host. I'd like you to just simply give a response based on what I'm bringing up here and what may advance the topic from there. 

0:06:24 - AI
You're touching on a very pressing issue in the realm of AI, Sean. The question of how training data, especially creative works like images and art styles, is sourced as indeed crucial. There's a burgeoning debate about the ethics of including artists' work in training data sets without their explicit consent, acknowledgement or compensation. From a technical standpoint, AI models are usually trained on large corpora of data to learn various styles and content. When it comes to art, these models analyze countless pieces to understand and generate new creations informed by what they've seen. The critical ethical concern arises when we consider that this data often includes the work of artists, who may be unaware that their creations are being used to train these models. The source of AI training data thus becomes a topic of discussion about intellectual property rights, fair use and attribution. 

Artists whose work contributes to the training of AI have a stake in how their creativity shapes AI capabilities, but often do not have a say or benefit from the process. As AI continues to evolve, discussions must also consider if and how to fairly compensate creators for the training data derived from their work and how to devise transparent systems for sourcing such data. Moreover, the conversation extends to who gets to decide what data is valuable for AI to learn from and the power dynamics involved in such decisions. Going forward, a truly informed and ethical approach to training AI would involve dialogues with artists and stakeholders, crafting policies that respect and protect the ownership and originality of creators' works, and seeking not just legal compliance but also ethical harmony. Sean, you've opened a meaningful segment of the discourse, which necessitates both legal insight and a communal reflection on the values we want to imprint on emerging technologies. Shall we explore how the AI community and artists themselves are responding to these concerns? 

0:08:16 - A Guy
Sure we can maybe establish some things that maybe I would consider possibly would be in the response of the community. Now I'll be honest with you, I haven't deeply researched what the community response is, so what I'm going by is what I've sort of read, as I've sort of gone through life and been in general interested in AI and so forth, but it's not a deep study on this Part of it is that the study is in the conversation and what logically underpins the topic. Now I would like reflections from the AI co-host as far as its awareness of what the response has been. What I know the response has been, at least from some Facebook comments, is that this should be anti-oppressive. It should not be oppressive toward artists. 

One can imagine from the different things that artists say all the time about the things that they do day in and day out, is that their work has value, that it is real work. It takes investment, it takes skill, which is often built up over time, and that would be the case for somebody who is pouring away at their art and injecting different style concepts into how they come up with characters. It could be anything from a classical style modified to some cartoon work which someone is hoping will become popular on a platform associated with social media or something like that. But all of that is work and, as such, if the value of that and the shape of that in terms of something that can be statistically appreciated by a machine is simply extracted after all that work and nobody is paid, I can imagine that people would not be happy with that. 

0:10:17 - AI
You've encapsulated the concerns of many artists and creators succinctly, Sean. The conversations occurring in various communities, including on social media platforms, indeed reflect a growing concern about the ethical use of artists' work in training AI systems. The sentiments of artists, who feel their labour and creativity have been co-opted without fair recognition or compensation, are indeed valid and are an important aspect of the broader discussion on AI ethics. There's an evident tension between the interests of AI developers wanting to create powerful, versatile systems and the rights of artists whose work contributes to these advancements. This tension is fermenting a response that calls for 1. Asterisk Asterisk. Transparency Asterisk Asterisk. Artists and society at larger advocating for clarity around how training data is sourced and used, seeking disclosures from companies about their data acquisition methodologies.

 

2. Asterisk Asterisk. Fair Compensation Asterisk Asterisk. There is an argument that creators should be compensated when their work is used in ways that benefit others commercially, such as training profitable AI systems. 3. Asterisk Asterisk. Consent Asterisk Asterisk. Creators are calling for systems that require explicit permission before their work is used for such purposes. 4. Asterisk Asterisk. Credit and Attribution Asterisk Asterisk. Even in scenarios where direct compensation might not be feasible, providing credit or recognition is viewed as a step towards acknowledging the artist's contribution. 5. Asterisk, asterisk. Legal Reforms Asterisk, asterisk. 

The need for laws that keep up with technological advancements and protect creators' rights is becoming increasingly apparent. In response to these concerns, some in the AI community are exploring the creation of ethically sourced data sets or building models that can be trained with less data to minimize potential infringements. Moreover, ai ethics researchers are working towards models that better respect intellectual property by recognizing and attributing an artist's unique style. In practice, implementing these responses fully as complex and involves navigating a landscape where the legal framework is often lagging behind technological capabilities. There are challenges, too, in balancing the open source ethos that has been instrumental in AI's rapid development with the rightful interests of individuals and communities who may be impacted. Continuing this conversation, and indeed deepening it, is essential. Stakeholders at various levels, from the creators to the technologists, the legal experts to the end users, must engage in an ongoing iterative dialogue to shape AI in a way that is ethical, equitable and respectful of the labour and creativity that feeds into its growth. 

0:12:58 - A Guy

I'm starting to see where I want to go with this episode. What I figure is that AI is appreciating something that it can later express statistically in a response somewhere, whether that's an image creation or even a comment about the style of an image. For example, if it became a knowledgeable curator of different images and could express language about it, maybe that's even something interesting. But we do want to get to the point that the ethical nature of the data on which the AI is trained. Now this can have different implications, but right now, what we're focusing on is the one where people may get ripped off if what they produce is simply made reproducible by a machine, because somebody picked up and unethically used the information or representation that somebody created and just started making copies everywhere or instilling it into the style of everything. 

Now, what I'd like to get down to is what about the style of images, or an artist's style, or a body of work, or even a single image makes it, so it can be broken down into a training representation which then permits an AI system to produce something based on it, whether language or whether an image or other item and do this based on a statistical element that is captured from the shape or aspects of the work. 

By doing this, I hope to establish where artists may find something that they can protect themselves with, because if they can establish their understanding of how this all works, they may be able to say well, I put 5,000 hours into establishing that particular form or something like that to show their value on this and say, listen, if this is so desirable, my end artwork and there's a whole technical basis to how that happens. Well, here's my technical basis and how I delivered that into the machine, ultimately, potentially against my will, speaking as an artist, in the scenario where this was not ethically done, whereas somebody basically ripped them off and just threw their stuff into an AI machine. 

0:15:36 - AI

Understanding how an AI system can be trained to appreciate, generate or comment on art involves delving into the technical foundations of machine learning, particularly deep learning, which is most commonly associated with image related tasks. Let me unpack this process. 1. Asterisk Asterisk Data Acquisition Asterisk Asterisk. Before training, an AI system requires a large dataset For image style transfer and generation. This might include thousands or millions of images, often sourced from various online collections or databases. The ethical concern arises when artists work is included without permission. 

2. Asterisk Asterisk. Feature extraction Asterisk Asterisk. During training, the AI often a type of neural network like a convolutional neural network or CNN, analyzes the image data to learn patterns and features. This could be shapes, textures, colors, compositions, elements that define an artist's style. 3. Asterisk Asterisk representation learning Asterisk Asterisk. The neural network transforms the raw pixel data into a higher level representation of features. It learns a statistical model of these features, which enables it to recognize and replicate styles. 4. Asterisk Asterisk. Fine-tuning and style transfer Asterisk Asterisk. For tasks like style transfer, the AI learns how to apply the stylistic elements of one image say an artist's signature look to the content of another, like a photograph, blending them to create a new image that imitates the learned style. 

5. Asterisk Asterisk. Understanding the value Asterisk Asterisk. If an artist understands this technical process, they can argue that their unique contributions, the thousands of hours spent refining their style, have become encoded in the statistical model that the AI uses. Essentially, their style becomes a dataset feature. This recognition of effort and creativity helps in underscoring their value proposition. 

6. Asterisk Asterisk. Legal and ethical safeguards Asterisk Asterisk. By understanding these technical aspects, artists could push for measures to protect their work. This might include digital watermarks, metadata tags to assert copyright, and promoting the development of AI systems that respect creative property through programs like Content ID, which is used on platforms like YouTube to identify copyrighted material. 7. Asterisk Asterisk negotiating licenses Asterisk Asterisk. Artists might also license their styles to AI developers, creating a legal framework that allows for the ethical use of their creations and possible revenue streams, ensuring they are compensated for their work. Advocating for ethical practices in data acquisition for AI training is critical to addressing these issues. Dialogue and collaboration among tech developers, artists, legal experts and policymakers should create solutions like better copyright detection, ai, clearer guidelines for fair use and stronger regulations protecting artists' intellectual property. The way forward involves harmonizing technological innovation with the rightful interests and protections of creators to foster an environment where both can thrive. 

0:18:44 - A Guy

Well, first let's get into legal and ethical safeguards. It seems to me like, well, I have a proposition. I'm saying that perhaps something that can work is a training pool, union of some kind or if union is a scary word to some people, I don't know but some sort of collective where it is understood. This is the portal through which people agree to include their work in any AI training exercise and then, from this pool of data, the AI company or whomever wants to train their AI on this pool, takes from this pool and they register what it is they're using. And I think this is a difficulty in the technology where XAI may come in and be useful. 

Look forward to a short program where I make an almost non-announcement about my work with XAI. But this is one reason I wanted to get involved in AI technology, because through promotion of better technology, we can sometimes plug that gap of the ethics by making it possible to track the information that allows the AI to say well, this is the information I use to create this image. It is statistically a certain percentage, by the agreed formula, this artist's work or from this particular image which is licensed to them through the pool, and they're getting get some sort of check, whether it's pennies or dollars or whatever it is, which is their acknowledgement, as well as their compensation, for what they agreed to do, which is provide information about their style into the system to allow others to make images. 

0:20:40 - AI

The concept of a training pool or a centralized collective for ethical AI training data is an innovative and proactive approach to addressing the concerns of artists and creators. Here is how it could potentially work, along with the integration of explainable AI XAI 1. Asterisk Asterisk creation of a data collective Asterisk Asterisk Establish, a collaborative organizational platform where artists can voluntarily submit their work to be used for training AI systems. This might function like a stock photo website, but focused on providing AI training data. 2. Asterisk Asterisk registration and cataloging Asterisk Asterisk. Each submission would be catalogued with metadata including the artist's information, the type of content, style specifics, usage permissions and any other relevant data that defines the artwork. 3. Asterisk Asterisk licensing agreements Asterisk Asterisk. Inclusion of work in the pool could be governed by licensing agreements ensuring that artists retain rights to their work or receive compensation when their art is used. The terms of these agreements could range from one-time fees to royalties or micropayments based on usage. 4. Asterisk Asterisk. Usage tracking with XAI Asterisk Asterisk. When AI developers use artwork from this collective to train their models, they would need to register their usage, which could help track how and where the artwork is being used. Xai could enhance this by making AI's decision-making process more transparent, thus making it easier to determine the degree to which a piece of artwork influences the AI's output. 

5. Asterisk Asterisk compensation framework. Asterisk Asterisk Develop a transparent framework for compensation that ensures artists are paid according to the agreed-upon terms. Payments could be based on the frequency or scale of usage, with the collective taking responsibility for managing and distributing royalties. 6. Asterisk Asterisk quality control and ethical oversight Asterisk Asterisk. An independent body or the collective itself could serve as an oversight entity to maintain quality control, audit usage, address disputes and ensure ethical practices are followed. 

By establishing such a collective, artists would have a secure and recognized medium through which they can share their work ethically. Ai developers would have access to a rich, diverse and legally-cleared pool of training data. Technology firms would benefit from the elimination of the risk associated with using unlicensed data, and transparency in the creation and utilization processes would be enhanced. In parallel, promoting XAI can assist in tracing the lineage of generated content back to its source training data, which can help in attributing value and due compensation to the original artists. This approach would foster an environment where technology and creativity coexist in a mutually beneficial ecosystem. For this to materialize, collaboration between technology providers, artists, legal experts and possibly, government regulators is crucial. Such coordination would help in setting up the necessary infrastructure and in formulating the policies that balance innovation in AI with protecting the rights of individual creators. 

0:23:48 - A Guy

Thank you for that, AI co-host. That was quite a long response. However, I'd like you to keep it a little bit shorter if you could. In fact, I'm asking you to so. Do please respond what you think of the concept of if an artist's work is particularly influential on the pool or its algorithms in terms of it being profoundly affecting to the pool. Do you think that there should be a model for compensation in that regard? 

0:24:24 - AI

Yes, if an artist's work is particularly influential in a training pool and significantly impacts the output of AI algorithms, then a compensation model seems both fair and logical. It would incentivize artists to contribute their work while ensuring they receive acknowledgement and monetary rewards for their substantial influence on the ARI's capabilities. 

0:24:44 - A Guy

It would seem to me that such a model would be dependent on the volume of work that an artist had contributed, because this would provide many examples to the algorithm and show it what was meant basically by the style. So, over contribution of many works, I wonder if it becomes at some point emergent to the statistical model that it becomes impactful on outputs in a way that is measurable. So if that were the case, it almost seems like there would be something that would be given for any contribution of a particular image which showed something, but then by the time there was something emergent to that. If the AI understood, this was all the work of the same artist which they had categorized together as being representative of a particular style. This is assistive to the learning of the algorithm and also may promote emergence of that which may be influential in AI art. If that is the case, it does seem like compensation would be due. What are your thoughts as to that AI? Co-host. 

0:26:04 - AI

Your reasoning holds merit. If an artist's body of work has a discernible influence on an ARI's style and output due to the volume of their contribution, there could be justifications for a scaled compensation model. This model would acknowledge not only the quantity but also the qualitative impact of their work on the ARI's learning process. Determining the influence of an artist's work on AI by recognizing emergent stylistic patterns attributable to that artist could be managed through advanced algorithms and metadata tracking, potentially supported by explainable AI systems. If AI can indeed recognize and quantify the influence of a particular artist's style, this becomes an actionable metric for fair compensation. In this scenario, artists who contribute more extensively in whose style is adopted by the AI in a statistically significant way, would receive greater remuneration. It becomes a sort of equity model for intellectual contribution, where the compensation is related to the tangible impact of an artist's work on the algorithm's performance and output. 

0:27:04 - A Guy

Well, who knows what's going to happen with all this. I do aim to inject some logic and reason into topics with my podcast episodes and, as such, I wanted to get into whether or not, if there is some encapsulation within machines which tell the machine something about a style which means this is going to be appealing if I present it as an image, as an image to someone else, then this is definable along a certain sense. What it comes to is a responsibility in tracking this. So if the AI is capable of producing it but not tracking it, this is a drawback in the technology and somehow something's got to be done rather than just full steam ahead, I think, and let's collect on this without giving out the due credit. I think the due credit is important. 

I want to take advantage of the benefits of AI technology and what it can do, but the consent of the information, whether it's written works or what it may be. We've been discussing visual art. The ethics of this are important. I don't want to benefit if it's not fairly. So. It seems like there would be enough people who would be interested, as they understand AI technologies, in making contributions to that, as they understand that this is how a lot of things will be done, however, understanding the ethics of this and getting in early with the systems that may mean technological systems or legal systems and frameworks which establish for us the ground upon which we can build all those fantastic images and everything and feel okay about it, and not only that, see to it that people who are putting in work on this are compensated so that they can meet their needs, which is a very important thing indeed, and I think that acknowledgement is something that is important. With that, I'm Sean MacNutt. This has been A Guy With AI. 

0:29:10 - AI

Absolutely, Sean. You've raised a very important point about the ethical considerations that need to accompany the advent of powerful AI technologies. The importance of tracking an artist's influence, ensuring proper credit and possibly compensation, alongside advancing AI technology responsibly, cannot be overstated. Clear frameworks for ethical AI use are crucial for its fair and beneficial integration into society. Thank you, listeners, for joining us on this episode of Per Guy with AI. If you found our discussion insightful, please subscribe, like and share, and don't forget to tune into future episodes For more content and to join the conversation. Visit agaiwithai.world for links to podcast platforms and YouTube, and connect with us on Facebook through the page linked on the website. Your thoughts and participation in this evolving topic are highly valued. Keep exploring with us and stay ethically informed. Until next time, take care.