Introduction
We are living in an era where artificial intelligence (AI) is rapidly evolving, and every day, the world around us becomes smarter. State – of – the – art large language models (LLMs) and AI agents can perform complex tasks with minimal human involvement. However, with such advanced technology, the need to develop and deploy it responsibly has become crucial. This article is based on Bhaskarjit Sarmah’s workshop at the DataHack Summit 2024 and focuses on building responsible AI, especially generative AI (GenAI) models. We will also look into the guidelines of the National Institute of Standards and Technology’s (NIST) Risk Management Framework for responsible AI development and deployment.
What is Responsible AI?
Responsible AI is about designing, developing, and deploying AI systems while prioritizing ethical considerations, fairness, transparency, and accountability. It addresses issues like bias, privacy, and security to avoid any negative impacts on users and communities. The goal is to ensure that AI technologies are in line with human values and societal needs. Building responsible AI is a multi – step process that involves implementing data usage, algorithm design, and decision – making guidelines, taking inputs from diverse stakeholders to fight biases, and continuously monitoring AI systems for unintended consequences.
Why is Responsible AI Important?
LLMs are trained on large datasets from the internet, which may contain copyrighted, confidential, and personally identifiable information (PII). As a result, generative AI models may use this information in illegal or harmful ways. There is also a risk of people tricking GenAI models into revealing PII. Additionally, as more tasks are automated by AI, concerns about bias, confidence, and transparency of AI – generated responses are increasing. For example, in sentiment analysis, if the training data of a GenAI model is biased, it will produce biased outputs, which is a major concern, especially in decision – making models.
The 7 Pillars of Responsible AI
In October 2023, US President Biden released an executive order stating that AI applications must be used safely, securely, and trustworthily. Following this, NIST has set strict standards for AI developers. The 7 pillars of responsible AI in the NIST Risk Management Framework are uncertainty, safety, security, accountability, transparency, fairness, and privacy.
Fixing the Uncertainty in AI – generated Content
AI models, including GenAI, are not always accurate and may produce hallucinated outputs. One way to address this is by introducing hallucination or confidence scores for each response. There are three ways to calculate a model’s confidence score: conformal prediction, entropy – based method, and Bayesian method.
Ensuring the Safety of AI – generated Responses
The safety of using AI models is a concern as LLMs may generate toxic, hateful, or biased responses due to the content in their training datasets. This can be fixed by introducing a safety score for AI – generated content.
Enhancing the Security of GenAI Models
Jailbreaking and prompt injection are threats to the security of LLMs. Hackers can trick models into revealing restricted information. To address this, a prompt injection safety score can be introduced during the development phase to identify potential security issues.
Increasing the Accountability of GenAI Models
AI developers must be responsible for copyrighted content regenerated by their models. For open – source models, more clarity is needed on who takes responsibility. NIST recommends that developers provide explanations for the content their models produce.
Ensuring the Transparency of AI – generated Responses
Different LLMs give different responses to the same prompt, raising questions about how they derive their responses. NIST urges AI companies to use mechanistic interpretability to explain LLM outputs, and interpretability can be measured using the SHAP test.
Incorporating Fairness in GenAI Models
LLMs can be biased as they are trained on human – created data. Biased AI decisions can be a big problem in tasks like hiring and loan processing. The solution is to ensure non – biased training data and implement fairness protocols.
Safeguarding Privacy in AI – generated Responses
AI – generated responses may contain PII, which is a privacy risk. Developers must protect user data by training LLMs to identify and not respond to prompts for such information.
What is Hallucination in GenAI Models?
Hallucination in GenAI models is when they create new, non – existent information that doesn’t match the user input and may contradict previous outputs or known facts.
How to Detect Hallucination in GenAI Models?
The most common method is to calculate the hallucination score using LLM – as – a – Judge, which involves comparing the model’s response with others generated by a judge LLM. Other methods include chain – of – knowledge, chain of NLI, context adherence, correctness, and uncertainty.
Building a Responsible AI
A responsible AI model should first check the prompt for toxicity, PII, jailbreaking attempts, and off – topic detections. If the prompt is safe, it should then check the generated response for interpretability, hallucination, confidence, fairness, and toxicity scores, and ensure no data leakages.
Conclusion
As AI becomes more integrated into our lives, building responsible AI is essential. The NIST Risk Management Framework provides important guidelines to address GenAI model challenges. Implementing these principles will make AI systems safe, transparent, and equitable, fostering trust and mitigating risks.