Unleashing the Power of Jupyter AI in Data Science

Introduction

Generative AI has emerged as a leading force in recent artificial – intelligence advancements. It has permeated various sectors like technology, healthcare, finance, and entertainment, and is continuously revolutionizing the way we work. It empowers us to create high – quality content and execute complex tasks within minutes.

Now, envision a world where you can leverage simple text prompts to harness the capabilities of generative AI, enabling you to write top – notch code or analyze intricate data right from a Jupyter Notebook. This is the world of Jupyter AI, which seamlessly integrates state – of – the – art generative AI models into your notebooks. It allows you to perform complex tasks effortlessly, enhancing productivity and efficiency.

Learning Objectives

By the end of this article, you will gain a clear understanding of:

  • The distinctions between traditional Jupyter notebooks and Jupyter AI.
  • How to effectively utilize Jupyter AI to execute complex tasks and enhance productivity.
  • Using text prompts to generate code, visualize data, and automate manual tasks in Jupyter AI.
  • Data and privacy concerns when using Jupyter AI.
  • The limitations and drawbacks of using Jupyter AI.

What is Jupyter AI?

Unlike traditional Jupyter notebooks, which demand manual execution of all tasks by the user, Jupyter AI can effortlessly automate tedious and repetitive tasks. It enables users to write high – quality code and analyze data more effectively than ever before, simply by using text prompts. It has access to multiple large language model providers, including OpenAI, Google, Anthropic, and Cohere. The interface is straightforward, user – friendly, and accessible directly from a Jupyter Notebook.

Jupyter AI can be used in two ways. The first is through interacting with an AI chatbot in JupyterLab, and the second is by running the jupyter_ai_magics command in a Jupyter notebook. We will explore both options in this article.

Generate API Keys

To use Jupyter AI with a specific model provider, API keys are required first. There are open – source model options that don’t need an API key, but you must install all configuration files on your system, which may consume additional storage space. Also, inference will be on your CPU, which is slower. Unless dealing with highly confidential data, cloud providers are recommended as they are beginner – friendly and handle complex tasks.

For this tutorial, TogetherAI and Google Gemini will be used. TogetherAI offers seamless integration with major LLM models and fast inference, and new accounts get $25 in free credits, which are sufficient for most use cases.

TogetherAI API key: To generate a TogetherAI API key, create an account on the together.ai platform, sign in, and go to together.ai to view your API keys.

Google API key: To use the Google Gemini model, go to Google Dev, select “Get API key in Google AI Studio”, sign in with your Google account, and in Google AI Studio, click on Get API key to generate it.

Cohere API key: To fine – tune the model to local data, an embedding model is needed. For Cohere’s text embeddings, go to Cohere API, create an account, and go to Trial keys to create your API key.

Install necessary dependencies

Jupyter AI is compatible with systems supporting Python versions 3.8 to 3.11, including Windows, macOS, and Linux. A conda distribution is also required to install necessary packages. If not installed, install conda first.

Create virtual environment: Before starting a project, create a virtual environment to avoid package conflicts. Use the command $ conda create -n jupyter - ai - env python = 3.11 to create a new environment and $ conda activate jupyter - ai - env to activate it.

Install JupyterLab and Jupyter AI: Install JupyterLab and Jupyter AI with $ conda install -c conda - forge jupyter - ai. For some model providers like OpenAI, Google, Anthropic, and NVIDIA, install their langchain dependencies. Also, install pypdf for pdf support and cohere for the embedding model.

Jupyter AI in JupyterLab

On startup, the JupyterLab interface has Jupyternaut, the chatbot on the left. It offers more than basic chat functionality, can learn from local data, and even generate a complete Jupyter notebook from a text prompt. There are two types of models: language model (powers the chat UI) and embedding model (generates vector embeddings of local data). Jupyter AI supports many model providers, and you can select your preferred model, enter API keys, and save changes.

You can perform simple tasks like chatting, generate code, optimize code, learn from local data, generate notebooks from scratch, and export chat history.

Jupyter AI in Jupyter notebooks

If JupyterLab cannot be installed or does not work properly, Jupyter AI can be used in notebooks via JupyterAI magics with the %%ai command. First, install jupyter_ai_magics if not already installed, load it with %load_ext jupyter_ai_magics, and provide API keys as environment variables. You can set aliases for model names and use the %%ai magic command to send text prompts. It can be used for text generation, mathematical equations, HTML tables, language translation, error correction, generating reports, text summarization, and data visualization.

Limitations and Challenges

Jupyter AI has its limitations. It can produce biased responses as LLMs are trained on internet – wide text data. Hallucinations, where the model invents non – existent things, are also a problem. Factual inconsistency is another issue. Additionally, it’s hard to select a reliable model for each task, and unstructured questions can lead to misinterpretation.

When using Jupyter AI, also keep in mind that it sends data to third – party model providers, additional context can increase token count and costs, AI – generated code may have errors, and policies for third – party embedding models should be reviewed before sending sensitive data.

In conclusion, Jupyter AI is a powerful tool that can assist in numerous tasks, freeing us from repetitive work and allowing us to focus on creativity. However, it also has its limitations, and users should be aware of them while leveraging its capabilities.