Building a Product Discovery API with Google Gemini Vision Pro and FastAPI

Introduction

The capabilities of Generative AI models are expanding rapidly, opening up numerous business opportunities around GenAI. Modern models, like the powerful multi – modal GPT – 4 and Gemini, can generate not only text but also leverage image data to produce information. This ability has great potential in the business world. For example, you can use any image to obtain information directly from the AI without much effort. In this article, we’ll explore using the Gemini Vision Pro multimodal model to extract product information from an image and then creating a FastAPI – based REST API to access this extracted information. Let’s start building a product discovery API.

Learning Objective

Understand what REST architecture is, how to use REST APIs to access web data, how to develop REST APIs with FastAPI and Pydantic, the steps to build APIs using Google Gemini Vision Pro, and how to use the Llamaindex library to access Google Gemini Models.

What is a REST API?

A REST API, or RESTful API, is an application programming interface that follows the design principles of the Representational State Transfer architecture. It helps developers integrate application components in a microservices architecture. An API is a means for an application or service to access resources within another service or application. Consider a restaurant analogy: A restaurant owner has two services in operation – the kitchen, where food is prepared (like a server producing data), and the seating area where customers eat. Customers (clients) check the menu (API) and place orders (requests) to the kitchen (server) using specific codes (HTTP methods) such as “GET”, “POST”, “PUT”, or “DELETE”. The “GET” method is like browsing the menu before ordering, “POST” is for placing an order (the server starts creating data), “PUT” is for updating an existing order (updating data), and “DELETE” is for canceling an order (deleting data).

What is the FastAPI framework?

FastAPI is a high – performance modern web framework for API development. It is built on Starlette for web parts and Pydantic for data validation and serialization. Key features include high performance due to its ASGI – based asynchronous programming, which can handle high – concurrency scenarios efficiently. It also uses Pydantic for data validation and provides automatic API documentation with Swagger UI and full OpenAPI standard JSON data. Additionally, it allows easy integration with other Python libraries and frameworks.

What is Llamaindex?

LLamaindex serves as a bridge between your data and LLMs. LLMs can be local (using Ollama) or accessed via API services like OpenAI, Gemini, etc. LLamaindex can build various LLM – based systems such as Q&A, chat processes, and intelligent agents. It enables Retrieval Augmented Generation in three steps: Knowledge Base (Input), Trigger/Query (Input), and Task/Action (Output). In this article, we’ll focus on the second and third steps, using an image as input to retrieve product information from the image.

Setup project environment

We’ll use conda to set up the project environment. First, create a conda environment named “api – dev” with Python 3.11. Then activate the environment. Next, install the necessary libraries including llamaindex, Google generative AI libraries, and FastAPI. Also, obtain the Gemini API KEY from Google AI Studio and keep it safe as we’ll need it later.

Implementing REST API

Create a project folder named “gemini_productAPI”. To check if FastAPI is installed correctly, create a “main.py” file with a simple “Hello World” endpoint. FastAPI is an ASGI framework, and we’ll use Uvicorn, an ASGI web server implementation for Python, to run the application. After confirming the basic setup, import the required libraries, create a “.env” file to store the Google Gemini API Key, instantiate the FastAPI class, and load the API key. Then create a simple landing page using a GET method and the Jinja template for rendering HTML. Link the template and static directories to the FastAPI application.

Implementing an Information Extraction Function

Create a function named “gemini_extractor” to use the Google Gemini Pro Vision model for extracting product information from an image. This function uses Llamaindex’s GeminiMultiModal API. We also engineer a prompt to instruct the model to extract specific product – related information such as name, color, category, and description. To handle potential undesired responses from the generative AI model, we use Pydantic to define data models for products, extracted product responses, and image requests.

Creating Extracted Product API endpoint with POST method

Create a POST endpoint “/extracted_products” in the FastAPI application. This endpoint will take an image request, call the “gemini_extractor” function to extract product information, and store the response in a list. You can also add database logic to store the responses more permanently, with MongoDB being a good choice for storing JSON – formatted data.

Requesting an image from OPENAPI doc

Access the OpenAPI docs at http://127.0.0.1:8000/docs in your browser. Expand the “/extracted_product” section, click “Try It Out”, and then “Execute” to extract product information from an image using the Gemini Vision Pro model.

Creating a product endpoint with a GET method for fetching the data

Create a GET endpoint “/api/products/” to fetch the extracted data stored in the list (or database). Others can use this JSON data for various purposes, such as building e – commerce sites.

Conclusion

This is a simple way to access and use the Gemini Multimodal Model to create a basic product discovery API. You can build on this to create a more robust product discovery system. Such applications have significant business potential, like an Android app that uses the camera API and Gemini API to extract product information for direct product purchases.

Frequently Asked Questions

Q1: How to use Llamaindex for different models? A: Llamaindex has default OpenAI access, but for other models, install model – specific libraries using PIP. Q2: How to use frontend frameworks such as NextJS, Vite, and React with FastAPI? A: Create separate frontend and backend directories in the FastAPI root and link them. Q3: Which Database will be good for storing responses? A: Document databases like MongoDB are suitable for storing JSON – formatted responses.