Amazon Bedrock Part 1 : RAG with Pinecone Vector Store, Anthropic Claude LLM and LangChain— Income Tax FAQ Q&A bot

Diptiman Raichaudhuri
8 min readOct 2, 2023

--

Disclosure: All opinions expressed in this article are my own, and represent no one but myself and not those of my current or any previous employers.

This is Part 1 of my Amazon Bedrock series.

Amazon Bedrock went GA a few days ago and it gave me the opportunity to quickly test out Bedrock APIs and Claude v2, a powerful LLM from Anthropic.

I could not wait to try out a RAG(Retrieval Augmented Generation) experiment.

My test case was simple,

  1. I took the IT Returns (ITR 1 and ITR 2) FAQ pdf s from the income tax of India website.
  2. Loaded pdf s and created chunks of text using Bedrock embeddings.
  3. Stored these embeddings in Pinecone.
  4. Then I used Claude with Amazon Bedrock and ran some LangChain QA chain based invocations.

I created a SageMaker Studio jupyter notebook to run my code. If you are using SageMaker studio for the first time, you need to create a SageMaker domain, by clicking on Create Domain link on the Domains left-navigation menu link. I chose Quick Setup , which is ideal for a quick experiment:

SageMaker Domain

Took ~2–3 mins to get my domain created, and then I clicked on my domain and launched the SageMaker Studio console :

SageMaker Studio Launch

I got a fresh new notebook on my SageMaker Studio launcehd with the DataScience 3.0 kernel on a ml.t3.medium SageMaker managed notebook instance :

Now, let’s setup Amazon Bedrock. We’ll come back to the notebook once we get all Bedrock model access properly setup.

Go to the Bedrock service home page from AWS console and the following landing page will show up :

Click on Model Access link on the left navigation menu.

It will show the list of models available with Bedrock, as of today.

Click on “Edit” and select all options (For Claude, you will have to fill up a a form with the reason for usage etc ..) and click “Save Changes” button.

I got my access to these models almost immediately, as shown below :

Amazon Bedrock Model Access

Once, these models are granted access, you will also receive emails on the account email, with the pricing terms, product description etc .. below image for Claude v2:

Bedrock Model Marketplace Offer

Similarly, for the other models, like Cohere, AI21 Labs Jurassic, Stable Diffusion etc .. you would receive emails form AWS Marketplace :

Bedrock Model MArketplace Offer

Another important thing to notice is to click the Base Models link on the left-nav menu :

Bedrock Base Models

And then click on Claude v2, to note the model API request pattern :

Bedrock Claude API Request

The modelId , contentType, accept and prompt patterns will be needed once we start writing the code with Bedrock.

With Bedrock setup completed, let’s complete the Pinecone registration.

For this example, I would store the word embeddings from the PDFs in Pinecone vector DB. For, that, I need to register at Pinecone website.

I logged on to pinecone.io and created my profile, and got the following authentication information :

PINECONE_API_KEY and PINECONE_API_ENV :

Pinecone API Key

I also granted Bedrock invocation privilege to my SageMaker domain IAM role :

SageMaker Domain
SageMaker Execution Role
SageMaker exxecution roles

For my quick experiment, I granted all privilege to my SageMaker role, ideally, in a production setup you should only provide an Allow on ”Action”: “bedrock:*” and add a trust relationship on “bedrock.amazonaws.com”. It is well documented here in the section Add Amazon Bedrock permissions to the IAM role for this SageMaker notebook.

With Bedrock and Pinecone setup done, let us go back to our notebook in SageMaker Studio to start writing the RAG code.

All good to go now, I took the wonderful Bedrock workshop notebooks form github, here and modified the RAG with pinecone one with Claude v2 as the LLM and modified chunk sizes etc ..

Let’s go back to the notebook that we launched and install dependencies.

I have attached the completed notebook at the end of this article.

%pip install boto3==1.28.57 botocore==1.31.57 langchain==0.0.305 

This installs/updates boto3, botocore and LangChain to use Bedrock. Bedrock was launched pretty recently, so, you got to upgrade these dependencies to the latest version.

You could also run the following :

%pip install --upgrade boto3 botocore langchain

Which will install the latest versions of these dependencies.

Now, the next set :

%pip install faiss-cpu pypdf pinecone-client apache-beam datasets tiktoken

The above installs all remaining dependencies, like, pinecone client tiktoken etc ..

Then we import dependencies :

import boto3
import json
import os

from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock
from langchain.vectorstores import Chroma, Pinecone
import pinecone
from tqdm.autonotebook import tqdm
from langchain.embeddings.openai import OpenAIEmbeddings
import numpy as np

Then we create the Bedrock runtime :

bedrock_runtime = boto3.client(
service_name = "bedrock-runtime",
region_name = "us-east-1"
)

Point to note, the above code creates a Bedrock runtime reference to call invoke_model etc …. For any control-plane operation, we need to instantiate a bedrock instance like below :

bedrock = boto3.client(
service_name = "bedrock",
region_name = "us-east-1"
)

With a bedrock reference, we could invoke control-plane operations like bedrock.list_foundation_models()which would return a list of models where I have access.

For the rest of the notebook, I will use the bedrock_runtime reference.

Next, I create the modelId and other relevant pointers, which I copied from the Base Models lef-nav menu of the Bedrock page:

modelId = 'anthropic.claude-v2'
accept = 'application/json'
contentType = 'application/json'

Then I created a data directory to download the ITR-1 and ITR-2 FAQ PDFs.

from urllib.request import urlretrieve

os.makedirs("data", exist_ok=True)
files = [
"https://incometaxindia.gov.in/Supporting%20Files/ITR2021/Instructions_ITR1_AY2021_22.pdf",
"https://incometaxindia.gov.in/Supporting%20Files/ITR2021/Instructions_ITR2_AY2021_22.pdf"
]
for url in files:
file_path = os.path.join("data", url.rpartition("/")[2])
urlretrieve(url, file_path)

And I confirm that the files were downloaded in the data directory :

KnowledgeBase File Download

These 2 files would be my KnowledgeBase source, which I am going to load-extract-chunk and then store in pinecone.

Next task is to load these 2 PDFs and use the RecursiveCharacterTextSplitter to create chunks from these files :

import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("./data/")

documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show.
chunk_size=2000,
chunk_overlap=0,
)
docs = text_splitter.split_documents(documents)

Now, that my documents are chunked and ready, let’s setup pinecone:

Login to the pinecone.io website, and from the API Keys link from the left nav menu, copy the API Key and the Environment :

Pinecone API Key

Create 2 environment variables, or, for PROD workloads use python-dotenv .env secrets :

os.environ["PINECONE_API_KEY"] = "<YOUR_PINECONE_API_KEY>"
os.environ["PINECONE_API_ENV"] = "YOUR_PINECONE_ENV"

Let’s initialize the pinecone client and create a pinecone index :

pinecone.init(
api_key = os.environ.get('PINECONE_API_KEY'),
environment = os.environ.get('PINECONE_API_ENV')
)

index_name = "itrsearchdx"

Next, we create a Bedrock LLM reference using the Claude v2 modelId created earlier. Also, I created a BedrockEmbeddings reference :

llm = Bedrock(
model_id=modelId,
client=bedrock_runtime
)
bedrock_embeddings = BedrockEmbeddings(client=bedrock_runtime)

The bedrock_embeddings will be used for Pinecone, while the llm would be used later for creating the LangChain Q&A Agent.

Next I create the Pinecone index through the Pinecone client reference :

if index_name in pinecone.list_indexes():
pinecone.delete_index(index_name)

pinecone.create_index(name=index_name, dimension=1536, metric="dotproduct")
# wait for index to finish initialization
while not pinecone.describe_index(index_name).status["ready"]:
time.sleep(1)

I confirmed that the Pinecone index got created alright :

Pinecone Index

All done now ! Let’s load the Pinecone index now :

docsearch = Pinecone.from_texts(
[t.page_content for t in docs],
bedrock_embeddings,
index_name = index_name
)

Let’s check the Pinecone console :

Pinecone Index 1
Pinecone Index 2

Perfect ! We have a fully loaded Pinecone index storing word embeddings from content extracted from the ITR-1 and ITR-2 FAQ PDFs !

Pinecone as a Knowledgebase of Bedrock is in preview, and we will soon be able to create a managed Pinecone knowledgebase soon, once, this feature goes GA. Read here.

We also have the llm reference ready with a Claude v2 foundation model for the LangChain Agent to start querying.

Let’s prepare the LangChain Agent :

from langchain.chains.question_answering import load_qa_chain

chain = load_qa_chain(llm, chain_type = "stuff")

And start querying :

query = "Who is eligible to use this return form?"
docs = docsearch.similarity_search(query)
chain.run(input_documents = docs, question = query)

And Claude v2 responds in style :

" Based on the context provided, ITR-1 (AY 2021-22) is only for resident 
individuals having income up to Rs 50 lakh and who do not have income
from business/profession.
ITR-2 (AY 2021-22) does not have any
eligibility criteria specified in the provided instructions,
so I don't have enough information to determine who is eligible to
use that form. The context does not specify who can use ITR-2."

Power of information-retrieval using vector search and LLM Agents, at its best !

I run another query :

query = "What is 80TTA?"
docs = docsearch.similarity_search(query)
chain.run(input_documents = docs, question = query)

And a perfect response again :

' 80TTA is a deduction that can be claimed in respect of interest income 
from savings accounts. The key points about deduction under
section 80TTA are:\n\n- It is available only to individuals and HUFs.
Non-individuals like companies, firms, etc cannot claim this deduction.
\n\n- The maximum deduction available under 80TTA is Rs 10,000 in a
financial year.\n\n- It can be claimed only against interest earned
from savings accounts with banks, cooperative banks and post offices.
Interest income from fixed deposits, NSCs, etc is not eligible for
deduction under 80TTA.\n\n- Senior citizens cannot claim deduction
under 80TTA. \n\nSo in summary, 80TTA provides a deduction on interest
income earned from savings accounts up to Rs 10,000 per year.
It is available only for individuals/HUFs and not senior citizens.'

Here’s the notebook attached in Github.

Do checkout Amazon Bedrock user guide, here

And the set of sample notebook to experiment, here.

And follow Mike Chambers on Youtube.

Next, I’ll build a Streamlit / Chainlit app to integrate with the RAG backend, followed by a quantized LLM deployment on SageMaker.

So long and Happy Coding ! !

--

--

Responses (1)