G-gen 又吉です。当記事では、LangChain と Vertex AI Search で Google Cloud 公式ドキュメントから回答を生成する LLM を構築してみたいと思います。

はじめに

Vertex AI Search and Conversation とは

Vertex AI Search and Conversation とは、Google の生成 AI 技術を簡単に利用するためのフルマネージドの Google Cloud プロダクトであり、インフラの構築・運用を意識する必要がなく、API 経由で自社の Web サイトやアプリから呼び出して利用します。

今回は Vertex AI Search and Conversation の「Vertex AI Search」を使って、Google Cloud の公式ドキュメント (Web サイト) を検索対象として検索を実装します。

Vertex AI Search and Conversation の詳細については、以下の記事をご参照ください。

blog.g-gen.co.jp

LangChain とは

LangChain とは、LLM を用いてアプリケーションを効率よく開発するためのフレームワークです。

今回は、LangChain を用いてテキスト分割を行ったり、ベクトルデータベースの構築とベクトル検索の実装などを行っていきます。

LangChain の詳細については、以下の記事をご参照ください。

blog.g-gen.co.jp

今回の構成

Web サイト検索結果の要約を行いたい場合、Vertex AI Search の「Web サイト検索の検索要約」という機能を使うことで Web サイト検索結果の要約が可能となります。

しかし、この機能を利用するためには Advanced website indexing というオプションを有効にする必要があり、この Advanced website indexing の有効には、その Web サイトのドメインに対するドメイン所有権を確認する必要があります。

つまり、自社の管理するドメインでサイトを運営していればいいが、Google Cloud 公式ドキュメント (https://cloud.google.com/*) などの外部 Web サイトでは通常このオプションを有効にできないため、Web サイト検索の検索要約機能を使うことができません。

そこで今回は、以下の構成で Google Cloud 公式ドキュメントを要約するアーキテクチャを実装してみたいと思います。

ユーザーの質問に対し類似する公式ドキュメントを上位 3 件抽出
1000 文字以下の Chank に分割
ユーザーの質問に対し類似する Chank を上位 3 件抽出
LLM で要約回答を生成

このアーキテクチャのポイントは、以下 2 点です。

LLM で要約を求める際の入力トークン制限に引っかからないように、最大 3,000 文字 (3 つの Chaink の最大文字数) の入力とした
分割した Chank に対しても再度ベクトル検索を行い、ユーザーの質問と類似しているテキストを部分的に抽出する

環境構築

API の有効化

対象プロジェクトで以下の API を有効化します。

discoveryengine.googleapis.com
aiplatform.googleapis.com
dataform.googleapis.com
compute.googleapis.com

Vertex AI Search

データストア

Web コンソールで [Vertex AI Search and Convasation] > [データストア] > [新しいデータストアを作成] から、以下のパラメータを設定してデータストアを作成します。

ソース
- [ウェブサイトの URL]を選択
データ
- [ウェブサイトの高度なインデックス登録]を "オフ"
- [インデックス登録する URL を指定] で cloud.google.com/* を入力
構成
- [データストア名] に任意の名前を入力し "作成" を押下

アプリ

Web コンソールで [Vertex AI Search and Convasation] > [アプリ] > [新しいアプリを作成] から、以下のパラメータを設定してアプリを作成します。

種類
- [検索] を選択
構成
- [Enterprise エディションの機能] を "オン"
- [アプリ名] は任意のアプリ名
- [会社名] は任意の会社名
- [アプリのロケーション] は "global"
データ
- ${先ほど作成したデータストア} を選択して "作成" を押下

実行環境

当記事では、Colab Enterprise の Notebook 上で Python を実行します。Colab Enterprise は、マネージドな Notebook のためインフラストラクチャを管理せず実装に注力できます。

Colab Enterprise の Notebook 作成方法は公式ドキュメントのクイックスタートをご参考下さい。

cloud.google.com

初期設定

ライブラリインストール

Notebook が立ち上がり、ランタイムと接続できましたら以下のコードを実行してライブラリのインストールを行います。

# input:[1]
!pip install google-cloud-discoveryengine langchain faiss-cpu

ライブラリのインストールができたら、ランタイムを再起動してから以下を実行します。

# input:[2]
import time
from typing import List
  
import vertexai
from google.cloud import aiplatform
from google.cloud.discoveryengine import SearchServiceClient, SearchRequest
from google.protobuf.json_format import MessageToDict
from pydantic import BaseModel
from langchain.llms import VertexAI
from langchain.embeddings import VertexAIEmbeddings
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import WebBaseLoader
from langchain.vectorstores import FAISS

ユーティリティ関数

Vertex AI Embedding API for Text で使用するユーティリティ関数を定義します。

# input: [3]
# レート制限
def rate_limit(max_per_minute):
    period = 60 / max_per_minute
    print("Waiting")
    while True:
        before = time.time()
        yield
        after = time.time()
        elapsed = after - before
        sleep_time = max(0, period - elapsed)
        if sleep_time > 0:
            print(".", end="")
            time.sleep(sleep_time)
  
  
class CustomVertexAIEmbeddings(VertexAIEmbeddings, BaseModel):
    requests_per_minute: int
    num_instances_per_batch: int
  
    # Overriding embed_documents method
    def embed_documents(self, texts: List[str]):
        limiter = rate_limit(self.requests_per_minute)
        results = []
        docs = list(texts)
  
        while docs:
            # Working in batches because the API accepts maximum 5
            # documents per request to get embeddings
            head, docs = (
                docs[: self.num_instances_per_batch],
                docs[self.num_instances_per_batch :],
            )
            chunk = self.client.get_embeddings(head)
            results.extend(chunk)
            next(limiter)
  
        return [r.values for r in results]

モデルの初期化

モデルの初期化を行います。

# input: [4]
# LLM model
llm = VertexAI(
    model_name="text-bison@001",
    max_output_tokens=1024,
    temperature=0.1,
    top_p=0.8,
    top_k=40,
    verbose=True,
)
  
# Embedding model
EMBEDDING_QPM = 100
EMBEDDING_NUM_BATCH = 5
embeddings = CustomVertexAIEmbeddings(
    requests_per_minute=EMBEDDING_QPM,
    num_instances_per_batch=EMBEDDING_NUM_BATCH,
)

Vertex AI Search 関数

プロジェクト ID と Vertex AI Search のデータストア ID を置き換えて以下を実行してください。

# input: [5]
PROJECT_ID = ${プロジェクト ID}
DATA_STORE_WEB   = ${データストア ID}
  
discov_client    = SearchServiceClient()
  
# Enterprise Search（Website）で各質問内容に関連する Web ページを検索
def search_questions_web(question: str) -> dict:

    # 重要なキーワードを抽出
    template = f"次の文から主要なキーワードやフレーズのみを抜き出してください: {question}"
    keyword = llm(template)

    serving_config = discov_client.serving_config_path(
        project=PROJECT_ID,
        location='global',
        data_store=DATA_STORE_WEB,
        serving_config='default_config'
    )

    # 検索の実行
    results = discov_client.search(
        SearchRequest(
            serving_config=serving_config,
            query=keyword,
            page_size=3
        )
    )

    documents = []
    # Web 検索の結果が 0 件だった場合の処理
    if not results.results:
        print(f"0 search results")
        return
    else:
        for r in results.results:
            document_info = {}
            r_dct = MessageToDict(r._pb)
            document_info['title'] = r_dct['document']['derivedStructData']['title']
            document_info['link'] = r_dct['document']['derivedStructData']['link']
            documents.append(document_info)

    responses = {
        "question": question,
        "keyword" : keyword,
        "answer": documents
    }

    return responses

今回は、Vertex AI Search のクエリに、ユーザーの質問 (question) を直接入力させず、重要なキーワードのみを検索対象としました。

「重要なキーワードのみを検索対象とする」とはどのようなことかというと、以下に例と実行結果を記載します。

[原文] 「BigQuery で行レベルセキュリティを実装する方法を教えてください」の検索結果は以下のとおりです。

よくある質問 | Datastream | Google Cloud

料金 | BigQuery: クラウドデータウェアハウス | Google Cloud

リソースへのラベルの追加 | BigQuery | Google Cloud

[重要なキーワードのみ] 「BigQuery, 行レベルセキュリティ, 実装」の検索結果は以下のとおりです。

BigQuery の行レベルのセキュリティの概要 | Google Cloud

BigQuery の行レベルのセキュリティにより、データへのアクセスの ...

BigQuery の行レベルのセキュリティに関するベストプラクティス ...

重要なキーワードのみで検索した方が欲しい検索結果を得られているため、今回はこちらの方法で実装します。

参考：Python Client for Discovery Engine API

テキスト分割関数

# input: [6]
def text_split_func(search_res: dict) -> list:
    text_li = []
  
    for answer in search_res["answer"]:
        url = answer["link"]
        # web ページからテキストを取得
        loader = WebBaseLoader(url)
        documents = loader.load()
  
        # テキスト分割してリストに格納
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
        text_li.extend(text_splitter.split_documents(documents))
  
    return text_li

RecursiveCharacterTextSplitter を用いて、Vertex AI Search の検索結果で得られた Google Cloud 公式ドキュメントの本文を、1,000 文字 (オーバーラップ 200 文字) ずつの Chank (テキスト) に分割を行います。

RAG を実行する関数

# input: [7]
def rag(question: str, text_li: list) -> list:
    # 分割したテキストをエンベディングしてベクトルデータベースに保存
    db = FAISS.from_documents(text_li, VertexAIEmbeddings())

    # question に対し類似したテキストをベクター検索して上位 3 つのテキストを取得
    docs = db.similarity_search(question, 3)

    # 上位 3 つのテキストを結合
    facts = "\n".join([doc.page_content for doc in docs])

    # 次のquestionに対して、factsに基づいてanswerを生成してください。
    final_prompt = f"""
    次の質問については、事実に基づいて回答を作成してください。

    質問: {question}

    事実: {facts}

    回答:
    """

    print("LLM Output:")
    print(llm(final_prompt))
    print("Reference URL:")
    print(search_res['answer'])

ベクトルベータベースには、FAISS を用いて、LangChain で操作します。

from_documents メソッドでテキスト分割したリストを、Vertex AI Embedding API を用いてエンべディングしベクトルデータベースに格納しています。

また、similarity_search メソッドでベクトル検索を実行して、ユーザーの質問と類似しているテキストを抽出します。

最後に、上記で抽出したテキストをもとに LLM に回答を生成させることで、RAG を実現しています。

参考：langchain.vectorstores.faiss.FAISS

実行

それでは、以下を実行して動作確認してみます。

# input: [8]
question = "BigQuery で行レベルセキュリティを実装する方法を教えてください"
search_res = search_questions_web(question)
text_li =  text_split_func(search_res)
rag(question, text_li)

# output: [8]
LLM Output:
BigQuery で行レベルセキュリティを実装するには、次の手順に従います。

1. ターゲット BigQuery テーブルに行レベルのアクセス ポリシーを作成します。
2. ポリシーで、ユーザーまたはグループを許可リストに追加します。
3. 許可リストに含まれていないユーザーまたはグループは、テーブルへのアクセスを拒否されます。
Reference URL:
[{'title': 'BigQuery の行レベルのセキュリティの概要 | Google Cloud', 'link': 'https://cloud.google.com/bigquery/docs/row-level-security-intro?hl=ja'}, {'title': 'BigQuery の行レベルのセキュリティにより、データへのアクセスの ...', 'link': 'https://cloud.google.com/blog/ja/products/data-analytics/bigquery-provides-tighter-controls-over-data-access'}, {'title': 'BigQuery の行レベルのセキュリティに関するベスト プラクティス ...', 'link': 'https://cloud.google.com/bigquery/docs/best-practices-row-level-security?hl=ja'}]

LLM Output には、最終的な LLM の要約が含まれております。

Reference URL には、Vertex AI Search で取得した検索結果 (上位 3 件) が取得されています。

別の質問でもいくつか実行してみます。

# input: [9]
question = "Cloud Run と Cloud Functions の使い分けについて教えてください。"
search_res = search_questions_web(question)
text_li =  text_split_func(search_res)
rag(question, text_li)

# output: [9]
LLM Output:
Cloud Functions は、プラットフォームの接続に適しています。Cloud Run は、サービスの実行に適しています。
Reference URL:
[{'title': 'Cloud run vs cloud functions for serverless | Google Cloud Blog', 'link': 'https://cloud.google.com/blog/products/serverless/cloud-run-vs-cloud-functions-for-serverless'}, {'title': 'Cloud Functions と Cloud Run: それぞれの使いどころ | Google ...', 'link': 'https://cloud.google.com/blog/ja/products/serverless/cloud-run-vs-cloud-functions-for-serverless'}, {'title': 'Cloud Run: サーバーレス コンテナの話 | Google Cloud 公式ブログ', 'link': 'https://cloud.google.com/blog/ja/topics/developers-practitioners/cloud-run-story-serverless-containers'}]

# input: [10]
question = """
BigQuery で料金を制限したいです。いい方法を教えてください。
"""
search_res = search_questions_web(question)
text_li =  text_split_func(search_res)
rag(question, text_li)

# output: [10]
LLM Output:
BigQuery の料金は、クエリの実行時間、使用したデータ量、保存したデータ量に基づいて計算されます。
  
料金を制限するには、次の方法があります。
  
* クエリをより効率的に実行する。
* 使用したデータ量を減らす。
* 保存したデータ量を減らす。
  
クエリをより効率的に実行するには、次の方法があります。
  
* クエリを最適化する。
* クエリを分割する。
* クエリを並列実行する。
  
使用したデータ量を減らすには、次の方法があります。
  
* 必要なデータのみをクエリする。
* クエリ結果をキャッシュする。
* クエリ結果を圧縮する。
  
保存したデータ量を減らすには、次の方法があります。
  
* 不要なデータを削除する。
* データを圧縮する。
* データをアーカイブする。
  
これらの方法を組み合わせることで、BigQuery の料金を大幅に削減することができます。
Reference URL:
[{'title': 'コストの見積りと管理 | BigQuery | Google Cloud', 'link': 'https://cloud.google.com/bigquery/docs/best-practices-costs?hl=ja'}, {'title': '料金 | BigQuery: クラウド データ ウェアハウス | Google Cloud', 'link': 'https://cloud.google.com/bigquery/pricing?hl=ja'}, {'title': '割り当てと上限 | BigQuery | Google Cloud', 'link': 'https://cloud.google.com/bigquery/quotas?hl=ja'}]