2024/11/24

Simple URL RAG with ollama locally

這篇就是翻譯下面這個 link,source code 也是下面這個 link 的 code: 文章中,embedding 用了 openai,為了作到完全 local,改用 OllamaEmbeddings

url rag 做的大概是..
列出url,把所有 url 讀進來,把內容攤平成一維
import os
os.environ['USER_AGENT'] = 'myagent'

from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
# List of URLs to load documents from
urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]
# Load documents from the URLs
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]
再把網頁內容分成一小段一小段
# Initialize a text splitter with specified chunk size and overlap
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250, chunk_overlap=0
)
# Split the documents into chunks
doc_splits = text_splitter.split_documents(docs_list)
把這些一小段一小段的句子,轉成 embedding,也就是一個 N 維 tensor。
因為要 run locally,所以用 OllamaEmbeddings 來做:
from langchain_ollama import OllamaEmbeddings

embeddings = OllamaEmbeddings(
    model="llama3",
)
所有字句轉成 embedding/tensor 後,要放到一個 local 的 database 里,讓𠹌一下 user 問問題的時候,來databasae 找答案。
這邊用 SKLearnVectorStore 這個 database :
from langchain_community.vectorstores import SKLearnVectorStore
from langchain_openai import OpenAIEmbeddings
# Create embeddings for documents and store them in a vector store
vectorstore = SKLearnVectorStore.from_documents(
    documents=doc_splits,
    embedding=embeddings,
)
retriever = vectorstore.as_retriever(k=4)
RAG 的 vectorstore 和 sql 不同的地方是,在 query 時,vecrotstore 給的是最接近 query 的內容,而不是像 sql 一樣,要完全 match 的 data。

url 資料都準備好了,接下來就是 ollama 對接 LLM 的部份。
prompt template. TAG process chain.
from langchain_ollama import ChatOllama
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Define the prompt template for the LLM
prompt = PromptTemplate(
    template="""You are an assistant for question-answering tasks.
    Use the following documents to answer the question.
    If you don't know the answer, just say that you don't know.
    Use three sentences maximum and keep the answer concise:
    Question: {question}
    Documents: {documents}
    Answer:
    """,
    input_variables=["question", "documents"],
)

# Initialize the LLM with Llama 3.1 model
llm = ChatOllama(
    model="llama3.1",
    temperature=0,
)

rag_chain = prompt | llm | StrOutputParser()
做出 RAG class:
# Define the RAG application class
class RAGApplication:
    def __init__(self, retriever, rag_chain):
        self.retriever = retriever
        self.rag_chain = rag_chain
    def run(self, question):
        # Retrieve relevant documents
        documents = self.retriever.invoke(question)
        # Extract content from retrieved documents
        doc_texts = "\\n".join([doc.page_content for doc in documents])
        # Get the answer from the language model
        answer = self.rag_chain.invoke({"question": question, "documents": doc_texts})
        return answer
用這個 RAG class 來測試
# Initialize the RAG application
rag_application = RAGApplication(retriever, rag_chain)
# Example usage
question = "What is prompt engineering"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)
輸出會是..
Question: What is prompt engineering
Answer: Prompt engineering is the process of designing and optimizing input prompts for language models, such as chatbots or virtual
assistants. According to Lilian Weng's 2023 article "Prompt Engineering", this involves techniques like word transformation, character 
transformation, and prompt-level obfuscations to improve model performance. The goal is to create effective and efficient prompts that 
elicit accurate responses from the model.


其他的 ref,用 web UI

沒有留言:

張貼留言