url rag 做的大概是..
列出url,把所有 url 讀進來,把內容攤平成一維
import os os.environ['USER_AGENT'] = 'myagent' from langchain_community.document_loaders import WebBaseLoader from langchain.text_splitter import RecursiveCharacterTextSplitter # List of URLs to load documents from urls = [ "https://lilianweng.github.io/posts/2023-06-23-agent/", "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/", "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/", ] # Load documents from the URLs docs = [WebBaseLoader(url).load() for url in urls] docs_list = [item for sublist in docs for item in sublist]再把網頁內容分成一小段一小段
# Initialize a text splitter with specified chunk size and overlap text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder( chunk_size=250, chunk_overlap=0 ) # Split the documents into chunks doc_splits = text_splitter.split_documents(docs_list)把這些一小段一小段的句子,轉成 embedding,也就是一個 N 維 tensor。
因為要 run locally,所以用 OllamaEmbeddings 來做:
from langchain_ollama import OllamaEmbeddings embeddings = OllamaEmbeddings( model="llama3", )所有字句轉成 embedding/tensor 後,要放到一個 local 的 database 里,讓𠹌一下 user 問問題的時候,來databasae 找答案。
這邊用 SKLearnVectorStore 這個 database :
from langchain_community.vectorstores import SKLearnVectorStore from langchain_openai import OpenAIEmbeddings # Create embeddings for documents and store them in a vector store vectorstore = SKLearnVectorStore.from_documents( documents=doc_splits, embedding=embeddings, ) retriever = vectorstore.as_retriever(k=4)RAG 的 vectorstore 和 sql 不同的地方是,在 query 時,vecrotstore 給的是最接近 query 的內容,而不是像 sql 一樣,要完全 match 的 data。
url 資料都準備好了,接下來就是 ollama 對接 LLM 的部份。
prompt template. TAG process chain.
from langchain_ollama import ChatOllama from langchain.prompts import PromptTemplate from langchain_core.output_parsers import StrOutputParser # Define the prompt template for the LLM prompt = PromptTemplate( template="""You are an assistant for question-answering tasks. Use the following documents to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise: Question: {question} Documents: {documents} Answer: """, input_variables=["question", "documents"], ) # Initialize the LLM with Llama 3.1 model llm = ChatOllama( model="llama3.1", temperature=0, ) rag_chain = prompt | llm | StrOutputParser()做出 RAG class:
# Define the RAG application class class RAGApplication: def __init__(self, retriever, rag_chain): self.retriever = retriever self.rag_chain = rag_chain def run(self, question): # Retrieve relevant documents documents = self.retriever.invoke(question) # Extract content from retrieved documents doc_texts = "\\n".join([doc.page_content for doc in documents]) # Get the answer from the language model answer = self.rag_chain.invoke({"question": question, "documents": doc_texts}) return answer用這個 RAG class 來測試
# Initialize the RAG application rag_application = RAGApplication(retriever, rag_chain) # Example usage question = "What is prompt engineering" answer = rag_application.run(question) print("Question:", question) print("Answer:", answer)輸出會是..
Question: What is prompt engineering Answer: Prompt engineering is the process of designing and optimizing input prompts for language models, such as chatbots or virtual assistants. According to Lilian Weng's 2023 article "Prompt Engineering", this involves techniques like word transformation, character transformation, and prompt-level obfuscations to improve model performance. The goal is to create effective and efficient prompts that elicit accurate responses from the model.
其他的 ref,用 web UI