2024/11/24

Simple URL RAG with ollama locally

這篇就是翻譯下面這個 link,source code 也是下面這個 link 的 code: 文章中,embedding 用了 openai,為了作到完全 local,改用 OllamaEmbeddings

url rag 做的大概是..
列出url,把所有 url 讀進來,把內容攤平成一維
import os
os.environ['USER_AGENT'] = 'myagent'

from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
# List of URLs to load documents from
urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]
# Load documents from the URLs
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]
再把網頁內容分成一小段一小段
# Initialize a text splitter with specified chunk size and overlap
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250, chunk_overlap=0
)
# Split the documents into chunks
doc_splits = text_splitter.split_documents(docs_list)
把這些一小段一小段的句子,轉成 embedding,也就是一個 N 維 tensor。
因為要 run locally,所以用 OllamaEmbeddings 來做:
from langchain_ollama import OllamaEmbeddings

embeddings = OllamaEmbeddings(
    model="llama3",
)
所有字句轉成 embedding/tensor 後,要放到一個 local 的 database 里,讓𠹌一下 user 問問題的時候,來databasae 找答案。
這邊用 SKLearnVectorStore 這個 database :
from langchain_community.vectorstores import SKLearnVectorStore
from langchain_openai import OpenAIEmbeddings
# Create embeddings for documents and store them in a vector store
vectorstore = SKLearnVectorStore.from_documents(
    documents=doc_splits,
    embedding=embeddings,
)
retriever = vectorstore.as_retriever(k=4)
RAG 的 vectorstore 和 sql 不同的地方是,在 query 時,vecrotstore 給的是最接近 query 的內容,而不是像 sql 一樣,要完全 match 的 data。

url 資料都準備好了,接下來就是 ollama 對接 LLM 的部份。
prompt template. TAG process chain.
from langchain_ollama import ChatOllama
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Define the prompt template for the LLM
prompt = PromptTemplate(
    template="""You are an assistant for question-answering tasks.
    Use the following documents to answer the question.
    If you don't know the answer, just say that you don't know.
    Use three sentences maximum and keep the answer concise:
    Question: {question}
    Documents: {documents}
    Answer:
    """,
    input_variables=["question", "documents"],
)

# Initialize the LLM with Llama 3.1 model
llm = ChatOllama(
    model="llama3.1",
    temperature=0,
)

rag_chain = prompt | llm | StrOutputParser()
做出 RAG class:
# Define the RAG application class
class RAGApplication:
    def __init__(self, retriever, rag_chain):
        self.retriever = retriever
        self.rag_chain = rag_chain
    def run(self, question):
        # Retrieve relevant documents
        documents = self.retriever.invoke(question)
        # Extract content from retrieved documents
        doc_texts = "\\n".join([doc.page_content for doc in documents])
        # Get the answer from the language model
        answer = self.rag_chain.invoke({"question": question, "documents": doc_texts})
        return answer
用這個 RAG class 來測試
# Initialize the RAG application
rag_application = RAGApplication(retriever, rag_chain)
# Example usage
question = "What is prompt engineering"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)
輸出會是..
Question: What is prompt engineering
Answer: Prompt engineering is the process of designing and optimizing input prompts for language models, such as chatbots or virtual
assistants. According to Lilian Weng's 2023 article "Prompt Engineering", this involves techniques like word transformation, character 
transformation, and prompt-level obfuscations to improve model performance. The goal is to create effective and efficient prompts that 
elicit accurate responses from the model.


其他的 ref,用 web UI

2024/11/20

AI toolkit for VSCode: config to use ollama

這個雖然現在好像還沒有什麼功能,但是好像是唯一不用付錢的 ai assistant.
最新的更新支援 local ollama 了,所以來試試看。

原來 這個 extension 的說明文件都在github : doc: overview
要使用 ollama 的話,首先當然要 setup 好自己的 ollama service (ref:ollama run llama locally)
記得要對 local ip 開放。

ai toolkit 設定部分,在安裝完後,VS Code 左邊 panel 會多一個 item,ai toolkit item,
最上面,My Models 右邊的 "+" 按下去,會出現 Add remote model (1/4),意思是有四個步驟,現在做第一步。
第一步是設定 ollama url,我的就是
http://192.168.145.64:11434/v1/chat/completions
Enter 後,第二步,要 load 的 model name。
這部分,到 ollama sever 上,用 ollama list 列出,挑出要load 的 model name,要全名。
qwen2.5-coder:14b
Enter 後第三步是名子,給這格 model setup 的name,隨便。
最後一步是authetication key,ollama 不需要,所以 Enter w就可以。

2024/11/19

Try Web front end for whisper

先來試試 他是用Grandio 做 whisper 的 web front end
python 用 3.11,然後 clone source ans install requirements
git clone https://github.com/jhj0517/Whisper-WebUI.git
cd Whisper-WeUI
pip install -r requirements.txt
因為我用 conda ,所以把 start-webui.sh的 venc/bin/activate comment 掉。
然後用 public serv:
./start-webui.sh --server_name=0.0.0.0b --inbrowser=false
把影片拖到網頁,選 large-v2,輸出 srt,開始...
看 console output 是開始下載model...所以網頁的 progress bar 沒有動作。
model download 完,開始轉換....
結果出現 Error:
Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}
參考 Downgrade ctranslate2:
pip install ctranslate2==4.4.0
之後,就沒 Error 了。

實際測試各個 model,發現並不是越大就越好。
在轉換長影片(2:30)時,越 1:00 後的文字出現問題,一直重復。

2024/11/7

Stable Diffusion and ControlNet

stable diffusion 用 webui 他會自己Download and install,雖然script 是用 VENV,但是還是用 conda create 一個python==3.8.10 的環境來run
跟平時依樣,但是第一次 run, 他會發現是第一次,然後自動 download and install
./webui.sh --listen
model checkpoint 要自己去 huggingface download.
比較多文件說明,還有沒有 license 問題的是 v 1.5 download 整個 project/file 下來,放到 models/Stable-diffusion 目錄下


安裝 ControlNet:

用 webui 來裝:
因為是在 headless server 上 run,所以要加上 --listen 允許在 lan 上access,這樣要 install extension 就會出現錯誤
 AssertionError: extension access disabled because of command line flags
所以要增加 options:
$ ./webui.sh --listen --enable-insecure-extension-access
download and install 完,console command 會出現
/mnt/hdd8t/charles-chang/stablediffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:797:
FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. 
If you want to force a new download, use `force_download=True`.
  warnings.warn(
Applying attention optimization: Doggettx... done.
Model loaded in 3.9s (load weights from disk: 2.8s, create model: 0.2s, apply weights to model: 0.8s).
Installing sd-webui-controlnet requirement: fvcore
Installing sd-webui-controlnet requirement: mediapipe
Installing sd-webui-controlnet requirement: svglib
Installing sd-webui-controlnet requirement: addict
Installing sd-webui-controlnet requirement: yapf
Installing sd-webui-controlnet requirement: changing albumentations version from None to 1.4.3
Installing sd-webui-controlnet requirement: changing timm version from 1.0.11 to 0.9.5
Installing sd-webui-controlnet requirement: changing pydantic version from 1.10.19 to 1.10.17
Installing sd-webui-controlnet requirement: changing controlnet_aux version from None to 0.0.9
Installing sd-webui-controlnet requirement: onnxruntime-gpu
ControlNet init warning: Unable to install insightface automatically. Please try run `pip install insightface` manually.
Installing sd-webui-controlnet requirement: handrefinerportable
Installing sd-webui-controlnet requirement: depth_anything
Installing sd-webui-controlnet requirement: depth_anything_v2
Installing sd-webui-controlnet requirement: dsine
等好久,完成,webui 會出現小字:
Installed into /mnt/hdd8t/charles-chang/stablediffusion/stable-diffusion-webui/extensions/sd-webui-controlnet. Use Installed tab to restart.
然後在 Installed 的表格最後會有:
-----------------------------------------------------------------------------------------------------------------------
|sd-webui-controlnet	| https://github.com/Mikubill/sd-webui-controlnet |	main | 56cec5b2	| 2024-07-26 04:52:52 | unknown |
ControlNet 也是一個model,所以也要 download checkpoint.