- HKUDS/VideoAgent: "VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"
- rupeshs/fastsdcpu: Fast stable diffusion on CPU and AI PC
- https://github.com/Linum-AI/linum-v2
- https://repoinside.com/FunnyWolf/Viper
- https://github.com/deepbeepmeep/Wan2GP
High Altitude Oolong
2026/1/27
bookmark some page to review later
2026/1/17
codex-cli 使用自己的 ollama server.
但是(0.87.0)因為它hard coding provider: ollama 就是用 localhost,所以在 ~/.codex/.config.toml 寫了 provider.ollama 的 base_url 也是沒效。
所以不能用 ollama 這個 名子做 provder,要用其他的。例如:
[model_providers.ollama-remote] name = "ollama" base_url = "http://192.168.145.70:11434/v1" wire_api = "responses"這樣,用 ollama-remote 作為 provider 的 name。
另外,設定 default provider 用 ollama-remote,然後 model 是 gpt-oss:20b 是
model_provider = "ollama-remote" model = "gpt-oss:20b" model_reasoning_effort = "medium" oss_provider = "ollama-remote"這樣,直接啟動 codex 後,就會用這個自己的 ollama server 的 gpt-oss:20b 作為 model 了。
ollama server 版本要大於0.13.4 才 support responses API
claude-code 要用 local 的 ollama 的話,用環境變數就可以了。
export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_BASE_URL=http://192.168.145.70:11434然後在啟動的時候指定 model:
claude --model gpt-oss:120bollama 要新版的才有支援 claude 的 api 格式.
2026/1/14
Problem and Solve : some sections are mssing when VLM ocr pdf file
Problem
When converting 5.1.3. Tutorial for Large Language Models.pdf to Markdown, shell script sections were missing in the output. These sections had light gray text on light gray background (low contrast).
Root Cause
The VLM prompt didn't explicitly instruct the model to look for low-contrast code blocks. While the VLM could recognize the text, recognition was inconsistent.
Changes Made
1. Enhanced VLM Prompt (ollama_client.py)
Added explicit instructions to detect low-contrast code:
- "Pay special attention to CODE BLOCKS and SHELL COMMANDS that may appear in LIGHT GRAY BOXES"
- "These low-contrast code sections are VERY IMPORTANT and MUST NOT be skipped"
- Specific examples:
$ bash script.sh,$ ./compile.sh,$ adb push
2. Added VLM Output Cleanup (ollama_client.py)
New
- Patterns like "Wait, no...", "Let me think...", "So final Markdown:"
- Markdown code block wrappers
- Multiple consecutive blank lines
2026/1/10
web ui for vlm : live-vlm-webui
安裝就用 pip
pip install live-vlm-webui然後啟動:
$ live-vlm-webui 2026-01-10 22:02:23,010 - live_vlm_webui.server - INFO - No model/API specified, auto-detecting local services... 2026-01-10 22:02:23,016 - live_vlm_webui.server - INFO - ✅ Auto-detected Ollama at http://localhost:11434/v1 2026-01-10 22:02:23,016 - live_vlm_webui.server - INFO - Selected model: llama3.2-vision:latest 2026-01-10 22:02:23,047 - live_vlm_webui.server - INFO - Initialized VLM service: 2026-01-10 22:02:23,047 - live_vlm_webui.server - INFO - Model: llama3.2-vision:latest 2026-01-10 22:02:23,047 - live_vlm_webui.server - INFO - API: http://localhost:11434/v1 (Local) 2026-01-10 22:02:23,047 - live_vlm_webui.server - INFO - Prompt: Describe what you see in this image in one sentence. 2026-01-10 22:02:23,047 - live_vlm_webui.server - INFO - Serving static files from: /home/charles-chang/livevlmwebui/venv/lib/python3.12/site-packages/live_vlm_webui/static/images 2026-01-10 22:02:23,047 - live_vlm_webui.server - INFO - Serving favicon files from: /home/charles-chang/livevlmwebui/venv/lib/python3.12/site-packages/live_vlm_webui/static/favicon 2026-01-10 22:02:23,048 - live_vlm_webui.server - INFO - SSL enabled - using HTTPS 2026-01-10 22:02:23,048 - live_vlm_webui.server - INFO - Starting server on 0.0.0.0:8090 2026-01-10 22:02:23,048 - live_vlm_webui.server - INFO - 2026-01-10 22:02:23,048 - live_vlm_webui.server - INFO - ====================================================================== 2026-01-10 22:02:23,048 - live_vlm_webui.server - INFO - Access the server at: 2026-01-10 22:02:23,048 - live_vlm_webui.server - INFO - Local: https://localhost:8090 2026-01-10 22:02:23,049 - live_vlm_webui.server - INFO - Network: https://192.168.145.77:8090 2026-01-10 22:02:23,049 - live_vlm_webui.server - INFO - Network: https://172.20.0.1:8090 ... 2026-01-10 22:02:23,050 - live_vlm_webui.server - INFO - Network: https://192.168.94.37:8090 2026-01-10 22:02:23,050 - live_vlm_webui.server - INFO - ====================================================================== 2026-01-10 22:02:23,050 - live_vlm_webui.server - INFO - 2026-01-10 22:02:23,050 - live_vlm_webui.server - INFO - Press Ctrl+C to stop 2026-01-10 22:02:23,069 - live_vlm_webui.gpu_monitor - INFO - Auto-detected NVIDIA GPU (NVML available) 2026-01-10 22:02:23,069 - live_vlm_webui.gpu_monitor - INFO - Detected system: ASUS EX-B760M-V5 2026-01-10 22:02:23,076 - live_vlm_webui.gpu_monitor - INFO - NVML initialized for GPU: NVIDIA TITAN RTX 2026-01-10 22:02:23,076 - live_vlm_webui.server - INFO - GPU monitor initialized 2026-01-10 22:02:23,076 - live_vlm_webui.server - INFO - GPU monitoring task started 2026-01-10 22:02:23,076 - live_vlm_webui.server - INFO - GPU monitoring loop started ======== Running on https://0.0.0.0:8090 ======== (Press CTRL+C to quit)然後在browsser 上開啟 https://localhost:8090 就可以。
API BASE URL 填入 ollama 的或是 sglang .. 都可以。
然後選 vlm model,開啟 camera,就會開始依照 prompt 內容對 video 做敘述...
2026/1/8
Yolo-World, VLM use yolo as image encoder
- ZSD-YOLO:2021
- YOLO-World:2024
其中 YOLO-World 又有 demo,所以就在 DGX Spark 上 setup 起來試試看。
GB10 的Cuda 有些限制,所以做了些修改才能 run,所以記錄在 GB10_SETUP.md 上。
demo 是用 gradio,上傳照片,promt 寫要偵測的東西,result 就是 object bounding box.
2026/1/4
Raspberry Pi 4 NetworkManager Wi-Fi AC 設定
說明如何在 Raspberry Pi 4 環境下,透過 NetworkManager 將無線熱點設定為 5GHz (802.11ac) 模式。
1. 環境確認
執行以下指令確認無線網卡支援 5GHz 頻段:
iw list
輸出內容若包含 Band 2 相關資訊,即代表硬體支援。
2. 修改熱點設定
假設既有的熱點連線名稱為 wifi-ap。若要切換至 AC 模式,必須指定頻段與頻道。
設定頻段
將 wifi.band 參數設定為 a,這代表強制使用 5GHz 頻段。
sudo nmcli connection modify wifi-ap wifi.band a
設定頻道
建議手動指定一組非 DFS 頻道 (例如 36, 40, 44, 48) 以加速啟動時間。
sudo nmcli connection modify wifi-ap wifi.channel 36
套用變更
重新啟動連線以生效設定:
sudo nmcli connection up wifi-ap
3. 驗證狀態
使用以下工具檢視目前的運作頻率與頻寬:
iw dev wlan0 info
正確範例輸出:
channel 36 (5180 MHz), width: 20 MHz
若顯示頻率為 5000 MHz 以上 (例如 5180 MHz),且用戶端連線速率超過 72 Mbps,即確認已運作於 802.11ac 模式。
4. 技術限制說明
使用 NetworkManager 建立熱點時,預設會鎖定頻寬為 20 MHz。雖然運作於 5GHz 頻段,但最高理論速率限制約為 86.6 Mbps。此設定是為了確保最高的裝置相容性與連線穩定度。
另外使用nmcli設定完,nmcli會自動更新 /etc/NetworkManager/system-connections/ 目錄下的 .nmconnection。 讓設定永久保存。
Antigravity ssh remote failed
到 pi 的 ~/.antigravity 去看 log: 在 data/logs/20260104T124620/remoteagent.log 有 SIGILL
Language server killed with signal SIGILL找到一個很像 language server 的 : ./bin/94...af/extensions/antigravity/bin/language_server_linux_arm
run run 看真的 fail
~/.antigravity-server $ ./bin/94f...6af/extensions/antigravity/bin/language_server_linux_arm --version Illegal instruction所以用 gdb 來看,用這些參數:
* `-ex "run"`: Tells GDB to start the program immediately.
* `-ex "x/i \$pc"`: This is the most important part.
* x = Examine memory.
* /i = Format the output as a CPU instruction.
* $pc = Look at the Program Counter (the exact address where the CPU stopped because of the error).
* `-ex "bt"`: Generates a Backtrace to show the function call stack leading up to the crash.
* `--batch`: Runs GDB in non-interactive mode and exits once the commands are finished.
* `--args`: Allows you to pass the binary path and any flags it needs (like --version).
結果
~/.antigravity-server $gdb -ex "run" -ex "x/i \$pc" -ex "bt" --batch --args ./bin/94...6af/extensions/antigravity/bin/language_server_linux_arm --version [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". Program received signal SIGILL, Illegal instruction. 0x000000555d5426ac in ?? () => 0x555d5426ac: ldaddal x8, x8, [x9] #0 0x000000555d5426ac in ?? () #1 0x0000007ff7fc1cdc in ?? () from /lib/ld-linux-aarch64.so.1 #2 0x0000007ff7fd83a0 in ?? () from /lib/ld-linux-aarch64.so.1所以是 ldaddal
查這個 instruction 要ARMv8.1 LSE, raspberry pi 4 是 ARMv8.0,所以沒辦法。
另外用 VSCode 測試,ssh 到 raspberry pi 是 OK 的,髓以猜是language server 重新 compile for armv8.0 應該就可以....
然後 .vscode 沒有 language_server... 這個東西,連 bin 都沒有...