2026/1/14

Problem and Solve : some sections are mssing when VLM ocr pdf file

 

Problem

When converting 5.1.3. Tutorial for Large Language Models.pdf to Markdown, shell script sections were missing in the output. These sections had light gray text on light gray background (low contrast).

Root Cause

The VLM prompt didn't explicitly instruct the model to look for low-contrast code blocks. While the VLM could recognize the text, recognition was inconsistent.

Changes Made

1. Enhanced VLM Prompt (
ollama_client.py
)

Added explicit instructions to detect low-contrast code:

  • "Pay special attention to CODE BLOCKS and SHELL COMMANDS that may appear in LIGHT GRAY BOXES"
  • "These low-contrast code sections are VERY IMPORTANT and MUST NOT be skipped"
  • Specific examples: $ bash script.sh$ ./compile.sh$ adb push

2. Added VLM Output Cleanup (
ollama_client.py
)

New 

_clean_vlm_output() method removes VLM thinking noise:

  • Patterns like "Wait, no...", "Let me think...", "So final Markdown:"
  • Markdown code block wrappers
  • Multiple consecutive blank lines

沒有留言:

張貼留言