fix: handle nested brackets and parentheses in LINK_PATTERN regex

The previous regex [^\]]+ stopped at the first ] which broke
markdown links containing embedded images like:

The new pattern allows one level of nested [...] in the link text
and one level of nested (...) in the URL, correctly handling:
- Embedded images in link text
- Wikipedia-style URLs with parentheses

Fixes #711
This commit is contained in:
Br1an67
2026-03-02 01:24:02 +08:00
parent aa7b05072d
commit 669b466667

View File

@@ -8,7 +8,7 @@ import re
from urllib.parse import urljoin
# Pre-compile the regex pattern
LINK_PATTERN = re.compile(r'!?\[([^\]]+)\]\(([^)]+?)(?:\s+"([^"]*)")?\)')
LINK_PATTERN = re.compile(r'!?\[((?:[^\[\]]|\[(?:[^\[\]]|\[[^\]]*\])*\])*)\]\(((?:[^()\s]|\([^()]*\))*)(?:\s+"([^"]*)")?\)')
def fast_urljoin(base: str, url: str) -> str: