浏览代码

[youtube] Clarify ytplayer.config extraction rationale

Sergey M․ 9 年之前
父节点
当前提交
526b3b0716
共有 1 个文件被更改,包括 7 次插入0 次删除
  1. 7 0
      youtube_dl/extractor/youtube.py

+ 7 - 0
youtube_dl/extractor/youtube.py

@@ -898,6 +898,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
 
 
     def _get_ytplayer_config(self, video_id, webpage):
     def _get_ytplayer_config(self, video_id, webpage):
         patterns = (
         patterns = (
+            # User data may contain arbitrary character sequences that may affect
+            # JSON extraction with regex, e.g. when '};' is contained the second
+            # regex won't capture the whole JSON. Yet working around by trying more
+            # concrete regex first keeping in mind proper quoted string handling
+            # to be implemented in future that will replace this workaround (see
+            # https://github.com/rg3/youtube-dl/issues/7468,
+            # https://github.com/rg3/youtube-dl/pull/7599)
             r';ytplayer\.config\s*=\s*({.+?});ytplayer',
             r';ytplayer\.config\s*=\s*({.+?});ytplayer',
             r';ytplayer\.config\s*=\s*({.+?});',
             r';ytplayer\.config\s*=\s*({.+?});',
         )
         )