浏览代码

Fix AssertionError when og property not found

On tvp.pl some webpages contain OpenGraph
metadata and some don't.

If og property is not found, _og_search_description
fails with

WARNING: unable to extract OpenGraph description; please report this issue on http://yt-dl.org/bug
Traceback (most recent call last):
  File "/usr/home/saper/bin/youtube-dl", line 18, in <module>
    youtube_dl.main()
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 766, in main
    _real_main(argv)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 719, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 715, in download
    videos = self.extract_info(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 348, in extract_info
    ie_result = ie.extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 125, in extract
    return self._real_extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/tvp.py", line 56, in _real_extract
    info['description'] = self._og_search_description(webpage)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 331, in _og_search_description
    return self._og_search_property('description', html, fatal=False, **kargs)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 325, in _og_search_property
    return unescapeHTML(escaped)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/utils.py", line 494, in unescapeHTML
    assert type(s) == type(u'')
AssertionError

The patch allows me to use:

  try:
    info['description'] = self._og_search_description(webpage)
    info['thumbnail'] = self._og_search_thumbnail(webpage)
  except RegexNotFoundError:
    pass
Marcin Cieślak 11 年之前
父节点
当前提交
a8eeb0597b
共有 1 个文件被更改,包括 3 次插入1 次删除
  1. 3 1
      youtube_dl/extractor/common.py

+ 3 - 1
youtube_dl/extractor/common.py

@@ -322,7 +322,9 @@ class InfoExtractor(object):
         if name is None:
             name = 'OpenGraph %s' % prop
         escaped = self._search_regex(self._og_regex(prop), html, name, flags=re.DOTALL, **kargs)
-        return unescapeHTML(escaped)
+        if not escaped is None:
+            return unescapeHTML(escaped)
+        return None
 
     def _og_search_thumbnail(self, html, **kargs):
         return self._og_search_property('image', html, u'thumbnail url', fatal=False, **kargs)