diff --git a/.gitignore b/.gitignore index db322c4f0..fdd904f7f 100644 --- a/.gitignore +++ b/.gitignore @@ -51,7 +51,6 @@ cookies *.srt *.ssa *.swf -*.swp *.tt *.ttml *.url @@ -119,6 +118,7 @@ yt-dlp.zip .vscode *.sublime-* *.code-workspace +*.swp # Lazy extractors */extractor/lazy_extractors.py diff --git a/CONTRIBUTORS b/CONTRIBUTORS index a89357275..489ab7da8 100644 --- a/CONTRIBUTORS +++ b/CONTRIBUTORS @@ -644,3 +644,16 @@ peisenwang TheZ3ro tippfehlr varunchopra +DrakoCpp +PatrykMis +DinhHuy2010 +exterrestris +harbhim +LeSuisse +DunnesH +iancmy +mokrueger +luvyana +szantnerb +hugepower +scribblemaniac diff --git a/Changelog.md b/Changelog.md index 3dbbc210c..0b96ab29c 100644 --- a/Changelog.md +++ b/Changelog.md @@ -4,10 +4,157 @@ # Changelog # To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master --> +### 2024.08.06 + +#### Core changes +- **jsinterp**: [Improve `slice` implementation](https://github.com/yt-dlp/yt-dlp/commit/bb8bf1db993f59752d20b73b861bd55e40cf0e31) ([#10664](https://github.com/yt-dlp/yt-dlp/issues/10664)) by [seproDev](https://github.com/seproDev) + +#### Extractor changes +- **discoveryplusitaly**: [Support sport and olympics URLs](https://github.com/yt-dlp/yt-dlp/commit/e7d73bc4531ee3f91a46b15e218dcc1fbeb6226c) ([#10655](https://github.com/yt-dlp/yt-dlp/issues/10655)) by [bashonly](https://github.com/bashonly) +- **gem.cbc.ca**: live: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/fc5eecfa31c9571b6031cc3968aaa0394be55d7a) ([#10565](https://github.com/yt-dlp/yt-dlp/issues/10565)) by [bashonly](https://github.com/bashonly), [scribblemaniac](https://github.com/scribblemaniac) +- **niconico**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/4d9231208332d4c32364b8cd814bff8b20232cae) ([#10677](https://github.com/yt-dlp/yt-dlp/issues/10677)) by [bashonly](https://github.com/bashonly) +- **olympics**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/919540a9644e55deb78cdd6751757ec8fdaf76f4) ([#10625](https://github.com/yt-dlp/yt-dlp/issues/10625)) by [bashonly](https://github.com/bashonly) +- **youku**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/0088c6de23d832b117061a33e984dc452d992e9c) ([#10626](https://github.com/yt-dlp/yt-dlp/issues/10626)) by [hugepower](https://github.com/hugepower) +- **youtube** + - [Change default player clients to `ios,web_creator`](https://github.com/yt-dlp/yt-dlp/commit/406f4c2e47502fffc1b0c210b4ee6487c89a44cb) ([#10674](https://github.com/yt-dlp/yt-dlp/issues/10674)) by [bashonly](https://github.com/bashonly) + - [Fix `n` function name extraction for player `b12cc44b`](https://github.com/yt-dlp/yt-dlp/commit/c86891eb9434b4d7eec426d38c0c625b5e13cb2f) ([#10668](https://github.com/yt-dlp/yt-dlp/issues/10668)) by [seproDev](https://github.com/seproDev) + +### 2024.08.01 + +#### Core changes +- **utils**: `unified_timestamp`: [Recognize Sunday](https://github.com/yt-dlp/yt-dlp/commit/6daf2c27c0464fba98337be30de0b66d520d0db1) ([#10589](https://github.com/yt-dlp/yt-dlp/issues/10589)) by [bashonly](https://github.com/bashonly) + +#### Extractor changes +- **abematv**: [Fix availability extraction](https://github.com/yt-dlp/yt-dlp/commit/ef36d517f9b05785d61abca7691d9ab7d63cc75c) ([#10569](https://github.com/yt-dlp/yt-dlp/issues/10569)) by [middlingphys](https://github.com/middlingphys) +- **cbc.ca**: player: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/94a1c5e642e468cebeb51f74c6c220434cb47d96) ([#10302](https://github.com/yt-dlp/yt-dlp/issues/10302)) by [bashonly](https://github.com/bashonly), [trainman261](https://github.com/trainman261) +- **discoveryplus**: [Support olympics URLs](https://github.com/yt-dlp/yt-dlp/commit/0b7728618417e1aa382722a4d29b916b594d4459) ([#10566](https://github.com/yt-dlp/yt-dlp/issues/10566)) by [bashonly](https://github.com/bashonly) +- **kick**: clips: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/bb3936ae2b3ce96d0b53f9e17cad1082058f032b) ([#10572](https://github.com/yt-dlp/yt-dlp/issues/10572)) by [luvyana](https://github.com/luvyana) +- **learningonscreen**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/fe15d3178e242803ae7a934b90137f13598eba2e) ([#10590](https://github.com/yt-dlp/yt-dlp/issues/10590)) by [bashonly](https://github.com/bashonly), [Grub4K](https://github.com/Grub4K) +- **mediaklikk**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/7e3e4779ad13e4511c9ba3869879e53f0267bd7a) ([#10605](https://github.com/yt-dlp/yt-dlp/issues/10605)) by [szantnerb](https://github.com/szantnerb) +- **mlbtv**: [Fix makeup game extraction](https://github.com/yt-dlp/yt-dlp/commit/4b69e1b53ea21e631cd5dd68ff531e2f1671ec17) ([#10607](https://github.com/yt-dlp/yt-dlp/issues/10607)) by [bashonly](https://github.com/bashonly) +- **olympics**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/2f1ddfe12a2c174bc777264c5c8ffe7ca0922d94) ([#10604](https://github.com/yt-dlp/yt-dlp/issues/10604)) by [bashonly](https://github.com/bashonly) +- **tva**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/28d485714fef88937c82635438afba5db81f9089) ([#10567](https://github.com/yt-dlp/yt-dlp/issues/10567)) by [bashonly](https://github.com/bashonly) +- **tver**: [Support olympic URLs](https://github.com/yt-dlp/yt-dlp/commit/5260696b1cba77161828941fdb38f09f14ac6c60) ([#10600](https://github.com/yt-dlp/yt-dlp/issues/10600)) by [vvto33](https://github.com/vvto33) +- **vimeo**: review: [Fix password-protected video extraction](https://github.com/yt-dlp/yt-dlp/commit/2b6df93a243bdfb9d6bb5c1e18020625cd02d465) ([#10598](https://github.com/yt-dlp/yt-dlp/issues/10598)) by [bashonly](https://github.com/bashonly) +- **youtube** + - [Change default player clients to `ios,tv`](https://github.com/yt-dlp/yt-dlp/commit/efb42763dec23ccf6a2e3bac3afbfefce8efd012) ([#10457](https://github.com/yt-dlp/yt-dlp/issues/10457)) by [seproDev](https://github.com/seproDev) + - [Fix `n` function name extraction for player `20dfca59`](https://github.com/yt-dlp/yt-dlp/commit/011b4a04db2a636c3ef0a0ad4e2d3ae482c9fd76) ([#10611](https://github.com/yt-dlp/yt-dlp/issues/10611)) by [bashonly](https://github.com/bashonly) + - [Fix age-verification workaround](https://github.com/yt-dlp/yt-dlp/commit/d19fcb934269465fd707e68a87f735ec6983e93d) ([#10610](https://github.com/yt-dlp/yt-dlp/issues/10610)) by [bashonly](https://github.com/bashonly), [Grub4K](https://github.com/Grub4K) + - [Player client maintenance](https://github.com/yt-dlp/yt-dlp/commit/0e539617a41913c7da1edd74fb6543c10ad727b3) ([#10573](https://github.com/yt-dlp/yt-dlp/issues/10573)) by [bashonly](https://github.com/bashonly) + +#### Misc. changes +- **cleanup**: Miscellaneous: [ffd7781](https://github.com/yt-dlp/yt-dlp/commit/ffd7781d6588926f820b44a34b9e6e3068fb9f97) by [bashonly](https://github.com/bashonly) + +### 2024.07.25 + +#### Extractor changes +- **abematv**: [Adapt key retrieval to request handler framework](https://github.com/yt-dlp/yt-dlp/commit/a3bab4752a2b3d56e5a59b4e0411bb8f695c010b) ([#10491](https://github.com/yt-dlp/yt-dlp/issues/10491)) by [bashonly](https://github.com/bashonly) +- **facebook**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/1a34a802f44a1dab8f642c79c3cc810e21541d3b) ([#10531](https://github.com/yt-dlp/yt-dlp/issues/10531)) by [bashonly](https://github.com/bashonly) +- **mlbtv**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/f0993391e6052ec8f7aacc286609564f226943b9) ([#10515](https://github.com/yt-dlp/yt-dlp/issues/10515)) by [bashonly](https://github.com/bashonly) +- **tiktok**: [Fix and deprioritize JSON subtitles](https://github.com/yt-dlp/yt-dlp/commit/2f97779f335ac069ecccd9c7bf81abf4a83cfe7a) ([#10516](https://github.com/yt-dlp/yt-dlp/issues/10516)) by [bashonly](https://github.com/bashonly) +- **vimeo**: [Fix chapters extraction](https://github.com/yt-dlp/yt-dlp/commit/a0a1bc3d8d8e3bb9a48a06e835815a0460e90e77) ([#10544](https://github.com/yt-dlp/yt-dlp/issues/10544)) by [bashonly](https://github.com/bashonly) +- **youtube**: [Fix `n` function name extraction for player `3400486c`](https://github.com/yt-dlp/yt-dlp/commit/713b4cd18f00556771af8cfdd9cea6cc1a09e948) ([#10542](https://github.com/yt-dlp/yt-dlp/issues/10542)) by [bashonly](https://github.com/bashonly) + +#### Misc. changes +- **build**: [Pin `setuptools` version](https://github.com/yt-dlp/yt-dlp/commit/e046db8a116b1c320d4785daadd48ea0b22a3987) ([#10493](https://github.com/yt-dlp/yt-dlp/issues/10493)) by [bashonly](https://github.com/bashonly) + +### 2024.07.16 + +#### Core changes +- [Fix `noprogress` if `test=True` with `--quiet` and `--verbose`](https://github.com/yt-dlp/yt-dlp/commit/66ce3d76d87af3f81cc9dfec4be4704016cb1cdb) ([#10454](https://github.com/yt-dlp/yt-dlp/issues/10454)) by [Grub4K](https://github.com/Grub4K) +- [Support `auto-tty` and `no_color-tty` for `--color`](https://github.com/yt-dlp/yt-dlp/commit/d9cbced493cae2008508d94a2db5dd98be7c01fc) ([#10453](https://github.com/yt-dlp/yt-dlp/issues/10453)) by [Grub4K](https://github.com/Grub4K) +- **update**: [Fix network error handling](https://github.com/yt-dlp/yt-dlp/commit/ed1b9ed93dd90d2cc960c0d8eaa9d919db224203) ([#10486](https://github.com/yt-dlp/yt-dlp/issues/10486)) by [bashonly](https://github.com/bashonly) +- **utils**: `parse_codecs`: [Fix parsing of mixed case codec strings](https://github.com/yt-dlp/yt-dlp/commit/cc0070f6496e501d77352bad475fb02d6a86846a) by [bashonly](https://github.com/bashonly) + +#### Extractor changes +- **adn**: [Adjust for .com domain change](https://github.com/yt-dlp/yt-dlp/commit/959b7a379b8e5da059d110a63339c964b6265736) ([#10399](https://github.com/yt-dlp/yt-dlp/issues/10399)) by [infanf](https://github.com/infanf) +- **afreecatv**: [Fix login and use `legacy_ssl`](https://github.com/yt-dlp/yt-dlp/commit/4cd41469243624d90b7a2009b95cbe0609343efe) ([#10440](https://github.com/yt-dlp/yt-dlp/issues/10440)) by [bashonly](https://github.com/bashonly) +- **box**: [Support enterprise URLs](https://github.com/yt-dlp/yt-dlp/commit/705f5b84dec75cc7af97f42fd1530e8062735970) ([#10419](https://github.com/yt-dlp/yt-dlp/issues/10419)) by [seproDev](https://github.com/seproDev) +- **digitalconcerthall**: [Extract HEVC and FLAC formats](https://github.com/yt-dlp/yt-dlp/commit/e62fa6b0e0186f8c5666c2c5ab64cf191abdafc1) ([#10470](https://github.com/yt-dlp/yt-dlp/issues/10470)) by [bashonly](https://github.com/bashonly) +- **dplay**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/39e6c4cb44b9292e89ac0afec3cd0afc2ae8775f) ([#10471](https://github.com/yt-dlp/yt-dlp/issues/10471)) by [bashonly](https://github.com/bashonly) +- **epidemicsound**: [Support sound effects URLs](https://github.com/yt-dlp/yt-dlp/commit/8531d2b03bac9cc746f2ee8098aaf8f115505f5b) ([#10436](https://github.com/yt-dlp/yt-dlp/issues/10436)) by [iancmy](https://github.com/iancmy) +- **generic**: [Fix direct video link extensions](https://github.com/yt-dlp/yt-dlp/commit/b9afb99e7c34d0eb15ddc6689cd7d20eebfda68e) ([#10468](https://github.com/yt-dlp/yt-dlp/issues/10468)) by [bashonly](https://github.com/bashonly) +- **picarto**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/bacd18b7df08b4995644fd12cee1f8c8e8636bc7) ([#10414](https://github.com/yt-dlp/yt-dlp/issues/10414)) by [Frankgoji](https://github.com/Frankgoji) +- **soundcloud**: permalink, user: [Extract tracks only](https://github.com/yt-dlp/yt-dlp/commit/22870b81bad97dfa6307a7add44753b2dffc76a9) ([#10463](https://github.com/yt-dlp/yt-dlp/issues/10463)) by [DunnesH](https://github.com/DunnesH) +- **tiktok**: live: [Fix room ID extraction](https://github.com/yt-dlp/yt-dlp/commit/d2189d3d36987ebeac426fd70a60a5fe86325a2b) ([#10408](https://github.com/yt-dlp/yt-dlp/issues/10408)) by [mokrueger](https://github.com/mokrueger) +- **tv5monde**: [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/9b95a6765a5f6325af99c4aca961587f0c426e8c) ([#10417](https://github.com/yt-dlp/yt-dlp/issues/10417)) by [bashonly](https://github.com/bashonly) (With fixes in [cc1a309](https://github.com/yt-dlp/yt-dlp/commit/cc1a3098c00995c6aebc2a16bd1050a66bad64db)) +- **youtube** + - [Avoid poToken experiment player responses](https://github.com/yt-dlp/yt-dlp/commit/8b8b442cb005a8d85315f301615f83fb736b967a) ([#10456](https://github.com/yt-dlp/yt-dlp/issues/10456)) by [seproDev](https://github.com/seproDev) (With fixes in [16da8ef](https://github.com/yt-dlp/yt-dlp/commit/16da8ef9937ff76632dfef02e5062c5ba99c8ea2)) + - [Invalidate nsig cache from < 2024.07.09](https://github.com/yt-dlp/yt-dlp/commit/04e17ba20a139f1b3e30ec4bafa3fba26888f0b3) ([#10401](https://github.com/yt-dlp/yt-dlp/issues/10401)) by [bashonly](https://github.com/bashonly) + - [Reduce android client priority](https://github.com/yt-dlp/yt-dlp/commit/b85eef0a615a01304f88a3847309c667e09a20df) ([#10467](https://github.com/yt-dlp/yt-dlp/issues/10467)) by [seproDev](https://github.com/seproDev) + +#### Networking changes +- [Add `legacy_ssl` request extension](https://github.com/yt-dlp/yt-dlp/commit/150ecc45d9cacc919550c13b04fd998ac5103a6b) ([#10448](https://github.com/yt-dlp/yt-dlp/issues/10448)) by [coletdjnz](https://github.com/coletdjnz) +- **Request Handler**: curl_cffi: [Support `curl_cffi` 0.7.X](https://github.com/yt-dlp/yt-dlp/commit/42bfca00a6b460fc053514cdd7ac6f5b5daddf0c) by [coletdjnz](https://github.com/coletdjnz) + +#### Misc. changes +- **build** + - [Include `curl_cffi` in `yt-dlp_linux`](https://github.com/yt-dlp/yt-dlp/commit/4521f30d1479315cd5c3bf4abdad19391952df98) by [bashonly](https://github.com/bashonly) + - [Pin `curl-cffi` to 0.5.10 for Windows](https://github.com/yt-dlp/yt-dlp/commit/ac30941ae682f71eab010877c9a977736a61d3cf) by [bashonly](https://github.com/bashonly) +- **cleanup**: Miscellaneous: [89a161e](https://github.com/yt-dlp/yt-dlp/commit/89a161e8c62569a662deda1c948664152efcb6b4) by [bashonly](https://github.com/bashonly) + +### 2024.07.09 + +#### Core changes +- [Do not alter default format selection when simulated](https://github.com/yt-dlp/yt-dlp/commit/0b570f2a90ce2363ba06089217514d644e7be2e0) ([#9862](https://github.com/yt-dlp/yt-dlp/issues/9862)) by [seproDev](https://github.com/seproDev) + +#### Extractor changes +- **youtube**: [Remove broken `n` function extraction fallback](https://github.com/yt-dlp/yt-dlp/commit/7ead7332af69422cee931aec3faa277288e9e212) ([#10396](https://github.com/yt-dlp/yt-dlp/issues/10396)) by [pukkandan](https://github.com/pukkandan), [seproDev](https://github.com/seproDev) + +### 2024.07.08 + +#### Core changes +- **jsinterp**: [Implement `Function.prototype` resolving for `call` and `apply`](https://github.com/yt-dlp/yt-dlp/commit/6c056ea7aeb03660281653a9668547f2548f194f) ([#10392](https://github.com/yt-dlp/yt-dlp/issues/10392)) by [Grub4K](https://github.com/Grub4K) + +#### Extractor changes +- **soundcloud**: [Fix rate-limit handling](https://github.com/yt-dlp/yt-dlp/commit/4b50b292cc98534fb8c7cdf0ae5cb85862f7ebfc) ([#10389](https://github.com/yt-dlp/yt-dlp/issues/10389)) by [bashonly](https://github.com/bashonly) +- **youtube**: [Fix JS `n` function name extraction](https://github.com/yt-dlp/yt-dlp/commit/297b0a379282a15c80d82d51f3757c961db2dae1) ([#10390](https://github.com/yt-dlp/yt-dlp/issues/10390)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev) + +### 2024.07.07 + +#### Important changes +- Security: [[ie/douyutv] Do not use dangerous javascript source/URL](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-3v33-3wmw-3785) + - A dependency on potentially malicious third-party JavaScript code has been removed from the Douyu extractors + +#### Core changes +- [Address gaps in allowed extensions](https://github.com/yt-dlp/yt-dlp/commit/2469119490d7e0397ebbf5c5ae327316f955eef2) ([#10362](https://github.com/yt-dlp/yt-dlp/issues/10362)) by [bashonly](https://github.com/bashonly) +- [Fix `--ignore-no-formats-error`](https://github.com/yt-dlp/yt-dlp/commit/cc767e9490056efaaa11c186b0d032e4b4969180) ([#10345](https://github.com/yt-dlp/yt-dlp/issues/10345)) by [Grub4K](https://github.com/Grub4K) + +#### Extractor changes +- **abematv**: [Extract availability](https://github.com/yt-dlp/yt-dlp/commit/2a1a1b8e67e864289ac7ba5d05ec63dbb19a639f) ([#10348](https://github.com/yt-dlp/yt-dlp/issues/10348)) by [middlingphys](https://github.com/middlingphys) +- **chzzk**: [Extract with API v3](https://github.com/yt-dlp/yt-dlp/commit/4862a29854d4044120e3f97b52199711ad04bee1) ([#10363](https://github.com/yt-dlp/yt-dlp/issues/10363)) by [hui1601](https://github.com/hui1601) +- **douyutv**: [Do not use dangerous javascript source/URL](https://github.com/yt-dlp/yt-dlp/commit/6075a029dba70a89675ae1250e7cdfd91f0eba41) ([#10347](https://github.com/yt-dlp/yt-dlp/issues/10347)) by [LeSuisse](https://github.com/LeSuisse) +- **jiosaavn**: playlist: [Support featured playlists](https://github.com/yt-dlp/yt-dlp/commit/f0f867f008a1728f5f6ac1224b9e014b5d27f817) ([#10382](https://github.com/yt-dlp/yt-dlp/issues/10382)) by [harbhim](https://github.com/harbhim) +- **vidyard**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/00766ece0c5c7a80781a4ff677198c5fb69d9dc0) ([#10155](https://github.com/yt-dlp/yt-dlp/issues/10155)) by [exterrestris](https://github.com/exterrestris) +- **vimeo**: [Fix password-protected video extraction](https://github.com/yt-dlp/yt-dlp/commit/c1c9bb4adb42d0d93a2fb5d93a7de0a87b6ba884) ([#10341](https://github.com/yt-dlp/yt-dlp/issues/10341)) by [bashonly](https://github.com/bashonly) +- **vtv**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/987a1f94c24275f2b0cd82e719956687415dd732) ([#10173](https://github.com/yt-dlp/yt-dlp/issues/10173)) by [DinhHuy2010](https://github.com/DinhHuy2010) +- **yle_areena** + - [Fix metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/4cdc976bd861b5835601ae402bef543eacd88f3d) ([#10380](https://github.com/yt-dlp/yt-dlp/issues/10380)) by [seproDev](https://github.com/seproDev) + - [Fix subtitle extraction](https://github.com/yt-dlp/yt-dlp/commit/0d174e8bed32081eb38ef7f5d1a1282ae154f517) ([#10379](https://github.com/yt-dlp/yt-dlp/issues/10379)) by [Grub4K](https://github.com/Grub4K) + +#### Misc. changes +- **cleanup**: Miscellaneous: [b337d29](https://github.com/yt-dlp/yt-dlp/commit/b337d2989ce0614651d363383f6f743d977248ef) by [bashonly](https://github.com/bashonly) + +### 2024.07.02 + +#### Core changes +- [Fix `--compat-opt allow-unsafe-ext`](https://github.com/yt-dlp/yt-dlp/commit/773bbb181506856ffda95496ab60c1c9603f1f71) ([#10336](https://github.com/yt-dlp/yt-dlp/issues/10336)) by [bashonly](https://github.com/bashonly), [rdamas](https://github.com/rdamas) + +#### Extractor changes +- **banbye**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/7509791385ba88cb7ec0ab17e826681f4af4b66e) ([#10332](https://github.com/yt-dlp/yt-dlp/issues/10332)) by [PatrykMis](https://github.com/PatrykMis), [seproDev](https://github.com/seproDev) +- **murrtube**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/6403530e2dfe259a87afe444708c4f3024cc45b8) ([#9249](https://github.com/yt-dlp/yt-dlp/issues/9249)) by [DrakoCpp](https://github.com/DrakoCpp) +- **zaiko**: [Support JWT video URLs](https://github.com/yt-dlp/yt-dlp/commit/7799e518956387bb3c1064c9beae26eab8d5044a) ([#10130](https://github.com/yt-dlp/yt-dlp/issues/10130)) by [pzhlkj6612](https://github.com/pzhlkj6612) + +#### Postprocessor changes +- **embedthumbnail**: [Fix embedding with mutagen](https://github.com/yt-dlp/yt-dlp/commit/d502f4c6d95b74896f40070d07229997f0850f31) ([#10337](https://github.com/yt-dlp/yt-dlp/issues/10337)) by [bashonly](https://github.com/bashonly) + +#### Misc. changes +- **cleanup**: Miscellaneous: [93d33cb](https://github.com/yt-dlp/yt-dlp/commit/93d33cb29af9e2e84369ac43589d50ce8e0160ef) by [bashonly](https://github.com/bashonly) + ### 2024.07.01 #### Important changes -- Security: [[CVE-2024-10123](https://nvd.nist.gov/vuln/detail/CVE-2024-10123)] [Properly sanitize file-extension to prevent file system modification and RCE](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4j) +- Security: [[CVE-2024-38519](https://nvd.nist.gov/vuln/detail/CVE-2024-38519)] [Properly sanitize file-extension to prevent file system modification and RCE](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4j) - Unsafe extensions are now blocked from being downloaded #### Core changes diff --git a/Makefile b/Makefile index e1de7f3e9..6c72ead1e 100644 --- a/Makefile +++ b/Makefile @@ -21,7 +21,7 @@ clean-test: rm -rf test/testdata/sigs/player-*.js tmp/ *.annotations.xml *.aria2 *.description *.dump *.frag \ *.frag.aria2 *.frag.urls *.info.json *.live_chat.json *.meta *.part* *.tmp *.temp *.unknown_video *.ytdl \ *.3gp *.ape *.ass *.avi *.desktop *.f4v *.flac *.flv *.gif *.jpeg *.jpg *.lrc *.m4a *.m4v *.mhtml *.mkv *.mov *.mp3 *.mp4 \ - *.mpg *.mpga *.oga *.ogg *.opus *.png *.sbv *.srt *.ssa *.swf *.swp *.tt *.ttml *.url *.vtt *.wav *.webloc *.webm *.webp + *.mpg *.mpga *.oga *.ogg *.opus *.png *.sbv *.srt *.ssa *.swf *.tt *.ttml *.url *.vtt *.wav *.webloc *.webm *.webp clean-dist: rm -rf yt-dlp.1.temp.md yt-dlp.1 README.txt MANIFEST build/ dist/ .coverage cover/ yt-dlp.tar.gz completions/ \ yt_dlp/extractor/lazy_extractors.py *.spec CONTRIBUTING.md.tmp yt-dlp yt-dlp.exe yt_dlp.egg-info/ AUTHORS diff --git a/README.md b/README.md index e8aeb93f7..ca32e09bf 100644 --- a/README.md +++ b/README.md @@ -202,7 +202,7 @@ #### Impersonation * [**curl_cffi**](https://github.com/yifeikong/curl_cffi) (recommended) - Python binding for [curl-impersonate](https://github.com/lwthiker/curl-impersonate). Provides impersonation targets for Chrome, Edge and Safari. Licensed under [MIT](https://github.com/yifeikong/curl_cffi/blob/main/LICENSE) * Can be installed with the `curl-cffi` group, e.g. `pip install "yt-dlp[default,curl-cffi]"` - * Currently only included in `yt-dlp.exe` and `yt-dlp_macos` builds + * Currently included in `yt-dlp.exe`, `yt-dlp_linux` and `yt-dlp_macos` builds ### Metadata @@ -368,7 +368,9 @@ ## General Options: stderr) to apply the setting to. Can be one of "always", "auto" (default), "never", or "no_color" (use non color terminal - sequences). Can be used multiple times + sequences). Use "auto-tty" or "no_color-tty" + to decide based on terminal support only. + Can be used multiple times --compat-options OPTS Options that can help keep compatibility with youtube-dl or youtube-dlc configurations by reverting some of the @@ -1756,7 +1758,7 @@ # Replace all spaces and "_" in title and uploader with a `-` # EXTRACTOR ARGUMENTS -Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=android_embedded,web;formats=incomplete" --extractor-args "funimation:version=uncut"` +Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=mediaconnect,web;formats=incomplete" --extractor-args "funimation:version=uncut"` Note: In CLI, `ARG` can use `-` instead of `_`; e.g. `youtube:player-client"` becomes `youtube:player_client"` @@ -1765,7 +1767,7 @@ # EXTRACTOR ARGUMENTS #### youtube * `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes * `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively -* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music`, `_embedded`, `_embedscreen`, `_creator` (e.g. `web_embedded`); and `mediaconnect`, `mweb`, `mweb_embedscreen` and `tv_embedded` (agegate bypass) with no variants. By default, `ios,web` is used, but `tv_embedded` and `creator` variants are added as required for age-gated videos. Similarly, the music variants are added for `music.youtube.com` urls. The `android` clients will always be given lowest priority since their formats are broken. You can use `all` to use all the clients, and `default` for the default clients. +* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mediaconnect`, `mweb`, `android_producer`, `android_testsuite`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `ios,web_creator` is used, and `tv_embedded`, `web_creator` and `mediaconnect` are added as required for age-gated videos. Similarly, the music variants are added for `music.youtube.com` urls. Most `android` clients will be given lowest priority since their formats are broken. You can use `all` to use all the clients, and `default` for the default clients. * `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details * `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp. * `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side) @@ -1773,7 +1775,7 @@ #### youtube * E.g. `all,all,1000,10` will get a maximum of 1000 replies total, with up to 10 replies per thread. `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total * `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8) * `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others -* `innertube_key`: Innertube API key to use for all API requests +* `innertube_key`: Innertube API key to use for all API requests. By default, no API key is used * `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning #### youtubetab (YouTube playlists, channels, feeds, etc.) @@ -1859,6 +1861,9 @@ #### orfon (orf:on) #### bilibili * `prefer_multi_flv`: Prefer extracting flv formats over mp4 for older videos that still provide legacy formats +#### digitalconcerthall +* `prefer_combined_hls`: Prefer extracting combined/pre-merged video and audio HLS formats. This will exclude 4K/HEVC video and lossless/FLAC audio formats, which are only available as split video/audio HLS formats + **Note**: These options may be changed/removed in the future without concern for backward compatibility @@ -2219,12 +2224,13 @@ ### Differences in default behavior * yt-dlp versions between 2021.11.10 and 2023.06.21 estimated `filesize_approx` values for fragmented/manifest formats. This was added for convenience in [f2fe69](https://github.com/yt-dlp/yt-dlp/commit/f2fe69c7b0d208bdb1f6292b4ae92bc1e1a7444a), but was reverted in [0dff8e](https://github.com/yt-dlp/yt-dlp/commit/0dff8e4d1e6e9fb938f4256ea9af7d81f42fd54f) due to the potentially extreme inaccuracy of the estimated values. Use `--compat-options manifest-filesize-approx` to keep extracting the estimated values * yt-dlp uses modern http client backends such as `requests`. Use `--compat-options prefer-legacy-http-handler` to prefer the legacy http handler (`urllib`) to be used for standard http requests. * The sub-modules `swfinterp`, `casefold` are removed. +* Passing `--simulate` (or calling `extract_info` with `download=False`) no longer alters the default format selection. See [#9843](https://github.com/yt-dlp/yt-dlp/issues/9843) for details. For ease of use, a few more compat options are available: -* `--compat-options all`: Use all compat options (Do NOT use) -* `--compat-options youtube-dl`: Same as `--compat-options all,-multistreams,-playlist-match-filter,-manifest-filesize-approx` -* `--compat-options youtube-dlc`: Same as `--compat-options all,-no-live-chat,-no-youtube-channel-redirect,-playlist-match-filter,-manifest-filesize-approx` +* `--compat-options all`: Use all compat options (**Do NOT use this!**) +* `--compat-options youtube-dl`: Same as `--compat-options all,-multistreams,-playlist-match-filter,-manifest-filesize-approx,-allow-unsafe-ext` +* `--compat-options youtube-dlc`: Same as `--compat-options all,-no-live-chat,-no-youtube-channel-redirect,-playlist-match-filter,-manifest-filesize-approx,-allow-unsafe-ext` * `--compat-options 2021`: Same as `--compat-options 2022,no-certifi,filename-sanitization,no-youtube-prefer-utc-upload-date` * `--compat-options 2022`: Same as `--compat-options 2023,playlist-match-filter,no-external-downloader-progress,prefer-legacy-http-handler,manifest-filesize-approx` * `--compat-options 2023`: Currently does nothing. Use this to enable all future compat options diff --git a/bundle/docker/static/entrypoint.sh b/bundle/docker/static/entrypoint.sh index 93d84fa9b..220275974 100755 --- a/bundle/docker/static/entrypoint.sh +++ b/bundle/docker/static/entrypoint.sh @@ -2,7 +2,7 @@ set -e source ~/.local/share/pipx/venvs/pyinstaller/bin/activate -python -m devscripts.install_deps --include secretstorage +python -m devscripts.install_deps --include secretstorage --include curl-cffi python -m devscripts.make_lazy_extractors python devscripts/update-version.py -c "${channel}" -r "${origin}" "${version}" python -m bundle.pyinstaller diff --git a/devscripts/changelog_override.json b/devscripts/changelog_override.json index ced38a0dd..5189de2d7 100644 --- a/devscripts/changelog_override.json +++ b/devscripts/changelog_override.json @@ -179,6 +179,11 @@ { "action": "add", "when": "6aaf96a3d6e7d0d426e97e11a2fcf52fda00e733", - "short": "[priority] Security: [[CVE-2024-10123](https://nvd.nist.gov/vuln/detail/CVE-2024-10123)] [Properly sanitize file-extension to prevent file system modification and RCE](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4j)\n - Unsafe extensions are now blocked from being downloaded" + "short": "[priority] Security: [[CVE-2024-38519](https://nvd.nist.gov/vuln/detail/CVE-2024-38519)] [Properly sanitize file-extension to prevent file system modification and RCE](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4j)\n - Unsafe extensions are now blocked from being downloaded" + }, + { + "action": "add", + "when": "6075a029dba70a89675ae1250e7cdfd91f0eba41", + "short": "[priority] Security: [[ie/douyutv] Do not use dangerous javascript source/URL](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-3v33-3wmw-3785)\n - A dependency on potentially malicious third-party JavaScript code has been removed from the Douyu extractors" } ] diff --git a/pyproject.toml b/pyproject.toml index 39986a355..d5480e1c6 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -9,6 +9,7 @@ maintainers = [ {name = "Grub4K", email = "contact@grub4k.xyz"}, {name = "bashonly", email = "bashonly@protonmail.com"}, {name = "coletdjnz", email = "coletdjnz@protonmail.com"}, + {name = "sepro", email = "sepro@sepr0.com"}, ] description = "A feature-rich command-line audio/video downloader" readme = "README.md" @@ -53,7 +54,10 @@ dependencies = [ [project.optional-dependencies] default = [] -curl-cffi = ["curl-cffi==0.5.10; implementation_name=='cpython'"] +curl-cffi = [ + "curl-cffi==0.5.10; os_name=='nt' and implementation_name=='cpython'", + "curl-cffi>=0.5.10,!=0.6.*,<0.8; os_name!='nt' and implementation_name=='cpython'", +] secretstorage = [ "cffi", "secretstorage", @@ -62,7 +66,7 @@ build = [ "build", "hatchling", "pip", - "setuptools", + "setuptools>=71.0.2", # 71.0.0 broke pyinstaller "wheel", ] dev = [ diff --git a/supportedsites.md b/supportedsites.md index 656366b4a..e3bbe03ec 100644 --- a/supportedsites.md +++ b/supportedsites.md @@ -354,7 +354,6 @@ # Supported sites - **DigitallySpeaking** - **Digiteka** - **DiscogsReleasePlaylist** - - **Discovery** - **DiscoveryLife** - **DiscoveryNetworksDe** - **DiscoveryPlus** @@ -363,7 +362,6 @@ # Supported sites - **DiscoveryPlusItaly** - **DiscoveryPlusItalyShow** - **Disney** - - **DIYNetwork** - **dlf** - **dlf:corpus**: DLF Multi-feed Archives - **dlive:stream** @@ -516,7 +514,6 @@ # Supported sites - **GlattvisionTVLive**: [*glattvisiontv*](## "netrc machine") - **GlattvisionTVRecordings**: [*glattvisiontv*](## "netrc machine") - **Glide**: Glide mobile video messages (glide.me) - - **GlobalCyclingNetworkPlus** - **GlobalPlayerAudio** - **GlobalPlayerAudioEpisode** - **GlobalPlayerLive** @@ -658,10 +655,11 @@ # Supported sites - **Ketnet** - **khanacademy** - **khanacademy:unit** - - **Kick** + - **kick:clips** + - **kick:live** + - **kick:vod** - **Kicker** - **KickStarter** - - **KickVOD** - **kinja:embed** - **KinoPoisk** - **Kommunetv** @@ -693,6 +691,7 @@ # Supported sites - **Lcp** - **LcpPlay** - **Le**: 乐视网 + - **LearningOnScreen** - **Lecture2Go**: (**Currently broken**) - **Lecturio**: [*lecturio*](## "netrc machine") - **LecturioCourse**: [*lecturio*](## "netrc machine") @@ -820,8 +819,6 @@ # Supported sites - **MotherlessGroup** - **MotherlessUploader** - **Motorsport**: motorsport.com (**Currently broken**) - - **MotorTrend** - - **MotorTrendOnDemand** - **MovieFap** - **Moviepilot** - **MoviewPlay** @@ -839,7 +836,7 @@ # Supported sites - **MTVUutisetArticle**: (**Currently broken**) - **MuenchenTV**: münchen.tv (**Currently broken**) - **MujRozhlas** - - **Murrtube**: (**Currently broken**) + - **Murrtube** - **MurrtubeUser**: Murrtube user profile (**Currently broken**) - **MuseAI** - **MuseScore** @@ -1145,7 +1142,6 @@ # Supported sites - **QuantumTV**: [*quantumtv*](## "netrc machine") - **QuantumTVLive**: [*quantumtv*](## "netrc machine") - **QuantumTVRecordings**: [*quantumtv*](## "netrc machine") - - **Qub** - **R7**: (**Currently broken**) - **R7Article**: (**Currently broken**) - **Radiko** @@ -1522,9 +1518,9 @@ # Supported sites - **tv5unis** - **tv5unis:video** - **tv8.it** - - **TVA** - **TVANouvelles** - **TVANouvellesArticle** + - **tvaplus**: TVA+ - **TVC** - **TVCArticle** - **TVer** @@ -1618,6 +1614,7 @@ # Supported sites - **VidLii** - **Vidly** - **vids.io** + - **Vidyard** - **viewlift** - **viewlift:embed** - **Viidea** @@ -1665,6 +1662,8 @@ # Supported sites - **VRT**: VRT NWS, Flanders News, Flandern Info and Sporza - **VrtNU**: [*vrtnu*](## "netrc machine") VRT MAX - **VTM**: (**Currently broken**) + - **VTV** + - **VTVGo** - **VTXTV**: [*vtxtv*](## "netrc machine") - **VTXTVLive**: [*vtxtv*](## "netrc machine") - **VTXTVRecordings**: [*vtxtv*](## "netrc machine") diff --git a/test/test_YoutubeDL.py b/test/test_YoutubeDL.py index 841ce1af3..1847c4ffd 100644 --- a/test/test_YoutubeDL.py +++ b/test/test_YoutubeDL.py @@ -4,6 +4,7 @@ import os import sys import unittest +from unittest.mock import patch sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) @@ -520,7 +521,33 @@ def test_format_filtering(self): ydl.process_ie_result(info_dict) self.assertEqual(ydl.downloaded_info_dicts, []) - def test_default_format_spec(self): + @patch('yt_dlp.postprocessor.ffmpeg.FFmpegMergerPP.available', False) + def test_default_format_spec_without_ffmpeg(self): + ydl = YDL({}) + self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio') + + ydl = YDL({'simulate': True}) + self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio') + + ydl = YDL({}) + self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio') + + ydl = YDL({'simulate': True}) + self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio') + + ydl = YDL({'outtmpl': '-'}) + self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio') + + ydl = YDL({}) + self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio') + self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio') + + @patch('yt_dlp.postprocessor.ffmpeg.FFmpegMergerPP.available', True) + @patch('yt_dlp.postprocessor.ffmpeg.FFmpegMergerPP.can_merge', lambda _: True) + def test_default_format_spec_with_ffmpeg(self): + ydl = YDL({}) + self.assertEqual(ydl._default_format_spec({}), 'bestvideo*+bestaudio/best') + ydl = YDL({'simulate': True}) self.assertEqual(ydl._default_format_spec({}), 'bestvideo*+bestaudio/best') @@ -528,13 +555,13 @@ def test_default_format_spec(self): self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio') ydl = YDL({'simulate': True}) - self.assertEqual(ydl._default_format_spec({'is_live': True}), 'bestvideo*+bestaudio/best') + self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio') ydl = YDL({'outtmpl': '-'}) self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio') ydl = YDL({}) - self.assertEqual(ydl._default_format_spec({}, download=False), 'bestvideo*+bestaudio/best') + self.assertEqual(ydl._default_format_spec({}), 'bestvideo*+bestaudio/best') self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio') diff --git a/test/test_jsinterp.py b/test/test_jsinterp.py index 7c556e461..06840ed85 100644 --- a/test/test_jsinterp.py +++ b/test/test_jsinterp.py @@ -376,6 +376,61 @@ def test_packed(self): jsi = JSInterpreter('''function f(p,a,c,k,e,d){while(c--)if(k[c])p=p.replace(new RegExp('\\b'+c.toString(a)+'\\b','g'),k[c]);return p}''') self.assertEqual(jsi.call_function('f', '''h 7=g("1j");7.7h({7g:[{33:"w://7f-7e-7d-7c.v.7b/7a/79/78/77/76.74?t=73&s=2s&e=72&f=2t&71=70.0.0.1&6z=6y&6x=6w"}],6v:"w://32.v.u/6u.31",16:"r%",15:"r%",6t:"6s",6r:"",6q:"l",6p:"l",6o:"6n",6m:\'6l\',6k:"6j",9:[{33:"/2u?b=6i&n=50&6h=w://32.v.u/6g.31",6f:"6e"}],1y:{6d:1,6c:\'#6b\',6a:\'#69\',68:"67",66:30,65:r,},"64":{63:"%62 2m%m%61%5z%5y%5x.u%5w%5v%5u.2y%22 2k%m%1o%22 5t%m%1o%22 5s%m%1o%22 2j%m%5r%22 16%m%5q%22 15%m%5p%22 5o%2z%5n%5m%2z",5l:"w://v.u/d/1k/5k.2y",5j:[]},\'5i\':{"5h":"5g"},5f:"5e",5d:"w://v.u",5c:{},5b:l,1x:[0.25,0.50,0.75,1,1.25,1.5,2]});h 1m,1n,5a;h 59=0,58=0;h 7=g("1j");h 2x=0,57=0,56=0;$.55({54:{\'53-52\':\'2i-51\'}});7.j(\'4z\',6(x){c(5>0&&x.1l>=5&&1n!=1){1n=1;$(\'q.4y\').4x(\'4w\')}});7.j(\'13\',6(x){2x=x.1l});7.j(\'2g\',6(x){2w(x)});7.j(\'4v\',6(){$(\'q.2v\').4u()});6 2w(x){$(\'q.2v\').4t();c(1m)19;1m=1;17=0;c(4s.4r===l){17=1}$.4q(\'/2u?b=4p&2l=1k&4o=2t-4n-4m-2s-4l&4k=&4j=&4i=&17=\'+17,6(2r){$(\'#4h\').4g(2r)});$(\'.3-8-4f-4e:4d("4c")\').2h(6(e){2q();g().4b(0);g().4a(l)});6 2q(){h $14=$("").2p({1l:"49",16:"r%",15:"r%",48:0,2n:0,2o:47,46:"45(10%, 10%, 10%, 0.4)","44-43":"42"});$("<41 />").2p({16:"60%",15:"60%",2o:40,"3z-2n":"3y"}).3x({\'2m\':\'/?b=3w&2l=1k\',\'2k\':\'0\',\'2j\':\'2i\'}).2f($14);$14.2h(6(){$(3v).3u();g().2g()});$14.2f($(\'#1j\'))}g().13(0);}6 3t(){h 9=7.1b(2e);2d.2c(9);c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==2e){2d.2c(\'!!=\'+i);7.1p(i)}}}}7.j(\'3s\',6(){g().1h("/2a/3r.29","3q 10 28",6(){g().13(g().27()+10)},"2b");$("q[26=2b]").23().21(\'.3-20-1z\');g().1h("/2a/3p.29","3o 10 28",6(){h 12=g().27()-10;c(12<0)12=0;g().13(12)},"24");$("q[26=24]").23().21(\'.3-20-1z\');});6 1i(){}7.j(\'3n\',6(){1i()});7.j(\'3m\',6(){1i()});7.j("k",6(y){h 9=7.1b();c(9.n<2)19;$(\'.3-8-3l-3k\').3j(6(){$(\'#3-8-a-k\').1e(\'3-8-a-z\');$(\'.3-a-k\').p(\'o-1f\',\'11\')});7.1h("/3i/3h.3g","3f 3e",6(){$(\'.3-1w\').3d(\'3-8-1v\');$(\'.3-8-1y, .3-8-1x\').p(\'o-1g\',\'11\');c($(\'.3-1w\').3c(\'3-8-1v\')){$(\'.3-a-k\').p(\'o-1g\',\'l\');$(\'.3-a-k\').p(\'o-1f\',\'l\');$(\'.3-8-a\').1e(\'3-8-a-z\');$(\'.3-8-a:1u\').3b(\'3-8-a-z\')}3a{$(\'.3-a-k\').p(\'o-1g\',\'11\');$(\'.3-a-k\').p(\'o-1f\',\'11\');$(\'.3-8-a:1u\').1e(\'3-8-a-z\')}},"39");7.j("38",6(y){1d.37(\'1c\',y.9[y.36].1a)});c(1d.1t(\'1c\')){35("1s(1d.1t(\'1c\'));",34)}});h 18;6 1s(1q){h 9=7.1b();c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==1q){c(i==18){19}18=i;7.1p(i)}}}}',36,270,'|||jw|||function|player|settings|tracks|submenu||if||||jwplayer|var||on|audioTracks|true|3D|length|aria|attr|div|100|||sx|filemoon|https||event|active||false|tt|seek|dd|height|width|adb|current_audio|return|name|getAudioTracks|default_audio|localStorage|removeClass|expanded|checked|addButton|callMeMaybe|vplayer|0fxcyc2ajhp1|position|vvplay|vvad|220|setCurrentAudioTrack|audio_name|for|audio_set|getItem|last|open|controls|playbackRates|captions|rewind|icon|insertAfter||detach|ff00||button|getPosition|sec|png|player8|ff11|log|console|track_name|appendTo|play|click|no|scrolling|frameborder|file_code|src|top|zIndex|css|showCCform|data|1662367683|383371|dl|video_ad|doPlay|prevt|mp4|3E||jpg|thumbs|file|300|setTimeout|currentTrack|setItem|audioTrackChanged|dualSound|else|addClass|hasClass|toggleClass|Track|Audio|svg|dualy|images|mousedown|buttons|topbar|playAttemptFailed|beforePlay|Rewind|fr|Forward|ff|ready|set_audio_track|remove|this|upload_srt|prop|50px|margin|1000001|iframe|center|align|text|rgba|background|1000000|left|absolute|pause|setCurrentCaptions|Upload|contains|item|content|html|fviews|referer|prem|embed|3e57249ef633e0d03bf76ceb8d8a4b65|216|83|hash|view|get|TokenZir|window|hide|show|complete|slow|fadeIn|video_ad_fadein|time||cache|Cache|Content|headers|ajaxSetup|v2done|tott|vastdone2|vastdone1|vvbefore|playbackRateControls|cast|aboutlink|FileMoon|abouttext|UHD|1870|qualityLabels|sites|GNOME_POWER|link|2Fiframe|3C|allowfullscreen|22360|22640|22no|marginheight|marginwidth|2FGNOME_POWER|2F0fxcyc2ajhp1|2Fe|2Ffilemoon|2F|3A||22https|3Ciframe|code|sharing|fontOpacity|backgroundOpacity|Tahoma|fontFamily|303030|backgroundColor|FFFFFF|color|userFontScale|thumbnails|kind|0fxcyc2ajhp10000|url|get_slides|start|startparam|none|preload|html5|primary|hlshtml|androidhls|duration|uniform|stretching|0fxcyc2ajhp1_xt|image|2048|sp|6871|asn|127|srv|43200|_g3XlBcu2lmD9oDexD2NLWSmah2Nu3XcDrl93m9PwXY|m3u8||master|0fxcyc2ajhp1_x|00076|01|hls2|to|s01|delivery|storage|moon|sources|setup'''.split('|'))) + def test_join(self): + test_input = list('test') + tests = [ + 'function f(a, b){return a.join(b)}', + 'function f(a, b){return Array.prototype.join.call(a, b)}', + 'function f(a, b){return Array.prototype.join.apply(a, [b])}', + ] + for test in tests: + jsi = JSInterpreter(test) + self._test(jsi, 'test', args=[test_input, '']) + self._test(jsi, 't-e-s-t', args=[test_input, '-']) + self._test(jsi, '', args=[[], '-']) + + def test_split(self): + test_result = list('test') + tests = [ + 'function f(a, b){return a.split(b)}', + 'function f(a, b){return String.prototype.split.call(a, b)}', + 'function f(a, b){return String.prototype.split.apply(a, [b])}', + ] + for test in tests: + jsi = JSInterpreter(test) + self._test(jsi, test_result, args=['test', '']) + self._test(jsi, test_result, args=['t-e-s-t', '-']) + self._test(jsi, [''], args=['', '-']) + self._test(jsi, [], args=['', '']) + + def test_slice(self): + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice()}', [0, 1, 2, 3, 4, 5, 6, 7, 8]) + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(0)}', [0, 1, 2, 3, 4, 5, 6, 7, 8]) + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(5)}', [5, 6, 7, 8]) + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(99)}', []) + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(-2)}', [7, 8]) + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(-99)}', [0, 1, 2, 3, 4, 5, 6, 7, 8]) + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(0, 0)}', []) + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(1, 0)}', []) + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(0, 1)}', [0]) + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(3, 6)}', [3, 4, 5]) + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(1, -1)}', [1, 2, 3, 4, 5, 6, 7]) + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(-1, 1)}', []) + self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(-3, -1)}', [6, 7]) + self._test('function f(){return "012345678".slice()}', '012345678') + self._test('function f(){return "012345678".slice(0)}', '012345678') + self._test('function f(){return "012345678".slice(5)}', '5678') + self._test('function f(){return "012345678".slice(99)}', '') + self._test('function f(){return "012345678".slice(-2)}', '78') + self._test('function f(){return "012345678".slice(-99)}', '012345678') + self._test('function f(){return "012345678".slice(0, 0)}', '') + self._test('function f(){return "012345678".slice(1, 0)}', '') + self._test('function f(){return "012345678".slice(0, 1)}', '0') + self._test('function f(){return "012345678".slice(3, 6)}', '345') + self._test('function f(){return "012345678".slice(1, -1)}', '1234567') + self._test('function f(){return "012345678".slice(-1, 1)}', '') + self._test('function f(){return "012345678".slice(-3, -1)}', '67') + if __name__ == '__main__': unittest.main() diff --git a/test/test_networking.py b/test/test_networking.py index af3ece3b4..826f11a56 100644 --- a/test/test_networking.py +++ b/test/test_networking.py @@ -265,6 +265,11 @@ def do_GET(self): self.end_headers() self.wfile.write(payload) self.finish() + elif self.path == '/get_cookie': + self.send_response(200) + self.send_header('Set-Cookie', 'test=ytdlp; path=/') + self.end_headers() + self.finish() else: self._status(404) @@ -338,6 +343,52 @@ def test_ssl_error(self, handler): validate_and_send(rh, Request(f'https://127.0.0.1:{https_port}/headers')) assert not issubclass(exc_info.type, CertificateVerifyError) + @pytest.mark.skip_handler('CurlCFFI', 'legacy_ssl ignored by CurlCFFI') + def test_legacy_ssl_extension(self, handler): + # HTTPS server with old ciphers + # XXX: is there a better way to test this than to create a new server? + https_httpd = http.server.ThreadingHTTPServer( + ('127.0.0.1', 0), HTTPTestRequestHandler) + sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER) + sslctx.maximum_version = ssl.TLSVersion.TLSv1_2 + sslctx.set_ciphers('SHA1:AESCCM:aDSS:eNULL:aNULL') + sslctx.load_cert_chain(os.path.join(TEST_DIR, 'testcert.pem'), None) + https_httpd.socket = sslctx.wrap_socket(https_httpd.socket, server_side=True) + https_port = http_server_port(https_httpd) + https_server_thread = threading.Thread(target=https_httpd.serve_forever) + https_server_thread.daemon = True + https_server_thread.start() + + with handler(verify=False) as rh: + res = validate_and_send(rh, Request(f'https://127.0.0.1:{https_port}/headers', extensions={'legacy_ssl': True})) + assert res.status == 200 + res.close() + + # Ensure only applies to request extension + with pytest.raises(SSLError): + validate_and_send(rh, Request(f'https://127.0.0.1:{https_port}/headers')) + + @pytest.mark.skip_handler('CurlCFFI', 'legacy_ssl ignored by CurlCFFI') + def test_legacy_ssl_support(self, handler): + # HTTPS server with old ciphers + # XXX: is there a better way to test this than to create a new server? + https_httpd = http.server.ThreadingHTTPServer( + ('127.0.0.1', 0), HTTPTestRequestHandler) + sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER) + sslctx.maximum_version = ssl.TLSVersion.TLSv1_2 + sslctx.set_ciphers('SHA1:AESCCM:aDSS:eNULL:aNULL') + sslctx.load_cert_chain(os.path.join(TEST_DIR, 'testcert.pem'), None) + https_httpd.socket = sslctx.wrap_socket(https_httpd.socket, server_side=True) + https_port = http_server_port(https_httpd) + https_server_thread = threading.Thread(target=https_httpd.serve_forever) + https_server_thread.daemon = True + https_server_thread.start() + + with handler(verify=False, legacy_ssl_support=True) as rh: + res = validate_and_send(rh, Request(f'https://127.0.0.1:{https_port}/headers')) + assert res.status == 200 + res.close() + def test_percent_encode(self, handler): with handler() as rh: # Unicode characters should be encoded with uppercase percent-encoding @@ -490,6 +541,24 @@ def test_cookies(self, handler): rh, Request(f'http://127.0.0.1:{self.http_port}/headers', extensions={'cookiejar': cookiejar})).read() assert b'cookie: test=ytdlp' in data.lower() + def test_cookie_sync_only_cookiejar(self, handler): + # Ensure that cookies are ONLY being handled by the cookiejar + with handler() as rh: + validate_and_send(rh, Request(f'http://127.0.0.1:{self.http_port}/get_cookie', extensions={'cookiejar': YoutubeDLCookieJar()})) + data = validate_and_send(rh, Request(f'http://127.0.0.1:{self.http_port}/headers', extensions={'cookiejar': YoutubeDLCookieJar()})).read() + assert b'cookie: test=ytdlp' not in data.lower() + + def test_cookie_sync_delete_cookie(self, handler): + # Ensure that cookies are ONLY being handled by the cookiejar + cookiejar = YoutubeDLCookieJar() + with handler(cookiejar=cookiejar) as rh: + validate_and_send(rh, Request(f'http://127.0.0.1:{self.http_port}/get_cookie')) + data = validate_and_send(rh, Request(f'http://127.0.0.1:{self.http_port}/headers')).read() + assert b'cookie: test=ytdlp' in data.lower() + cookiejar.clear_session_cookies() + data = validate_and_send(rh, Request(f'http://127.0.0.1:{self.http_port}/headers')).read() + assert b'cookie: test=ytdlp' not in data.lower() + def test_headers(self, handler): with handler(headers=HTTPHeaderDict({'test1': 'test', 'test2': 'test2'})) as rh: @@ -914,7 +983,6 @@ def mock_close(*args, **kwargs): class TestCurlCFFIRequestHandler(TestRequestHandlerBase): @pytest.mark.parametrize('params,extensions', [ - ({}, {'impersonate': ImpersonateTarget('chrome')}), ({'impersonate': ImpersonateTarget('chrome', '110')}, {}), ({'impersonate': ImpersonateTarget('chrome', '99')}, {'impersonate': ImpersonateTarget('chrome', '110')}), ]) @@ -1200,6 +1268,9 @@ class HTTPSupportedRH(ValidationRH): ({'timeout': 1}, False), ({'timeout': 'notatimeout'}, AssertionError), ({'unsupported': 'value'}, UnsupportedRequest), + ({'legacy_ssl': False}, False), + ({'legacy_ssl': True}, False), + ({'legacy_ssl': 'notabool'}, AssertionError), ]), ('Requests', 'http', [ ({'cookiejar': 'notacookiejar'}, AssertionError), @@ -1207,6 +1278,9 @@ class HTTPSupportedRH(ValidationRH): ({'timeout': 1}, False), ({'timeout': 'notatimeout'}, AssertionError), ({'unsupported': 'value'}, UnsupportedRequest), + ({'legacy_ssl': False}, False), + ({'legacy_ssl': True}, False), + ({'legacy_ssl': 'notabool'}, AssertionError), ]), ('CurlCFFI', 'http', [ ({'cookiejar': 'notacookiejar'}, AssertionError), @@ -1220,6 +1294,9 @@ class HTTPSupportedRH(ValidationRH): ({'impersonate': ImpersonateTarget(None, None, None, None)}, False), ({'impersonate': ImpersonateTarget()}, False), ({'impersonate': 'chrome'}, AssertionError), + ({'legacy_ssl': False}, False), + ({'legacy_ssl': True}, False), + ({'legacy_ssl': 'notabool'}, AssertionError), ]), (NoCheckRH, 'http', [ ({'cookiejar': 'notacookiejar'}, False), @@ -1228,6 +1305,9 @@ class HTTPSupportedRH(ValidationRH): ('Websockets', 'ws', [ ({'cookiejar': YoutubeDLCookieJar()}, False), ({'timeout': 2}, False), + ({'legacy_ssl': False}, False), + ({'legacy_ssl': True}, False), + ({'legacy_ssl': 'notabool'}, AssertionError), ]), ] diff --git a/test/test_utils.py b/test/test_utils.py index 3ff1f8b55..a2b459352 100644 --- a/test/test_utils.py +++ b/test/test_utils.py @@ -444,6 +444,8 @@ def test_unified_timestamps(self): self.assertEqual(unified_timestamp('Sep 11, 2013 | 5:49 AM'), 1378878540) self.assertEqual(unified_timestamp('December 15, 2017 at 7:49 am'), 1513324140) self.assertEqual(unified_timestamp('2018-03-14T08:32:43.1493874+00:00'), 1521016363) + self.assertEqual(unified_timestamp('Sunday, 26 Nov 2006, 19:00'), 1164567600) + self.assertEqual(unified_timestamp('wed, aug 16, 2008, 12:00pm'), 1218931200) self.assertEqual(unified_timestamp('December 31 1969 20:00:01 EDT'), 1) self.assertEqual(unified_timestamp('Wednesday 31 December 1969 18:01:26 MDT'), 86) @@ -929,6 +931,11 @@ def test_parse_codecs(self): 'acodec': 'none', 'dynamic_range': 'DV', }) + self.assertEqual(parse_codecs('fLaC'), { + 'vcodec': 'none', + 'acodec': 'flac', + 'dynamic_range': None, + }) self.assertEqual(parse_codecs('theora, vorbis'), { 'vcodec': 'theora', 'acodec': 'vorbis', diff --git a/test/test_websockets.py b/test/test_websockets.py index 5f101abcc..43f20ac65 100644 --- a/test/test_websockets.py +++ b/test/test_websockets.py @@ -61,6 +61,10 @@ def process_request(self, request): return websockets.http11.Response( status.value, status.phrase, websockets.datastructures.Headers([('Location', '/')]), b'') return self.protocol.reject(status.value, status.phrase) + elif request.path.startswith('/get_cookie'): + response = self.protocol.accept(request) + response.headers['Set-Cookie'] = 'test=ytdlp' + return response return self.protocol.accept(request) @@ -102,6 +106,15 @@ def create_mtls_wss_websocket_server(): return create_websocket_server(ssl_context=sslctx) +def create_legacy_wss_websocket_server(): + certfn = os.path.join(TEST_DIR, 'testcert.pem') + sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER) + sslctx.maximum_version = ssl.TLSVersion.TLSv1_2 + sslctx.set_ciphers('SHA1:AESCCM:aDSS:eNULL:aNULL') + sslctx.load_cert_chain(certfn, None) + return create_websocket_server(ssl_context=sslctx) + + def ws_validate_and_send(rh, req): rh.validate(req) max_tries = 3 @@ -132,6 +145,9 @@ def setup_class(cls): cls.mtls_wss_thread, cls.mtls_wss_port = create_mtls_wss_websocket_server() cls.mtls_wss_base_url = f'wss://127.0.0.1:{cls.mtls_wss_port}' + cls.legacy_wss_thread, cls.legacy_wss_port = create_legacy_wss_websocket_server() + cls.legacy_wss_host = f'wss://127.0.0.1:{cls.legacy_wss_port}' + def test_basic_websockets(self, handler): with handler() as rh: ws = ws_validate_and_send(rh, Request(self.ws_base_url)) @@ -166,6 +182,22 @@ def test_ssl_error(self, handler): ws_validate_and_send(rh, Request(self.bad_wss_host)) assert not issubclass(exc_info.type, CertificateVerifyError) + def test_legacy_ssl_extension(self, handler): + with handler(verify=False) as rh: + ws = ws_validate_and_send(rh, Request(self.legacy_wss_host, extensions={'legacy_ssl': True})) + assert ws.status == 101 + ws.close() + + # Ensure only applies to request extension + with pytest.raises(SSLError): + ws_validate_and_send(rh, Request(self.legacy_wss_host)) + + def test_legacy_ssl_support(self, handler): + with handler(verify=False, legacy_ssl_support=True) as rh: + ws = ws_validate_and_send(rh, Request(self.legacy_wss_host)) + assert ws.status == 101 + ws.close() + @pytest.mark.parametrize('path,expected', [ # Unicode characters should be encoded with uppercase percent-encoding ('/中文', '/%E4%B8%AD%E6%96%87'), @@ -248,6 +280,32 @@ def test_cookies(self, handler): assert json.loads(ws.recv())['cookie'] == 'test=ytdlp' ws.close() + @pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets') + def test_cookie_sync_only_cookiejar(self, handler): + # Ensure that cookies are ONLY being handled by the cookiejar + with handler() as rh: + ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie', extensions={'cookiejar': YoutubeDLCookieJar()})) + ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': YoutubeDLCookieJar()})) + ws.send('headers') + assert 'cookie' not in json.loads(ws.recv()) + ws.close() + + @pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets') + def test_cookie_sync_delete_cookie(self, handler): + # Ensure that cookies are ONLY being handled by the cookiejar + cookiejar = YoutubeDLCookieJar() + with handler(verbose=True, cookiejar=cookiejar) as rh: + ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie')) + ws = ws_validate_and_send(rh, Request(self.ws_base_url)) + ws.send('headers') + assert json.loads(ws.recv())['cookie'] == 'test=ytdlp' + ws.close() + cookiejar.clear_session_cookies() + ws = ws_validate_and_send(rh, Request(self.ws_base_url)) + ws.send('headers') + assert 'cookie' not in json.loads(ws.recv()) + ws.close() + def test_source_address(self, handler): source_address = f'127.0.0.{random.randint(5, 255)}' verify_address_availability(source_address) diff --git a/test/test_youtube_signature.py b/test/test_youtube_signature.py index b0f3269e1..0f7ae34f4 100644 --- a/test/test_youtube_signature.py +++ b/test/test_youtube_signature.py @@ -167,6 +167,22 @@ 'https://www.youtube.com/s/player/590f65a6/player_ias.vflset/en_US/base.js', '1tm7-g_A9zsI8_Lay_', 'xI4Vem4Put_rOg', ), + ( + 'https://www.youtube.com/s/player/b22ef6e7/player_ias.vflset/en_US/base.js', + 'b6HcntHGkvBLk_FRf', 'kNPW6A7FyP2l8A', + ), + ( + 'https://www.youtube.com/s/player/3400486c/player_ias.vflset/en_US/base.js', + 'lL46g3XifCKUZn1Xfw', 'z767lhet6V2Skl', + ), + ( + 'https://www.youtube.com/s/player/20dfca59/player_ias.vflset/en_US/base.js', + '-fLCxedkAk4LUTK2', 'O8kfRq1y1eyHGw', + ), + ( + 'https://www.youtube.com/s/player/b12cc44b/player_ias.vflset/en_US/base.js', + 'keLa5R2U00sR9SQK', 'N1OGyujjEwMnLw', + ), ] diff --git a/yt_dlp/YoutubeDL.py b/yt_dlp/YoutubeDL.py index e56c3ed3c..9691a1ea7 100644 --- a/yt_dlp/YoutubeDL.py +++ b/yt_dlp/YoutubeDL.py @@ -452,7 +452,8 @@ class YoutubeDL: Can also just be a single color policy, in which case it applies to all outputs. Valid stream names are 'stdout' and 'stderr'. - Valid color policies are one of 'always', 'auto', 'no_color' or 'never'. + Valid color policies are one of 'always', 'auto', + 'no_color', 'never', 'auto-tty' or 'no_color-tty'. geo_bypass: Bypass geographic restriction via faking X-Forwarded-For HTTP header geo_bypass_country: @@ -659,12 +660,15 @@ def __init__(self, params=None, auto_init=True): self.params['color'] = 'no_color' term_allow_color = os.getenv('TERM', '').lower() != 'dumb' - no_color = bool(os.getenv('NO_COLOR')) + base_no_color = bool(os.getenv('NO_COLOR')) def process_color_policy(stream): stream_name = {sys.stdout: 'stdout', sys.stderr: 'stderr'}[stream] - policy = traverse_obj(self.params, ('color', (stream_name, None), {str}), get_all=False) - if policy in ('auto', None): + policy = traverse_obj(self.params, ('color', (stream_name, None), {str}, any)) or 'auto' + if policy in ('auto', 'auto-tty', 'no_color-tty'): + no_color = base_no_color + if policy.endswith('tty'): + no_color = policy.startswith('no_color') if term_allow_color and supports_terminal_sequences(stream): return 'no_color' if no_color else True return False @@ -2190,9 +2194,8 @@ def _select_formats(self, formats, selector): or all(f.get('acodec') == 'none' for f in formats)), # OR, No formats with audio })) - def _default_format_spec(self, info_dict, download=True): - download = download and not self.params.get('simulate') - prefer_best = download and ( + def _default_format_spec(self, info_dict): + prefer_best = ( self.params['outtmpl']['default'] == '-' or info_dict.get('is_live') and not self.params.get('live_from_start')) @@ -2200,7 +2203,7 @@ def can_merge(): merger = FFmpegMergerPP(self) return merger.available and merger.can_merge() - if not prefer_best and download and not can_merge(): + if not prefer_best and not can_merge(): prefer_best = True formats = self._get_formats(info_dict) evaluate_formats = lambda spec: self._select_formats(formats, self.build_format_selector(spec)) @@ -2959,7 +2962,7 @@ def is_wellformed(f): continue if format_selector is None: - req_format = self._default_format_spec(info_dict, download=download) + req_format = self._default_format_spec(info_dict) self.write_debug(f'Default format spec: {req_format}') format_selector = self.build_format_selector(req_format) @@ -3169,11 +3172,12 @@ def dl(self, name, info, subtitle=False, test=False): if test: verbose = self.params.get('verbose') + quiet = self.params.get('quiet') or not verbose params = { 'test': True, - 'quiet': self.params.get('quiet') or not verbose, + 'quiet': quiet, 'verbose': verbose, - 'noprogress': not verbose, + 'noprogress': quiet, 'nopart': True, 'skip_unavailable_fragments': False, 'keep_fragments': False, diff --git a/yt_dlp/__init__.py b/yt_dlp/__init__.py index f88f15d70..c0b8e3b50 100644 --- a/yt_dlp/__init__.py +++ b/yt_dlp/__init__.py @@ -468,7 +468,7 @@ def metadataparser_actions(f): default_downloader = ed.get_basename() for policy in opts.color.values(): - if policy not in ('always', 'auto', 'no_color', 'never'): + if policy not in ('always', 'auto', 'auto-tty', 'no_color', 'no_color-tty', 'never'): raise ValueError(f'"{policy}" is not a valid color policy') warnings, deprecation_warnings = [], [] @@ -599,7 +599,7 @@ def report_deprecation(val, old, new=None): warnings.append( 'Using allow-unsafe-ext opens you up to potential attacks. ' 'Use with great care!') - _UnsafeExtensionError.sanitize_extension = lambda x: x + _UnsafeExtensionError.sanitize_extension = lambda x, prepend=False: x return warnings, deprecation_warnings diff --git a/yt_dlp/extractor/_extractors.py b/yt_dlp/extractor/_extractors.py index 3c03c68d7..953983aa7 100644 --- a/yt_dlp/extractor/_extractors.py +++ b/yt_dlp/extractor/_extractors.py @@ -506,7 +506,6 @@ from .digitalconcerthall import DigitalConcertHallIE from .digiteka import DigitekaIE from .discogs import DiscogsReleasePlaylistIE -from .discovery import DiscoveryIE from .disney import DisneyIE from .dispeak import DigitallySpeakingIE from .dlf import ( @@ -534,16 +533,12 @@ DiscoveryPlusIndiaShowIE, DiscoveryPlusItalyIE, DiscoveryPlusItalyShowIE, - DIYNetworkIE, DPlayIE, FoodNetworkIE, - GlobalCyclingNetworkPlusIE, GoDiscoveryIE, HGTVDeIE, HGTVUsaIE, InvestigationDiscoveryIE, - MotorTrendIE, - MotorTrendOnDemandIE, ScienceChannelIE, TravelChannelIE, ) @@ -946,6 +941,7 @@ KhanAcademyUnitIE, ) from .kick import ( + KickClipIE, KickIE, KickVODIE, ) @@ -993,6 +989,7 @@ LcpIE, LcpPlayIE, ) +from .learningonscreen import LearningOnScreenIE from .lecture2go import Lecture2GoIE from .lecturio import ( LecturioCourseIE, @@ -2176,10 +2173,7 @@ TV5UnisVideoIE, ) from .tv24ua import TV24UAVideoIE -from .tva import ( - TVAIE, - QubIE, -) +from .tva import TVAIE from .tvanouvelles import ( TVANouvellesArticleIE, TVANouvellesIE, @@ -2326,6 +2320,7 @@ ) from .vidlii import VidLiiIE from .vidly import VidlyIE +from .vidyard import VidyardIE from .viewlift import ( ViewLiftEmbedIE, ViewLiftIE, @@ -2391,6 +2386,10 @@ VrtNUIE, ) from .vtm import VTMIE +from .vtv import ( + VTVIE, + VTVGoIE, +) from .vuclip import VuClipIE from .vvvvid import ( VVVVIDIE, diff --git a/yt_dlp/extractor/abematv.py b/yt_dlp/extractor/abematv.py index 293a6c40e..66ab083fe 100644 --- a/yt_dlp/extractor/abematv.py +++ b/yt_dlp/extractor/abematv.py @@ -9,12 +9,12 @@ import struct import time import urllib.parse -import urllib.request -import urllib.response import uuid from .common import InfoExtractor from ..aes import aes_ecb_decrypt +from ..networking import RequestHandler, Response +from ..networking.exceptions import TransportError from ..utils import ( ExtractorError, OnDemandPagedList, @@ -26,37 +26,36 @@ traverse_obj, update_url_query, ) -from ..utils.networking import clean_proxies -def add_opener(ydl, handler): # FIXME: Create proper API in .networking - """Add a handler for opening URLs, like _download_webpage""" - # https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L426 - # https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L605 - rh = ydl._request_director.handlers['Urllib'] - if 'abematv-license' in rh._SUPPORTED_URL_SCHEMES: - return - headers = ydl.params['http_headers'].copy() - proxies = ydl.proxies.copy() - clean_proxies(proxies, headers) - opener = rh._get_instance(cookiejar=ydl.cookiejar, proxies=proxies) - assert isinstance(opener, urllib.request.OpenerDirector) - opener.add_handler(handler) - rh._SUPPORTED_URL_SCHEMES = (*rh._SUPPORTED_URL_SCHEMES, 'abematv-license') +class AbemaLicenseRH(RequestHandler): + _SUPPORTED_URL_SCHEMES = ('abematv-license',) + _SUPPORTED_PROXY_SCHEMES = None + _SUPPORTED_FEATURES = None + RH_NAME = 'abematv_license' + _STRTABLE = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz' + _HKEY = b'3AF0298C219469522A313570E8583005A642E73EDD58E3EA2FB7339D3DF1597E' -class AbemaLicenseHandler(urllib.request.BaseHandler): - handler_order = 499 - STRTABLE = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz' - HKEY = b'3AF0298C219469522A313570E8583005A642E73EDD58E3EA2FB7339D3DF1597E' - - def __init__(self, ie: 'AbemaTVIE'): - # the protocol that this should really handle is 'abematv-license://' - # abematv_license_open is just a placeholder for development purposes - # ref. https://github.com/python/cpython/blob/f4c03484da59049eb62a9bf7777b963e2267d187/Lib/urllib/request.py#L510 - setattr(self, 'abematv-license_open', getattr(self, 'abematv_license_open', None)) + def __init__(self, *, ie: 'AbemaTVIE', **kwargs): + super().__init__(**kwargs) self.ie = ie + def _send(self, request): + url = request.url + ticket = urllib.parse.urlparse(url).netloc + + try: + response_data = self._get_videokey_from_ticket(ticket) + except ExtractorError as e: + raise TransportError(cause=e.cause) from e + except (IndexError, KeyError, TypeError) as e: + raise TransportError(cause=repr(e)) from e + + return Response( + io.BytesIO(response_data), url, + headers={'Content-Length': str(len(response_data))}) + def _get_videokey_from_ticket(self, ticket): to_show = self.ie.get_param('verbose', False) media_token = self.ie._get_media_token(to_show=to_show) @@ -72,25 +71,17 @@ def _get_videokey_from_ticket(self, ticket): 'Content-Type': 'application/json', }) - res = decode_base_n(license_response['k'], table=self.STRTABLE) + res = decode_base_n(license_response['k'], table=self._STRTABLE) encvideokey = bytes_to_intlist(struct.pack('>QQ', res >> 64, res & 0xffffffffffffffff)) h = hmac.new( - binascii.unhexlify(self.HKEY), + binascii.unhexlify(self._HKEY), (license_response['cid'] + self.ie._DEVICE_ID).encode(), digestmod=hashlib.sha256) enckey = bytes_to_intlist(h.digest()) return intlist_to_bytes(aes_ecb_decrypt(encvideokey, enckey)) - def abematv_license_open(self, url): - url = url.get_full_url() if isinstance(url, urllib.request.Request) else url - ticket = urllib.parse.urlparse(url).netloc - response_data = self._get_videokey_from_ticket(ticket) - return urllib.response.addinfourl(io.BytesIO(response_data), headers={ - 'Content-Length': str(len(response_data)), - }, url=url, code=200) - class AbemaTVBaseIE(InfoExtractor): _NETRC_MACHINE = 'abematv' @@ -139,7 +130,7 @@ def _get_device_token(self): if self._USERTOKEN: return self._USERTOKEN - add_opener(self._downloader, AbemaLicenseHandler(self)) + self._downloader._request_director.add_handler(AbemaLicenseRH(ie=self, logger=None)) username, _ = self._get_login_info() auth_cache = username and self.cache.load(self._NETRC_MACHINE, username, min_ver='2024.01.19') @@ -368,6 +359,7 @@ def _real_extract(self, url): info['episode_number'] = epis if epis < 2000 else None is_live, m3u8_url = False, None + availability = 'public' if video_type == 'now-on-air': is_live = True channel_url = 'https://api.abema.io/v1/channels' @@ -385,10 +377,10 @@ def _real_extract(self, url): f'https://api.abema.io/v1/video/programs/{video_id}', video_id, note='Checking playability', headers=headers) - ondemand_types = traverse_obj(api_response, ('terms', ..., 'onDemandType')) - if 3 not in ondemand_types: + if not traverse_obj(api_response, ('label', 'free', {bool})): # cannot acquire decryption key for these streams self.report_warning('This is a premium-only stream') + availability = 'premium_only' info.update(traverse_obj(api_response, { 'series': ('series', 'title'), 'season': ('season', 'name'), @@ -408,6 +400,7 @@ def _real_extract(self, url): headers=headers) if not traverse_obj(api_response, ('slot', 'flags', 'timeshiftFree'), default=False): self.report_warning('This is a premium-only stream') + availability = 'premium_only' m3u8_url = f'https://vod-abematv.akamaized.net/slot/{video_id}/playlist.m3u8' else: @@ -425,6 +418,7 @@ def _real_extract(self, url): 'description': description, 'formats': formats, 'is_live': is_live, + 'availability': availability, }) return info diff --git a/yt_dlp/extractor/adn.py b/yt_dlp/extractor/adn.py index 7be990b9c..c8a261375 100644 --- a/yt_dlp/extractor/adn.py +++ b/yt_dlp/extractor/adn.py @@ -16,6 +16,7 @@ float_or_none, int_or_none, intlist_to_bytes, + join_nonempty, long_to_bytes, parse_iso8601, pkcs1pad, @@ -48,9 +49,9 @@ class ADNBaseIE(InfoExtractor): class ADNIE(ADNBaseIE): - _VALID_URL = r'https?://(?:www\.)?(?:animation|anime)digitalnetwork\.(?Pfr|de)/video/[^/?#]+/(?P\d+)' + _VALID_URL = r'https?://(?:www\.)?animationdigitalnetwork\.com/(?:(?Pde)/)?video/[^/?#]+/(?P\d+)' _TESTS = [{ - 'url': 'https://animationdigitalnetwork.fr/video/fruits-basket/9841-episode-1-a-ce-soir', + 'url': 'https://animationdigitalnetwork.com/video/558-fruits-basket/9841-episode-1-a-ce-soir', 'md5': '1c9ef066ceb302c86f80c2b371615261', 'info_dict': { 'id': '9841', @@ -70,10 +71,7 @@ class ADNIE(ADNBaseIE): }, 'skip': 'Only available in French and German speaking Europe', }, { - 'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites', - 'only_matching': True, - }, { - 'url': 'https://animationdigitalnetwork.de/video/the-eminence-in-shadow/23550-folge-1', + 'url': 'https://animationdigitalnetwork.com/de/video/973-the-eminence-in-shadow/23550-folge-1', 'md5': '5c5651bf5791fa6fcd7906012b9d94e8', 'info_dict': { 'id': '23550', @@ -166,7 +164,7 @@ def _perform_login(self, username, password): 'username': username, })) or {}).get('accessToken') if access_token: - self._HEADERS = {'authorization': 'Bearer ' + access_token} + self._HEADERS['Authorization'] = f'Bearer {access_token}' except ExtractorError as e: message = None if isinstance(e.cause, HTTPError) and e.cause.status == 401: @@ -177,6 +175,7 @@ def _perform_login(self, username, password): def _real_extract(self, url): lang, video_id = self._match_valid_url(url).group('lang', 'id') + self._HEADERS['X-Target-Distribution'] = lang or 'fr' video_base_url = self._PLAYER_BASE_URL + f'video/{video_id}/' player = self._download_json( video_base_url + 'configuration', video_id, @@ -217,7 +216,6 @@ def _real_extract(self, url): links_data = self._download_json( links_url, video_id, 'Downloading links JSON metadata', headers={ 'X-Player-Token': authorization, - 'X-Target-Distribution': lang, **self._HEADERS, }, query={ 'freeWithAds': 'true', @@ -256,6 +254,7 @@ def _real_extract(self, url): load_balancer_data = self._download_json( load_balancer_url, video_id, f'Downloading {format_id} {quality} JSON metadata', + headers=self._HEADERS, fatal=False) or {} m3u8_url = load_balancer_data.get('location') if not m3u8_url: @@ -276,7 +275,7 @@ def _real_extract(self, url): video = (self._download_json( self._API_BASE_URL + f'video/{video_id}', video_id, - 'Downloading additional video metadata', fatal=False) or {}).get('video') or {} + 'Downloading additional video metadata', fatal=False, headers=self._HEADERS) or {}).get('video') or {} show = video.get('show') or {} return { @@ -298,9 +297,9 @@ def _real_extract(self, url): class ADNSeasonIE(ADNBaseIE): - _VALID_URL = r'https?://(?:www\.)?(?:animation|anime)digitalnetwork\.(?Pfr|de)/video/(?P[^/?#]+)/?(?:$|[#?])' + _VALID_URL = r'https?://(?:www\.)?animationdigitalnetwork\.com/(?:(?Pde)/)?video/(?P\d+)[^/?#]*/?(?:$|[#?])' _TESTS = [{ - 'url': 'https://animationdigitalnetwork.fr/video/tokyo-mew-mew-new', + 'url': 'https://animationdigitalnetwork.com/video/911-tokyo-mew-mew-new', 'playlist_count': 12, 'info_dict': { 'id': '911', @@ -311,24 +310,22 @@ class ADNSeasonIE(ADNBaseIE): def _real_extract(self, url): lang, video_show_slug = self._match_valid_url(url).group('lang', 'id') + self._HEADERS['X-Target-Distribution'] = lang or 'fr' show = self._download_json( f'{self._API_BASE_URL}show/{video_show_slug}/', video_show_slug, 'Downloading show JSON metadata', headers=self._HEADERS)['show'] show_id = str(show['id']) episodes = self._download_json( f'{self._API_BASE_URL}video/show/{show_id}', video_show_slug, - 'Downloading episode list', headers={ - 'X-Target-Distribution': lang, - **self._HEADERS, - }, query={ + 'Downloading episode list', headers=self._HEADERS, query={ 'order': 'asc', 'limit': '-1', }) def entries(): for episode_id in traverse_obj(episodes, ('videos', ..., 'id', {str_or_none})): - yield self.url_result( - f'https://animationdigitalnetwork.{lang}/video/{video_show_slug}/{episode_id}', - ADNIE, episode_id) + yield self.url_result(join_nonempty( + 'https://animationdigitalnetwork.com', lang, 'video', + video_show_slug, episode_id, delim='/'), ADNIE, episode_id) return self.playlist_result(entries(), show_id, show.get('title')) diff --git a/yt_dlp/extractor/afreecatv.py b/yt_dlp/extractor/afreecatv.py index f51b5a68b..815d20537 100644 --- a/yt_dlp/extractor/afreecatv.py +++ b/yt_dlp/extractor/afreecatv.py @@ -1,6 +1,7 @@ import functools from .common import InfoExtractor +from ..networking import Request from ..utils import ( ExtractorError, OnDemandPagedList, @@ -58,6 +59,13 @@ def _perform_login(self, username, password): f'Unable to login: {self.IE_NAME} said: {error}', expected=True) + def _call_api(self, endpoint, display_id, data=None, headers=None, query=None): + return self._download_json(Request( + f'https://api.m.afreecatv.com/{endpoint}', + data=data, headers=headers, query=query, + extensions={'legacy_ssl': True}), display_id, + 'Downloading API JSON', 'Unable to download API JSON') + class AfreecaTVIE(AfreecaTVBaseIE): IE_NAME = 'afreecatv' @@ -184,12 +192,12 @@ class AfreecaTVIE(AfreecaTVBaseIE): def _real_extract(self, url): video_id = self._match_id(url) - data = self._download_json( - 'https://api.m.afreecatv.com/station/video/a/view', video_id, - headers={'Referer': url}, data=urlencode_postdata({ + data = self._call_api( + 'station/video/a/view', video_id, headers={'Referer': url}, + data=urlencode_postdata({ 'nTitleNo': video_id, 'nApiLevel': 10, - }), impersonate=True)['data'] + }))['data'] error_code = traverse_obj(data, ('code', {int})) if error_code == -6221: @@ -267,9 +275,9 @@ class AfreecaTVCatchStoryIE(AfreecaTVBaseIE): def _real_extract(self, url): video_id = self._match_id(url) - data = self._download_json( - 'https://api.m.afreecatv.com/catchstory/a/view', video_id, headers={'Referer': url}, - query={'aStoryListIdx': '', 'nStoryIdx': video_id}, impersonate=True) + data = self._call_api( + 'catchstory/a/view', video_id, headers={'Referer': url}, + query={'aStoryListIdx': '', 'nStoryIdx': video_id}) return self.playlist_result(self._entries(data), video_id) diff --git a/yt_dlp/extractor/banbye.py b/yt_dlp/extractor/banbye.py index d10bdf8da..148a1705e 100644 --- a/yt_dlp/extractor/banbye.py +++ b/yt_dlp/extractor/banbye.py @@ -4,9 +4,13 @@ from .common import InfoExtractor from ..utils import ( InAdvancePagedList, + determine_ext, format_field, + int_or_none, + join_nonempty, traverse_obj, unified_timestamp, + url_or_none, ) @@ -30,6 +34,7 @@ def _extract_playlist(self, playlist_id): class BanByeIE(BanByeBaseIE): _VALID_URL = r'https?://(?:www\.)?banbye\.com/(?:en/)?watch/(?P[\w-]+)' _TESTS = [{ + # ['src']['mp4']['levels'] direct mp4 urls only 'url': 'https://banbye.com/watch/v_ytfmvkVYLE8T', 'md5': '2f4ea15c5ca259a73d909b2cfd558eb5', 'info_dict': { @@ -58,6 +63,7 @@ class BanByeIE(BanByeBaseIE): }, 'playlist_mincount': 9, }, { + # ['src']['mp4']['levels'] direct mp4 urls only 'url': 'https://banbye.com/watch/v_kb6_o1Kyq-CD', 'info_dict': { 'id': 'v_kb6_o1Kyq-CD', @@ -77,6 +83,48 @@ class BanByeIE(BanByeBaseIE): 'view_count': int, 'comment_count': int, }, + }, { + # ['src']['hls']['levels'] variant m3u8 urls only; master m3u8 is 404 + 'url': 'https://banbye.com/watch/v_a_gPFuC9LoW5', + 'info_dict': { + 'id': 'v_a_gPFuC9LoW5', + 'ext': 'mp4', + 'title': 'md5:183524056bebdfa245fd6d214f63c0fe', + 'description': 'md5:943ac87287ca98d28d8b8797719827c6', + 'uploader': 'wRealu24', + 'channel_id': 'ch_wrealu24', + 'channel_url': 'https://banbye.com/channel/ch_wrealu24', + 'upload_date': '20231113', + 'timestamp': 1699874062, + 'view_count': int, + 'like_count': int, + 'dislike_count': int, + 'comment_count': int, + 'thumbnail': 'https://cdn.banbye.com/video/v_a_gPFuC9LoW5/96.webp', + 'tags': ['jaszczur', 'sejm', 'lewica', 'polska', 'ukrainizacja', 'pierwszeposiedzeniesejmu'], + }, + 'expected_warnings': ['Failed to download m3u8'], + }, { + # ['src']['hls']['masterPlaylist'] m3u8 only + 'url': 'https://banbye.com/watch/v_B0rsKWsr-aaa', + 'info_dict': { + 'id': 'v_B0rsKWsr-aaa', + 'ext': 'mp4', + 'title': 'md5:00b254164b82101b3f9e5326037447ed', + 'description': 'md5:3fd8b48aa81954ba024bc60f5de6e167', + 'uploader': 'PSTV Piotr Szlachtowicz ', + 'channel_id': 'ch_KV9EVObkB9wB', + 'channel_url': 'https://banbye.com/channel/ch_KV9EVObkB9wB', + 'upload_date': '20240629', + 'timestamp': 1719646816, + 'duration': 2377, + 'view_count': int, + 'like_count': int, + 'dislike_count': int, + 'comment_count': int, + 'thumbnail': 'https://cdn.banbye.com/video/v_B0rsKWsr-aaa/96.webp', + 'tags': ['Biden', 'Trump', 'Wybory', 'USA'], + }, }] def _real_extract(self, url): @@ -91,11 +139,24 @@ def _real_extract(self, url): 'id': f'{quality}p', 'url': f'{self._CDN_BASE}/video/{video_id}/{quality}.webp', } for quality in [48, 96, 144, 240, 512, 1080]] - formats = [{ - 'format_id': f'http-{quality}p', - 'quality': quality, - 'url': f'{self._CDN_BASE}/video/{video_id}/{quality}.mp4', - } for quality in data['quality']] + + formats = [] + url_data = self._download_json(f'{self._API_BASE}/videos/{video_id}/url', video_id, data=b'') + if master_url := traverse_obj(url_data, ('src', 'hls', 'masterPlaylist', {url_or_none})): + formats = self._extract_m3u8_formats(master_url, video_id, 'mp4', m3u8_id='hls', fatal=False) + + for format_id, format_url in traverse_obj(url_data, ( + 'src', ('mp4', 'hls'), 'levels', {dict.items}, lambda _, v: url_or_none(v[1]))): + ext = determine_ext(format_url) + is_hls = ext == 'm3u8' + formats.append({ + 'url': format_url, + 'ext': 'mp4' if is_hls else ext, + 'format_id': join_nonempty(is_hls and 'hls', format_id), + 'protocol': 'm3u8_native' if is_hls else 'https', + 'height': int_or_none(format_id), + }) + self._remove_duplicate_formats(formats) return { 'id': video_id, diff --git a/yt_dlp/extractor/bilibili.py b/yt_dlp/extractor/bilibili.py index 362f5fc38..72394f172 100644 --- a/yt_dlp/extractor/bilibili.py +++ b/yt_dlp/extractor/bilibili.py @@ -298,7 +298,7 @@ def _get_interactive_entries(self, video_id, cid, metainfo, headers=None): class BiliBiliIE(BilibiliBaseIE): - _VALID_URL = r'https?://(?:www\.)?bilibili\.com/(?:video/|festival/\w+\?(?:[^#]*&)?bvid=)[aAbB][vV](?P[^/?#&]+)' + _VALID_URL = r'https?://(?:www\.)?bilibili\.com/(?:video/|festival/[^/?#]+\?(?:[^#]*&)?bvid=)[aAbB][vV](?P[^/?#&]+)' _TESTS = [{ 'url': 'https://www.bilibili.com/video/BV13x41117TL', @@ -622,6 +622,10 @@ class BiliBiliIE(BilibiliBaseIE): 'ext': 'mp4', }, 'skip': 'geo-restricted', + }, { + 'note': 'has - in the last path segment of the url', + 'url': 'https://www.bilibili.com/festival/bh3-7th?bvid=BV1tr4y1f7p2&', + 'only_matching': True, }] def _real_extract(self, url): diff --git a/yt_dlp/extractor/box.py b/yt_dlp/extractor/box.py index 3547ad997..f06339f70 100644 --- a/yt_dlp/extractor/box.py +++ b/yt_dlp/extractor/box.py @@ -12,7 +12,7 @@ class BoxIE(InfoExtractor): - _VALID_URL = r'https?://(?:[^.]+\.)?app\.box\.com/s/(?P[^/?#]+)(?:/file/(?P\d+))?' + _VALID_URL = r'https?://(?:[^.]+\.)?(?Papp|ent)\.box\.com/s/(?P[^/?#]+)(?:/file/(?P\d+))?' _TESTS = [{ 'url': 'https://mlssoccer.app.box.com/s/0evd2o3e08l60lr4ygukepvnkord1o1x/file/510727257538', 'md5': '1f81b2fd3960f38a40a3b8823e5fcd43', @@ -38,10 +38,22 @@ class BoxIE(InfoExtractor): 'uploader_id': '239068974', }, 'params': {'skip_download': 'dash fragment too small'}, + }, { + 'url': 'https://thejacksonlaboratory.ent.box.com/s/2x09dm6vcg6y28o0oox1so4l0t8wzt6l/file/1536173056065', + 'info_dict': { + 'id': '1536173056065', + 'ext': 'mp4', + 'uploader_id': '18523128264', + 'uploader': 'Lexi Hennigan', + 'title': 'iPSC Symposium recording part 1.mp4', + 'timestamp': 1716228343, + 'upload_date': '20240520', + }, + 'params': {'skip_download': 'dash fragment too small'}, }] def _real_extract(self, url): - shared_name, file_id = self._match_valid_url(url).groups() + shared_name, file_id, service = self._match_valid_url(url).group('shared_name', 'id', 'service') webpage = self._download_webpage(url, file_id or shared_name) if not file_id: @@ -57,14 +69,14 @@ def _real_extract(self, url): request_token = self._search_json( r'Box\.config\s*=', webpage, 'Box config', file_id)['requestToken'] access_token = self._download_json( - 'https://app.box.com/app-api/enduserapp/elements/tokens', file_id, + f'https://{service}.box.com/app-api/enduserapp/elements/tokens', file_id, 'Downloading token JSON metadata', data=json.dumps({'fileIDs': [file_id]}).encode(), headers={ 'Content-Type': 'application/json', 'X-Request-Token': request_token, 'X-Box-EndUser-API': 'sharedName=' + shared_name, })[file_id]['read'] - shared_link = 'https://app.box.com/s/' + shared_name + shared_link = f'https://{service}.box.com/s/{shared_name}' f = self._download_json( 'https://api.box.com/2.0/files/' + file_id, file_id, 'Downloading file JSON metadata', headers={ diff --git a/yt_dlp/extractor/cbc.py b/yt_dlp/extractor/cbc.py index 1522b08e2..40224f63f 100644 --- a/yt_dlp/extractor/cbc.py +++ b/yt_dlp/extractor/cbc.py @@ -1,4 +1,5 @@ import base64 +import functools import json import re import time @@ -6,17 +7,24 @@ import xml.etree.ElementTree from .common import InfoExtractor +from ..networking import HEADRequest from ..utils import ( ExtractorError, + float_or_none, int_or_none, join_nonempty, js_to_json, + mimetype2ext, orderedSet, parse_iso8601, + replace_extension, smuggle_url, strip_or_none, traverse_obj, try_get, + update_url, + url_basename, + url_or_none, ) @@ -149,6 +157,7 @@ def _real_extract(self, url): class CBCPlayerIE(InfoExtractor): IE_NAME = 'cbc.ca:player' _VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/(?:video/)?|i/caffeine/syndicate/\?mediaId=))(?P(?:\d\.)?\d+)' + _GEO_COUNTRIES = ['CA'] _TESTS = [{ 'url': 'http://www.cbc.ca/player/play/2683190193', 'md5': '64d25f841ddf4ddb28a235338af32e2c', @@ -172,21 +181,20 @@ class CBCPlayerIE(InfoExtractor): 'description': 'md5:dd3b692f0a139b0369943150bd1c46a9', 'timestamp': 1425704400, 'upload_date': '20150307', - 'uploader': 'CBCC-NEW', - 'thumbnail': 'http://thumbnails.cbc.ca/maven_legacy/thumbnails/sonali-karnick-220.jpg', + 'thumbnail': 'https://i.cbc.ca/ais/1.2985700,1717262248558/full/max/0/default.jpg', 'chapters': [], 'duration': 494.811, - 'categories': ['AudioMobile/All in a Weekend Montreal'], - 'tags': 'count:8', + 'categories': ['All in a Weekend Montreal'], + 'tags': 'count:11', 'location': 'Quebec', 'series': 'All in a Weekend Montreal', 'season': 'Season 2015', 'season_number': 2015, 'media_type': 'Excerpt', + 'genres': ['Other'], }, }, { 'url': 'http://www.cbc.ca/i/caffeine/syndicate/?mediaId=2164402062', - 'md5': '33fcd8f6719b9dd60a5e73adcb83b9f6', 'info_dict': { 'id': '2164402062', 'ext': 'mp4', @@ -194,107 +202,168 @@ class CBCPlayerIE(InfoExtractor): 'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.', 'timestamp': 1320410746, 'upload_date': '20111104', - 'uploader': 'CBCC-NEW', - 'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/277/67/cancer_852x480_2164412612.jpg', + 'thumbnail': 'https://i.cbc.ca/ais/1.1711287,1717139372111/full/max/0/default.jpg', 'chapters': [], 'duration': 186.867, 'series': 'CBC News: Windsor at 6:00', - 'categories': ['News/Canada/Windsor'], + 'categories': ['Windsor'], 'location': 'Windsor', - 'tags': ['cancer'], - 'creators': ['Allison Johnson'], + 'tags': ['Cancer', 'News/Canada/Windsor', 'Windsor'], 'media_type': 'Excerpt', + 'genres': ['News'], }, + 'params': {'skip_download': 'm3u8'}, }, { # Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/ 'url': 'https://www.cbc.ca/player/play/1.2985700', 'md5': 'e5e708c34ae6fca156aafe17c43e8b75', 'info_dict': { - 'id': '2657631896', + 'id': '1.2985700', 'ext': 'mp3', 'title': 'CBC Montreal is organizing its first ever community hackathon!', 'description': 'The modern technology we tend to depend on so heavily, is never without it\'s share of hiccups and headaches. Next weekend - CBC Montreal will be getting members of the public for its first Hackathon.', 'timestamp': 1425704400, 'upload_date': '20150307', - 'uploader': 'CBCC-NEW', - 'thumbnail': 'http://thumbnails.cbc.ca/maven_legacy/thumbnails/sonali-karnick-220.jpg', + 'thumbnail': 'https://i.cbc.ca/ais/1.2985700,1717262248558/full/max/0/default.jpg', 'chapters': [], 'duration': 494.811, - 'categories': ['AudioMobile/All in a Weekend Montreal'], - 'tags': 'count:8', + 'categories': ['All in a Weekend Montreal'], + 'tags': 'count:11', 'location': 'Quebec', 'series': 'All in a Weekend Montreal', 'season': 'Season 2015', 'season_number': 2015, 'media_type': 'Excerpt', + 'genres': ['Other'], }, }, { 'url': 'https://www.cbc.ca/player/play/1.1711287', - 'md5': '33fcd8f6719b9dd60a5e73adcb83b9f6', 'info_dict': { - 'id': '2164402062', + 'id': '1.1711287', 'ext': 'mp4', 'title': 'Cancer survivor four times over', 'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.', 'timestamp': 1320410746, 'upload_date': '20111104', - 'uploader': 'CBCC-NEW', - 'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/277/67/cancer_852x480_2164412612.jpg', + 'thumbnail': 'https://i.cbc.ca/ais/1.1711287,1717139372111/full/max/0/default.jpg', 'chapters': [], 'duration': 186.867, 'series': 'CBC News: Windsor at 6:00', - 'categories': ['News/Canada/Windsor'], + 'categories': ['Windsor'], 'location': 'Windsor', - 'tags': ['cancer'], - 'creators': ['Allison Johnson'], + 'tags': ['Cancer', 'News/Canada/Windsor', 'Windsor'], 'media_type': 'Excerpt', + 'genres': ['News'], }, + 'params': {'skip_download': 'm3u8'}, }, { # Has subtitles # These broadcasts expire after ~1 month, can find new test URL here: # https://www.cbc.ca/player/news/TV%20Shows/The%20National/Latest%20Broadcast - 'url': 'https://www.cbc.ca/player/play/1.7159484', - 'md5': '6ed6cd0fc2ef568d2297ba68a763d455', + 'url': 'https://www.cbc.ca/player/play/video/9.6424403', + 'md5': '8025909eaffcf0adf59922904def9a5e', 'info_dict': { - 'id': '2324213316001', + 'id': '9.6424403', 'ext': 'mp4', - 'title': 'The National | School boards sue social media giants', - 'description': 'md5:4b4db69322fa32186c3ce426da07402c', - 'timestamp': 1711681200, - 'duration': 2743.400, - 'subtitles': {'eng': [{'ext': 'vtt', 'protocol': 'm3u8_native'}]}, - 'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/607/559/thumbnail.jpeg', - 'uploader': 'CBCC-NEW', + 'title': 'The National | N.W.T. wildfire emergency', + 'description': 'md5:ada33d36d1df69347ed575905bfd496c', + 'timestamp': 1718589600, + 'duration': 2692.833, + 'subtitles': { + 'en-US': [{ + 'name': 'English Captions', + 'url': 'https://cbchls.akamaized.net/delivery/news-shows/2024/06/17/NAT_JUN16-00-55-00/NAT_JUN16_cc.vtt', + }], + }, + 'thumbnail': 'https://i.cbc.ca/ais/6272b5c6-5e78-4c05-915d-0e36672e33d1,1714756287822/full/max/0/default.jpg', 'chapters': 'count:5', - 'upload_date': '20240329', - 'categories': 'count:4', + 'upload_date': '20240617', + 'categories': ['News', 'The National', 'The National Latest Broadcasts'], 'series': 'The National - Full Show', - 'tags': 'count:1', - 'creators': ['News'], + 'tags': ['The National'], 'location': 'Canada', 'media_type': 'Full Program', + 'genres': ['News'], }, }, { 'url': 'https://www.cbc.ca/player/play/video/1.7194274', 'md5': '188b96cf6bdcb2540e178a6caa957128', 'info_dict': { - 'id': '2334524995812', + 'id': '1.7194274', 'ext': 'mp4', 'title': '#TheMoment a rare white spirit moose was spotted in Alberta', 'description': 'md5:18ae269a2d0265c5b0bbe4b2e1ac61a3', 'timestamp': 1714788791, 'duration': 77.678, 'subtitles': {'eng': [{'ext': 'vtt', 'protocol': 'm3u8_native'}]}, - 'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/201/543/THE_MOMENT.jpg', - 'uploader': 'CBCC-NEW', - 'chapters': 'count:0', - 'upload_date': '20240504', + 'thumbnail': 'https://i.cbc.ca/ais/1.7194274,1717224990425/full/max/0/default.jpg', + 'chapters': [], 'categories': 'count:3', 'series': 'The National', - 'tags': 'count:15', - 'creators': ['encoder'], + 'tags': 'count:17', 'location': 'Canada', 'media_type': 'Excerpt', + 'upload_date': '20240504', + 'genres': ['News'], + }, + }, { + 'url': 'https://www.cbc.ca/player/play/video/9.6427282', + 'info_dict': { + 'id': '9.6427282', + 'ext': 'mp4', + 'title': 'Men\'s Soccer - Argentina vs Morocco', + 'description': 'Argentina faces Morocco on the football pitch at Saint Etienne Stadium.', + 'series': 'CBC Sports', + 'media_type': 'Event Coverage', + 'thumbnail': 'https://i.cbc.ca/ais/a4c5c0c2-99fa-4bd3-8061-5a63879c1b33,1718828053500/full/max/0/default.jpg', + 'timestamp': 1721825400.0, + 'upload_date': '20240724', + 'duration': 10568.0, + 'chapters': [], + 'genres': [], + 'tags': ['2024 Paris Olympic Games'], + 'categories': ['Olympics Summer Soccer', 'Summer Olympics Replays', 'Summer Olympics Soccer Replays'], + 'location': 'Canada', + }, + 'params': {'skip_download': 'm3u8'}, + }, { + 'url': 'https://www.cbc.ca/player/play/video/9.6459530', + 'md5': '6c1bb76693ab321a2e99c347a1d5ecbc', + 'info_dict': { + 'id': '9.6459530', + 'ext': 'mp4', + 'title': 'Parts of Jasper incinerated as wildfire rages', + 'description': 'md5:6f1caa8d128ad3f629257ef5fecf0962', + 'series': 'The National', + 'media_type': 'Excerpt', + 'thumbnail': 'https://i.cbc.ca/ais/507c0086-31a2-494d-96e4-bffb1048d045,1721953984375/full/max/0/default.jpg', + 'timestamp': 1721964091.012, + 'upload_date': '20240726', + 'duration': 952.285, + 'chapters': [], + 'genres': [], + 'tags': 'count:23', + 'categories': ['News (FAST)', 'News', 'The National', 'TV News Shows', 'The National '], + }, + }, { + 'url': 'https://www.cbc.ca/player/play/video/9.6420651', + 'md5': '71a850c2c6ee5e912de169f5311bb533', + 'info_dict': { + 'id': '9.6420651', + 'ext': 'mp4', + 'title': 'Is it a breath of fresh air? Measuring air quality in Edmonton', + 'description': 'md5:3922b92cc8b69212d739bd9dd095b1c3', + 'series': 'CBC News Edmonton', + 'media_type': 'Excerpt', + 'thumbnail': 'https://i.cbc.ca/ais/73c4ab9c-7ad4-46ee-bb9b-020fdc01c745,1718214547576/full/max/0/default.jpg', + 'timestamp': 1718220065.768, + 'upload_date': '20240612', + 'duration': 286.086, + 'chapters': [], + 'genres': ['News'], + 'categories': ['News', 'Edmonton'], + 'tags': 'count:7', + 'location': 'Edmonton', }, }, { 'url': 'cbcplayer:1.7159484', @@ -307,23 +376,113 @@ class CBCPlayerIE(InfoExtractor): 'only_matching': True, }] + def _parse_param(self, asset_data, name): + return traverse_obj(asset_data, ('params', lambda _, v: v['name'] == name, 'value', {str}, any)) + def _real_extract(self, url): video_id = self._match_id(url) - if '.' in video_id: - webpage = self._download_webpage(f'https://www.cbc.ca/player/play/{video_id}', video_id) - video_id = self._search_json( - r'window\.__INITIAL_STATE__\s*=', webpage, - 'initial state', video_id)['video']['currentClip']['mediaId'] + webpage = self._download_webpage(f'https://www.cbc.ca/player/play/{video_id}', video_id) + data = self._search_json( + r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', video_id)['video']['currentClip'] + assets = traverse_obj( + data, ('media', 'assets', lambda _, v: url_or_none(v['key']) and v['type'])) + + if not assets and (media_id := traverse_obj(data, ('mediaId', {str}))): + # XXX: Deprecated; CBC is migrating off of ThePlatform + return { + '_type': 'url_transparent', + 'ie_key': 'ThePlatform', + 'url': smuggle_url( + f'http://link.theplatform.com/s/ExhSPC/media/guid/2655402169/{media_id}?mbr=true&formats=MPEG4,FLV,MP3', { + 'force_smil_url': True, + }), + 'id': media_id, + '_format_sort_fields': ('res', 'proto'), # Prioritize direct http formats over HLS + } + + is_live = traverse_obj(data, ('media', 'streamType', {str})) == 'Live' + formats, subtitles = [], {} + + for sub in traverse_obj(data, ('media', 'textTracks', lambda _, v: url_or_none(v['src']))): + subtitles.setdefault(sub.get('language') or 'und', []).append({ + 'url': sub['src'], + 'name': sub.get('label'), + }) + + for asset in assets: + asset_key = asset['key'] + asset_type = asset['type'] + if asset_type != 'medianet': + self.report_warning(f'Skipping unsupported asset type "{asset_type}": {asset_key}') + continue + asset_data = self._download_json(asset_key, video_id, f'Downloading {asset_type} JSON') + ext = mimetype2ext(self._parse_param(asset_data, 'contentType')) + if ext == 'm3u8': + fmts, subs = self._extract_m3u8_formats_and_subtitles( + asset_data['url'], video_id, 'mp4', m3u8_id='hls', live=is_live) + formats.extend(fmts) + # Avoid slow/error-prone webvtt-over-m3u8 if direct https vtt is available + if not subtitles: + self._merge_subtitles(subs, target=subtitles) + if is_live or not fmts: + continue + # Check for direct https mp4 format + best_video_fmt = traverse_obj(fmts, ( + lambda _, v: v.get('vcodec') != 'none' and v['tbr'], all, + {functools.partial(sorted, key=lambda x: x['tbr'])}, -1, {dict})) or {} + base_url = self._search_regex( + r'(https?://[^?#]+?/)hdntl=', best_video_fmt.get('url'), 'base url', default=None) + if not base_url or '/live/' in base_url: + continue + mp4_url = base_url + replace_extension(url_basename(best_video_fmt['url']), 'mp4') + if self._request_webpage( + HEADRequest(mp4_url), video_id, 'Checking for https format', + errnote=False, fatal=False): + formats.append({ + **best_video_fmt, + 'url': mp4_url, + 'format_id': 'https-mp4', + 'protocol': 'https', + 'manifest_url': None, + 'acodec': None, + }) + else: + formats.append({ + 'url': asset_data['url'], + 'ext': ext, + 'vcodec': 'none' if self._parse_param(asset_data, 'mediaType') == 'audio' else None, + }) + + chapters = traverse_obj(data, ( + 'media', 'chapters', lambda _, v: float(v['startTime']) is not None, { + 'start_time': ('startTime', {functools.partial(float_or_none, scale=1000)}), + 'end_time': ('endTime', {functools.partial(float_or_none, scale=1000)}), + 'title': ('name', {str}), + })) + # Filter out pointless single chapters with start_time==0 and no end_time + if len(chapters) == 1 and not (chapters[0].get('start_time') or chapters[0].get('end_time')): + chapters = [] return { - '_type': 'url_transparent', - 'ie_key': 'ThePlatform', - 'url': smuggle_url( - f'http://link.theplatform.com/s/ExhSPC/media/guid/2655402169/{video_id}?mbr=true&formats=MPEG4,FLV,MP3', { - 'force_smil_url': True, - }), + **traverse_obj(data, { + 'title': ('title', {str}), + 'description': ('description', {str.strip}), + 'thumbnail': ('image', 'url', {url_or_none}, {functools.partial(update_url, query=None)}), + 'timestamp': ('publishedAt', {functools.partial(float_or_none, scale=1000)}), + 'media_type': ('media', 'clipType', {str}), + 'series': ('showName', {str}), + 'season_number': ('media', 'season', {int_or_none}), + 'duration': ('media', 'duration', {float_or_none}, {lambda x: None if is_live else x}), + 'location': ('media', 'region', {str}), + 'tags': ('tags', ..., 'name', {str}), + 'genres': ('media', 'genre', all), + 'categories': ('categories', ..., 'name', {str}), + }), 'id': video_id, - '_format_sort_fields': ('res', 'proto'), # Prioritize direct http formats over HLS + 'formats': formats, + 'subtitles': subtitles, + 'chapters': chapters, + 'is_live': is_live, } @@ -647,11 +806,11 @@ class CBCGemLiveIE(InfoExtractor): 'title': 'Ottawa', 'description': 'The live TV channel and local programming from Ottawa', 'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/CBC_OTT_VMS/Live_Channel_Static_Images/Ottawa_2880x1620.jpg', - 'is_live': True, + 'live_status': 'is_live', 'id': 'AyqZwxRqh8EH', 'ext': 'mp4', - 'timestamp': 1492106160, - 'upload_date': '20170413', + 'release_timestamp': 1492106160, + 'release_date': '20170413', 'uploader': 'CBCC-NEW', }, 'skip': 'Live might have ended', @@ -680,49 +839,84 @@ class CBCGemLiveIE(InfoExtractor): 'description': 'March 24, 2023 | President Biden’s Ottawa visit ends with big pledges from both countries. Plus, Gwyneth Paltrow testifies in her ski collision trial.', 'live_status': 'is_live', 'thumbnail': r're:https://images.gem.cbc.ca/v1/cbc-gem/live/.*', - 'timestamp': 1679706000, - 'upload_date': '20230325', + 'release_timestamp': 1679706000, + 'release_date': '20230325', }, 'params': {'skip_download': True}, 'skip': 'Live might have ended', }, + { # event replay (medianetlive) + 'url': 'https://gem.cbc.ca/live-event/42314', + 'md5': '297a9600f554f2258aed01514226a697', + 'info_dict': { + 'id': '42314', + 'ext': 'mp4', + 'live_status': 'was_live', + 'title': 'Women\'s Soccer - Canada vs New Zealand', + 'description': 'md5:36200e5f1a70982277b5a6ecea86155d', + 'thumbnail': r're:https://.+default\.jpg', + 'release_timestamp': 1721917200, + 'release_date': '20240725', + }, + 'params': {'skip_download': True}, + 'skip': 'Replay might no longer be available', + }, + { # event replay (medianetlive) + 'url': 'https://gem.cbc.ca/live-event/43273', + 'only_matching': True, + }, ] + _GEO_COUNTRIES = ['CA'] def _real_extract(self, url): video_id = self._match_id(url) webpage = self._download_webpage(url, video_id) video_info = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['data'] - # Two types of metadata JSON + # Three types of video_info JSON: info in root, freeTv stream/item, event replay if not video_info.get('formattedIdMedia'): - video_info = traverse_obj( - video_info, (('freeTv', ('streams', ...)), 'items', lambda _, v: v['key'] == video_id, {dict}), - get_all=False, default={}) + if traverse_obj(video_info, ('event', 'key')) == video_id: + video_info = video_info['event'] + else: + video_info = traverse_obj(video_info, ( + ('freeTv', ('streams', ...)), 'items', + lambda _, v: v['key'].partition('-')[0] == video_id, any)) or {} video_stream_id = video_info.get('formattedIdMedia') if not video_stream_id: - raise ExtractorError('Couldn\'t find video metadata, maybe this livestream is now offline', expected=True) + raise ExtractorError( + 'Couldn\'t find video metadata, maybe this livestream is now offline', expected=True) - stream_data = self._download_json( - 'https://services.radio-canada.ca/media/validation/v2/', video_id, query={ - 'appCode': 'mpx', - 'connectionType': 'hd', - 'deviceType': 'ipad', - 'idMedia': video_stream_id, - 'multibitrate': 'true', - 'output': 'json', - 'tech': 'hls', - 'manifestType': 'desktop', - }) + live_status = 'was_live' if video_info.get('isVodEnabled') else 'is_live' + release_timestamp = traverse_obj(video_info, ('airDate', {parse_iso8601})) + + if live_status == 'is_live' and release_timestamp and release_timestamp > time.time(): + formats = [] + live_status = 'is_upcoming' + self.raise_no_formats('This livestream has not yet started', expected=True) + else: + stream_data = self._download_json( + 'https://services.radio-canada.ca/media/validation/v2/', video_id, query={ + 'appCode': 'medianetlive', + 'connectionType': 'hd', + 'deviceType': 'ipad', + 'idMedia': video_stream_id, + 'multibitrate': 'true', + 'output': 'json', + 'tech': 'hls', + 'manifestType': 'desktop', + }) + formats = self._extract_m3u8_formats( + stream_data['url'], video_id, 'mp4', live=live_status == 'is_live') return { 'id': video_id, - 'formats': self._extract_m3u8_formats(stream_data['url'], video_id, 'mp4', live=True), - 'is_live': True, + 'formats': formats, + 'live_status': live_status, + 'release_timestamp': release_timestamp, **traverse_obj(video_info, { - 'title': 'title', - 'description': 'description', + 'title': ('title', {str}), + 'description': ('description', {str}), 'thumbnail': ('images', 'card', 'url'), - 'timestamp': ('airDate', {parse_iso8601}), }), } diff --git a/yt_dlp/extractor/cellebrite.py b/yt_dlp/extractor/cellebrite.py index e90365a8b..54367c4d5 100644 --- a/yt_dlp/extractor/cellebrite.py +++ b/yt_dlp/extractor/cellebrite.py @@ -1,63 +1,50 @@ -from .common import InfoExtractor -from ..utils import traverse_obj +from .vidyard import VidyardBaseIE, VidyardIE +from ..utils import ExtractorError, make_archive_id, url_basename -class CellebriteIE(InfoExtractor): +class CellebriteIE(VidyardBaseIE): _VALID_URL = r'https?://cellebrite\.com/(?:\w+)?/(?P[\w-]+)' _TESTS = [{ 'url': 'https://cellebrite.com/en/collect-data-from-android-devices-with-cellebrite-ufed/', 'info_dict': { - 'id': '16025876', + 'id': 'ZqmUss3dQfEMGpauambPuH', + 'display_id': '16025876', 'ext': 'mp4', - 'description': 'md5:174571cb97083fd1d457d75c684f4e2b', - 'thumbnail': 'https://cellebrite.com/wp-content/uploads/2021/05/Chat-Capture-1024x559.png', 'title': 'Ask the Expert: Chat Capture - Collect Data from Android Devices in Cellebrite UFED', - 'duration': 455, - 'tags': [], + 'description': 'md5:dee48fe12bbae5c01fe6a053f7676da4', + 'thumbnail': 'https://cellebrite.com/wp-content/uploads/2021/05/Chat-Capture-1024x559.png', + 'duration': 455.979, + '_old_archive_ids': ['cellebrite 16025876'], }, }, { 'url': 'https://cellebrite.com/en/how-to-lawfully-collect-the-maximum-amount-of-data-from-android-devices/', 'info_dict': { - 'id': '29018255', + 'id': 'QV1U8a2yzcxigw7VFnqKyg', + 'display_id': '29018255', 'ext': 'mp4', - 'duration': 134, - 'tags': [], - 'description': 'md5:e9a3d124c7287b0b07bad2547061cacf', + 'title': 'How to Lawfully Collect the Maximum Amount of Data From Android Devices', + 'description': 'md5:0e943a9ac14c374d5d74faed634d773c', 'thumbnail': 'https://cellebrite.com/wp-content/uploads/2022/07/How-to-Lawfully-Collect-the-Maximum-Amount-of-Data-From-Android-Devices.png', - 'title': 'Android Extractions Explained', + 'duration': 134.315, + '_old_archive_ids': ['cellebrite 29018255'], }, }] - def _get_formats_and_subtitles(self, json_data, display_id): - formats = [{'url': url} for url in traverse_obj(json_data, ('mp4', ..., 'url')) or []] - subtitles = {} - - for url in traverse_obj(json_data, ('hls', ..., 'url')) or []: - fmt, sub = self._extract_m3u8_formats_and_subtitles( - url, display_id, ext='mp4', headers={'Referer': 'https://play.vidyard.com/'}) - formats.extend(fmt) - self._merge_subtitles(sub, target=subtitles) - - return formats, subtitles - def _real_extract(self, url): - display_id = self._match_id(url) - webpage = self._download_webpage(url, display_id) + slug = self._match_id(url) + webpage = self._download_webpage(url, slug) + vidyard_url = next(VidyardIE._extract_embed_urls(url, webpage), None) + if not vidyard_url: + raise ExtractorError('No Vidyard video embeds found on page') - player_uuid = self._search_regex( - r']*\bdata-uuid\s*=\s*"([^"\?]+)', webpage, 'player UUID') - json_data = self._download_json( - f'https://play.vidyard.com/player/{player_uuid}.json', display_id)['payload']['chapters'][0] + video_id = url_basename(vidyard_url) + info = self._process_video_json(self._fetch_video_json(video_id)['chapters'][0], video_id) + if info.get('display_id'): + info['_old_archive_ids'] = [make_archive_id(self, info['display_id'])] + if thumbnail := self._og_search_thumbnail(webpage, default=None): + info.setdefault('thumbnails', []).append({'url': thumbnail}) - formats, subtitles = self._get_formats_and_subtitles(json_data['sources'], display_id) return { - 'id': str(json_data['videoId']), - 'title': json_data.get('name') or self._og_search_title(webpage), - 'formats': formats, - 'subtitles': subtitles, - 'description': json_data.get('description') or self._og_search_description(webpage), - 'duration': json_data.get('seconds'), - 'tags': json_data.get('tags'), - 'thumbnail': self._og_search_thumbnail(webpage), - 'http_headers': {'Referer': 'https://play.vidyard.com/'}, + 'description': self._og_search_description(webpage, default=None), + **info, } diff --git a/yt_dlp/extractor/chzzk.py b/yt_dlp/extractor/chzzk.py index 420fe0514..e0b9980af 100644 --- a/yt_dlp/extractor/chzzk.py +++ b/yt_dlp/extractor/chzzk.py @@ -36,7 +36,7 @@ class CHZZKLiveIE(InfoExtractor): def _real_extract(self, url): channel_id = self._match_id(url) live_detail = self._download_json( - f'https://api.chzzk.naver.com/service/v2/channels/{channel_id}/live-detail', channel_id, + f'https://api.chzzk.naver.com/service/v3/channels/{channel_id}/live-detail', channel_id, note='Downloading channel info', errnote='Unable to download channel info')['content'] if live_detail.get('status') == 'CLOSE': @@ -106,12 +106,45 @@ class CHZZKVideoIE(InfoExtractor): 'upload_date': '20231219', 'view_count': int, }, + 'skip': 'Replay video is expired', + }, { + # Manually uploaded video + 'url': 'https://chzzk.naver.com/video/1980', + 'info_dict': { + 'id': '1980', + 'ext': 'mp4', + 'title': '※시청주의※한번보면 잊기 힘든 영상', + 'channel': '라디유radiyu', + 'channel_id': '68f895c59a1043bc5019b5e08c83a5c5', + 'channel_is_verified': False, + 'thumbnail': r're:^https?://.*\.jpg$', + 'duration': 95, + 'timestamp': 1703102631.722, + 'upload_date': '20231220', + 'view_count': int, + }, + }, { + # Partner channel replay video + 'url': 'https://chzzk.naver.com/video/2458', + 'info_dict': { + 'id': '2458', + 'ext': 'mp4', + 'title': '첫 방송', + 'channel': '강지', + 'channel_id': 'b5ed5db484d04faf4d150aedd362f34b', + 'channel_is_verified': True, + 'thumbnail': r're:^https?://.*\.jpg$', + 'duration': 4433, + 'timestamp': 1703307460.214, + 'upload_date': '20231223', + 'view_count': int, + }, }] def _real_extract(self, url): video_id = self._match_id(url) video_meta = self._download_json( - f'https://api.chzzk.naver.com/service/v2/videos/{video_id}', video_id, + f'https://api.chzzk.naver.com/service/v3/videos/{video_id}', video_id, note='Downloading video info', errnote='Unable to download video info')['content'] formats, subtitles = self._extract_mpd_formats_and_subtitles( f'https://apis.naver.com/neonplayer/vodplay/v1/playback/{video_meta["videoId"]}', video_id, diff --git a/yt_dlp/extractor/common.py b/yt_dlp/extractor/common.py index f63bd7825..187f73e7b 100644 --- a/yt_dlp/extractor/common.py +++ b/yt_dlp/extractor/common.py @@ -3150,7 +3150,7 @@ def _parse_ism_formats_and_subtitles(self, ism_doc, ism_url, ism_id=None): }) return formats, subtitles - def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8_native', mpd_id=None, preference=None, quality=None): + def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8_native', mpd_id=None, preference=None, quality=None, _headers=None): def absolute_url(item_url): return urljoin(base_url, item_url) @@ -3174,11 +3174,11 @@ def _media_formats(src, cur_media_type, type_info=None): formats = self._extract_m3u8_formats( full_url, video_id, ext='mp4', entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id, - preference=preference, quality=quality, fatal=False) + preference=preference, quality=quality, fatal=False, headers=_headers) elif ext == 'mpd': is_plain_url = False formats = self._extract_mpd_formats( - full_url, video_id, mpd_id=mpd_id, fatal=False) + full_url, video_id, mpd_id=mpd_id, fatal=False, headers=_headers) else: is_plain_url = True formats = [{ @@ -3272,6 +3272,8 @@ def _media_formats(src, cur_media_type, type_info=None): }) for f in media_info['formats']: f.setdefault('http_headers', {})['Referer'] = base_url + if _headers: + f['http_headers'].update(_headers) if media_info['formats'] or media_info['subtitles']: entries.append(media_info) return entries diff --git a/yt_dlp/extractor/digitalconcerthall.py b/yt_dlp/extractor/digitalconcerthall.py index 8b4d5c0fc..edb6fa9c0 100644 --- a/yt_dlp/extractor/digitalconcerthall.py +++ b/yt_dlp/extractor/digitalconcerthall.py @@ -1,6 +1,8 @@ from .common import InfoExtractor +from ..networking.exceptions import HTTPError from ..utils import ( ExtractorError, + parse_codecs, try_get, url_or_none, urlencode_postdata, @@ -12,6 +14,7 @@ class DigitalConcertHallIE(InfoExtractor): IE_DESC = 'DigitalConcertHall extractor' _VALID_URL = r'https?://(?:www\.)?digitalconcerthall\.com/(?P[a-z]+)/(?Pfilm|concert|work)/(?P[0-9]+)-?(?P[0-9]+)?' _OAUTH_URL = 'https://api.digitalconcerthall.com/v2/oauth2/token' + _USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15' _ACCESS_TOKEN = None _NETRC_MACHINE = 'digitalconcerthall' _TESTS = [{ @@ -68,33 +71,42 @@ class DigitalConcertHallIE(InfoExtractor): }] def _perform_login(self, username, password): - token_response = self._download_json( + login_token = self._download_json( self._OAUTH_URL, None, 'Obtaining token', errnote='Unable to obtain token', data=urlencode_postdata({ 'affiliate': 'none', 'grant_type': 'device', 'device_vendor': 'unknown', + # device_model 'Safari' gets split streams of 4K/HEVC video and lossless/FLAC audio + 'device_model': 'unknown' if self._configuration_arg('prefer_combined_hls') else 'Safari', 'app_id': 'dch.webapp', - 'app_version': '1.0.0', + 'app_distributor': 'berlinphil', + 'app_version': '1.84.0', 'client_secret': '2ySLN+2Fwb', }), headers={ - 'Content-Type': 'application/x-www-form-urlencoded', - }) - self._ACCESS_TOKEN = token_response['access_token'] + 'Accept': 'application/json', + 'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8', + 'User-Agent': self._USER_AGENT, + })['access_token'] try: - self._download_json( + login_response = self._download_json( self._OAUTH_URL, None, note='Logging in', errnote='Unable to login', data=urlencode_postdata({ 'grant_type': 'password', 'username': username, 'password': password, }), headers={ - 'Content-Type': 'application/x-www-form-urlencoded', + 'Accept': 'application/json', + 'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8', 'Referer': 'https://www.digitalconcerthall.com', - 'Authorization': f'Bearer {self._ACCESS_TOKEN}', + 'Authorization': f'Bearer {login_token}', + 'User-Agent': self._USER_AGENT, }) - except ExtractorError: - self.raise_login_required(msg='Login info incorrect') + except ExtractorError as error: + if isinstance(error.cause, HTTPError) and error.cause.status == 401: + raise ExtractorError('Invalid username or password', expected=True) + raise + self._ACCESS_TOKEN = login_response['access_token'] def _real_initialize(self): if not self._ACCESS_TOKEN: @@ -108,11 +120,15 @@ def _entries(self, items, language, type_, **kwargs): 'Accept': 'application/json', 'Authorization': f'Bearer {self._ACCESS_TOKEN}', 'Accept-Language': language, + 'User-Agent': self._USER_AGENT, }) formats = [] for m3u8_url in traverse_obj(stream_info, ('channel', ..., 'stream', ..., 'url', {url_or_none})): - formats.extend(self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', fatal=False)) + formats.extend(self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False)) + for fmt in formats: + if fmt.get('format_note') and fmt.get('vcodec') == 'none': + fmt.update(parse_codecs(fmt['format_note'])) yield { 'id': video_id, @@ -140,13 +156,15 @@ def _real_extract(self, url): f'https://api.digitalconcerthall.com/v2/{api_type}/{video_id}', video_id, headers={ 'Accept': 'application/json', 'Accept-Language': language, + 'User-Agent': self._USER_AGENT, + 'Authorization': f'Bearer {self._ACCESS_TOKEN}', }) - album_artists = traverse_obj(vid_info, ('_links', 'artist', ..., 'name')) videos = [vid_info] if type_ == 'film' else traverse_obj(vid_info, ('_embedded', ..., ...)) if type_ == 'work': videos = [videos[int(part) - 1]] + album_artists = traverse_obj(vid_info, ('_links', 'artist', ..., 'name', {str})) thumbnail = traverse_obj(vid_info, ( 'image', ..., {self._proto_relative_url}, {url_or_none}, {lambda x: x.format(width=0, height=0)}, any)) # NB: 0x0 is the original size diff --git a/yt_dlp/extractor/discovery.py b/yt_dlp/extractor/discovery.py deleted file mode 100644 index b98279d67..000000000 --- a/yt_dlp/extractor/discovery.py +++ /dev/null @@ -1,115 +0,0 @@ -import random -import string -import urllib.parse - -from .discoverygo import DiscoveryGoBaseIE -from ..networking.exceptions import HTTPError -from ..utils import ExtractorError - - -class DiscoveryIE(DiscoveryGoBaseIE): - _VALID_URL = r'''(?x)https?:// - (?P - go\.discovery| - www\. - (?: - investigationdiscovery| - discoverylife| - animalplanet| - ahctv| - destinationamerica| - sciencechannel| - tlc - )| - watch\. - (?: - hgtv| - foodnetwork| - travelchannel| - diynetwork| - cookingchanneltv| - motortrend - ) - )\.com/tv-shows/(?P[^/]+)/(?:video|full-episode)s/(?P[^./?#]+)''' - _TESTS = [{ - 'url': 'https://go.discovery.com/tv-shows/cash-cab/videos/riding-with-matthew-perry', - 'info_dict': { - 'id': '5a2f35ce6b66d17a5026e29e', - 'ext': 'mp4', - 'title': 'Riding with Matthew Perry', - 'description': 'md5:a34333153e79bc4526019a5129e7f878', - 'duration': 84, - }, - 'params': { - 'skip_download': True, # requires ffmpeg - }, - }, { - 'url': 'https://www.investigationdiscovery.com/tv-shows/final-vision/full-episodes/final-vision', - 'only_matching': True, - }, { - 'url': 'https://go.discovery.com/tv-shows/alaskan-bush-people/videos/follow-your-own-road', - 'only_matching': True, - }, { - # using `show_slug` is important to get the correct video data - 'url': 'https://www.sciencechannel.com/tv-shows/mythbusters-on-science/full-episodes/christmas-special', - 'only_matching': True, - }] - _GEO_COUNTRIES = ['US'] - _GEO_BYPASS = False - _API_BASE_URL = 'https://api.discovery.com/v1/' - - def _real_extract(self, url): - site, show_slug, display_id = self._match_valid_url(url).groups() - - access_token = None - cookies = self._get_cookies(url) - - # prefer Affiliate Auth Token over Anonymous Auth Token - auth_storage_cookie = cookies.get('eosAf') or cookies.get('eosAn') - if auth_storage_cookie and auth_storage_cookie.value: - auth_storage = self._parse_json(urllib.parse.unquote( - urllib.parse.unquote(auth_storage_cookie.value)), - display_id, fatal=False) or {} - access_token = auth_storage.get('a') or auth_storage.get('access_token') - - if not access_token: - access_token = self._download_json( - f'https://{site}.com/anonymous', display_id, - 'Downloading token JSON metadata', query={ - 'authRel': 'authorization', - 'client_id': '3020a40c2356a645b4b4', - 'nonce': ''.join(random.choices(string.ascii_letters, k=32)), - 'redirectUri': 'https://www.discovery.com/', - })['access_token'] - - headers = self.geo_verification_headers() - headers['Authorization'] = 'Bearer ' + access_token - - try: - video = self._download_json( - self._API_BASE_URL + 'content/videos', - display_id, 'Downloading content JSON metadata', - headers=headers, query={ - 'embed': 'show.name', - 'fields': 'authenticated,description.detailed,duration,episodeNumber,id,name,parental.rating,season.number,show,tags', - 'slug': display_id, - 'show_slug': show_slug, - })[0] - video_id = video['id'] - stream = self._download_json( - self._API_BASE_URL + 'streaming/video/' + video_id, - display_id, 'Downloading streaming JSON metadata', headers=headers) - except ExtractorError as e: - if isinstance(e.cause, HTTPError) and e.cause.status in (401, 403): - e_description = self._parse_json( - e.cause.response.read().decode(), display_id)['description'] - if 'resource not available for country' in e_description: - self.raise_geo_restricted(countries=self._GEO_COUNTRIES) - if 'Authorized Networks' in e_description: - raise ExtractorError( - 'This video is only available via cable service provider subscription that' - ' is not currently supported. You may want to use --cookies.', expected=True) - raise ExtractorError(e_description) - raise - - return self._extract_video_info(video, stream, display_id) diff --git a/yt_dlp/extractor/discoverygo.py b/yt_dlp/extractor/discoverygo.py deleted file mode 100644 index 964948548..000000000 --- a/yt_dlp/extractor/discoverygo.py +++ /dev/null @@ -1,171 +0,0 @@ -import re - -from .common import InfoExtractor -from ..utils import ( - ExtractorError, - determine_ext, - extract_attributes, - int_or_none, - parse_age_limit, - remove_end, - unescapeHTML, - url_or_none, -) - - -class DiscoveryGoBaseIE(InfoExtractor): - _VALID_URL_TEMPLATE = r'''(?x)https?://(?:www\.)?(?: - discovery| - investigationdiscovery| - discoverylife| - animalplanet| - ahctv| - destinationamerica| - sciencechannel| - tlc| - velocitychannel - )go\.com/%s(?P[^/?#&]+)''' - - def _extract_video_info(self, video, stream, display_id): - title = video['name'] - - if not stream: - if video.get('authenticated') is True: - raise ExtractorError( - 'This video is only available via cable service provider subscription that' - ' is not currently supported. You may want to use --cookies.', expected=True) - else: - raise ExtractorError('Unable to find stream') - STREAM_URL_SUFFIX = 'streamUrl' - formats = [] - for stream_kind in ('', 'hds'): - suffix = STREAM_URL_SUFFIX.capitalize() if stream_kind else STREAM_URL_SUFFIX - stream_url = stream.get(f'{stream_kind}{suffix}') - if not stream_url: - continue - if stream_kind == '': - formats.extend(self._extract_m3u8_formats( - stream_url, display_id, 'mp4', entry_protocol='m3u8_native', - m3u8_id='hls', fatal=False)) - elif stream_kind == 'hds': - formats.extend(self._extract_f4m_formats( - stream_url, display_id, f4m_id=stream_kind, fatal=False)) - - video_id = video.get('id') or display_id - description = video.get('description', {}).get('detailed') - duration = int_or_none(video.get('duration')) - - series = video.get('show', {}).get('name') - season_number = int_or_none(video.get('season', {}).get('number')) - episode_number = int_or_none(video.get('episodeNumber')) - - tags = video.get('tags') - age_limit = parse_age_limit(video.get('parental', {}).get('rating')) - - subtitles = {} - captions = stream.get('captions') - if isinstance(captions, list): - for caption in captions: - subtitle_url = url_or_none(caption.get('fileUrl')) - if not subtitle_url or not subtitle_url.startswith('http'): - continue - lang = caption.get('fileLang', 'en') - ext = determine_ext(subtitle_url) - subtitles.setdefault(lang, []).append({ - 'url': subtitle_url, - 'ext': 'ttml' if ext == 'xml' else ext, - }) - - return { - 'id': video_id, - 'display_id': display_id, - 'title': title, - 'description': description, - 'duration': duration, - 'series': series, - 'season_number': season_number, - 'episode_number': episode_number, - 'tags': tags, - 'age_limit': age_limit, - 'formats': formats, - 'subtitles': subtitles, - } - - -class DiscoveryGoIE(DiscoveryGoBaseIE): - _VALID_URL = DiscoveryGoBaseIE._VALID_URL_TEMPLATE % r'(?:[^/]+/)+' - _GEO_COUNTRIES = ['US'] - _TEST = { - 'url': 'https://www.discoverygo.com/bering-sea-gold/reaper-madness/', - 'info_dict': { - 'id': '58c167d86b66d12f2addeb01', - 'ext': 'mp4', - 'title': 'Reaper Madness', - 'description': 'md5:09f2c625c99afb8946ed4fb7865f6e78', - 'duration': 2519, - 'series': 'Bering Sea Gold', - 'season_number': 8, - 'episode_number': 6, - 'age_limit': 14, - }, - } - - def _real_extract(self, url): - display_id = self._match_id(url) - - webpage = self._download_webpage(url, display_id) - - container = extract_attributes( - self._search_regex( - r'(]+class=["\']video-player-container[^>]+>)', - webpage, 'video container')) - - video = self._parse_json( - container.get('data-video') or container.get('data-json'), - display_id) - - stream = video.get('stream') - - return self._extract_video_info(video, stream, display_id) - - -class DiscoveryGoPlaylistIE(DiscoveryGoBaseIE): - _VALID_URL = DiscoveryGoBaseIE._VALID_URL_TEMPLATE % '' - _TEST = { - 'url': 'https://www.discoverygo.com/bering-sea-gold/', - 'info_dict': { - 'id': 'bering-sea-gold', - 'title': 'Bering Sea Gold', - 'description': 'md5:cc5c6489835949043c0cc3ad66c2fa0e', - }, - 'playlist_mincount': 6, - } - - @classmethod - def suitable(cls, url): - return False if DiscoveryGoIE.suitable(url) else super().suitable(url) - - def _real_extract(self, url): - display_id = self._match_id(url) - - webpage = self._download_webpage(url, display_id) - - entries = [] - for mobj in re.finditer(r'data-json=(["\'])(?P{.+?})\1', webpage): - data = self._parse_json( - mobj.group('json'), display_id, - transform_source=unescapeHTML, fatal=False) - if not isinstance(data, dict) or data.get('type') != 'episode': - continue - episode_url = data.get('socialUrl') - if not episode_url: - continue - entries.append(self.url_result( - episode_url, ie=DiscoveryGoIE.ie_key(), - video_id=data.get('id'))) - - return self.playlist_result( - entries, display_id, - remove_end(self._og_search_title( - webpage, fatal=False), ' | Discovery GO'), - self._og_search_description(webpage)) diff --git a/yt_dlp/extractor/douyutv.py b/yt_dlp/extractor/douyutv.py index fdf19c252..e36eac919 100644 --- a/yt_dlp/extractor/douyutv.py +++ b/yt_dlp/extractor/douyutv.py @@ -24,8 +24,9 @@ class DouyuBaseIE(InfoExtractor): def _download_cryptojs_md5(self, video_id): for url in [ + # XXX: Do NOT use cdn.bootcdn.net; ref: https://sansec.io/research/polyfill-supply-chain-attack 'https://cdnjs.cloudflare.com/ajax/libs/crypto-js/3.1.2/rollups/md5.js', - 'https://cdn.bootcdn.net/ajax/libs/crypto-js/3.1.2/rollups/md5.js', + 'https://unpkg.com/cryptojslib@3.1.2/rollups/md5.js', ]: js_code = self._download_webpage( url, video_id, note='Downloading signing dependency', fatal=False) @@ -35,7 +36,8 @@ def _download_cryptojs_md5(self, video_id): raise ExtractorError('Unable to download JS dependency (crypto-js/md5)') def _get_cryptojs_md5(self, video_id): - return self.cache.load('douyu', 'crypto-js-md5') or self._download_cryptojs_md5(video_id) + return self.cache.load( + 'douyu', 'crypto-js-md5', min_ver='2024.07.04') or self._download_cryptojs_md5(video_id) def _calc_sign(self, sign_func, video_id, a): b = uuid.uuid4().hex diff --git a/yt_dlp/extractor/dplay.py b/yt_dlp/extractor/dplay.py index 48eae1088..8d7707271 100644 --- a/yt_dlp/extractor/dplay.py +++ b/yt_dlp/extractor/dplay.py @@ -346,8 +346,16 @@ def _real_extract(self, url): class DiscoveryPlusBaseIE(DPlayBaseIE): + """Subclasses must set _PRODUCT, _DISCO_API_PARAMS""" + + _DISCO_CLIENT_VER = '27.43.0' + def _update_disco_api_headers(self, headers, disco_base, display_id, realm): - headers['x-disco-client'] = f'WEB:UNKNOWN:{self._PRODUCT}:25.2.6' + headers.update({ + 'x-disco-params': f'realm={realm},siteLookupKey={self._PRODUCT}', + 'x-disco-client': f'WEB:UNKNOWN:{self._PRODUCT}:{self._DISCO_CLIENT_VER}', + 'Authorization': self._get_auth(disco_base, display_id, realm), + }) def _download_video_playback_info(self, disco_base, video_id, headers): return self._download_json( @@ -368,6 +376,26 @@ def _real_extract(self, url): class GoDiscoveryIE(DiscoveryPlusBaseIE): _VALID_URL = r'https?://(?:go\.)?discovery\.com/video' + DPlayBaseIE._PATH_REGEX _TESTS = [{ + 'url': 'https://go.discovery.com/video/in-the-eye-of-the-storm-discovery-atve-us/trapped-in-a-twister', + 'info_dict': { + 'id': '5352642', + 'display_id': 'in-the-eye-of-the-storm-discovery-atve-us/trapped-in-a-twister', + 'ext': 'mp4', + 'title': 'Trapped in a Twister', + 'description': 'Twisters destroy Midwest towns, trapping spotters in the eye of the storm.', + 'episode_number': 1, + 'episode': 'Episode 1', + 'season_number': 1, + 'season': 'Season 1', + 'series': 'In The Eye Of The Storm', + 'duration': 2490.237, + 'upload_date': '20240715', + 'timestamp': 1721008800, + 'tags': [], + 'creators': ['Discovery'], + 'thumbnail': 'https://us1-prod-images.disco-api.com/2024/07/10/5e39637d-cabf-3ab3-8e9a-f4e9d37bc036.jpeg', + }, + }, { 'url': 'https://go.discovery.com/video/dirty-jobs-discovery-atve-us/rodbuster-galvanizer', 'info_dict': { 'id': '4164906', @@ -395,6 +423,26 @@ class GoDiscoveryIE(DiscoveryPlusBaseIE): class TravelChannelIE(DiscoveryPlusBaseIE): _VALID_URL = r'https?://(?:watch\.)?travelchannel\.com/video' + DPlayBaseIE._PATH_REGEX _TESTS = [{ + 'url': 'https://watch.travelchannel.com/video/the-dead-files-travel-channel/protect-the-children', + 'info_dict': { + 'id': '4710177', + 'display_id': 'the-dead-files-travel-channel/protect-the-children', + 'ext': 'mp4', + 'title': 'Protect the Children', + 'description': 'An evil presence threatens an Ohio woman\'s children and marriage.', + 'season_number': 14, + 'season': 'Season 14', + 'episode_number': 10, + 'episode': 'Episode 10', + 'series': 'The Dead Files', + 'duration': 2550.481, + 'timestamp': 1664510400, + 'upload_date': '20220930', + 'tags': [], + 'creators': ['Travel Channel'], + 'thumbnail': 'https://us1-prod-images.disco-api.com/2022/03/17/5e45eace-de5d-343a-9293-f400a2aa77d5.jpeg', + }, + }, { 'url': 'https://watch.travelchannel.com/video/ghost-adventures-travel-channel/ghost-train-of-ely', 'info_dict': { 'id': '2220256', @@ -422,6 +470,26 @@ class TravelChannelIE(DiscoveryPlusBaseIE): class CookingChannelIE(DiscoveryPlusBaseIE): _VALID_URL = r'https?://(?:watch\.)?cookingchanneltv\.com/video' + DPlayBaseIE._PATH_REGEX _TESTS = [{ + 'url': 'https://watch.cookingchanneltv.com/video/bobbys-triple-threat-food-network-atve-us/titans-vs-marcus-samuelsson', + 'info_dict': { + 'id': '5350005', + 'ext': 'mp4', + 'display_id': 'bobbys-triple-threat-food-network-atve-us/titans-vs-marcus-samuelsson', + 'title': 'Titans vs Marcus Samuelsson', + 'description': 'Marcus Samuelsson throws his legendary global tricks at the Titans.', + 'episode_number': 1, + 'episode': 'Episode 1', + 'season_number': 3, + 'season': 'Season 3', + 'series': 'Bobby\'s Triple Threat', + 'duration': 2520.851, + 'upload_date': '20240710', + 'timestamp': 1720573200, + 'tags': [], + 'creators': ['Food Network'], + 'thumbnail': 'https://us1-prod-images.disco-api.com/2024/07/04/529cd095-27ec-35c5-84e9-90ebd3e5d2da.jpeg', + }, + }, { 'url': 'https://watch.cookingchanneltv.com/video/carnival-eats-cooking-channel/the-postman-always-brings-rice-2348634', 'info_dict': { 'id': '2348634', @@ -449,6 +517,22 @@ class CookingChannelIE(DiscoveryPlusBaseIE): class HGTVUsaIE(DiscoveryPlusBaseIE): _VALID_URL = r'https?://(?:watch\.)?hgtv\.com/video' + DPlayBaseIE._PATH_REGEX _TESTS = [{ + 'url': 'https://watch.hgtv.com/video/flip-or-flop-the-final-flip-hgtv-atve-us/flip-or-flop-the-final-flip', + 'info_dict': { + 'id': '5025585', + 'display_id': 'flip-or-flop-the-final-flip-hgtv-atve-us/flip-or-flop-the-final-flip', + 'ext': 'mp4', + 'title': 'Flip or Flop: The Final Flip', + 'description': 'Tarek and Christina are going their separate ways after one last flip!', + 'series': 'Flip or Flop: The Final Flip', + 'duration': 2580.644, + 'upload_date': '20231101', + 'timestamp': 1698811200, + 'tags': [], + 'creators': ['HGTV'], + 'thumbnail': 'https://us1-prod-images.disco-api.com/2022/11/27/455caa6c-1462-3f14-b63d-a026d7a5e6d3.jpeg', + }, + }, { 'url': 'https://watch.hgtv.com/video/home-inspector-joe-hgtv-atve-us/this-mold-house', 'info_dict': { 'id': '4289736', @@ -476,6 +560,26 @@ class HGTVUsaIE(DiscoveryPlusBaseIE): class FoodNetworkIE(DiscoveryPlusBaseIE): _VALID_URL = r'https?://(?:watch\.)?foodnetwork\.com/video' + DPlayBaseIE._PATH_REGEX _TESTS = [{ + 'url': 'https://watch.foodnetwork.com/video/guys-grocery-games-food-network/wild-in-the-aisles', + 'info_dict': { + 'id': '2152549', + 'display_id': 'guys-grocery-games-food-network/wild-in-the-aisles', + 'ext': 'mp4', + 'title': 'Wild in the Aisles', + 'description': 'The chefs make spaghetti and meatballs with "Out of Stock" ingredients.', + 'season_number': 1, + 'season': 'Season 1', + 'episode_number': 1, + 'episode': 'Episode 1', + 'series': 'Guy\'s Grocery Games', + 'tags': [], + 'creators': ['Food Network'], + 'duration': 2520.651, + 'upload_date': '20230623', + 'timestamp': 1687492800, + 'thumbnail': 'https://us1-prod-images.disco-api.com/2022/06/15/37fb5333-cad2-3dbb-af7c-c20ec77c89c6.jpeg', + }, + }, { 'url': 'https://watch.foodnetwork.com/video/kids-baking-championship-food-network/float-like-a-butterfly', 'info_dict': { 'id': '4116449', @@ -503,6 +607,26 @@ class FoodNetworkIE(DiscoveryPlusBaseIE): class DestinationAmericaIE(DiscoveryPlusBaseIE): _VALID_URL = r'https?://(?:www\.)?destinationamerica\.com/video' + DPlayBaseIE._PATH_REGEX _TESTS = [{ + 'url': 'https://www.destinationamerica.com/video/bbq-pit-wars-destination-america/smoke-on-the-water', + 'info_dict': { + 'id': '2218409', + 'display_id': 'bbq-pit-wars-destination-america/smoke-on-the-water', + 'ext': 'mp4', + 'title': 'Smoke on the Water', + 'description': 'The pitmasters head to Georgia for the Smoke on the Water BBQ Festival.', + 'season_number': 2, + 'season': 'Season 2', + 'episode_number': 1, + 'episode': 'Episode 1', + 'series': 'BBQ Pit Wars', + 'tags': [], + 'creators': ['Destination America'], + 'duration': 2614.878, + 'upload_date': '20230623', + 'timestamp': 1687492800, + 'thumbnail': 'https://us1-prod-images.disco-api.com/2020/05/11/c0f8e85d-9a10-3e6f-8e43-f6faafa81ba2.jpeg', + }, + }, { 'url': 'https://www.destinationamerica.com/video/alaska-monsters-destination-america-atve-us/central-alaskas-bigfoot', 'info_dict': { 'id': '4210904', @@ -530,6 +654,26 @@ class DestinationAmericaIE(DiscoveryPlusBaseIE): class InvestigationDiscoveryIE(DiscoveryPlusBaseIE): _VALID_URL = r'https?://(?:www\.)?investigationdiscovery\.com/video' + DPlayBaseIE._PATH_REGEX _TESTS = [{ + 'url': 'https://www.investigationdiscovery.com/video/deadly-influence-the-social-media-murders-investigation-discovery-atve-us/rip-bianca', + 'info_dict': { + 'id': '5341132', + 'display_id': 'deadly-influence-the-social-media-murders-investigation-discovery-atve-us/rip-bianca', + 'ext': 'mp4', + 'title': 'RIP Bianca', + 'description': 'A teenage influencer discovers an online world of threat, harm and danger.', + 'season_number': 1, + 'season': 'Season 1', + 'episode_number': 3, + 'episode': 'Episode 3', + 'series': 'Deadly Influence: The Social Media Murders', + 'creators': ['Investigation Discovery'], + 'tags': [], + 'duration': 2490.888, + 'upload_date': '20240618', + 'timestamp': 1718672400, + 'thumbnail': 'https://us1-prod-images.disco-api.com/2024/06/15/b567c774-9e44-3c6c-b0ba-db860a73e812.jpeg', + }, + }, { 'url': 'https://www.investigationdiscovery.com/video/unmasked-investigation-discovery/the-killer-clown', 'info_dict': { 'id': '2139409', @@ -557,6 +701,26 @@ class InvestigationDiscoveryIE(DiscoveryPlusBaseIE): class AmHistoryChannelIE(DiscoveryPlusBaseIE): _VALID_URL = r'https?://(?:www\.)?ahctv\.com/video' + DPlayBaseIE._PATH_REGEX _TESTS = [{ + 'url': 'https://www.ahctv.com/video/blood-and-fury-americas-civil-war-ahc/battle-of-bull-run', + 'info_dict': { + 'id': '2139199', + 'display_id': 'blood-and-fury-americas-civil-war-ahc/battle-of-bull-run', + 'ext': 'mp4', + 'title': 'Battle of Bull Run', + 'description': 'Two untested armies clash in the first real battle of the Civil War.', + 'season_number': 1, + 'season': 'Season 1', + 'episode_number': 1, + 'episode': 'Episode 1', + 'series': 'Blood and Fury: America\'s Civil War', + 'duration': 2612.509, + 'upload_date': '20220923', + 'timestamp': 1663905600, + 'creators': ['AHC'], + 'tags': [], + 'thumbnail': 'https://us1-prod-images.disco-api.com/2020/05/11/4af61bd7-d705-3108-82c4-1a6e541e20fa.jpeg', + }, + }, { 'url': 'https://www.ahctv.com/video/modern-sniper-ahc/army', 'info_dict': { 'id': '2309730', @@ -584,6 +748,26 @@ class AmHistoryChannelIE(DiscoveryPlusBaseIE): class ScienceChannelIE(DiscoveryPlusBaseIE): _VALID_URL = r'https?://(?:www\.)?sciencechannel\.com/video' + DPlayBaseIE._PATH_REGEX _TESTS = [{ + 'url': 'https://www.sciencechannel.com/video/spaces-deepest-secrets-science-atve-us/mystery-of-the-dead-planets', + 'info_dict': { + 'id': '2347335', + 'display_id': 'spaces-deepest-secrets-science-atve-us/mystery-of-the-dead-planets', + 'ext': 'mp4', + 'title': 'Mystery of the Dead Planets', + 'description': 'Astronomers unmask the truly destructive nature of the cosmos.', + 'season_number': 7, + 'season': 'Season 7', + 'episode_number': 1, + 'episode': 'Episode 1', + 'series': 'Space\'s Deepest Secrets', + 'duration': 2524.989, + 'upload_date': '20230128', + 'timestamp': 1674882000, + 'creators': ['Science'], + 'tags': [], + 'thumbnail': 'https://us1-prod-images.disco-api.com/2021/03/30/3796829d-aead-3f9a-bd8d-e49048b3cdca.jpeg', + }, + }, { 'url': 'https://www.sciencechannel.com/video/strangest-things-science-atve-us/nazi-mystery-machine', 'info_dict': { 'id': '2842849', @@ -608,36 +792,29 @@ class ScienceChannelIE(DiscoveryPlusBaseIE): } -class DIYNetworkIE(DiscoveryPlusBaseIE): - _VALID_URL = r'https?://(?:watch\.)?diynetwork\.com/video' + DPlayBaseIE._PATH_REGEX - _TESTS = [{ - 'url': 'https://watch.diynetwork.com/video/pool-kings-diy-network/bringing-beach-life-to-texas', - 'info_dict': { - 'id': '2309730', - 'display_id': 'pool-kings-diy-network/bringing-beach-life-to-texas', - 'ext': 'mp4', - 'title': 'Bringing Beach Life to Texas', - 'description': 'The Pool Kings give a family a day at the beach in their own backyard.', - 'season_number': 10, - 'episode_number': 2, - }, - 'skip': 'Available for Premium users', - }, { - 'url': 'https://watch.diynetwork.com/video/pool-kings-diy-network/bringing-beach-life-to-texas', - 'only_matching': True, - }] - - _PRODUCT = 'diy' - _DISCO_API_PARAMS = { - 'disco_host': 'us1-prod-direct.watch.diynetwork.com', - 'realm': 'go', - 'country': 'us', - } - - class DiscoveryLifeIE(DiscoveryPlusBaseIE): _VALID_URL = r'https?://(?:www\.)?discoverylife\.com/video' + DPlayBaseIE._PATH_REGEX _TESTS = [{ + 'url': 'https://www.discoverylife.com/video/er-files-discovery-life-atve-us/sweet-charity', + 'info_dict': { + 'id': '2347614', + 'display_id': 'er-files-discovery-life-atve-us/sweet-charity', + 'ext': 'mp4', + 'title': 'Sweet Charity', + 'description': 'The staff at Charity Hospital treat a serious foot infection.', + 'season_number': 1, + 'season': 'Season 1', + 'episode_number': 1, + 'episode': 'Episode 1', + 'series': 'ER Files', + 'duration': 2364.261, + 'upload_date': '20230721', + 'timestamp': 1689912000, + 'creators': ['Discovery Life'], + 'tags': [], + 'thumbnail': 'https://us1-prod-images.disco-api.com/2021/03/16/4b6f0124-360b-3546-b6a4-5552db886b86.jpeg', + }, + }, { 'url': 'https://www.discoverylife.com/video/surviving-death-discovery-life-atve-us/bodily-trauma', 'info_dict': { 'id': '2218238', @@ -665,6 +842,26 @@ class DiscoveryLifeIE(DiscoveryPlusBaseIE): class AnimalPlanetIE(DiscoveryPlusBaseIE): _VALID_URL = r'https?://(?:www\.)?animalplanet\.com/video' + DPlayBaseIE._PATH_REGEX _TESTS = [{ + 'url': 'https://www.animalplanet.com/video/mysterious-creatures-with-forrest-galante-animal-planet-atve-us/the-demon-of-peru', + 'info_dict': { + 'id': '4650835', + 'display_id': 'mysterious-creatures-with-forrest-galante-animal-planet-atve-us/the-demon-of-peru', + 'ext': 'mp4', + 'title': 'The Demon of Peru', + 'description': 'In Peru, a farming village is being terrorized by a “man-like beast.”', + 'season_number': 1, + 'season': 'Season 1', + 'episode_number': 4, + 'episode': 'Episode 4', + 'series': 'Mysterious Creatures with Forrest Galante', + 'duration': 2490.488, + 'upload_date': '20230111', + 'timestamp': 1673413200, + 'creators': ['Animal Planet'], + 'tags': [], + 'thumbnail': 'https://us1-prod-images.disco-api.com/2022/03/01/6dbaa833-9a2e-3fee-9381-c19eddf67c0c.jpeg', + }, + }, { 'url': 'https://www.animalplanet.com/video/north-woods-law-animal-planet/squirrel-showdown', 'info_dict': { 'id': '3338923', @@ -692,6 +889,26 @@ class AnimalPlanetIE(DiscoveryPlusBaseIE): class TLCIE(DiscoveryPlusBaseIE): _VALID_URL = r'https?://(?:go\.)?tlc\.com/video' + DPlayBaseIE._PATH_REGEX _TESTS = [{ + 'url': 'https://go.tlc.com/video/90-day-the-last-resort-tlc-atve-us/the-last-chance', + 'info_dict': { + 'id': '5186422', + 'display_id': '90-day-the-last-resort-tlc-atve-us/the-last-chance', + 'ext': 'mp4', + 'title': 'The Last Chance', + 'description': 'Infidelity shakes Kalani and Asuelu\'s world, and Angela threatens divorce.', + 'season_number': 1, + 'season': 'Season 1', + 'episode_number': 1, + 'episode': 'Episode 1', + 'series': '90 Day: The Last Resort', + 'duration': 5123.91, + 'upload_date': '20230815', + 'timestamp': 1692061200, + 'creators': ['TLC'], + 'tags': [], + 'thumbnail': 'https://us1-prod-images.disco-api.com/2023/08/08/0ee367e2-ac76-334d-bf23-dbf796696a24.jpeg', + }, + }, { 'url': 'https://go.tlc.com/video/my-600-lb-life-tlc/melissas-story-part-1', 'info_dict': { 'id': '2206540', @@ -716,93 +933,8 @@ class TLCIE(DiscoveryPlusBaseIE): } -class MotorTrendIE(DiscoveryPlusBaseIE): - _VALID_URL = r'https?://(?:watch\.)?motortrend\.com/video' + DPlayBaseIE._PATH_REGEX - _TESTS = [{ - 'url': 'https://watch.motortrend.com/video/car-issues-motortrend-atve-us/double-dakotas', - 'info_dict': { - 'id': '"4859182"', - 'display_id': 'double-dakotas', - 'ext': 'mp4', - 'title': 'Double Dakotas', - 'description': 'Tylers buy-one-get-one Dakota deal has the Wizard pulling double duty.', - 'season_number': 2, - 'episode_number': 3, - }, - 'skip': 'Available for Premium users', - }, { - 'url': 'https://watch.motortrend.com/video/car-issues-motortrend-atve-us/double-dakotas', - 'only_matching': True, - }] - - _PRODUCT = 'vel' - _DISCO_API_PARAMS = { - 'disco_host': 'us1-prod-direct.watch.motortrend.com', - 'realm': 'go', - 'country': 'us', - } - - -class MotorTrendOnDemandIE(DiscoveryPlusBaseIE): - _VALID_URL = r'https?://(?:www\.)?motortrend(?:ondemand\.com|\.com/plus)/detail' + DPlayBaseIE._PATH_REGEX - _TESTS = [{ - 'url': 'https://www.motortrendondemand.com/detail/wheelstanding-dump-truck-stubby-bobs-comeback/37699/784', - 'info_dict': { - 'id': '37699', - 'display_id': 'wheelstanding-dump-truck-stubby-bobs-comeback/37699', - 'ext': 'mp4', - 'title': 'Wheelstanding Dump Truck! Stubby Bob’s Comeback', - 'description': 'md5:996915abe52a1c3dfc83aecea3cce8e7', - 'season_number': 5, - 'episode_number': 52, - 'episode': 'Episode 52', - 'season': 'Season 5', - 'thumbnail': r're:^https?://.+\.jpe?g$', - 'timestamp': 1388534401, - 'duration': 1887.345, - 'creator': 'Originals', - 'series': 'Roadkill', - 'upload_date': '20140101', - 'tags': [], - }, - }, { - 'url': 'https://www.motortrend.com/plus/detail/roadworthy-rescues-teaser-trailer/4922860/', - 'info_dict': { - 'id': '4922860', - 'ext': 'mp4', - 'title': 'Roadworthy Rescues | Teaser Trailer', - 'description': 'Derek Bieri helps Freiburger and Finnegan with their \'68 big-block Dart.', - 'display_id': 'roadworthy-rescues-teaser-trailer/4922860', - 'creator': 'Originals', - 'series': 'Roadworthy Rescues', - 'thumbnail': r're:^https?://.+\.jpe?g$', - 'upload_date': '20220907', - 'timestamp': 1662523200, - 'duration': 1066.356, - 'tags': [], - }, - }, { - 'url': 'https://www.motortrend.com/plus/detail/ugly-duckling/2450033/12439', - 'only_matching': True, - }] - - _PRODUCT = 'MTOD' - _DISCO_API_PARAMS = { - 'disco_host': 'us1-prod-direct.motortrendondemand.com', - 'realm': 'motortrend', - 'country': 'us', - } - - def _update_disco_api_headers(self, headers, disco_base, display_id, realm): - headers.update({ - 'x-disco-params': f'realm={realm}', - 'x-disco-client': f'WEB:UNKNOWN:{self._PRODUCT}:4.39.1-gi1', - 'Authorization': self._get_auth(disco_base, display_id, realm), - }) - - class DiscoveryPlusIE(DiscoveryPlusBaseIE): - _VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/(?!it/)(?:\w{2}/)?video' + DPlayBaseIE._PATH_REGEX + _VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/(?!it/)(?:(?P[a-z]{2})/)?video(?:/sport|/olympics)?' + DPlayBaseIE._PATH_REGEX _TESTS = [{ 'url': 'https://www.discoveryplus.com/video/property-brothers-forever-home/food-and-family', 'info_dict': { @@ -823,14 +955,45 @@ class DiscoveryPlusIE(DiscoveryPlusBaseIE): }, { 'url': 'https://discoveryplus.com/ca/video/bering-sea-gold-discovery-ca/goldslingers', 'only_matching': True, + }, { + 'url': 'https://www.discoveryplus.com/gb/video/sport/eurosport-1-british-eurosport-1-british-sport/6-hours-of-spa-review', + 'only_matching': True, + }, { + 'url': 'https://www.discoveryplus.com/gb/video/olympics/dplus-sport-dplus-sport-sport/rugby-sevens-australia-samoa', + 'only_matching': True, }] - _PRODUCT = 'dplus_us' - _DISCO_API_PARAMS = { - 'disco_host': 'us1-prod-direct.discoveryplus.com', - 'realm': 'go', - 'country': 'us', - } + _PRODUCT = None + _DISCO_API_PARAMS = None + + def _update_disco_api_headers(self, headers, disco_base, display_id, realm): + headers.update({ + 'x-disco-params': f'realm={realm},siteLookupKey={self._PRODUCT}', + 'x-disco-client': f'WEB:UNKNOWN:dplus_us:{self._DISCO_CLIENT_VER}', + 'Authorization': self._get_auth(disco_base, display_id, realm), + }) + + def _real_extract(self, url): + video_id, country = self._match_valid_url(url).group('id', 'country') + if not country: + country = 'us' + + self._PRODUCT = f'dplus_{country}' + + if country in ('br', 'ca', 'us'): + self._DISCO_API_PARAMS = { + 'disco_host': 'us1-prod-direct.discoveryplus.com', + 'realm': 'go', + 'country': country, + } + else: + self._DISCO_API_PARAMS = { + 'disco_host': 'eu1-prod-direct.discoveryplus.com', + 'realm': 'dplay', + 'country': country, + } + + return self._get_disco_api_info(url, video_id, **self._DISCO_API_PARAMS) class DiscoveryPlusIndiaIE(DiscoveryPlusBaseIE): @@ -984,16 +1147,22 @@ def _real_extract(self, url): class DiscoveryPlusItalyIE(DiscoveryPlusBaseIE): - _VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/it/video' + DPlayBaseIE._PATH_REGEX + _VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/it/video(?:/sport|/olympics)?' + DPlayBaseIE._PATH_REGEX _TESTS = [{ 'url': 'https://www.discoveryplus.com/it/video/i-signori-della-neve/stagione-2-episodio-1-i-preparativi', 'only_matching': True, }, { 'url': 'https://www.discoveryplus.com/it/video/super-benny/trailer', 'only_matching': True, + }, { + 'url': 'https://www.discoveryplus.com/it/video/olympics/dplus-sport-dplus-sport-sport/water-polo-greece-italy', + 'only_matching': True, + }, { + 'url': 'https://www.discoveryplus.com/it/video/sport/dplus-sport-dplus-sport-sport/lisa-vittozzi-allinferno-e-ritorno', + 'only_matching': True, }] - _PRODUCT = 'dplus_us' + _PRODUCT = 'dplus_it' _DISCO_API_PARAMS = { 'disco_host': 'eu1-prod-direct.discoveryplus.com', 'realm': 'dplay', @@ -1002,8 +1171,8 @@ class DiscoveryPlusItalyIE(DiscoveryPlusBaseIE): def _update_disco_api_headers(self, headers, disco_base, display_id, realm): headers.update({ - 'x-disco-params': f'realm={realm}', - 'x-disco-client': f'WEB:UNKNOWN:{self._PRODUCT}:25.2.6', + 'x-disco-params': f'realm={realm},siteLookupKey={self._PRODUCT}', + 'x-disco-client': f'WEB:UNKNOWN:dplus_us:{self._DISCO_CLIENT_VER}', 'Authorization': self._get_auth(disco_base, display_id, realm), }) @@ -1044,39 +1213,3 @@ class DiscoveryPlusIndiaShowIE(DiscoveryPlusShowBaseIE): _SHOW_STR = 'show' _INDEX = 4 _VIDEO_IE = DiscoveryPlusIndiaIE - - -class GlobalCyclingNetworkPlusIE(DiscoveryPlusBaseIE): - _VALID_URL = r'https?://plus\.globalcyclingnetwork\.com/watch/(?P\d+)' - _TESTS = [{ - 'url': 'https://plus.globalcyclingnetwork.com/watch/1397691', - 'info_dict': { - 'id': '1397691', - 'ext': 'mp4', - 'title': 'The Athertons: Mountain Biking\'s Fastest Family', - 'description': 'md5:75a81937fcd8b989eec6083a709cd837', - 'thumbnail': 'https://us1-prod-images.disco-api.com/2021/03/04/eb9e3026-4849-3001-8281-9356466f0557.png', - 'series': 'gcn', - 'creator': 'Gcn', - 'upload_date': '20210309', - 'timestamp': 1615248000, - 'duration': 2531.0, - 'tags': [], - }, - 'skip': 'Subscription required', - 'params': {'skip_download': 'm3u8'}, - }] - - _PRODUCT = 'web' - _DISCO_API_PARAMS = { - 'disco_host': 'disco-api-prod.globalcyclingnetwork.com', - 'realm': 'gcn', - 'country': 'us', - } - - def _update_disco_api_headers(self, headers, disco_base, display_id, realm): - headers.update({ - 'x-disco-params': f'realm={realm}', - 'x-disco-client': f'WEB:UNKNOWN:{self._PRODUCT}:27.3.2', - 'Authorization': self._get_auth(disco_base, display_id, realm), - }) diff --git a/yt_dlp/extractor/epidemicsound.py b/yt_dlp/extractor/epidemicsound.py index 0d81b11c8..75b0f052b 100644 --- a/yt_dlp/extractor/epidemicsound.py +++ b/yt_dlp/extractor/epidemicsound.py @@ -2,6 +2,7 @@ from ..utils import ( float_or_none, int_or_none, + join_nonempty, orderedSet, parse_iso8601, parse_qs, @@ -13,7 +14,7 @@ class EpidemicSoundIE(InfoExtractor): - _VALID_URL = r'https?://(?:www\.)?epidemicsound\.com/track/(?P[0-9a-zA-Z]+)' + _VALID_URL = r'https?://(?:www\.)?epidemicsound\.com/(?:(?Psound-effects/tracks)|track)/(?P[0-9a-zA-Z-]+)' _TESTS = [{ 'url': 'https://www.epidemicsound.com/track/yFfQVRpSPz/', 'md5': 'd98ff2ddb49e8acab9716541cbc9dfac', @@ -47,6 +48,20 @@ class EpidemicSoundIE(InfoExtractor): 'release_timestamp': 1700535606, 'release_date': '20231121', }, + }, { + 'url': 'https://www.epidemicsound.com/sound-effects/tracks/2f02f54b-9faa-4daf-abac-1cfe9e9cef69/', + 'md5': '35d7cf05bd8b614a84f0495a05de9388', + 'info_dict': { + 'id': '208931', + 'ext': 'mp3', + 'upload_date': '20240603', + 'timestamp': 1717436529, + 'categories': ['appliance'], + 'display_id': '6b2NXLURPr', + 'duration': 1.0, + 'title': 'Oven, Grill, Door Open 01', + 'thumbnail': 'https://cdn.epidemicsound.com/curation-assets/commercial-release-cover-images/default-sfx/3000x3000.jpg', + }, }] @staticmethod @@ -77,8 +92,10 @@ def _epidemic_fmt_or_none(f): return f def _real_extract(self, url): - video_id = self._match_id(url) - json_data = self._download_json(f'https://www.epidemicsound.com/json/track/{video_id}', video_id) + video_id, is_sfx = self._match_valid_url(url).group('id', 'sfx') + json_data = self._download_json(join_nonempty( + 'https://www.epidemicsound.com/json/track', + is_sfx and 'kosmos-id', video_id, delim='/'), video_id) thumbnails = traverse_obj(json_data, [('imageUrl', 'cover')]) thumb_base_url = traverse_obj(json_data, ('coverArt', 'baseUrl', {url_or_none})) diff --git a/yt_dlp/extractor/facebook.py b/yt_dlp/extractor/facebook.py index a3ca291fc..a43ffe95e 100644 --- a/yt_dlp/extractor/facebook.py +++ b/yt_dlp/extractor/facebook.py @@ -571,16 +571,21 @@ def process_formats(info): # Formats larger than ~500MB will return error 403 unless chunk size is regulated f.setdefault('downloader_options', {})['http_chunk_size'] = 250 << 20 - def extract_relay_data(_filter): - return self._parse_json(self._search_regex( - rf'data-sjs>({{.*?{_filter}.*?}})', - webpage, 'replay data', default='{}'), video_id, fatal=False) or {} + def yield_all_relay_data(_filter): + for relay_data in re.findall(rf'data-sjs>({{.*?{_filter}.*?}})', webpage): + yield self._parse_json(relay_data, video_id, fatal=False) or {} - def extract_relay_prefetched_data(_filter): - return traverse_obj(extract_relay_data(_filter), ( - 'require', (None, (..., ..., ..., '__bbox', 'require')), + def extract_relay_data(_filter): + return next(filter(None, yield_all_relay_data(_filter)), {}) + + def extract_relay_prefetched_data(_filter, target_keys=None): + path = 'data' + if target_keys is not None: + path = lambda k, v: k == 'data' and any(target in v for target in variadic(target_keys)) + return traverse_obj(yield_all_relay_data(_filter), ( + ..., 'require', (None, (..., ..., ..., '__bbox', 'require')), lambda _, v: any(key.startswith('RelayPrefetchedStreamCache') for key in v), - ..., ..., '__bbox', 'result', 'data', {dict}), get_all=False) or {} + ..., ..., '__bbox', 'result', path, {dict}), get_all=False) or {} if not video_data: server_js_data = self._parse_json(self._search_regex([ @@ -591,7 +596,8 @@ def extract_relay_prefetched_data(_filter): if not video_data: data = extract_relay_prefetched_data( - r'"(?:dash_manifest|playable_url(?:_quality_hd)?)') + r'"(?:dash_manifest|playable_url(?:_quality_hd)?)', + target_keys=('video', 'event', 'nodes', 'node', 'mediaset')) if data: entries = [] @@ -957,6 +963,7 @@ class FacebookAdsIE(InfoExtractor): 'id': '899206155126718', 'ext': 'mp4', 'title': 'video by Kandao', + 'description': 'md5:0822724069e3aca97cbed5dabbab282e', 'uploader': 'Kandao', 'uploader_id': '774114102743284', 'uploader_url': r're:^https?://.*', @@ -965,6 +972,22 @@ class FacebookAdsIE(InfoExtractor): 'upload_date': '20231214', 'like_count': int, }, + }, { + # key 'watermarked_video_sd_url' missing + 'url': 'https://www.facebook.com/ads/library/?id=501152689226254', + 'info_dict': { + 'id': '501152689226254', + 'ext': 'mp4', + 'title': 'video by mat.nawrocki', + 'description': 'md5:02a446ace7ff8c3c37a2892922492490', + 'uploader': 'mat.nawrocki', + 'uploader_id': '148586968341456', + 'uploader_url': r're:^https?://.*', + 'timestamp': 1723452305, + 'thumbnail': r're:^https?://.*', + 'upload_date': '20240812', + 'like_count': int, + }, }, { 'url': 'https://www.facebook.com/ads/library/?id=893637265423481', 'info_dict': { @@ -1011,34 +1034,42 @@ def _real_extract(self, url): video_id = self._match_id(url) webpage = self._download_webpage(url, video_id) - post_data = [self._parse_json(j, video_id, fatal=False) - for j in re.findall(r's\.handle\(({.*})\);requireLazy\(', webpage)] - data = traverse_obj(post_data, ( - ..., 'require', ..., ..., ..., 'props', 'deeplinkAdCard', 'snapshot', {dict}), get_all=False) + post_data = traverse_obj( + re.findall(r'data-sjs>({.*?ScheduledServerJS.*?})', webpage), (..., {json.loads})) + data = get_first(post_data, ( + 'require', ..., ..., ..., '__bbox', 'require', ..., ..., ..., + 'entryPointRoot', 'otherProps', 'deeplinkAdCard', 'snapshot', {dict})) if not data: raise ExtractorError('Unable to extract ad data') title = data.get('title') if not title or title == '{{product.name}}': title = join_nonempty('display_format', 'page_name', delim=' by ', from_dict=data) + markup_id = traverse_obj(data, ('body', '__m', {str})) + markup = traverse_obj(post_data, ( + ..., 'require', ..., ..., ..., '__bbox', 'markup', lambda _, v: v[0].startswith(markup_id), + ..., '__html', {clean_html}, {lambda x: not x.startswith('{{product.') and x}, any)) - info_dict = traverse_obj(data, { - 'description': ('link_description', {str}, {lambda x: x if x != '{{product.description}}' else None}), + info_dict = merge_dicts({ + 'title': title, + 'description': markup or None, + }, traverse_obj(data, { + 'description': ('link_description', {lambda x: x if not x.startswith('{{product.') else None}), 'uploader': ('page_name', {str}), 'uploader_id': ('page_id', {str_or_none}), 'uploader_url': ('page_profile_uri', {url_or_none}), 'timestamp': ('creation_time', {int_or_none}), 'like_count': ('page_like_count', {int_or_none}), - }) + })) entries = [] for idx, entry in enumerate(traverse_obj( - data, (('videos', 'cards'), lambda _, v: any(url_or_none(v[f]) for f in self._FORMATS_MAP))), 1, + data, (('videos', 'cards'), lambda _, v: any(url_or_none(v.get(f)) for f in self._FORMATS_MAP))), 1, ): entries.append({ 'id': f'{video_id}_{idx}', 'title': entry.get('title') or title, - 'description': entry.get('link_description') or info_dict.get('description'), + 'description': traverse_obj(entry, 'body', 'link_description') or info_dict.get('description'), 'thumbnail': url_or_none(entry.get('video_preview_image_url')), 'formats': self._extract_formats(entry), }) diff --git a/yt_dlp/extractor/generic.py b/yt_dlp/extractor/generic.py index 3b8e1e957..04cffaa86 100644 --- a/yt_dlp/extractor/generic.py +++ b/yt_dlp/extractor/generic.py @@ -43,6 +43,7 @@ xpath_text, xpath_with_ns, ) +from ..utils._utils import _UnsafeExtensionError class GenericIE(InfoExtractor): @@ -2446,9 +2447,13 @@ def _real_extract(self, url): if not is_html(first_bytes): self.report_warning( 'URL could be a direct video link, returning it as such.') + ext = determine_ext(url) + if ext not in _UnsafeExtensionError.ALLOWED_EXTENSIONS: + ext = 'unknown_video' info_dict.update({ 'direct': True, 'url': url, + 'ext': ext, }) return info_dict diff --git a/yt_dlp/extractor/jiosaavn.py b/yt_dlp/extractor/jiosaavn.py index 542e41b80..030fe686b 100644 --- a/yt_dlp/extractor/jiosaavn.py +++ b/yt_dlp/extractor/jiosaavn.py @@ -158,7 +158,7 @@ def _real_extract(self, url): class JioSaavnPlaylistIE(JioSaavnBaseIE): IE_NAME = 'jiosaavn:playlist' - _VALID_URL = r'https?://(?:www\.)?(?:jio)?saavn\.com/s/playlist/(?:[^/?#]+/){2}(?P[^/?#]+)' + _VALID_URL = r'https?://(?:www\.)?(?:jio)?saavn\.com/(?:s/playlist/(?:[^/?#]+/){2}|featured/[^/?#]+/)(?P[^/?#]+)' _TESTS = [{ 'url': 'https://www.jiosaavn.com/s/playlist/2279fbe391defa793ad7076929a2f5c9/mood-english/LlJ8ZWT1ibN5084vKHRj2Q__', 'info_dict': { @@ -173,6 +173,13 @@ class JioSaavnPlaylistIE(JioSaavnBaseIE): 'title': 'Mood Hindi', }, 'playlist_mincount': 801, + }, { + 'url': 'https://www.jiosaavn.com/featured/taaza-tunes/Me5RridRfDk_', + 'info_dict': { + 'id': 'Me5RridRfDk_', + 'title': 'Taaza Tunes', + }, + 'playlist_mincount': 301, }] _PAGE_SIZE = 50 diff --git a/yt_dlp/extractor/kick.py b/yt_dlp/extractor/kick.py index 889548f52..1c1b2a177 100644 --- a/yt_dlp/extractor/kick.py +++ b/yt_dlp/extractor/kick.py @@ -1,9 +1,14 @@ +import functools + from .common import InfoExtractor from ..networking import HEADRequest from ..utils import ( UserNotLive, + determine_ext, float_or_none, + int_or_none, merge_dicts, + parse_iso8601, str_or_none, traverse_obj, unified_timestamp, @@ -25,104 +30,192 @@ def _real_initialize(self): def _call_api(self, path, display_id, note='Downloading API JSON', headers={}, **kwargs): return self._download_json( - f'https://kick.com/api/v1/{path}', display_id, note=note, + f'https://kick.com/api/{path}', display_id, note=note, headers=merge_dicts(headers, self._API_HEADERS), impersonate=True, **kwargs) class KickIE(KickBaseIE): + IE_NAME = 'kick:live' _VALID_URL = r'https?://(?:www\.)?kick\.com/(?!(?:video|categories|search|auth)(?:[/?#]|$))(?P[\w-]+)' _TESTS = [{ - 'url': 'https://kick.com/yuppy', + 'url': 'https://kick.com/buddha', 'info_dict': { - 'id': '6cde1-kickrp-joe-flemmingskick-info-heremust-knowmust-see21', + 'id': '92722911-nopixel-40', 'ext': 'mp4', 'title': str, 'description': str, - 'channel': 'yuppy', - 'channel_id': '33538', - 'uploader': 'Yuppy', - 'uploader_id': '33793', - 'upload_date': str, - 'live_status': 'is_live', 'timestamp': int, - 'thumbnail': r're:^https?://.*\.jpg', + 'thumbnail': r're:https?://.+\.jpg', 'categories': list, + 'upload_date': str, + 'channel': 'buddha', + 'channel_id': '32807', + 'uploader': 'Buddha', + 'uploader_id': '33057', + 'live_status': 'is_live', + 'concurrent_view_count': int, + 'release_timestamp': int, + 'age_limit': 18, + 'release_date': str, }, - 'skip': 'livestream', + 'params': {'skip_download': 'livestream'}, + # 'skip': 'livestream', }, { - 'url': 'https://kick.com/kmack710', + 'url': 'https://kick.com/xqc', 'only_matching': True, }] + @classmethod + def suitable(cls, url): + return False if KickClipIE.suitable(url) else super().suitable(url) + def _real_extract(self, url): channel = self._match_id(url) - response = self._call_api(f'channels/{channel}', channel) + response = self._call_api(f'v2/channels/{channel}', channel) if not traverse_obj(response, 'livestream', expected_type=dict): raise UserNotLive(video_id=channel) return { - 'id': str(traverse_obj( - response, ('livestream', ('slug', 'id')), get_all=False, default=channel)), - 'formats': self._extract_m3u8_formats( - response['playback_url'], channel, 'mp4', live=True), - 'title': traverse_obj( - response, ('livestream', ('session_title', 'slug')), get_all=False, default=''), - 'description': traverse_obj(response, ('user', 'bio')), 'channel': channel, - 'channel_id': str_or_none(traverse_obj(response, 'id', ('livestream', 'channel_id'))), - 'uploader': traverse_obj(response, 'name', ('user', 'username')), - 'uploader_id': str_or_none(traverse_obj(response, 'user_id', ('user', 'id'))), 'is_live': True, - 'timestamp': unified_timestamp(traverse_obj(response, ('livestream', 'created_at'))), - 'thumbnail': traverse_obj( - response, ('livestream', 'thumbnail', 'url'), expected_type=url_or_none), - 'categories': traverse_obj(response, ('recent_categories', ..., 'name')), + 'formats': self._extract_m3u8_formats(response['playback_url'], channel, 'mp4', live=True), + **traverse_obj(response, { + 'id': ('livestream', 'slug', {str}), + 'title': ('livestream', 'session_title', {str}), + 'description': ('user', 'bio', {str}), + 'channel_id': (('id', ('livestream', 'channel_id')), {int}, {str_or_none}, any), + 'uploader': (('name', ('user', 'username')), {str}, any), + 'uploader_id': (('user_id', ('user', 'id')), {int}, {str_or_none}, any), + 'timestamp': ('livestream', 'created_at', {unified_timestamp}), + 'release_timestamp': ('livestream', 'start_time', {unified_timestamp}), + 'thumbnail': ('livestream', 'thumbnail', 'url', {url_or_none}), + 'categories': ('recent_categories', ..., 'name', {str}), + 'concurrent_view_count': ('livestream', 'viewer_count', {int_or_none}), + 'age_limit': ('livestream', 'is_mature', {bool}, {lambda x: 18 if x else 0}), + }), } class KickVODIE(KickBaseIE): + IE_NAME = 'kick:vod' _VALID_URL = r'https?://(?:www\.)?kick\.com/video/(?P[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})' _TESTS = [{ - 'url': 'https://kick.com/video/58bac65b-e641-4476-a7ba-3707a35e60e3', + 'url': 'https://kick.com/video/e74614f4-5270-4319-90ad-32179f19a45c', 'md5': '3870f94153e40e7121a6e46c068b70cb', 'info_dict': { - 'id': '58bac65b-e641-4476-a7ba-3707a35e60e3', + 'id': 'e74614f4-5270-4319-90ad-32179f19a45c', 'ext': 'mp4', - 'title': '🤠REBIRTH IS BACK!!!!🤠!stake CODE JAREDFPS 🤠', - 'description': 'md5:02b0c46f9b4197fb545ab09dddb85b1d', - 'channel': 'jaredfps', - 'channel_id': '26608', - 'uploader': 'JaredFPS', - 'uploader_id': '26799', - 'upload_date': '20240402', - 'timestamp': 1712097108, - 'duration': 33859.0, + 'title': r're:❎ MEGA DRAMA ❎ LIVE ❎ CLICK ❎ ULTIMATE SKILLS .+', + 'description': 'THE BEST AT ABSOLUTELY EVERYTHING. THE JUICER. LEADER OF THE JUICERS.', + 'channel': 'xqc', + 'channel_id': '668', + 'uploader': 'xQc', + 'uploader_id': '676', + 'upload_date': '20240724', + 'timestamp': 1721796562, + 'duration': 18566.0, 'thumbnail': r're:^https?://.*\.jpg', - 'categories': ['Call of Duty: Warzone'], + 'view_count': int, + 'categories': ['VALORANT'], + 'age_limit': 0, }, - 'params': { - 'skip_download': 'm3u8', - }, - 'expected_warnings': [r'impersonation'], + 'params': {'skip_download': 'm3u8'}, }] def _real_extract(self, url): video_id = self._match_id(url) - response = self._call_api(f'video/{video_id}', video_id) + response = self._call_api(f'v1/video/{video_id}', video_id) return { 'id': video_id, 'formats': self._extract_m3u8_formats(response['source'], video_id, 'mp4'), - 'title': traverse_obj( - response, ('livestream', ('session_title', 'slug')), get_all=False, default=''), - 'description': traverse_obj(response, ('livestream', 'channel', 'user', 'bio')), - 'channel': traverse_obj(response, ('livestream', 'channel', 'slug')), - 'channel_id': str_or_none(traverse_obj(response, ('livestream', 'channel', 'id'))), - 'uploader': traverse_obj(response, ('livestream', 'channel', 'user', 'username')), - 'uploader_id': str_or_none(traverse_obj(response, ('livestream', 'channel', 'user_id'))), - 'timestamp': unified_timestamp(response.get('created_at')), - 'duration': float_or_none(traverse_obj(response, ('livestream', 'duration')), scale=1000), - 'thumbnail': traverse_obj( - response, ('livestream', 'thumbnail'), expected_type=url_or_none), - 'categories': traverse_obj(response, ('livestream', 'categories', ..., 'name')), + **traverse_obj(response, { + 'title': ('livestream', ('session_title', 'slug'), {str}, any), + 'description': ('livestream', 'channel', 'user', 'bio', {str}), + 'channel': ('livestream', 'channel', 'slug', {str}), + 'channel_id': ('livestream', 'channel', 'id', {int}, {str_or_none}), + 'uploader': ('livestream', 'channel', 'user', 'username', {str}), + 'uploader_id': ('livestream', 'channel', 'user_id', {int}, {str_or_none}), + 'timestamp': ('created_at', {parse_iso8601}), + 'duration': ('livestream', 'duration', {functools.partial(float_or_none, scale=1000)}), + 'thumbnail': ('livestream', 'thumbnail', {url_or_none}), + 'categories': ('livestream', 'categories', ..., 'name', {str}), + 'view_count': ('views', {int_or_none}), + 'age_limit': ('livestream', 'is_mature', {bool}, {lambda x: 18 if x else 0}), + }), + } + + +class KickClipIE(KickBaseIE): + IE_NAME = 'kick:clips' + _VALID_URL = r'https?://(?:www\.)?kick\.com/[\w-]+/?\?(?:[^#]+&)?clip=(?Pclip_[\w-]+)' + _TESTS = [{ + 'url': 'https://kick.com/mxddy?clip=clip_01GYXVB5Y8PWAPWCWMSBCFB05X', + 'info_dict': { + 'id': 'clip_01GYXVB5Y8PWAPWCWMSBCFB05X', + 'ext': 'mp4', + 'title': 'Maddy detains Abd D:', + 'channel': 'mxddy', + 'channel_id': '133789', + 'uploader': 'AbdCreates', + 'uploader_id': '3309077', + 'thumbnail': r're:^https?://.*\.jpeg', + 'duration': 35, + 'timestamp': 1682481453, + 'upload_date': '20230426', + 'view_count': int, + 'like_count': int, + 'categories': ['VALORANT'], + 'age_limit': 18, + }, + 'params': {'skip_download': 'm3u8'}, + }, { + 'url': 'https://kick.com/destiny?clip=clip_01H9SKET879NE7N9RJRRDS98J3', + 'info_dict': { + 'id': 'clip_01H9SKET879NE7N9RJRRDS98J3', + 'title': 'W jews', + 'ext': 'mp4', + 'channel': 'destiny', + 'channel_id': '1772249', + 'uploader': 'punished_furry', + 'uploader_id': '2027722', + 'duration': 49.0, + 'upload_date': '20230908', + 'timestamp': 1694150180, + 'thumbnail': 'https://clips.kick.com/clips/j3/clip_01H9SKET879NE7N9RJRRDS98J3/thumbnail.png', + 'view_count': int, + 'like_count': int, + 'categories': ['Just Chatting'], + 'age_limit': 0, + }, + 'params': {'skip_download': 'm3u8'}, + }] + + def _real_extract(self, url): + clip_id = self._match_id(url) + clip = self._call_api(f'v2/clips/{clip_id}/play', clip_id)['clip'] + clip_url = clip['clip_url'] + + if determine_ext(clip_url) == 'm3u8': + formats = self._extract_m3u8_formats(clip_url, clip_id, 'mp4') + else: + formats = [{'url': clip_url}] + + return { + 'id': clip_id, + 'formats': formats, + **traverse_obj(clip, { + 'title': ('title', {str}), + 'channel': ('channel', 'slug', {str}), + 'channel_id': ('channel', 'id', {int}, {str_or_none}), + 'uploader': ('creator', 'username', {str}), + 'uploader_id': ('creator', 'id', {int}, {str_or_none}), + 'thumbnail': ('thumbnail_url', {url_or_none}), + 'duration': ('duration', {float_or_none}), + 'categories': ('category', 'name', {str}, all), + 'timestamp': ('created_at', {parse_iso8601}), + 'view_count': ('views', {int_or_none}), + 'like_count': ('likes', {int_or_none}), + 'age_limit': ('is_mature', {bool}, {lambda x: 18 if x else 0}), + }), } diff --git a/yt_dlp/extractor/learningonscreen.py b/yt_dlp/extractor/learningonscreen.py new file mode 100644 index 000000000..dcf83144c --- /dev/null +++ b/yt_dlp/extractor/learningonscreen.py @@ -0,0 +1,78 @@ +import functools +import re + +from .common import InfoExtractor +from ..utils import ( + ExtractorError, + clean_html, + extract_attributes, + get_element_by_class, + get_element_html_by_id, + join_nonempty, + parse_duration, + unified_timestamp, +) +from ..utils.traversal import traverse_obj + + +class LearningOnScreenIE(InfoExtractor): + _VALID_URL = r'https?://learningonscreen\.ac\.uk/ondemand/index\.php/prog/(?P\w+)' + _TESTS = [{ + 'url': 'https://learningonscreen.ac.uk/ondemand/index.php/prog/005D81B2?bcast=22757013', + 'info_dict': { + 'id': '005D81B2', + 'ext': 'mp4', + 'title': 'Planet Earth', + 'duration': 3600.0, + 'timestamp': 1164567600.0, + 'upload_date': '20061126', + 'thumbnail': 'https://stream.learningonscreen.ac.uk/trilt-cover-images/005D81B2-Planet-Earth-2006-11-26T190000Z-BBC4.jpg', + }, + }] + + def _real_initialize(self): + if not self._get_cookies('https://learningonscreen.ac.uk/').get('PHPSESSID-BOB-LIVE'): + self.raise_login_required( + 'Use --cookies for authentication. See ' + ' https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp ' + 'for how to manually pass cookies', method=None) + + def _real_extract(self, url): + video_id = self._match_id(url) + webpage = self._download_webpage(url, video_id) + + details = traverse_obj(webpage, ( + {functools.partial(get_element_html_by_id, 'programme-details')}, { + 'title': ({functools.partial(re.search, r'

([^<]+)

')}, 1, {clean_html}), + 'timestamp': ( + {functools.partial(get_element_by_class, 'broadcast-date')}, + {functools.partial(re.match, r'([^<]+)')}, 1, {unified_timestamp}), + 'duration': ( + {functools.partial(get_element_by_class, 'prog-running-time')}, + {clean_html}, {parse_duration}), + })) + + title = details.pop('title', None) or traverse_obj(webpage, ( + {functools.partial(get_element_html_by_id, 'add-to-existing-playlist')}, + {extract_attributes}, 'data-record-title', {clean_html})) + + entries = self._parse_html5_media_entries( + 'https://stream.learningonscreen.ac.uk', webpage, video_id, m3u8_id='hls', mpd_id='dash', + _headers={'Origin': 'https://learningonscreen.ac.uk', 'Referer': 'https://learningonscreen.ac.uk/'}) + if not entries: + raise ExtractorError('No video found') + + if len(entries) > 1: + duration = details.pop('duration', None) + for idx, entry in enumerate(entries, start=1): + entry.update(details) + entry['id'] = join_nonempty(video_id, idx) + entry['title'] = join_nonempty(title, idx) + return self.playlist_result(entries, video_id, title, duration=duration) + + return { + **entries[0], + **details, + 'id': video_id, + 'title': title, + } diff --git a/yt_dlp/extractor/mediaklikk.py b/yt_dlp/extractor/mediaklikk.py index bd1a27fcc..f51342060 100644 --- a/yt_dlp/extractor/mediaklikk.py +++ b/yt_dlp/extractor/mediaklikk.py @@ -133,7 +133,9 @@ def _real_extract(self, url): r']+\bclass="article_date">([^<]+)<', webpage, 'upload date', default=None)) player_data['video'] = player_data.pop('token') - player_page = self._download_webpage('https://player.mediaklikk.hu/playernew/player.php', video_id, query=player_data) + player_page = self._download_webpage( + 'https://player.mediaklikk.hu/playernew/player.php', video_id, + query=player_data, headers={'Referer': url}) player_json = self._search_json( r'\bpl\.setup\s*\(', player_page, 'player json', video_id, end_pattern=r'\);') playlist_url = traverse_obj( diff --git a/yt_dlp/extractor/mlb.py b/yt_dlp/extractor/mlb.py index 6f67602a6..935bf8561 100644 --- a/yt_dlp/extractor/mlb.py +++ b/yt_dlp/extractor/mlb.py @@ -1,16 +1,21 @@ +import json import re -import urllib.parse +import time import uuid from .common import InfoExtractor +from ..networking.exceptions import HTTPError from ..utils import ( + ExtractorError, determine_ext, int_or_none, join_nonempty, + jwt_decode_hs256, parse_duration, parse_iso8601, try_get, url_or_none, + urlencode_postdata, ) from ..utils.traversal import traverse_obj @@ -276,81 +281,225 @@ def _download_video_data(self, display_id): class MLBTVIE(InfoExtractor): _VALID_URL = r'https?://(?:www\.)?mlb\.com/tv/g(?P\d{6})' _NETRC_MACHINE = 'mlb' - _TESTS = [{ 'url': 'https://www.mlb.com/tv/g661581/vee2eff5f-a7df-4c20-bdb4-7b926fa12638', 'info_dict': { 'id': '661581', 'ext': 'mp4', 'title': '2022-07-02 - St. Louis Cardinals @ Philadelphia Phillies', + 'release_date': '20220702', + 'release_timestamp': 1656792300, }, - 'params': { - 'skip_download': True, + 'params': {'skip_download': 'm3u8'}, + }, { + # makeup game: has multiple dates, need to avoid games with 'rescheduleDate' + 'url': 'https://www.mlb.com/tv/g747039/vd22541c4-5a29-45f7-822b-635ec041cf5e', + 'info_dict': { + 'id': '747039', + 'ext': 'mp4', + 'title': '2024-07-29 - Toronto Blue Jays @ Baltimore Orioles', + 'release_date': '20240729', + 'release_timestamp': 1722280200, }, + 'params': {'skip_download': 'm3u8'}, }] + _GRAPHQL_INIT_QUERY = '''\ +mutation initSession($device: InitSessionInput!, $clientType: ClientType!, $experience: ExperienceTypeInput) { + initSession(device: $device, clientType: $clientType, experience: $experience) { + deviceId + sessionId + entitlements { + code + } + location { + countryCode + regionName + zipCode + latitude + longitude + } + clientExperience + features + } + }''' + _GRAPHQL_PLAYBACK_QUERY = '''\ +mutation initPlaybackSession( + $adCapabilities: [AdExperienceType] + $mediaId: String! + $deviceId: String! + $sessionId: String! + $quality: PlaybackQuality + ) { + initPlaybackSession( + adCapabilities: $adCapabilities + mediaId: $mediaId + deviceId: $deviceId + sessionId: $sessionId + quality: $quality + ) { + playbackSessionId + playback { + url + token + expiration + cdn + } + } + }''' + _APP_VERSION = '7.8.2' + _device_id = None + _session_id = None _access_token = None + _token_expiry = 0 + + @property + def _api_headers(self): + if (self._token_expiry - 120) <= time.time(): + self.write_debug('Access token has expired; re-logging in') + self._perform_login(*self._get_login_info()) + return {'Authorization': f'Bearer {self._access_token}'} def _real_initialize(self): if not self._access_token: self.raise_login_required( 'All videos are only available to registered users', method='password') + def _set_device_id(self, username): + if not self._device_id: + self._device_id = self.cache.load( + self._NETRC_MACHINE, 'device_ids', default={}).get(username) + if self._device_id: + return + self._device_id = str(uuid.uuid4()) + self.cache.store(self._NETRC_MACHINE, 'device_ids', {username: self._device_id}) + def _perform_login(self, username, password): - data = f'grant_type=password&username={urllib.parse.quote(username)}&password={urllib.parse.quote(password)}&scope=openid offline_access&client_id=0oa3e1nutA1HLzAKG356' - access_token = self._download_json( - 'https://ids.mlb.com/oauth2/aus1m088yK07noBfh356/v1/token', None, - headers={ - 'User-Agent': 'okhttp/3.12.1', - 'Content-Type': 'application/x-www-form-urlencoded', - }, data=data.encode())['access_token'] + try: + self._access_token = self._download_json( + 'https://ids.mlb.com/oauth2/aus1m088yK07noBfh356/v1/token', None, + 'Logging in', 'Unable to log in', headers={ + 'User-Agent': 'okhttp/3.12.1', + 'Content-Type': 'application/x-www-form-urlencoded', + }, data=urlencode_postdata({ + 'grant_type': 'password', + 'username': username, + 'password': password, + 'scope': 'openid offline_access', + 'client_id': '0oa3e1nutA1HLzAKG356', + }))['access_token'] + except ExtractorError as error: + if isinstance(error.cause, HTTPError) and error.cause.status == 400: + raise ExtractorError('Invalid username or password', expected=True) + raise - entitlement = self._download_webpage( - f'https://media-entitlement.mlb.com/api/v3/jwt?os=Android&appname=AtBat&did={uuid.uuid4()}', None, - headers={ - 'User-Agent': 'okhttp/3.12.1', - 'Authorization': f'Bearer {access_token}', - }) + self._token_expiry = traverse_obj(self._access_token, ({jwt_decode_hs256}, 'exp', {int})) or 0 + self._set_device_id(username) - data = f'grant_type=urn:ietf:params:oauth:grant-type:token-exchange&subject_token={entitlement}&subject_token_type=urn:ietf:params:oauth:token-type:jwt&platform=android-tv' - self._access_token = self._download_json( - 'https://us.edge.bamgrid.com/token', None, + self._session_id = self._call_api({ + 'operationName': 'initSession', + 'query': self._GRAPHQL_INIT_QUERY, + 'variables': { + 'device': { + 'appVersion': self._APP_VERSION, + 'deviceFamily': 'desktop', + 'knownDeviceId': self._device_id, + 'languagePreference': 'ENGLISH', + 'manufacturer': '', + 'model': '', + 'os': '', + 'osVersion': '', + }, + 'clientType': 'WEB', + }, + }, None, 'session ID')['data']['initSession']['sessionId'] + + def _call_api(self, data, video_id, description='GraphQL JSON', fatal=True): + return self._download_json( + 'https://media-gateway.mlb.com/graphql', video_id, + f'Downloading {description}', f'Unable to download {description}', fatal=fatal, headers={ + **self._api_headers, 'Accept': 'application/json', - 'Authorization': 'Bearer bWxidHYmYW5kcm9pZCYxLjAuMA.6LZMbH2r--rbXcgEabaDdIslpo4RyZrlVfWZhsAgXIk', - 'Content-Type': 'application/x-www-form-urlencoded', - }, data=data.encode())['access_token'] + 'Content-Type': 'application/json', + 'x-client-name': 'WEB', + 'x-client-version': self._APP_VERSION, + }, data=json.dumps(data, separators=(',', ':')).encode()) + + def _extract_formats_and_subtitles(self, broadcast, video_id): + feed = traverse_obj(broadcast, ('homeAway', {str.title})) + medium = traverse_obj(broadcast, ('type', {str})) + language = traverse_obj(broadcast, ('language', {str.lower})) + format_id = join_nonempty(feed, medium, language) + + response = self._call_api({ + 'operationName': 'initPlaybackSession', + 'query': self._GRAPHQL_PLAYBACK_QUERY, + 'variables': { + 'adCapabilities': ['GOOGLE_STANDALONE_AD_PODS'], + 'deviceId': self._device_id, + 'mediaId': broadcast['mediaId'], + 'quality': 'PLACEHOLDER', + 'sessionId': self._session_id, + }, + }, video_id, f'{format_id} broadcast JSON', fatal=False) + + playback = traverse_obj(response, ('data', 'initPlaybackSession', 'playback', {dict})) + m3u8_url = traverse_obj(playback, ('url', {url_or_none})) + token = traverse_obj(playback, ('token', {str})) + + if not (m3u8_url and token): + errors = '; '.join(traverse_obj(response, ('errors', ..., 'message', {str}))) + if 'not entitled' in errors: + raise ExtractorError(errors, expected=True) + elif errors: # Only warn when 'blacked out' since radio formats are available + self.report_warning(f'API returned errors for {format_id}: {errors}') + else: + self.report_warning(f'No formats available for {format_id} broadcast; skipping') + return [], {} + + cdn_headers = {'x-cdn-token': token} + fmts, subs = self._extract_m3u8_formats_and_subtitles( + m3u8_url.replace(f'/{token}/', '/'), video_id, 'mp4', + m3u8_id=format_id, fatal=False, headers=cdn_headers) + for fmt in fmts: + fmt['http_headers'] = cdn_headers + fmt.setdefault('format_note', join_nonempty(feed, medium, delim=' ')) + fmt.setdefault('language', language) + if fmt.get('vcodec') == 'none' and fmt['language'] == 'en': + fmt['source_preference'] = 10 + + return fmts, subs def _real_extract(self, url): video_id = self._match_id(url) - airings = self._download_json( - f'https://search-api-mlbtv.mlb.com/svc/search/v2/graphql/persisted/query/core/Airings?variables=%7B%22partnerProgramIds%22%3A%5B%22{video_id}%22%5D%2C%22applyEsniMediaRightsLabels%22%3Atrue%7D', - video_id)['data']['Airings'] + data = self._download_json( + 'https://statsapi.mlb.com/api/v1/schedule', video_id, query={ + 'gamePk': video_id, + 'hydrate': 'broadcasts(all),statusFlags', + }) + metadata = traverse_obj(data, ( + 'dates', ..., 'games', + lambda _, v: str(v['gamePk']) == video_id and not v.get('rescheduleDate'), any)) + + broadcasts = traverse_obj(metadata, ( + 'broadcasts', lambda _, v: v['mediaId'] and v['mediaState']['mediaStateCode'] != 'MEDIA_OFF')) formats, subtitles = [], {} - for airing in traverse_obj(airings, lambda _, v: v['playbackUrls'][0]['href']): - format_id = join_nonempty('feedType', 'feedLanguage', from_dict=airing) - m3u8_url = traverse_obj(self._download_json( - airing['playbackUrls'][0]['href'].format(scenario='browser~csai'), video_id, - note=f'Downloading {format_id} stream info JSON', - errnote=f'Failed to download {format_id} stream info, skipping', - fatal=False, headers={ - 'Authorization': self._access_token, - 'Accept': 'application/vnd.media-service+json; version=2', - }), ('stream', 'complete', {url_or_none})) - if not m3u8_url: - continue - f, s = self._extract_m3u8_formats_and_subtitles( - m3u8_url, video_id, 'mp4', m3u8_id=format_id, fatal=False) - formats.extend(f) - self._merge_subtitles(s, target=subtitles) + for broadcast in broadcasts: + fmts, subs = self._extract_formats_and_subtitles(broadcast, video_id) + formats.extend(fmts) + self._merge_subtitles(subs, target=subtitles) return { 'id': video_id, - 'title': traverse_obj(airings, (..., 'titles', 0, 'episodeName'), get_all=False), - 'is_live': traverse_obj(airings, (..., 'mediaConfig', 'productType'), get_all=False) == 'LIVE', + 'title': join_nonempty( + traverse_obj(metadata, ('officialDate', {str})), + traverse_obj(metadata, ('teams', ('away', 'home'), 'team', 'name', {str}, all, {' @ '.join})), + delim=' - '), + 'is_live': traverse_obj(broadcasts, (..., 'mediaState', 'mediaStateCode', {str}, any)) == 'MEDIA_ON', + 'release_timestamp': traverse_obj(metadata, ('gameDate', {parse_iso8601})), 'formats': formats, 'subtitles': subtitles, - 'http_headers': {'Authorization': f'Bearer {self._access_token}'}, } diff --git a/yt_dlp/extractor/murrtube.py b/yt_dlp/extractor/murrtube.py index 3b39a1b9a..9067b8781 100644 --- a/yt_dlp/extractor/murrtube.py +++ b/yt_dlp/extractor/murrtube.py @@ -5,39 +5,103 @@ from ..utils import ( ExtractorError, OnDemandPagedList, - determine_ext, - int_or_none, - try_get, + clean_html, + extract_attributes, + get_element_by_class, + get_element_html_by_id, + parse_count, + remove_end, + update_url, + urlencode_postdata, ) class MurrtubeIE(InfoExtractor): - _WORKING = False _VALID_URL = r'''(?x) (?: murrtube:| - https?://murrtube\.net/videos/(?P[a-z0-9\-]+)\- + https?://murrtube\.net/(?:v/|videos/(?P[a-z0-9-]+?)-) ) - (?P[a-f0-9]{8}\-[a-f0-9]{4}\-[a-f0-9]{4}\-[a-f0-9]{4}\-[a-f0-9]{12}) + (?P[A-Z0-9]{4}|[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}) ''' - _TEST = { + _TESTS = [{ 'url': 'https://murrtube.net/videos/inferno-x-skyler-148b6f2a-fdcc-4902-affe-9c0f41aaaca0', - 'md5': '169f494812d9a90914b42978e73aa690', + 'md5': '70380878a77e8565d4aea7f68b8bbb35', 'info_dict': { - 'id': '148b6f2a-fdcc-4902-affe-9c0f41aaaca0', + 'id': 'ca885d8456b95de529b6723b158032e11115d', 'ext': 'mp4', 'title': 'Inferno X Skyler', 'description': 'Humping a very good slutty sheppy (roomate)', - 'thumbnail': r're:^https?://.*\.jpg$', - 'duration': 284, 'uploader': 'Inferno Wolf', 'age_limit': 18, + 'thumbnail': 'https://storage.murrtube.net/murrtube-production/ekbs3zcfvuynnqfx72nn2tkokvsd', 'comment_count': int, 'view_count': int, 'like_count': int, - 'tags': ['hump', 'breed', 'Fursuit', 'murrsuit', 'bareback'], }, - } + }, { + 'url': 'https://murrtube.net/v/0J2Q', + 'md5': '31262f6ac56f0ca75e5a54a0f3fefcb6', + 'info_dict': { + 'id': '8442998c52134968d9caa36e473e1a6bac6ca', + 'ext': 'mp4', + 'uploader': 'Hayel', + 'title': 'Who\'s in charge now?', + 'description': 'md5:795791e97e5b0f1805ea84573f02a997', + 'age_limit': 18, + 'thumbnail': 'https://storage.murrtube.net/murrtube-production/fb1ojjwiucufp34ya6hxu5vfqi5s', + 'comment_count': int, + 'view_count': int, + 'like_count': int, + }, + }] + + def _extract_count(self, name, html): + return parse_count(self._search_regex( + rf'([\d,]+)\s+]*>{name}', html, name, default=None)) + + def _real_initialize(self): + homepage = self._download_webpage( + 'https://murrtube.net', None, note='Getting session token') + self._request_webpage( + 'https://murrtube.net/accept_age_check', None, 'Setting age cookie', + data=urlencode_postdata(self._hidden_inputs(homepage))) + + def _real_extract(self, url): + video_id = self._match_id(url) + if video_id.startswith('murrtube:'): + raise ExtractorError('Support for murrtube: prefix URLs is broken') + video_page = self._download_webpage(url, video_id) + video_attrs = extract_attributes(get_element_html_by_id('video', video_page)) + playlist = update_url(video_attrs['data-url'], query=None) + video_id = self._search_regex(r'/([\da-f]+)/index.m3u8', playlist, 'video id') + + return { + 'id': video_id, + 'title': remove_end(self._og_search_title(video_page), ' - Murrtube'), + 'age_limit': 18, + 'formats': self._extract_m3u8_formats(playlist, video_id, 'mp4'), + 'description': self._og_search_description(video_page), + 'thumbnail': update_url(self._og_search_thumbnail(video_page, default=''), query=None) or None, + 'uploader': clean_html(get_element_by_class('pl-1 is-size-6 has-text-lighter', video_page)), + 'view_count': self._extract_count('Views', video_page), + 'like_count': self._extract_count('Likes', video_page), + 'comment_count': self._extract_count('Comments', video_page), + } + + +class MurrtubeUserIE(InfoExtractor): + _WORKING = False + IE_DESC = 'Murrtube user profile' + _VALID_URL = r'https?://murrtube\.net/(?P[^/]+)$' + _TESTS = [{ + 'url': 'https://murrtube.net/stormy', + 'info_dict': { + 'id': 'stormy', + }, + 'playlist_mincount': 27, + }] + _PAGE_SIZE = 10 def _download_gql(self, video_id, op, note=None, fatal=True): result = self._download_json( @@ -46,73 +110,6 @@ def _download_gql(self, video_id, op, note=None, fatal=True): headers={'Content-Type': 'application/json'}) return result['data'] - def _real_extract(self, url): - video_id = self._match_id(url) - data = self._download_gql(video_id, { - 'operationName': 'Medium', - 'variables': { - 'id': video_id, - }, - 'query': '''\ -query Medium($id: ID!) { - medium(id: $id) { - title - description - key - duration - commentsCount - likesCount - viewsCount - thumbnailKey - tagList - user { - name - __typename - } - __typename - } -}'''}) - meta = data['medium'] - - storage_url = 'https://storage.murrtube.net/murrtube/' - format_url = storage_url + meta.get('key', '') - thumbnail = storage_url + meta.get('thumbnailKey', '') - - if determine_ext(format_url) == 'm3u8': - formats = self._extract_m3u8_formats( - format_url, video_id, 'mp4', entry_protocol='m3u8_native', fatal=False) - else: - formats = [{'url': format_url}] - - return { - 'id': video_id, - 'title': meta.get('title'), - 'description': meta.get('description'), - 'formats': formats, - 'thumbnail': thumbnail, - 'duration': int_or_none(meta.get('duration')), - 'uploader': try_get(meta, lambda x: x['user']['name']), - 'view_count': meta.get('viewsCount'), - 'like_count': meta.get('likesCount'), - 'comment_count': meta.get('commentsCount'), - 'tags': meta.get('tagList'), - 'age_limit': 18, - } - - -class MurrtubeUserIE(MurrtubeIE): # XXX: Do not subclass from concrete IE - _WORKING = False - IE_DESC = 'Murrtube user profile' - _VALID_URL = r'https?://murrtube\.net/(?P[^/]+)$' - _TEST = { - 'url': 'https://murrtube.net/stormy', - 'info_dict': { - 'id': 'stormy', - }, - 'playlist_mincount': 27, - } - _PAGE_SIZE = 10 - def _fetch_page(self, username, user_id, page): data = self._download_gql(username, { 'operationName': 'Media', diff --git a/yt_dlp/extractor/niconico.py b/yt_dlp/extractor/niconico.py index 9d7b010c5..179e7a9b1 100644 --- a/yt_dlp/extractor/niconico.py +++ b/yt_dlp/extractor/niconico.py @@ -40,7 +40,6 @@ class NiconicoIE(InfoExtractor): _TESTS = [{ 'url': 'http://www.nicovideo.jp/watch/sm22312215', - 'md5': 'd1a75c0823e2f629128c43e1212760f9', 'info_dict': { 'id': 'sm22312215', 'ext': 'mp4', @@ -56,8 +55,8 @@ class NiconicoIE(InfoExtractor): 'comment_count': int, 'genres': ['未設定'], 'tags': [], - 'expected_protocol': str, }, + 'params': {'skip_download': 'm3u8'}, }, { # File downloaded with and without credentials are different, so omit # the md5 field @@ -77,8 +76,8 @@ class NiconicoIE(InfoExtractor): 'view_count': int, 'genres': ['音楽・サウンド'], 'tags': ['Translation_Request', 'Kagamine_Rin', 'Rin_Original'], - 'expected_protocol': str, }, + 'params': {'skip_download': 'm3u8'}, }, { # 'video exists but is marked as "deleted" # md5 is unstable @@ -112,7 +111,6 @@ class NiconicoIE(InfoExtractor): }, { # video not available via `getflv`; "old" HTML5 video 'url': 'http://www.nicovideo.jp/watch/sm1151009', - 'md5': 'f95a3d259172667b293530cc2e41ebda', 'info_dict': { 'id': 'sm1151009', 'ext': 'mp4', @@ -128,11 +126,10 @@ class NiconicoIE(InfoExtractor): 'comment_count': int, 'genres': ['ゲーム'], 'tags': [], - 'expected_protocol': str, }, + 'params': {'skip_download': 'm3u8'}, }, { # "New" HTML5 video - # md5 is unstable 'url': 'http://www.nicovideo.jp/watch/sm31464864', 'info_dict': { 'id': 'sm31464864', @@ -149,12 +146,11 @@ class NiconicoIE(InfoExtractor): 'comment_count': int, 'genres': ['アニメ'], 'tags': [], - 'expected_protocol': str, }, + 'params': {'skip_download': 'm3u8'}, }, { # Video without owner 'url': 'http://www.nicovideo.jp/watch/sm18238488', - 'md5': 'd265680a1f92bdcbbd2a507fc9e78a9e', 'info_dict': { 'id': 'sm18238488', 'ext': 'mp4', @@ -168,8 +164,8 @@ class NiconicoIE(InfoExtractor): 'comment_count': int, 'genres': ['エンターテイメント'], 'tags': [], - 'expected_protocol': str, }, + 'params': {'skip_download': 'm3u8'}, }, { 'url': 'http://sp.nicovideo.jp/watch/sm28964488?ss_pos=1&cp_in=wt_tg', 'only_matching': True, @@ -458,9 +454,11 @@ def _real_extract(self, url): if video_id.startswith('so'): video_id = self._match_id(handle.url) - api_data = self._parse_json(self._html_search_regex( - 'data-api-data="([^"]+)"', webpage, - 'API data', default='{}'), video_id) + api_data = traverse_obj( + self._parse_json(self._html_search_meta('server-response', webpage) or '', video_id), + ('data', 'response', {dict})) + if not api_data: + raise ExtractorError('Server response data not found') except ExtractorError as e: try: api_data = self._download_json( diff --git a/yt_dlp/extractor/olympics.py b/yt_dlp/extractor/olympics.py index becf052f6..bbf83e531 100644 --- a/yt_dlp/extractor/olympics.py +++ b/yt_dlp/extractor/olympics.py @@ -1,9 +1,19 @@ from .common import InfoExtractor -from ..utils import int_or_none, try_get +from ..networking.exceptions import HTTPError +from ..utils import ( + ExtractorError, + int_or_none, + parse_iso8601, + parse_qs, + try_get, + update_url, + url_or_none, +) +from ..utils.traversal import traverse_obj class OlympicsReplayIE(InfoExtractor): - _VALID_URL = r'https?://(?:www\.)?olympics\.com(?:/tokyo-2020)?/[a-z]{2}/(?:replay|video)/(?P[^/#&?]+)' + _VALID_URL = r'https?://(?:www\.)?olympics\.com/[a-z]{2}/(?:paris-2024/)?(?:replay|videos?|original-series/episode)/(?P[\w-]+)' _TESTS = [{ 'url': 'https://olympics.com/fr/video/men-s-109kg-group-a-weightlifting-tokyo-2020-replays', 'info_dict': { @@ -11,26 +21,105 @@ class OlympicsReplayIE(InfoExtractor): 'ext': 'mp4', 'title': '+109kg (H) Groupe A - Haltérophilie | Replay de Tokyo 2020', 'upload_date': '20210801', - 'timestamp': 1627783200, + 'timestamp': 1627797600, 'description': 'md5:c66af4a5bc7429dbcc43d15845ff03b3', - 'uploader': 'International Olympic Committee', - }, - 'params': { - 'skip_download': True, + 'thumbnail': 'https://img.olympics.com/images/image/private/t_1-1_1280/primary/nua4o7zwyaznoaejpbk2', + 'duration': 7017.0, }, }, { - 'url': 'https://olympics.com/tokyo-2020/en/replay/bd242924-4b22-49a5-a846-f1d4c809250d/mens-bronze-medal-match-hun-esp', - 'only_matching': True, + 'url': 'https://olympics.com/en/original-series/episode/b-boys-and-b-girls-take-the-spotlight-breaking-life-road-to-paris-2024', + 'info_dict': { + 'id': '32633650-c5ee-4280-8b94-fb6defb6a9b5', + 'ext': 'mp4', + 'title': 'B-girl Nicka - Breaking Life, Road to Paris 2024 | Episode 1', + 'upload_date': '20240517', + 'timestamp': 1715948200, + 'description': 'md5:f63d728a41270ec628f6ac33ce471bb1', + 'thumbnail': 'https://img.olympics.com/images/image/private/t_1-1_1280/primary/a3j96l7j6so3vyfijby1', + 'duration': 1321.0, + }, + }, { + 'url': 'https://olympics.com/en/paris-2024/videos/men-s-preliminaries-gbr-esp-ned-rsa-hockey-olympic-games-paris-2024', + 'info_dict': { + 'id': '3d96db23-8eee-4b7c-8ef5-488a0361026c', + 'ext': 'mp4', + 'title': 'Men\'s Preliminaries GBR-ESP & NED-RSA | Hockey | Olympic Games Paris 2024', + 'upload_date': '20240727', + 'timestamp': 1722066600, + }, + 'skip': 'Geo-restricted to RU, BR, BT, NP, TM, BD, TL', + }, { + 'url': 'https://olympics.com/en/paris-2024/videos/dnp-suni-lee-i-have-goals-and-i-have-expectations-for-myself-but-i-also-am-trying-to-give-myself-grace', + 'info_dict': { + 'id': 'a42f37ab-8a74-41d0-a7d9-af27b7b02a90', + 'ext': 'mp4', + 'title': 'md5:c7cfbc9918636a98e66400a812e4d407', + 'upload_date': '20240729', + 'timestamp': 1722288600, + }, }] + _GEO_BYPASS = False + + def _extract_from_nextjs_data(self, webpage, video_id): + data = traverse_obj(self._search_nextjs_data(webpage, video_id, default={}), ( + 'props', 'pageProps', 'page', 'items', + lambda _, v: v['name'] == 'videoPlaylist', 'data', 'currentVideo', {dict}, any)) + if not data: + return None + + geo_countries = traverse_obj(data, ('countries', ..., {str})) + if traverse_obj(data, ('geoRestrictedVideo', {bool})): + self.raise_geo_restricted(countries=geo_countries) + + is_live = traverse_obj(data, ('streamingStatus', {str})) == 'LIVE' + m3u8_url = traverse_obj(data, ('videoUrl', {url_or_none})) or data['streamUrl'] + tokenized_url = self._tokenize_url(m3u8_url, data['jwtToken'], is_live, video_id) + + try: + formats, subtitles = self._extract_m3u8_formats_and_subtitles( + tokenized_url, video_id, 'mp4', m3u8_id='hls') + except ExtractorError as e: + if isinstance(e.cause, HTTPError) and 'georestricted' in e.cause.msg: + self.raise_geo_restricted(countries=geo_countries) + raise + + return { + 'formats': formats, + 'subtitles': subtitles, + 'is_live': is_live, + **traverse_obj(data, { + 'id': ('videoID', {str}), + 'title': ('title', {str}), + 'timestamp': ('contentDate', {parse_iso8601}), + }), + } + + def _tokenize_url(self, url, token, is_live, video_id): + return self._download_json( + 'https://metering.olympics.com/tokengenerator', video_id, + 'Downloading tokenized m3u8 url', query={ + **parse_qs(url), + 'url': update_url(url, query=None), + 'service-id': 'live' if is_live else 'vod', + 'user-auth': token, + })['data']['url'] + + def _legacy_tokenize_url(self, url, video_id): + return self._download_json( + 'https://olympics.com/tokenGenerator', video_id, + 'Downloading legacy tokenized m3u8 url', query={'url': url}) def _real_extract(self, url): video_id = self._match_id(url) - webpage = self._download_webpage(url, video_id) + + if info := self._extract_from_nextjs_data(webpage, video_id): + return info + title = self._html_search_meta(('title', 'og:title', 'twitter:title'), webpage) - uuid = self._html_search_meta('episode_uid', webpage) + video_uuid = self._html_search_meta('episode_uid', webpage) m3u8_url = self._html_search_meta('video_url', webpage) - json_ld = self._search_json_ld(webpage, uuid) + json_ld = self._search_json_ld(webpage, video_uuid) thumbnails_list = json_ld.get('image') if not thumbnails_list: thumbnails_list = self._html_search_regex( @@ -48,12 +137,12 @@ def _real_extract(self, url): 'width': width, 'height': int_or_none(try_get(width, lambda x: x * height_a / width_a)), }) - m3u8_url = self._download_json( - f'https://olympics.com/tokenGenerator?url={m3u8_url}', uuid, note='Downloading m3u8 url') - formats, subtitles = self._extract_m3u8_formats_and_subtitles(m3u8_url, uuid, 'mp4', m3u8_id='hls') + + formats, subtitles = self._extract_m3u8_formats_and_subtitles( + self._legacy_tokenize_url(m3u8_url, video_uuid), video_uuid, 'mp4', m3u8_id='hls') return { - 'id': uuid, + 'id': video_uuid, 'title': title, 'thumbnails': thumbnails, 'formats': formats, diff --git a/yt_dlp/extractor/patreon.py b/yt_dlp/extractor/patreon.py index 7d6e8439c..4489d533a 100644 --- a/yt_dlp/extractor/patreon.py +++ b/yt_dlp/extractor/patreon.py @@ -420,7 +420,7 @@ def _get_comments(self, post_id): class PatreonCampaignIE(PatreonBaseIE): - _VALID_URL = r'https?://(?:www\.)?patreon\.com/(?!rss)(?:(?:m/(?P\d+))|(?P[-\w]+))' + _VALID_URL = r'https?://(?:www\.)?patreon\.com/(?!rss)(?:(?:m|api/campaigns)/(?P\d+)|(?P[-\w]+))' _TESTS = [{ 'url': 'https://www.patreon.com/dissonancepod/', 'info_dict': { @@ -442,25 +442,44 @@ class PatreonCampaignIE(PatreonBaseIE): 'url': 'https://www.patreon.com/m/4767637/posts', 'info_dict': { 'title': 'Not Just Bikes', - 'channel_follower_count': int, 'id': '4767637', 'channel_id': '4767637', 'channel_url': 'https://www.patreon.com/notjustbikes', - 'description': 'md5:595c6e7dca76ae615b1d38c298a287a1', + 'description': 'md5:9f4b70051216c4d5c58afe580ffc8d0f', 'age_limit': 0, 'channel': 'Not Just Bikes', 'uploader_url': 'https://www.patreon.com/notjustbikes', - 'uploader': 'Not Just Bikes', + 'uploader': 'Jason', 'uploader_id': '37306634', 'thumbnail': r're:^https?://.*$', }, 'playlist_mincount': 71, + }, { + 'url': 'https://www.patreon.com/api/campaigns/4243769/posts', + 'info_dict': { + 'title': 'Second Thought', + 'channel_follower_count': int, + 'id': '4243769', + 'channel_id': '4243769', + 'channel_url': 'https://www.patreon.com/secondthought', + 'description': 'md5:69c89a3aba43efdb76e85eb023e8de8b', + 'age_limit': 0, + 'channel': 'Second Thought', + 'uploader_url': 'https://www.patreon.com/secondthought', + 'uploader': 'JT Chapman', + 'uploader_id': '32718287', + 'thumbnail': r're:^https?://.*$', + }, + 'playlist_mincount': 201, }, { 'url': 'https://www.patreon.com/dissonancepod/posts', 'only_matching': True, }, { 'url': 'https://www.patreon.com/m/5932659', 'only_matching': True, + }, { + 'url': 'https://www.patreon.com/api/campaigns/4243769', + 'only_matching': True, }] @classmethod diff --git a/yt_dlp/extractor/picarto.py b/yt_dlp/extractor/picarto.py index 726fe4142..72e89c31e 100644 --- a/yt_dlp/extractor/picarto.py +++ b/yt_dlp/extractor/picarto.py @@ -5,6 +5,7 @@ ExtractorError, str_or_none, traverse_obj, + update_url, ) @@ -43,15 +44,16 @@ def _real_extract(self, url): url } }''' % (channel_id, channel_id), # noqa: UP031 - })['data'] + }, headers={'Accept': '*/*', 'Content-Type': 'application/json'})['data'] metadata = data['channel'] if metadata.get('online') == 0: raise ExtractorError('Stream is offline', expected=True) title = metadata['title'] - cdn_data = self._download_json( - data['getLoadBalancerUrl']['url'] + '/stream/json_' + metadata['stream_name'] + '.js', + cdn_data = self._download_json(''.join(( + update_url(data['getLoadBalancerUrl']['url'], scheme='https'), + '/stream/json_', metadata['stream_name'], '.js')), channel_id, 'Downloading load balancing info') formats = [] @@ -99,10 +101,10 @@ class PicartoVodIE(InfoExtractor): }, 'skip': 'The VOD does not exist', }, { - 'url': 'https://picarto.tv/ArtofZod/videos/772650', - 'md5': '00067a0889f1f6869cc512e3e79c521b', + 'url': 'https://picarto.tv/ArtofZod/videos/771008', + 'md5': 'abef5322f2700d967720c4c6754b2a34', 'info_dict': { - 'id': '772650', + 'id': '771008', 'ext': 'mp4', 'title': 'Art of Zod - Drawing and Painting', 'thumbnail': r're:^https?://.*\.jpg', @@ -131,7 +133,7 @@ def _real_extract(self, url): }} }} }}''', - })['data']['video'] + }, headers={'Accept': '*/*', 'Content-Type': 'application/json'})['data']['video'] file_name = data['file_name'] netloc = urllib.parse.urlparse(data['video_recording_image_url']).netloc diff --git a/yt_dlp/extractor/soundcloud.py b/yt_dlp/extractor/soundcloud.py index 0c6f0b070..4f8d96407 100644 --- a/yt_dlp/extractor/soundcloud.py +++ b/yt_dlp/extractor/soundcloud.py @@ -314,23 +314,11 @@ def add_format(f, protocol, is_preview=False): self.write_debug(f'"{identifier}" is not a requested format, skipping') continue - stream = None - for retry in self.RetryManager(fatal=False): - try: - stream = self._call_api( - format_url, track_id, f'Downloading {identifier} format info JSON', - query=query, headers=self._HEADERS) - except ExtractorError as e: - if isinstance(e.cause, HTTPError) and e.cause.status == 429: - self.report_warning( - 'You have reached the API rate limit, which is ~600 requests per ' - '10 minutes. Use the --extractor-retries and --retry-sleep options ' - 'to configure an appropriate retry count and wait time', only_once=True) - retry.error = e.cause - else: - self.report_warning(e.msg) + # XXX: if not extract_flat, 429 error must be caught where _extract_info_dict is called + stream_url = traverse_obj(self._call_api( + format_url, track_id, f'Downloading {identifier} format info JSON', + query=query, headers=self._HEADERS), ('url', {url_or_none})) - stream_url = traverse_obj(stream, ('url', {url_or_none})) if invalid_url(stream_url): continue format_urls.add(stream_url) @@ -647,7 +635,17 @@ def _real_extract(self, url): info = self._call_api( info_json_url, full_title, 'Downloading info JSON', query=query, headers=self._HEADERS) - return self._extract_info_dict(info, full_title, token) + for retry in self.RetryManager(): + try: + return self._extract_info_dict(info, full_title, token) + except ExtractorError as e: + if not isinstance(e.cause, HTTPError) or not e.cause.status == 429: + raise + self.report_warning( + 'You have reached the API rate limit, which is ~600 requests per ' + '10 minutes. Use the --extractor-retries and --retry-sleep options ' + 'to configure an appropriate retry count and wait time', only_once=True) + retry.error = e.cause class SoundcloudPlaylistBaseIE(SoundcloudBaseIE): @@ -873,7 +871,7 @@ class SoundcloudUserPermalinkIE(SoundcloudPagedPlaylistBaseIE): 'id': '30909869', 'title': 'neilcic', }, - 'playlist_mincount': 23, + 'playlist_mincount': 22, }] def _real_extract(self, url): @@ -882,7 +880,7 @@ def _real_extract(self, url): self._resolv_url(url), user_id, 'Downloading user info', headers=self._HEADERS) return self._extract_playlist( - f'{self._API_V2_BASE}stream/users/{user["id"]}', str(user['id']), user.get('username')) + f'{self._API_V2_BASE}users/{user["id"]}/tracks', str(user['id']), user.get('username')) class SoundcloudTrackStationIE(SoundcloudPagedPlaylistBaseIE): diff --git a/yt_dlp/extractor/swearnet.py b/yt_dlp/extractor/swearnet.py index b4835c5ad..2d6fb3eb4 100644 --- a/yt_dlp/extractor/swearnet.py +++ b/yt_dlp/extractor/swearnet.py @@ -1,55 +1,31 @@ -from .common import InfoExtractor -from ..utils import ExtractorError, int_or_none, traverse_obj +from .vidyard import VidyardBaseIE +from ..utils import ExtractorError, int_or_none, make_archive_id -class SwearnetEpisodeIE(InfoExtractor): +class SwearnetEpisodeIE(VidyardBaseIE): _VALID_URL = r'https?://www\.swearnet\.com/shows/(?P[\w-]+)/seasons/(?P\d+)/episodes/(?P\d+)' _TESTS = [{ 'url': 'https://www.swearnet.com/shows/gettin-learnt-with-ricky/seasons/1/episodes/1', 'info_dict': { - 'id': '232819', + 'id': 'wicK2EOzjOdxkUXGDIgcPw', + 'display_id': '232819', 'ext': 'mp4', 'episode_number': 1, 'episode': 'Episode 1', 'duration': 719, - 'description': 'md5:c48ef71440ce466284c07085cd7bd761', + 'description': r're:Are you drunk and high and craving a grilled cheese sandwich.+', 'season': 'Season 1', 'title': 'Episode 1 - Grilled Cheese Sammich', 'season_number': 1, - 'thumbnail': 'https://cdn.vidyard.com/thumbnails/232819/_RX04IKIq60a2V6rIRqq_Q_small.jpg', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/custom/0dd74f9b-388a-452e-b570-b407fb64435b_small.jpg', + 'tags': ['Getting Learnt with Ricky', 'drunk', 'grilled cheese', 'high'], + '_old_archive_ids': ['swearnetepisode 232819'], }, }] - def _get_formats_and_subtitle(self, video_source, video_id): - video_source = video_source or {} - formats, subtitles = [], {} - for key, value in video_source.items(): - if key == 'hls': - for video_hls in value: - fmts, subs = self._extract_m3u8_formats_and_subtitles(video_hls.get('url'), video_id) - formats.extend(fmts) - self._merge_subtitles(subs, target=subtitles) - else: - formats.extend({ - 'url': video_mp4.get('url'), - 'ext': 'mp4', - } for video_mp4 in value) - - return formats, subtitles - - def _get_direct_subtitle(self, caption_json): - subs = {} - for caption in caption_json: - subs.setdefault(caption.get('language') or 'und', []).append({ - 'url': caption.get('vttUrl'), - 'name': caption.get('name'), - }) - - return subs - def _real_extract(self, url): - display_id, season_number, episode_number = self._match_valid_url(url).group('id', 'season_num', 'episode_num') - webpage = self._download_webpage(url, display_id) + slug, season_number, episode_number = self._match_valid_url(url).group('id', 'season_num', 'episode_num') + webpage = self._download_webpage(url, slug) try: external_id = self._search_regex(r'externalid\s*=\s*"([^"]+)', webpage, 'externalid') @@ -58,22 +34,12 @@ def _real_extract(self, url): self.raise_login_required() raise - json_data = self._download_json( - f'https://play.vidyard.com/player/{external_id}.json', display_id)['payload']['chapters'][0] - - formats, subtitles = self._get_formats_and_subtitle(json_data['sources'], display_id) - self._merge_subtitles(self._get_direct_subtitle(json_data.get('captions')), target=subtitles) + info = self._process_video_json(self._fetch_video_json(external_id)['chapters'][0], external_id) + if info.get('display_id'): + info['_old_archive_ids'] = [make_archive_id(self, info['display_id'])] return { - 'id': str(json_data['videoId']), - 'title': json_data.get('name') or self._html_search_meta(['og:title', 'twitter:title'], webpage), - 'description': (json_data.get('description') - or self._html_search_meta(['og:description', 'twitter:description'], webpage)), - 'duration': int_or_none(json_data.get('seconds')), - 'formats': formats, - 'subtitles': subtitles, + **info, 'season_number': int_or_none(season_number), 'episode_number': int_or_none(episode_number), - 'thumbnails': [{'url': thumbnail_url} - for thumbnail_url in traverse_obj(json_data, ('thumbnailUrls', ...))], } diff --git a/yt_dlp/extractor/tiktok.py b/yt_dlp/extractor/tiktok.py index c3505b14f..9d823a315 100644 --- a/yt_dlp/extractor/tiktok.py +++ b/yt_dlp/extractor/tiktok.py @@ -23,7 +23,6 @@ mimetype2ext, parse_qs, qualities, - remove_start, srt_subtitles_timecode, str_or_none, traverse_obj, @@ -254,7 +253,16 @@ def _extract_web_data_and_status(self, url, video_id, fatal=True): def _get_subtitles(self, aweme_detail, aweme_id, user_name): # TODO: Extract text positioning info + + EXT_MAP = { # From lowest to highest preference + 'creator_caption': 'json', + 'srt': 'srt', + 'webvtt': 'vtt', + } + preference = qualities(tuple(EXT_MAP.values())) + subtitles = {} + # aweme/detail endpoint subs captions_info = traverse_obj( aweme_detail, ('interaction_stickers', ..., 'auto_video_caption_info', 'auto_captions', ...), expected_type=dict) @@ -278,8 +286,8 @@ def _get_subtitles(self, aweme_detail, aweme_id, user_name): if not caption.get('url'): continue subtitles.setdefault(caption.get('lang') or 'en', []).append({ - 'ext': remove_start(caption.get('caption_format'), 'web'), 'url': caption['url'], + 'ext': EXT_MAP.get(caption.get('Format')), }) # webpage subs if not subtitles: @@ -288,9 +296,14 @@ def _get_subtitles(self, aweme_detail, aweme_id, user_name): self._create_url(user_name, aweme_id), aweme_id, fatal=False) for caption in traverse_obj(aweme_detail, ('video', 'subtitleInfos', lambda _, v: v['Url'])): subtitles.setdefault(caption.get('LanguageCodeName') or 'en', []).append({ - 'ext': remove_start(caption.get('Format'), 'web'), 'url': caption['Url'], + 'ext': EXT_MAP.get(caption.get('Format')), }) + + # Deprioritize creator_caption json since it can't be embedded or used by media players + for lang, subs_list in subtitles.items(): + subtitles[lang] = sorted(subs_list, key=lambda x: preference(x['ext'])) + return subtitles def _parse_url_key(self, url_key): @@ -1458,9 +1471,11 @@ def _real_extract(self, url): if webpage: data = self._get_sigi_state(webpage, uploader or room_id) - room_id = (traverse_obj(data, ('UserModule', 'users', ..., 'roomId', {str_or_none}), get_all=False) - or self._search_regex(r'snssdk\d*://live\?room_id=(\d+)', webpage, 'room ID', default=None) - or room_id) + room_id = ( + traverse_obj(data, (( + ('LiveRoom', 'liveRoomUserInfo', 'user'), + ('UserModule', 'users', ...)), 'roomId', {str}, any)) + or self._search_regex(r'snssdk\d*://live\?room_id=(\d+)', webpage, 'room ID', default=room_id)) uploader = uploader or traverse_obj( data, ('LiveRoom', 'liveRoomUserInfo', 'user', 'uniqueId'), ('UserModule', 'users', ..., 'uniqueId'), get_all=False, expected_type=str) diff --git a/yt_dlp/extractor/toggle.py b/yt_dlp/extractor/toggle.py index de2e03f17..fbef7cc0f 100644 --- a/yt_dlp/extractor/toggle.py +++ b/yt_dlp/extractor/toggle.py @@ -28,35 +28,11 @@ class ToggleIE(InfoExtractor): 'skip_download': 'm3u8 download', }, }, { - 'note': 'DRM-protected video', 'url': 'http://www.mewatch.sg/en/movies/dug-s-special-mission/341413', - 'info_dict': { - 'id': '341413', - 'ext': 'wvm', - 'title': 'Dug\'s Special Mission', - 'description': 'md5:e86c6f4458214905c1772398fabc93e0', - 'upload_date': '20150827', - 'timestamp': 1440644006, - }, - 'params': { - 'skip_download': 'DRM-protected wvm download', - }, + 'only_matching': True, }, { - # this also tests correct video id extraction - 'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay', 'url': 'http://www.mewatch.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861', - 'info_dict': { - 'id': '332861', - 'ext': 'mp4', - 'title': '28th SEA Games (5 Show) - Episode 11', - 'description': 'md5:3cd4f5f56c7c3b1340c50a863f896faa', - 'upload_date': '20150605', - 'timestamp': 1433480166, - }, - 'params': { - 'skip_download': 'DRM-protected wvm download', - }, - 'skip': 'm3u8 links are geo-restricted', + 'only_matching': True, }, { 'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331', 'only_matching': True, diff --git a/yt_dlp/extractor/tv5mondeplus.py b/yt_dlp/extractor/tv5mondeplus.py index 52ff230f2..953eb77ed 100644 --- a/yt_dlp/extractor/tv5mondeplus.py +++ b/yt_dlp/extractor/tv5mondeplus.py @@ -96,7 +96,7 @@ def _extract_subtitles(data_captions): def _real_extract(self, url): display_id = self._match_id(url) - webpage = self._download_webpage(url, display_id) + webpage = self._download_webpage(url, display_id, impersonate=True) if ">Ce programme n'est malheureusement pas disponible pour votre zone géographique.<" in webpage: self.raise_geo_restricted(countries=['FR']) @@ -122,8 +122,9 @@ def process_video_files(v): if not token: continue deferred_json = self._download_json( - f'https://api.tv5monde.com/player/asset/{d_param}/resolve?condenseKS=true', display_id, - note='Downloading deferred info', headers={'Authorization': f'Bearer {token}'}, fatal=False) + f'https://api.tv5monde.com/player/asset/{d_param}/resolve?condenseKS=true', + display_id, 'Downloading deferred info', fatal=False, impersonate=True, + headers={'Authorization': f'Bearer {token}'}) v_url = traverse_obj(deferred_json, (0, 'url', {url_or_none})) if not v_url: continue diff --git a/yt_dlp/extractor/tva.py b/yt_dlp/extractor/tva.py index e3e10557c..d702640f3 100644 --- a/yt_dlp/extractor/tva.py +++ b/yt_dlp/extractor/tva.py @@ -1,60 +1,29 @@ import functools import re +from .brightcove import BrightcoveNewIE from .common import InfoExtractor from ..utils import float_or_none, int_or_none, smuggle_url, strip_or_none from ..utils.traversal import traverse_obj class TVAIE(InfoExtractor): - _VALID_URL = r'https?://videos?\.tva\.ca/details/_(?P\d+)' + IE_NAME = 'tvaplus' + IE_DESC = 'TVA+' + _VALID_URL = r'https?://(?:www\.)?tvaplus\.ca/(?:[^/?#]+/)*[\w-]+-(?P\d+)(?:$|[#?])' _TESTS = [{ - 'url': 'https://videos.tva.ca/details/_5596811470001', - 'info_dict': { - 'id': '5596811470001', - 'ext': 'mp4', - 'title': 'Un extrait de l\'épisode du dimanche 8 octobre 2017 !', - 'uploader_id': '5481942443001', - 'upload_date': '20171003', - 'timestamp': 1507064617, - }, - 'params': { - # m3u8 download - 'skip_download': True, - }, - 'skip': 'HTTP Error 404: Not Found', - }, { - 'url': 'https://video.tva.ca/details/_5596811470001', - 'only_matching': True, - }] - BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5481942443001/default_default/index.html?videoId=%s' - - def _real_extract(self, url): - video_id = self._match_id(url) - - return { - '_type': 'url_transparent', - 'id': video_id, - 'url': smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % video_id, {'geo_countries': ['CA']}), - 'ie_key': 'BrightcoveNew', - } - - -class QubIE(InfoExtractor): - _VALID_URL = r'https?://(?:www\.)?qub\.ca/(?:[^/]+/)*[0-9a-z-]+-(?P\d+)' - _TESTS = [{ - 'url': 'https://www.qub.ca/tvaplus/tva/alerte-amber/saison-1/episode-01-1000036619', + 'url': 'https://www.tvaplus.ca/tva/alerte-amber/saison-1/episode-01-1000036619', 'md5': '949490fd0e7aee11d0543777611fbd53', 'info_dict': { 'id': '6084352463001', 'ext': 'mp4', - 'title': 'Ép 01. Mon dernier jour', + 'title': 'Mon dernier jour', 'uploader_id': '5481942443001', 'upload_date': '20190907', 'timestamp': 1567899756, 'description': 'md5:9c0d7fbb90939420c651fd977df90145', 'thumbnail': r're:https://.+\.jpg', - 'episode': 'Ép 01. Mon dernier jour', + 'episode': 'Mon dernier jour', 'episode_number': 1, 'tags': ['alerte amber', 'alerte amber saison 1', 'surdemande'], 'duration': 2625.963, @@ -64,23 +33,36 @@ class QubIE(InfoExtractor): 'channel': 'TVA', }, }, { - 'url': 'https://www.qub.ca/tele/video/lcn-ca-vous-regarde-rev-30s-ap369664-1009357943', - 'only_matching': True, + 'url': 'https://www.tvaplus.ca/tva/le-baiser-du-barbu/le-baiser-du-barbu-886644190', + 'info_dict': { + 'id': '6354448043112', + 'ext': 'mp4', + 'title': 'Le Baiser du barbu', + 'uploader_id': '5481942443001', + 'upload_date': '20240606', + 'timestamp': 1717694023, + 'description': 'md5:025b1219086c1cbf4bc27e4e034e8b57', + 'thumbnail': r're:https://.+\.jpg', + 'episode': 'Le Baiser du barbu', + 'tags': ['fullepisode', 'films'], + 'duration': 6053.504, + 'series': 'Le Baiser du barbu', + 'channel': 'TVA', + }, }] - # reference_id also works with old account_id(5481942443001) - # BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5813221784001/default_default/index.html?videoId=ref:%s' + _BC_URL_TMPL = 'https://players.brightcove.net/5481942443001/default_default/index.html?videoId={}' def _real_extract(self, url): entity_id = self._match_id(url) webpage = self._download_webpage(url, entity_id) - entity = self._search_nextjs_data(webpage, entity_id)['props']['initialProps']['pageProps']['fallbackData'] + entity = self._search_nextjs_data(webpage, entity_id)['props']['pageProps']['staticEntity'] video_id = entity['videoId'] episode = strip_or_none(entity.get('name')) return { '_type': 'url_transparent', - 'url': f'https://videos.tva.ca/details/_{video_id}', - 'ie_key': TVAIE.ie_key(), + 'url': smuggle_url(self._BC_URL_TMPL.format(video_id), {'geo_countries': ['CA']}), + 'ie_key': BrightcoveNewIE.ie_key(), 'id': video_id, 'title': episode, 'episode': episode, diff --git a/yt_dlp/extractor/tver.py b/yt_dlp/extractor/tver.py index 8105db41c..c13832c6f 100644 --- a/yt_dlp/extractor/tver.py +++ b/yt_dlp/extractor/tver.py @@ -10,7 +10,7 @@ class TVerIE(InfoExtractor): - _VALID_URL = r'https?://(?:www\.)?tver\.jp/(?:(?Plp|corner|series|episodes?|feature|tokyo2020/video)/)+(?P[a-zA-Z0-9]+)' + _VALID_URL = r'https?://(?:www\.)?tver\.jp/(?:(?Plp|corner|series|episodes?|feature|tokyo2020/video|olympic/paris2024/video)/)+(?P[a-zA-Z0-9]+)' _TESTS = [{ 'skip': 'videos are only available for 7 days', 'url': 'https://tver.jp/episodes/ep83nf3w4p', @@ -23,6 +23,20 @@ class TVerIE(InfoExtractor): 'channel': 'テレビ朝日', }, 'add_ie': ['BrightcoveNew'], + }, { + 'url': 'https://tver.jp/olympic/paris2024/video/6359578055112/', + 'info_dict': { + 'id': '6359578055112', + 'ext': 'mp4', + 'title': '堀米雄斗 金メダルで五輪連覇!「みんなの応援が最後に乗れたカギ」', + 'timestamp': 1722279928, + 'upload_date': '20240729', + 'tags': ['20240729', 'japanese', 'japanmedal', 'paris'], + 'uploader_id': '4774017240001', + 'thumbnail': r're:https?://[^/?#]+boltdns\.net/[^?#]+/1920x1080/match/image\.jpg', + 'duration': 670.571, + }, + 'params': {'skip_download': 'm3u8'}, }, { 'url': 'https://tver.jp/corner/f0103888', 'only_matching': True, @@ -47,7 +61,15 @@ def _real_initialize(self): def _real_extract(self, url): video_id, video_type = self._match_valid_url(url).group('id', 'type') - if video_type not in {'series', 'episodes'}: + + if video_type == 'olympic/paris2024/video': + # Player ID is taken from .content.brightcove.E200.pro.pc.account_id: + # https://tver.jp/olympic/paris2024/req/api/hook?q=https%3A%2F%2Folympic-assets.tver.jp%2Fweb-static%2Fjson%2Fconfig.json&d= + return self.url_result(smuggle_url( + self.BRIGHTCOVE_URL_TEMPLATE % ('4774017240001', video_id), + {'geo_countries': ['JP']}), 'BrightcoveNew') + + elif video_type not in {'series', 'episodes'}: webpage = self._download_webpage(url, video_id, note='Resolving to new URL') video_id = self._match_id(self._search_regex( (r'canonical"\s*href="(https?://tver\.jp/[^"]+)"', r'&link=(https?://tver\.jp/[^?&]+)[?&]'), diff --git a/yt_dlp/extractor/unsupported.py b/yt_dlp/extractor/unsupported.py index 1e2d118aa..8b7ec1dd9 100644 --- a/yt_dlp/extractor/unsupported.py +++ b/yt_dlp/extractor/unsupported.py @@ -49,6 +49,7 @@ class KnownDRMIE(UnsupportedInfoExtractor): r'amazon\.(?:\w{2}\.)?\w+/gp/video', r'music\.amazon\.(?:\w{2}\.)?\w+', r'(?:watch|front)\.njpwworld\.com', + r'qub\.ca/vrai', ) _TESTS = [{ @@ -149,6 +150,9 @@ class KnownDRMIE(UnsupportedInfoExtractor): }, { 'url': 'https://front.njpwworld.com/p/s_series_00563_16_bs', 'only_matching': True, + }, { + 'url': 'https://www.qub.ca/vrai/l-effet-bocuse-d-or/saison-1/l-effet-bocuse-d-or-saison-1-bande-annonce-1098225063', + 'only_matching': True, }] def _real_extract(self, url): diff --git a/yt_dlp/extractor/vidyard.py b/yt_dlp/extractor/vidyard.py new file mode 100644 index 000000000..20a54b161 --- /dev/null +++ b/yt_dlp/extractor/vidyard.py @@ -0,0 +1,426 @@ +import functools +import re + +from .common import InfoExtractor +from ..utils import ( + extract_attributes, + float_or_none, + int_or_none, + join_nonempty, + mimetype2ext, + parse_resolution, + str_or_none, + unescapeHTML, + url_or_none, +) +from ..utils.traversal import traverse_obj + + +class VidyardBaseIE(InfoExtractor): + _HEADERS = {'Referer': 'https://play.vidyard.com/'} + + def _get_formats_and_subtitles(self, sources, video_id): + formats, subtitles = [], {} + + def add_hls_fmts_and_subs(m3u8_url): + fmts, subs = self._extract_m3u8_formats_and_subtitles( + m3u8_url, video_id, 'mp4', m3u8_id='hls', headers=self._HEADERS, fatal=False) + formats.extend(fmts) + self._merge_subtitles(subs, target=subtitles) + + hls_list = isinstance(sources, dict) and sources.pop('hls', None) + if master_m3u8_url := traverse_obj( + hls_list, (lambda _, v: v['profile'] == 'auto', 'url', {url_or_none}, any)): + add_hls_fmts_and_subs(master_m3u8_url) + if not formats: # These are duplicate and unnecesary requests if we got 'auto' hls fmts + for variant_m3u8_url in traverse_obj(hls_list, (..., 'url', {url_or_none})): + add_hls_fmts_and_subs(variant_m3u8_url) + + for source_type, source_list in traverse_obj(sources, ({dict.items}, ...)): + for source in traverse_obj(source_list, lambda _, v: url_or_none(v['url'])): + profile = source.get('profile') + formats.append({ + 'url': source['url'], + 'ext': mimetype2ext(source.get('mimeType'), default=None), + 'format_id': join_nonempty('http', source_type, profile), + **parse_resolution(profile), + }) + + self._remove_duplicate_formats(formats) + return formats, subtitles + + def _get_direct_subtitles(self, caption_json): + subs = {} + for caption in traverse_obj(caption_json, lambda _, v: url_or_none(v['vttUrl'])): + subs.setdefault(caption.get('language') or 'und', []).append({ + 'url': caption['vttUrl'], + 'name': caption.get('name'), + }) + + return subs + + def _fetch_video_json(self, video_id): + return self._download_json( + f'https://play.vidyard.com/player/{video_id}.json', video_id)['payload'] + + def _process_video_json(self, json_data, video_id): + formats, subtitles = self._get_formats_and_subtitles(json_data['sources'], video_id) + self._merge_subtitles(self._get_direct_subtitles(json_data.get('captions')), target=subtitles) + + return { + **traverse_obj(json_data, { + 'id': ('facadeUuid', {str}), + 'display_id': ('videoId', {int}, {str_or_none}), + 'title': ('name', {str}), + 'description': ('description', {str}, {unescapeHTML}, {lambda x: x or None}), + 'duration': (( + ('milliseconds', {functools.partial(float_or_none, scale=1000)}), + ('seconds', {int_or_none})), any), + 'thumbnails': ('thumbnailUrls', ('small', 'normal'), {'url': {url_or_none}}), + 'tags': ('tags', ..., 'name', {str}), + }), + 'formats': formats, + 'subtitles': subtitles, + 'http_headers': self._HEADERS, + } + + +class VidyardIE(VidyardBaseIE): + _VALID_URL = [ + r'https?://[\w-]+(?:\.hubs)?\.vidyard\.com/watch/(?P[\w-]+)', + r'https?://(?:embed|share)\.vidyard\.com/share/(?P[\w-]+)', + r'https?://play\.vidyard\.com/(?:player/)?(?P[\w-]+)', + ] + _EMBED_REGEX = [r']* src=["\'](?P(?:https?:)?//play\.vidyard\.com/[\w-]+)'] + _TESTS = [{ + 'url': 'https://vyexample03.hubs.vidyard.com/watch/oTDMPlUv--51Th455G5u7Q', + 'info_dict': { + 'id': 'oTDMPlUv--51Th455G5u7Q', + 'display_id': '50347', + 'ext': 'mp4', + 'title': 'Homepage Video', + 'description': 'Look I changed the description.', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/50347/OUPa5LTKV46849sLYngMqQ_small.jpg', + 'duration': 99, + 'tags': ['these', 'are', 'all', 'tags'], + }, + }, { + 'url': 'https://share.vidyard.com/watch/PaQzDAT1h8JqB8ivEu2j6Y?', + 'info_dict': { + 'id': 'PaQzDAT1h8JqB8ivEu2j6Y', + 'display_id': '9281024', + 'ext': 'mp4', + 'title': 'Inline Embed', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/spacer.gif', + 'duration': 41.186, + }, + }, { + 'url': 'https://embed.vidyard.com/share/oTDMPlUv--51Th455G5u7Q', + 'info_dict': { + 'id': 'oTDMPlUv--51Th455G5u7Q', + 'display_id': '50347', + 'ext': 'mp4', + 'title': 'Homepage Video', + 'description': 'Look I changed the description.', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/50347/OUPa5LTKV46849sLYngMqQ_small.jpg', + 'duration': 99, + 'tags': ['these', 'are', 'all', 'tags'], + }, + }, { + # First video from playlist below + 'url': 'https://embed.vidyard.com/share/SyStyHtYujcBHe5PkZc5DL', + 'info_dict': { + 'id': 'SyStyHtYujcBHe5PkZc5DL', + 'display_id': '41974005', + 'ext': 'mp4', + 'title': 'Prepare the Frame and Track for Palm Beach Polysatin Shutters With BiFold Track', + 'description': r're:In this video, you will learn how to prepare the frame.+', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/41974005/IJw7oCaJcF1h7WWu3OVZ8A_small.png', + 'duration': 258.666, + }, + }, { + # Playlist + 'url': 'https://thelink.hubs.vidyard.com/watch/pwu7pCYWSwAnPxs8nDoFrE', + 'info_dict': { + 'id': 'pwu7pCYWSwAnPxs8nDoFrE', + 'title': 'PLAYLIST - Palm Beach Shutters- Bi-Fold Track System Installation', + 'entries': [{ + 'id': 'SyStyHtYujcBHe5PkZc5DL', + 'display_id': '41974005', + 'ext': 'mp4', + 'title': 'Prepare the Frame and Track for Palm Beach Polysatin Shutters With BiFold Track', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/41974005/IJw7oCaJcF1h7WWu3OVZ8A_small.png', + 'duration': 258.666, + }, { + 'id': '1Fw4B84jZTXLXWqkE71RiM', + 'display_id': '5861113', + 'ext': 'mp4', + 'title': 'Palm Beach - Bi-Fold Track System "Frame Installation"', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/5861113/29CJ54s5g1_aP38zkKLHew_small.jpg', + 'duration': 167.858, + }, { + 'id': 'DqP3wBvLXSpxrcqpT5kEeo', + 'display_id': '41976334', + 'ext': 'mp4', + 'title': 'Install the Track for Palm Beach Polysatin Shutters With BiFold Track', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/5861090/RwG2VaTylUa6KhSTED1r1Q_small.png', + 'duration': 94.229, + }, { + 'id': 'opfybfxpzQArxqtQYB6oBU', + 'display_id': '41976364', + 'ext': 'mp4', + 'title': 'Install the Panel for Palm Beach Polysatin Shutters With BiFold Track', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/5860926/JIOaJR08dM4QgXi_iQ2zGA_small.png', + 'duration': 191.467, + }, { + 'id': 'rWrXvkbTNNaNqD6189HJya', + 'display_id': '41976382', + 'ext': 'mp4', + 'title': 'Adjust the Panels for Palm Beach Polysatin Shutters With BiFold Track', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/5860687/CwHxBv4UudAhOh43FVB4tw_small.png', + 'duration': 138.155, + }, { + 'id': 'eYPTB521MZ9TPEArSethQ5', + 'display_id': '41976409', + 'ext': 'mp4', + 'title': 'Assemble and Install the Valance for Palm Beach Polysatin Shutters With BiFold Track', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/5861425/0y68qlMU4O5VKU7bJ8i_AA_small.png', + 'duration': 148.224, + }], + }, + 'playlist_count': 6, + }, { + # Non hubs.vidyard.com playlist + 'url': 'https://salesforce.vidyard.com/watch/d4vqPjs7Q5EzVEis5QT3jd', + 'info_dict': { + 'id': 'd4vqPjs7Q5EzVEis5QT3jd', + 'title': 'How To: Service Cloud: Import External Content in Lightning Knowledge', + 'entries': [{ + 'id': 'mcjDpSZir2iSttbvFkx6Rv', + 'display_id': '29479036', + 'ext': 'mp4', + 'title': 'Welcome to this Expert Coaching Series', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/ouyQi9WuwyiOupChUWNmjQ/7170d3485ba602e012df05_small.jpg', + 'duration': 38.205, + }, { + 'id': '84bPYwpg243G6xYEfJdYw9', + 'display_id': '21820704', + 'ext': 'mp4', + 'title': 'Chapter 1 - Title + Agenda', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/HFPN0ZgQq4Ow8BghGcQSow/bfaa30123c8f6601e7d7f2_small.jpg', + 'duration': 98.016, + }, { + 'id': 'nP17fMuvA66buVHUrzqjTi', + 'display_id': '21820707', + 'ext': 'mp4', + 'title': 'Chapter 2 - Import Options', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/rGRIF5nFjPI9OOA2qJ_Dbg/86a8d02bfec9a566845dd4_small.jpg', + 'duration': 199.136, + }, { + 'id': 'm54EcwXdpA5gDBH5rgCYoV', + 'display_id': '21820710', + 'ext': 'mp4', + 'title': 'Chapter 3 - Importing Article Translations', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/IVX4XR8zpSsiNIHx45kz-A/1ccbf8a29a33856d06b3ed_small.jpg', + 'duration': 184.352, + }, { + 'id': 'j4nzS42oq4hE9oRV73w3eQ', + 'display_id': '21820716', + 'ext': 'mp4', + 'title': 'Chapter 4 - Best Practices', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/BtrRrQpRDLbA4AT95YQyog/1f1e6b8e7fdc3fa95ec8d3_small.jpg', + 'duration': 296.960, + }, { + 'id': 'y28PYfW5pftvers9PXzisC', + 'display_id': '21820727', + 'ext': 'mp4', + 'title': 'Chapter 5 - Migration Steps', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/K2CdQOXDfLcrVTF60r0bdw/a09239ada28b6ffce12b1f_small.jpg', + 'duration': 620.640, + }, { + 'id': 'YWU1eQxYvhj29SjYoPw5jH', + 'display_id': '21820733', + 'ext': 'mp4', + 'title': 'Chapter 6 - Demo', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/rsmhP-cO8dAa8ilvFGCX0g/7911ef415167cd14032068_small.jpg', + 'duration': 631.456, + }, { + 'id': 'nmEvVqpwdJUgb74zKsLGxn', + 'display_id': '29479037', + 'ext': 'mp4', + 'title': 'Schedule Your Follow-Up', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/Rtwc7X4PEkF4Ae5kHi-Jvw/174ebed3f34227b1ffa1d0_small.jpg', + 'duration': 33.608, + }], + }, + 'playlist_count': 8, + }, { + # URL of iframe embed src + 'url': 'https://play.vidyard.com/iDqTwWGrd36vaLuaCY3nTs.html', + 'info_dict': { + 'id': 'iDqTwWGrd36vaLuaCY3nTs', + 'display_id': '9281009', + 'ext': 'mp4', + 'title': 'Lightbox Embed', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/spacer.gif', + 'duration': 39.035, + }, + }, { + # Player JSON URL + 'url': 'https://play.vidyard.com/player/7GAApnNNbcZZ46k6JqJQSh.json?disable_analytics=0', + 'info_dict': { + 'id': '7GAApnNNbcZZ46k6JqJQSh', + 'display_id': '820026', + 'ext': 'mp4', + 'title': 'The Art of Storytelling: How to Deliver Your Brand Story with Content & Social', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/MhbE-5sEFQu4x3fI6FkNlA/41eb5717c557cd19456910_small.jpg', + 'duration': 2153.013, + 'tags': ['Summit2017'], + }, + }, { + 'url': 'http://share.vidyard.com/share/diYeo6YR2yiGgL8odvS8Ri', + 'only_matching': True, + }, { + 'url': 'https://play.vidyard.com/FFlz3ZpxhIfKQ1fd9DAryA', + 'only_matching': True, + }, { + 'url': 'https://play.vidyard.com/qhMAu5A76GZVrFzOPgSf9A/type/standalone', + 'only_matching': True, + }] + _WEBPAGE_TESTS = [{ + # URL containing inline/lightbox embedded video + 'url': 'https://resources.altium.com/p/2-the-extreme-importance-of-pc-board-stack-up', + 'info_dict': { + 'id': 'GDx1oXrFWj4XHbipfoXaMn', + 'display_id': '3225198', + 'ext': 'mp4', + 'title': 'The Extreme Importance of PC Board Stack Up', + 'thumbnail': 'https://cdn.vidyard.com/thumbnails/73_Q3_hBexWX7Og1sae6cg/9998fa4faec921439e2c04_small.jpg', + 'duration': 3422.742, + }, + }, { + #