Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One HTML-only cron + one everything-but-HTML cron #131

Closed
hugovk opened this issue Sep 12, 2024 · 13 comments
Closed

One HTML-only cron + one everything-but-HTML cron #131

hugovk opened this issue Sep 12, 2024 · 13 comments
Labels
doc:cpython Related to cpython docs.python.org

Comments

@hugovk
Copy link
Member

hugovk commented Sep 12, 2024

To help python/docsbuild-scripts#169.

Current situation

Right now the docs server is taking over 40 hours to build a full set of 3.12-3.14 docs, plus 12 translations each:

List of versions/languages
  1. 3.14/zh-tw
  2. 3.14/zh-cn
  3. 3.14/uk
  4. 3.14/tr
  5. 3.14/pt-br
  6. 3.14/pl
  7. 3.14/ko
  8. 3.14/ja
  9. 3.14/it
  10. 3.14/id
  11. 3.14/fr
  12. 3.14/es
  13. 3.14/en
  14. 3.13/zh-tw
  15. 3.13/zh-cn
  16. 3.13/uk
  17. 3.13/tr
  18. 3.13/pt-br
  19. 3.13/pl
  20. 3.13/ko
  21. 3.13/ja
  22. 3.13/it
  23. 3.13/id
  24. 3.13/fr
  25. 3.13/es
  26. 3.13/en
  27. 3.12/zh-tw
  28. 3.12/zh-cn
  29. 3.12/uk
  30. 3.12/tr
  31. 3.12/pt-br
  32. 3.12/pl
  33. 3.12/ko
  34. 3.12/ja
  35. 3.12/it
  36. 3.12/id
  37. 3.12/fr
  38. 3.12/es
  39. 3.12/en

Nearly all these include HTML, plain text, PDF, Texinfo and EPUB (Ukrainian is HTML only). HTML-only is fast to build, about 3-4 minutes. The full set of artifacts is much slower to build, between 40 minutes and two hours, depending on the language, and is mostly due to building latex for PDFs.

What happens is:

A cron goes off at 7 minutes past the hour and starts a new full build loop.

  • If there's a build running (= lockfile found), the new one exits and allows the running one to continue.
  • If there's no build running, it creates its own lockfile and starts a new build.

For each language/version, we only do a build if the docs have changed since last time, or if the translation has changed since last time. This is good, there's no point rebuilding something that hasn't changed.

However, because the full loop takes over 40 hours, inevitably there have been docs or translation changes since the last time, and we get a full rebuild each time.

This results in long delays between docs being updated, not to mention the high server resources usage.

HTML vs. PDF

We have download stats for the HTML docs, but we don't have download numbers for the other artifacts to compare.

However, I'm certain the HTML is by far the most used, and there's the most benefit to getting fresh HTML up quickly.

An affordance of websites is being able to look up just the pages you need, on-demand. Compared with PDF, where you can download it once and use it as an offline reference. Maybe you'll re-download again later, but there's less benefit in updating often, as the one you usually consult is an old, offline copy.

Proposal

I suggest we have two cron jobs:

  1. The current hourly job only builds HTML.

  2. A new job builds everything else except HTML.

1. HTML only

When there are new changes, they will be built and uploaded much sooner. It will run much quicker.

It's more likely that on the next pass, some languages can be skipped because there's nothing to update this time round.

2. Everything but HTML

This will be much slower than the HTML-only job, and will take about the same as the current loop does now.

Maybe it'll be a bit quicker due to not needing to build HTML, but maybe a bit slower because we'll sometimes be using CPU to build HTML at the same time. However, the majority of the time is spent running a latex command on a single CPU, so it might not make much difference.

We also don't need to update the non-HTML as often, so its cron could be every few days?

@hugovk hugovk added the doc:cpython Related to cpython docs.python.org label Sep 12, 2024
@hugovk
Copy link
Member Author

hugovk commented Sep 12, 2024

TODO

If we do this, build_docs.py already has --quick to build HTML-only:

https://github.com/python/docsbuild-scripts/blob/56d72d43e5759cc0ed600827b56e81d8310bcaca/build_docs.py#L525-L530

Anything else?

@hugovk
Copy link
Member Author

hugovk commented Sep 23, 2024

It's been mentioned previously, one downside of this sort of approach is:

  1. New Python release is made
  2. Fast HTML docs built
  3. Reader goes to https://docs.python.org/3/download.html and clicks to download a PDF (or other artifact)
  4. PDFs haven't built yet and user gets 404 for https://docs.python.org/x.y/archives/python-x.y.z-docs-pdf-a4.zip

This has been partially addressed by adding a 404 that says "The archive you're trying to download has not been built yet. Please try again later or consult the archives for earlier versions."


Perhaps we could also mitigate this by renaming the files so they only have x.y in the filename and not x.y.z?

So https://docs.python.org/x.y/archives/python-x.y-docs-pdf-a4.zip instead.

For example, on the day we release 3.13.0, instead of getting a 404 page for the 3.13.0 PDF, they get the 3.13.0rc2 PDF.

I think this is fine. There's usually not much that's changed, and the benefit is everyone gets the HTML and PDF files sooner (all the time, not just for releases).

@AA-Turner
Copy link
Member

I seem to remember that the release process includes building the docs + PDF etc, or perhaps it used to. If this is still the case, can part of the release process be to upload a copy of that archive to the docs server as well as the release server?

@ned-deily did this I think for rc2? Is this something we can formalise?

A

@ned-deily
Copy link
Member

I seem to remember that the release process includes building the docs + PDF etc, or perhaps it used to. If this is still the case, can part of the release process be to upload a copy of that archive to the docs server as well as the release server?

This subject is confusingly complex for various reasons (some due to attempts to provide compatibility with older end-of-life releases) so there's a good chance that some or all of what follows is wrong but, to the best of my knowledge, it works today like this.

The release process currently does produce a quick build of the untranslated docs (html and PDF et als) for all rc and final releases (but not alpha and beta releases) built from the source release git tag (i.e. v3.13.0rc2). As mostly discussed in PEP 101, the release process saves the unpacked html files of the docset for every release on the doc server (under /srv/docs.python.org/release/ and saves the downloadable files to the download server (under /srv/www.python.org/ftp/python/doc/). The archived html is linked to from the Python Documentation by Version web page and, importantly, the links in their Download these documents pages, for example Download Python 3.12.6 Documentation, link to these archived downloadable files. Links to these archived versions of the docs may also be used by the RM on the individual release pages, often for the link to the changelog. This is all completely independent of the cron doc builds under discussion here.

At the same time, the cron jobs do their things and (try to) produce their daily/3-hour builds on the docs server that are served under various python.org URLs. The actual file names of the download files do include the version number but are served under the branch-specific directories (/srv/docs.python.org/x.y/archives/) resulting in the URLs mentioned earlier (like https://docs.python.org/x.y/archives/python-x.y.z-docs-pdf-a4.zip). But this does result in a curious set of files in these x.y/archives directories because the file name changes following the first cron run after the release manager merges the release engineering branch back into the main repo. The effect is that, at some somewhat random time following a release, the download files built by the cron jobs will switch from ...-x.y.z-... to ...-x.y.z+1.. file names (or even to ...-x.y+1.0a0...) in the same directory and the older files are never deleted even though they imprecisely imply their source, i.e. 3.12/archives/python-3.12.5-docs-pdf-a4.zip neither reflects the state of the documentation at the time of the 3.12.5 release nor necessarily the final state of the 3.12 docs prior to the 3.12.6 release; it's just what existed in the 3.12 git branch the last time the cron started building (and successfully finished) before the 3.12.6 release was merged into the 3.12 branch.

A partial look at a docs server branch directory shows this:

$ ls 3.12/archive
[...]
-rw-rw-r-- 1 docsbuild docs  8361535 Jul 31 11:06 python-3.12.4-docs-html.tar.bz2
-rw-rw-r-- 1 docsbuild docs 13120842 Jul 31 11:07 python-3.12.4-docs-html.zip
-rw-rw-r-- 1 docsbuild docs 17910690 Jul 31 11:17 python-3.12.4-docs-pdf-a4.tar.bz2
-rw-rw-r-- 1 docsbuild docs 17851043 Jul 31 11:17 python-3.12.4-docs-pdf-a4.zip
-rw-rw-r-- 1 docsbuild docs 18067471 Jul 31 11:26 python-3.12.4-docs-pdf-letter.tar.bz2
-rw-rw-r-- 1 docsbuild docs 18010439 Jul 31 11:26 python-3.12.4-docs-pdf-letter.zip
-rw-rw-r-- 1 docsbuild docs  7541776 Jul 31 11:30 python-3.12.4-docs-texinfo.tar.bz2
-rw-rw-r-- 1 docsbuild docs  9727536 Jul 31 11:31 python-3.12.4-docs-texinfo.zip
-rw-rw-r-- 1 docsbuild docs  2878769 Jul 31 11:08 python-3.12.4-docs-text.tar.bz2
-rw-rw-r-- 1 docsbuild docs  4025748 Jul 31 11:08 python-3.12.4-docs-text.zip
-rw-rw-r-- 1 docsbuild docs  6653572 Jul 31 11:27 python-3.12.4-docs.epub
-rw-rw-r-- 1 docsbuild docs  8360373 Sep  7 13:51 python-3.12.5-docs-html.tar.bz2
-rw-rw-r-- 1 docsbuild docs 13126773 Sep  7 13:51 python-3.12.5-docs-html.zip
-rw-rw-r-- 1 docsbuild docs 18385513 Sep  7 14:05 python-3.12.5-docs-pdf-a4.tar.bz2
-rw-rw-r-- 1 docsbuild docs 18342026 Sep  7 14:05 python-3.12.5-docs-pdf-a4.zip
-rw-rw-r-- 1 docsbuild docs 18559979 Sep  7 14:17 python-3.12.5-docs-pdf-letter.tar.bz2
-rw-rw-r-- 1 docsbuild docs 18515957 Sep  7 14:17 python-3.12.5-docs-pdf-letter.zip
-rw-rw-r-- 1 docsbuild docs  7550334 Sep  7 14:22 python-3.12.5-docs-texinfo.tar.bz2
-rw-rw-r-- 1 docsbuild docs  9733259 Sep  7 14:22 python-3.12.5-docs-texinfo.zip
-rw-rw-r-- 1 docsbuild docs  2885597 Sep  7 13:53 python-3.12.5-docs-text.tar.bz2
-rw-rw-r-- 1 docsbuild docs  4028513 Sep  7 13:53 python-3.12.5-docs-text.zip
-rw-rw-r-- 1 docsbuild docs  6658516 Sep  7 14:19 python-3.12.5-docs.epub
-rw-rw-r-- 1 docsbuild docs  8359767 Sep 21 08:07 python-3.12.6-docs-html.tar.bz2
-rw-rw-r-- 1 docsbuild docs 13132489 Sep 21 08:07 python-3.12.6-docs-html.zip
-rw-rw-r-- 1 docsbuild docs 18447737 Sep 21 08:20 python-3.12.6-docs-pdf-a4.tar.bz2
-rw-rw-r-- 1 docsbuild docs 18401603 Sep 21 08:20 python-3.12.6-docs-pdf-a4.zip
-rw-rw-r-- 1 docsbuild docs 18613117 Sep 16 03:08 python-3.12.6-docs-pdf-letter.tar.bz2
-rw-rw-r-- 1 docsbuild docs 18566788 Sep 16 03:08 python-3.12.6-docs-pdf-letter.zip
-rw-rw-r-- 1 docsbuild docs  7592417 Sep 21 08:25 python-3.12.6-docs-texinfo.tar.bz2
-rw-rw-r-- 1 docsbuild docs  9777614 Sep 21 08:25 python-3.12.6-docs-texinfo.zip
-rw-rw-r-- 1 docsbuild docs  2885648 Sep 21 08:09 python-3.12.6-docs-text.tar.bz2
-rw-rw-r-- 1 docsbuild docs  4030381 Sep 21 08:09 python-3.12.6-docs-text.zip
-rw-rw-r-- 1 docsbuild docs  6661795 Sep 21 08:22 python-3.12.6-docs.epub
-rw-rw-r-- 1 docsbuild docs  7852747 May 23  2023 python-3.13.0a0-docs-html.tar.bz2
-rw-rw-r-- 1 docsbuild docs 12366266 May 23  2023 python-3.13.0a0-docs-html.zip
-rw-rw-r-- 1 docsbuild docs 16804233 May 23  2023 python-3.13.0a0-docs-pdf-a4.tar.bz2
-rw-rw-r-- 1 docsbuild docs 16756843 May 23  2023 python-3.13.0a0-docs-pdf-a4.zip
-rw-rw-r-- 1 docsbuild docs 16928897 May 23  2023 python-3.13.0a0-docs-pdf-letter.tar.bz2
-rw-rw-r-- 1 docsbuild docs 16882274 May 23  2023 python-3.13.0a0-docs-pdf-letter.zip
-rw-rw-r-- 1 docsbuild docs  7098538 May 23  2023 python-3.13.0a0-docs-texinfo.tar.bz2
-rw-rw-r-- 1 docsbuild docs  9239149 May 23  2023 python-3.13.0a0-docs-texinfo.zip
-rw-rw-r-- 1 docsbuild docs  2811736 May 23  2023 python-3.13.0a0-docs-text.tar.bz2
-rw-rw-r-- 1 docsbuild docs  3927852 May 23  2023 python-3.13.0a0-docs-text.zip
-rw-rw-r-- 1 docsbuild docs  6715932 May 23  2023 python-3.13.0a0-docs.epub

AFAICT, there is no reason to be keeping the older files in this directory and this would be solved/mitigated, along with the sync problem alluded to above, if the download file names were changed as suggested above by @hugovk. The trick, of course, is to eliminate or minimize any compatibility issues with user expectations and with the separate release archives produced by the RMs.

To the other point:

@ned-deily did this I think for rc2? Is this something we can formalise?

I believe that what I did for 3.13.0rc2 was to add a temporary link on the Python Documentation by Version web page to the rc2 documentation produced by the release process; normally that page does not include links to pre-release versions. Maybe there was something else on the release page, too.

@AA-Turner
Copy link
Member

See python/cpython#124489 to alter the build process.

We'll need to consider what to do with the existing /archives/ files. It likely doesn't hurt to keep them, even though they are in effect random snapshots of the documentation at some point in time. If we do remove them, we can create redirects in the docs server configuration so that links don't break.

A

@AA-Turner
Copy link
Member

In terms of splitting the build into HTML and non-HTML, we have a triumvirate of patches:

  1. Add --select-output docsbuild-scripts#199 to allow choosing between HTML or non-HTML outputs in build_docs.py
  2. Restore the HTML docsbuild cron job psf-salt#497 to restore the cron job for HTML-only builds
  3. Doc: Run HTML and non-HTML daily builds separately cpython#124493 to split dist-html from everything else in the autobuild targets that docsbuild-scripts uses.

The first can be merged and the builds will carry on as they are now without any change. The second will start building HTML files twice and could potentially overwite itself when copying to /srv/... if a full job and HTML-only job finish at the same time. The CPython change should be merged third, and fixes this by splitting the non-HTML and HTML-only builds into disjoint sets.

A

@ned-deily
Copy link
Member

See python/cpython#124489 to alter the build process.

Thanks, that looks good to me. The only potential issue I can think of is that there might be users/scripts out there that might be periodically expecting to directly download the current built artifacts using the old URL formats. I would guess that is not common and I don't think we've ever provided any guarantees about the URLs other than linking through the download.html pages. ...

We'll need to consider what to do with the existing /archives/ files. It likely doesn't hurt to keep them, even though they are in effect random snapshots of the documentation at some point in time. If we do remove them, we can create redirects in the docs server configuration so that links don't break.

... And the links that would break, besides the above-mentioned possible scripts, would be from previously downloaded copies of the html-format documentation (the only artifact where the download links appear?) and from embedded copies of the HTML documentation that are provided, for example, in the python.org macOS installer. Others? So, if we did add redirects from the x.y.z to the new x.y URLs for releases where we apply this change (and are still building docs), that should solve the problem for all of those cases, I think. If it would make the redirecting easier, we could perhaps do a one-time create x.y symlinks for EOL releases.

@AA-Turner
Copy link
Member

If we want to add redirects, I've opened python/psf-salt#498 as a draft.

A

@ned-deily
Copy link
Member

I think this brings up another related issue inspired by the above discussion and a comment in the PR:

By just using x.y we are honest that the download reflects the state of the website rather than any one release -- for that, use https://docs.python.org/release/.

That is, like the file names and corresponding URLs, the Python version displayed in the daily document HTML and downloads is also imprecise and potentially misleading. The daily builds currently show the x.y.z version (i.e. 3.12.6) but the actual documentation reflects the current state of the branch (3.12, say) which is likely newer than the last release (3.12.6). If one builds Python from the current head of the branch, the version in sys.version and thus displayed in the REPL is the last release with a '+' appended, i.e. 3.12.6+. This comes from the #define PY_VERSION string in Include/patchlevel.h. The + is automatically added as one of the final steps of the release process when the release manager triggers the release engineering branch to be merged into the cpython release branch. We could consider using that more precise version for the daily builds. I thought we did do that at some time in the past but I may be misremembering (I know RMs used to do it manually for the README file in the repo itself).

@AA-Turner
Copy link
Member

An alternative is to go the other way with less precision, and advertise the daily downloads page as for "Python 3.12" (helpfully this is also easier to achieve). The static /release/ versions would keep the full release (e.g. https://docs.python.org/release/3.12.0rc3/download.html).

A

@merwok
Copy link
Member

merwok commented Oct 1, 2024

advertise the daily downloads page as for "Python 3.12"

To be interpreted as «Python 3.12 as that branch looks today» ?

… seems good!

@AA-Turner
Copy link
Member

On Monday (30 September 2024) we split the server into a HTML-only and non-HTML cron task, the formed scheduled hourly and the latter daily. After some initial teething problems, we've had a successful full rebuild of the non-HTML job, hence this note.

First, two tables of statistics with build times and durations:

Build times (HTML only)
Start Version Language Build Trigger
2024-09-30 17:17 UTC 3.14 en 2m 41s Doc/ has changed
2024-09-30 17:20 UTC 3.14 es 5m 10s Doc/ has changed
2024-09-30 17:25 UTC 3.14 fr 4m 18s Doc/ has changed
2024-09-30 17:29 UTC 3.14 id 4m 49s Doc/ has changed
2024-09-30 17:34 UTC 3.14 it 3m 7s Doc/ has changed
2024-09-30 17:37 UTC 3.14 ja 6m 56s Doc/ has changed
2024-09-30 17:44 UTC 3.14 ko 5m 12s Doc/ has changed
2024-09-30 17:49 UTC 3.14 pl 3m 0s Doc/ has changed
2024-09-30 17:52 UTC 3.14 pt-br 4m 40s Doc/ has changed
2024-09-30 17:57 UTC 3.14 tr 3m 54s Doc/ has changed
2024-09-30 18:01 UTC 3.14 uk 4m 55s new translations
2024-09-30 18:06 UTC 3.14 zh-cn 1h 0m 32s new translations
2024-09-30 19:06 UTC 3.14 zh-tw 1h 15m 51s new translations
2024-09-30 21:00 UTC 3.13 uk 5m 3s new translations
2024-09-30 21:05 UTC 3.13 zh-cn 56m 49s new translations
2024-09-30 22:02 UTC 3.13 zh-tw 1h 10m 57s new translations
2024-09-30 23:13 UTC 3.12 en 2m 45s Doc/ has changed
2024-09-30 23:16 UTC 3.12 es 4m 27s Doc/ has changed
2024-09-30 23:20 UTC 3.12 fr 3m 37s Doc/ has changed
2024-09-30 23:24 UTC 3.12 id 4m 1s Doc/ has changed
2024-09-30 23:28 UTC 3.12 it 2m 52s Doc/ has changed
2024-09-30 23:31 UTC 3.12 ja 5m 32s Doc/ has changed
2024-09-30 23:37 UTC 3.12 ko 3m 39s Doc/ has changed
2024-09-30 23:40 UTC 3.12 pl 2m 41s Doc/ has changed
2024-09-30 23:43 UTC 3.12 pt-br 3m 39s Doc/ has changed
2024-09-30 23:47 UTC 3.12 tr 3m 27s Doc/ has changed
2024-09-30 23:50 UTC 3.12 uk 3m 54s new translations
2024-09-30 23:54 UTC 3.12 zh-cn 55m 9s Doc/ has changed
2024-10-01 00:49 UTC 3.12 zh-tw 1h 9m 21s Doc/ has changed
2024-10-01 02:04 UTC --FULL- -BUILD-- 8h 48m 34s -----------
2024-10-01 02:16 UTC 3.14 ja 6m 37s new translations
2024-10-01 02:22 UTC 3.14 pt-br 4m 17s new translations
2024-10-01 02:27 UTC 3.13 en 2m 32s Doc/ has changed
2024-10-01 02:29 UTC 3.13 es 5m 8s Doc/ has changed
2024-10-01 02:34 UTC 3.13 fr 4m 14s Doc/ has changed
2024-10-01 02:39 UTC 3.13 id 4m 33s Doc/ has changed
2024-10-01 02:43 UTC 3.13 it 3m 8s Doc/ has changed
2024-10-01 02:46 UTC 3.13 ja 6m 13s new translations
2024-10-01 02:53 UTC 3.13 ko 4m 23s Doc/ has changed
2024-10-01 02:57 UTC 3.13 pl 2m 56s new translations
2024-10-01 03:00 UTC 3.13 pt-br 5m 13s new translations
2024-10-01 03:05 UTC 3.13 tr 3m 39s Doc/ has changed
2024-10-01 03:09 UTC 3.13 uk 4m 36s Doc/ has changed
2024-10-01 03:13 UTC 3.13 zh-cn 54m 51s Doc/ has changed
2024-10-01 04:08 UTC 3.12 ja 5m 50s new translations
2024-10-01 04:19 UTC --FULL- -BUILD-- 2h 3m 10s -----------
2024-10-01 05:16 UTC 3.14 zh-cn 1h 4m 6s new translations
2024-10-01 06:20 UTC 3.13 uk 7m 20s new translations
2024-10-01 06:27 UTC 3.13 zh-cn 1h 11m 29s new translations
2024-10-01 07:44 UTC --FULL- -BUILD-- 2h 27m 60s -----------
2024-10-01 08:16 UTC 3.14 en 2m 33s Doc/ has changed
2024-10-01 08:18 UTC 3.14 es 4m 58s Doc/ has changed
2024-10-01 08:23 UTC 3.14 fr 4m 22s Doc/ has changed
2024-10-01 08:27 UTC 3.14 id 4m 44s Doc/ has changed
2024-10-01 08:32 UTC 3.14 it 3m 4s Doc/ has changed
2024-10-01 08:35 UTC 3.14 ja 6m 48s Doc/ has changed
2024-10-01 08:42 UTC 3.14 ko 4m 19s Doc/ has changed
2024-10-01 08:46 UTC 3.14 pl 2m 39s Doc/ has changed
2024-10-01 08:49 UTC 3.14 pt-br 4m 13s Doc/ has changed
2024-10-01 08:53 UTC 3.14 tr 3m 33s Doc/ has changed
2024-10-01 08:57 UTC 3.14 uk 4m 49s new translations
2024-10-01 09:02 UTC 3.14 zh-cn 56m 53s new translations
2024-10-01 09:59 UTC 3.14 zh-tw 1h 15m 57s Doc/ has changed
2024-10-01 11:15 UTC 3.13 zh-cn 58m 51s new translations
2024-10-01 12:14 UTC 3.12 en 2m 33s Doc/ has changed
2024-10-01 12:16 UTC 3.12 es 4m 33s Doc/ has changed
2024-10-01 12:21 UTC 3.12 fr 4m 4s Doc/ has changed
2024-10-01 12:25 UTC 3.12 id 4m 41s Doc/ has changed
2024-10-01 12:29 UTC 3.12 it 3m 22s Doc/ has changed
2024-10-01 12:33 UTC 3.12 ja 5m 48s Doc/ has changed
2024-10-01 12:39 UTC 3.12 ko 3m 57s Doc/ has changed
2024-10-01 12:43 UTC 3.12 pl 3m 2s Doc/ has changed
2024-10-01 12:46 UTC 3.12 pt-br 3m 53s Doc/ has changed
2024-10-01 12:50 UTC 3.12 tr 3m 29s Doc/ has changed
2024-10-01 12:53 UTC 3.12 uk 3m 54s Doc/ has changed
2024-10-01 12:57 UTC 3.12 zh-cn 54m 25s Doc/ has changed
2024-10-01 13:51 UTC 3.12 zh-tw 1h 15m 16s Doc/ has changed
2024-10-01 15:12 UTC --FULL- -BUILD-- 6h 56m 4s -----------
2024-10-01 15:16 UTC 3.14 en 2m 38s Doc/ has changed
2024-10-01 15:18 UTC 3.14 es 5m 35s Doc/ has changed
2024-10-01 15:24 UTC 3.14 fr 4m 20s Doc/ has changed
2024-10-01 15:28 UTC 3.14 id 5m 3s Doc/ has changed
2024-10-01 15:33 UTC 3.14 it 3m 7s Doc/ has changed
2024-10-01 15:36 UTC 3.14 ja 6m 38s Doc/ has changed
2024-10-01 15:43 UTC 3.14 ko 4m 29s Doc/ has changed
2024-10-01 15:47 UTC 3.14 pl 2m 51s Doc/ has changed
2024-10-01 15:50 UTC 3.14 pt-br 4m 33s Doc/ has changed
2024-10-01 15:55 UTC 3.14 tr 3m 44s Doc/ has changed
2024-10-01 15:59 UTC 3.14 uk 4m 53s Doc/ has changed
2024-10-01 16:04 UTC 3.14 zh-cn 58m 17s new translations
2024-10-01 17:02 UTC 3.14 zh-tw 1h 19m 26s Doc/ has changed
2024-10-01 18:21 UTC 3.13 zh-cn 59m 8s new translations
2024-10-01 19:21 UTC 3.12 zh-tw 1h 14m 37s Doc/ has changed
2024-10-01 20:40 UTC --FULL- -BUILD-- 5h 24m 53s -----------
2024-10-01 21:16 UTC 3.14 en 2m 41s Doc/ has changed
2024-10-01 21:18 UTC 3.14 es 5m 26s Doc/ has changed
2024-10-01 21:24 UTC 3.14 fr 4m 48s new translations
2024-10-01 21:28 UTC 3.14 id 4m 54s Doc/ has changed
2024-10-01 21:33 UTC 3.14 it 3m 12s Doc/ has changed
2024-10-01 21:37 UTC 3.14 ja 6m 59s Doc/ has changed
2024-10-01 21:44 UTC 3.14 ko 4m 42s Doc/ has changed
2024-10-01 21:48 UTC 3.14 pl 2m 42s Doc/ has changed
2024-10-01 21:51 UTC 3.14 pt-br 4m 10s Doc/ has changed
2024-10-01 21:55 UTC 3.14 tr 3m 41s Doc/ has changed
2024-10-01 21:59 UTC 3.14 uk 4m 46s Doc/ has changed
2024-10-01 22:04 UTC 3.14 zh-cn 1h 2m 8s new translations
2024-10-01 23:06 UTC 3.14 zh-tw 1h 12m 55s Doc/ has changed
2024-10-02 00:19 UTC 3.13 fr 4m 15s new translations
2024-10-02 00:23 UTC 3.13 pt-br 4m 13s new translations
2024-10-02 00:27 UTC 3.12 fr 3m 57s new translations
2024-10-02 00:36 UTC --FULL- -BUILD-- 3h 20m 27s -----------
2024-10-02 01:16 UTC 3.14 pt-br 4m 31s new translations
2024-10-02 01:25 UTC --FULL- -BUILD-- 9m 18s -----------
2024-10-02 02:20 UTC --FULL- -BUILD-- 4m 37s -----------
2024-10-02 03:20 UTC --FULL- -BUILD-- 4m 52s -----------
2024-10-02 04:20 UTC --FULL- -BUILD-- 4m 29s -----------
2024-10-02 05:16 UTC 3.12 en 2m 19s Doc/ has changed
2024-10-02 05:18 UTC 3.12 es 4m 5s Doc/ has changed
2024-10-02 05:22 UTC 3.12 fr 3m 49s Doc/ has changed
2024-10-02 05:26 UTC 3.12 id 4m 8s Doc/ has changed
2024-10-02 05:30 UTC 3.12 it 2m 59s Doc/ has changed
2024-10-02 05:33 UTC 3.12 ja 5m 37s Doc/ has changed
2024-10-02 05:39 UTC 3.12 ko 4m 3s Doc/ has changed
2024-10-02 05:43 UTC 3.12 pl 2m 47s Doc/ has changed
2024-10-02 05:46 UTC 3.12 pt-br 3m 27s Doc/ has changed
2024-10-02 05:49 UTC 3.12 tr 3m 24s Doc/ has changed
2024-10-02 05:53 UTC 3.12 uk 3m 40s Doc/ has changed
2024-10-02 05:56 UTC 3.12 zh-cn 1h 0m 33s Doc/ has changed
2024-10-02 06:57 UTC 3.12 zh-tw 1h 22m 37s Doc/ has changed
2024-10-02 08:25 UTC --FULL- -BUILD-- 3h 9m 1s -----------
2024-10-02 09:16 UTC 3.14 uk 5m 35s new translations
2024-10-02 09:21 UTC 3.14 zh-cn 1h 7m 2s new translations
2024-10-02 10:29 UTC 3.13 pl 3m 35s new translations
2024-10-02 10:32 UTC 3.13 uk 6m 3s new translations
2024-10-02 10:38 UTC 3.13 zh-cn 1h 2m 27s new translations
2024-10-02 11:41 UTC 3.12 uk 4m 30s new translations
2024-10-02 11:51 UTC --FULL- -BUILD-- 2h 35m 7s -----------
2024-10-02 12:16 UTC 3.14 pl 3m 16s new translations
2024-10-02 12:25 UTC --FULL- -BUILD-- 9m 14s -----------
2024-10-02 13:21 UTC --FULL- -BUILD-- 5m 35s -----------
2024-10-02 14:22 UTC --FULL- -BUILD-- 6m 3s -----------
2024-10-02 15:22 UTC --FULL- -BUILD-- 6m 2s -----------
2024-10-02 16:16 UTC 3.14 zh-cn 1h 7m 58s new translations
2024-10-02 17:24 UTC 3.14 zh-tw 1h 29m 58s Doc/ has changed
2024-10-02 18:54 UTC 3.13 zh-cn 1h 3m 49s new translations
2024-10-02 20:04 UTC --FULL- -BUILD-- 3h 48m 42s -----------
2024-10-02 20:16 UTC 3.14 en 2m 46s Doc/ has changed
2024-10-02 20:18 UTC 3.14 es 5m 42s Doc/ has changed
2024-10-02 20:24 UTC 3.14 fr 4m 39s Doc/ has changed
2024-10-02 20:29 UTC 3.14 id 4m 49s Doc/ has changed
2024-10-02 20:34 UTC 3.14 it 3m 13s Doc/ has changed
2024-10-02 20:37 UTC 3.14 ja 6m 55s Doc/ has changed
2024-10-02 20:44 UTC 3.14 ko 4m 44s Doc/ has changed
2024-10-02 20:49 UTC 3.14 pl In progress... ...
Build times (no HTML)
Start Version Language Build Trigger
2024-10-02 06:07 UTC 3.14 en 21m 14s Doc/ has changed
2024-10-02 06:28 UTC 3.14 es 1h 12m 9s Doc/ has changed
2024-10-02 07:40 UTC 3.14 fr 22m 8s new translations
2024-10-02 08:02 UTC 3.14 id 28m 8s Doc/ has changed
2024-10-02 08:30 UTC 3.14 it 18m 51s Doc/ has changed
2024-10-02 08:49 UTC 3.14 ja 50m 0s Doc/ has changed
2024-10-02 09:39 UTC 3.14 ko 33m 57s Doc/ has changed
2024-10-02 10:13 UTC 3.14 pl 20m 53s new translations
2024-10-02 10:34 UTC 3.14 pt-br 30m 38s new translations
2024-10-02 11:05 UTC 3.14 tr 1h 0m 15s Doc/ has changed
2024-10-02 12:05 UTC 3.14 zh-cn 27m 1s new translations
2024-10-02 12:32 UTC 3.14 zh-tw 20m 46s Doc/ has changed
2024-10-02 12:53 UTC 3.13 fr 19m 27s new translations
2024-10-02 13:12 UTC 3.13 pl 18m 44s new translations
2024-10-02 13:31 UTC 3.13 pt-br 24m 20s new translations
2024-10-02 13:55 UTC 3.13 zh-cn 27m 22s new translations
2024-10-02 14:23 UTC 3.12 en 18m 54s Doc/ has changed
2024-10-02 14:42 UTC 3.12 es 1h 3m 45s Doc/ has changed
2024-10-02 15:46 UTC 3.12 fr 17m 18s new translations
2024-10-02 16:03 UTC 3.12 id 27m 4s Doc/ has changed
2024-10-02 16:30 UTC 3.12 it 20m 18s Doc/ has changed
2024-10-02 17:05 UTC 3.12 ko 30m 31s Doc/ has changed
2024-10-02 17:36 UTC 3.12 pl 20m 52s Doc/ has changed
2024-10-02 17:57 UTC 3.12 pt-br 25m 28s Doc/ has changed
2024-10-02 18:22 UTC 3.12 tr 1h 4m 14s Doc/ has changed
2024-10-02 19:27 UTC 3.12 zh-cn 28m 16s Doc/ has changed
2024-10-02 19:55 UTC 3.12 zh-tw 18m 54s Doc/ has changed
2024-10-02 20:19 UTC --FULL- -BUILD-- 14h 12m 19s -----------

Taking the most recent 16 rebuilds for HTML-only:

Start Duration
2024-10-01 02:04 UTC 8h 48m 34s
2024-10-01 04:19 UTC 2h 3m 10s
2024-10-01 07:44 UTC 2h 27m 60s
2024-10-01 15:12 UTC 6h 56m 4s
2024-10-01 20:40 UTC 5h 24m 53s
2024-10-02 00:36 UTC 3h 20m 27s
2024-10-02 01:25 UTC 9m 18s
2024-10-02 02:20 UTC 4m 37s
2024-10-02 03:20 UTC 4m 52s
2024-10-02 04:20 UTC 4m 29s
2024-10-02 08:25 UTC 3h 9m 1s
2024-10-02 11:51 UTC 2h 35m 7s
2024-10-02 12:25 UTC 9m 14s
2024-10-02 13:21 UTC 5m 35s
2024-10-02 14:22 UTC 6m 3s
2024-10-02 15:22 UTC 6m 2s
2024-10-02 20:04 UTC 3h 48m 42s

These have an average (mean) time of 8344s, or 2h 19m 4s. Excluding the builds for which no work was done (5), we have 11 builds with an average of 3h 32m 3s.

These numbers are significantly skewed by the Chinese languages, which take more than an hour each. Excluding the Chinese, we have 109 HTML-only builds at an average (mean) time of 4m 15s.

We haven't yet observed a full rebuild for all versions and languages, which is to be expected as a benefit of splitting the workers is that the HTML job will have no work to do more frequently. The expected time for a full rebuild of all 13 languages and 3 versions is (11 x 4m15s + 2 x 66m) x 3, or just under nine hours (8:56:15).

This is a significant improvement to the status quo ante (c. 30 hours per rebuild), and will further improve dramatically when we resolve the issue with Chinese builds.

The Non-HTML archive builds are currently scheduled daily, and a basic projection estimates that a full rebuild of 12 languages x 3 versions would take just under 19 hours, so we have headroom here (though faster is of course better!).

Thank you to everyone involved in making this work happen, I'll now close this issue.

A

@hugovk
Copy link
Member Author

hugovk commented Oct 2, 2024

And thank you @AA-Turner for all your work here!


Another idea to consider: a third cron for only the English HTML for the default /3 version. It takes about 3 minutes. We know from Plausible stats (numbers below from the July 2023 trial) that this one is served an order of magnitude more than the others.

But I agree, let's first try and figure out why the Chinese builds are so slow.


By language or version:

/3      8,596,047
/zh-cn    565,930
/3.11     484,517
/ja       380,342
/es       204,088
/ko       130,757
/fr       130,595
/pt-br    102,773
/zh-tw    100,002
/3.12      88,140
/3.13      42,304
/uk        18,591
/dev       17,835
/tr         7,404
/pl         5,085
/id         1,011
/it           147

By language and version:

/3           8,596,047
/zh-cn/3       502,665
/3.11          484,517
/ja/3          369,998
/es/3          187,400
/ko/3          125,512
/fr/3          124,258
/pt-br/3        94,112
/zh-tw/3        93,538
/3.12           88,140
/3.13           42,304
/zh-cn/3.11     40,785
/dev            17,835
/uk/3           17,270
/es/3.11        10,124
/zh-cn/3.12      9,552
/zh-cn/3.13      6,617
/zh-cn/dev       6,197
/tr/3            6,163
/ja/3.11         6,126
/es/dev          5,152
/pl/3            4,775
/pt-br/dev       3,964
/zh-tw/3.11      3,563
/pt-br/3.11      3,344
/fr/3.11         3,140
/ja/dev          2,816
/ko/dev          2,372
/ko/3.11         2,214
/fr/dev          1,991
/zh-tw/dev       1,247
/zh-tw/3.12      1,049
/fr/3.13           985
/uk/3.12           922
/ja/3.13           839
/tr/3.11           801
/es/3.13           745
/pt-br/3.13        692
/es/3.12           667
/pt-br/3.12        661
/zh-tw/3.13        604
/ja/3.12           563
/id/3              562
/ko/3.13           414
/tr/3.13           300
/ko/3.12           245
/fr/3.12           218
/id/3.11           208
/uk/3.11           199
/uk/dev            154
/tr/3.12           132
/pl/dev            123
/pl/3.11            98
/id/3.13            86
/id/dev             80
/it/3.11            77
/id/3.12            75
/uk/3.13            46
/pl/3.13            45
/pl/3.12            44
/it/3.12            44
/it/3.13            13
/it/3               13
/tr/dev              5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc:cpython Related to cpython docs.python.org
Projects
None yet
Development

No branches or pull requests

4 participants