One HTML-only cron + one everything-but-HTML cron #131

hugovk · 2024-09-12T20:22:25Z

Current situation

Right now the docs server is taking over 40 hours to build a full set of 3.12-3.14 docs, plus 12 translations each:

List of versions/languages

3.14/zh-tw
3.14/zh-cn
3.14/uk
3.14/tr
3.14/pt-br
3.14/pl
3.14/ko
3.14/ja
3.14/it
3.14/id
3.14/fr
3.14/es
3.14/en
3.13/zh-tw
3.13/zh-cn
3.13/uk
3.13/tr
3.13/pt-br
3.13/pl
3.13/ko
3.13/ja
3.13/it
3.13/id
3.13/fr
3.13/es
3.13/en
3.12/zh-tw
3.12/zh-cn
3.12/uk
3.12/tr
3.12/pt-br
3.12/pl
3.12/ko
3.12/ja
3.12/it
3.12/id
3.12/fr
3.12/es
3.12/en

Nearly all these include HTML, plain text, PDF, Texinfo and EPUB (Ukrainian is HTML only). HTML-only is fast to build, about 3-4 minutes. The full set of artifacts is much slower to build, between 40 minutes and two hours, depending on the language, and is mostly due to building latex for PDFs.

What happens is:

A cron goes off at 7 minutes past the hour and starts a new full build loop.

If there's a build running (= lockfile found), the new one exits and allows the running one to continue.
If there's no build running, it creates its own lockfile and starts a new build.

For each language/version, we only do a build if the docs have changed since last time, or if the translation has changed since last time. This is good, there's no point rebuilding something that hasn't changed.

However, because the full loop takes over 40 hours, inevitably there have been docs or translation changes since the last time, and we get a full rebuild each time.

This results in long delays between docs being updated, not to mention the high server resources usage.

HTML vs. PDF

We have download stats for the HTML docs, but we don't have download numbers for the other artifacts to compare.

However, I'm certain the HTML is by far the most used, and there's the most benefit to getting fresh HTML up quickly.

An affordance of websites is being able to look up just the pages you need, on-demand. Compared with PDF, where you can download it once and use it as an offline reference. Maybe you'll re-download again later, but there's less benefit in updating often, as the one you usually consult is an old, offline copy.

Proposal

I suggest we have two cron jobs:

The current hourly job only builds HTML.
A new job builds everything else except HTML.

1. HTML only

When there are new changes, they will be built and uploaded much sooner. It will run much quicker.

It's more likely that on the next pass, some languages can be skipped because there's nothing to update this time round.

2. Everything but HTML

This will be much slower than the HTML-only job, and will take about the same as the current loop does now.

Maybe it'll be a bit quicker due to not needing to build HTML, but maybe a bit slower because we'll sometimes be using CPU to build HTML at the same time. However, the majority of the time is spent running a latex command on a single CPU, so it might not make much difference.

We also don't need to update the non-HTML as often, so its cron could be every few days?

The text was updated successfully, but these errors were encountered:

hugovk · 2024-09-12T20:24:12Z

TODO

If we do this, build_docs.py already has --quick to build HTML-only:

https://github.com/python/docsbuild-scripts/blob/56d72d43e5759cc0ed600827b56e81d8310bcaca/build_docs.py#L525-L530

Add an option to build everything-but-HTML. (Doc: Add make dist-no-html cpython#124383)
Adjust the lockfile mechanism so we can have one HTML-only build running and also one everything-but-HTML running at the same time. (Add --select-output docsbuild-scripts#199)
Adjust the docs cron at psf-salt adjust the existing and add a new cron.present entry: (Restore the HTML docsbuild cron job psf-salt#497)

Anything else?

hugovk · 2024-09-23T21:51:51Z

It's been mentioned previously, one downside of this sort of approach is:

New Python release is made
Fast HTML docs built
Reader goes to https://docs.python.org/3/download.html and clicks to download a PDF (or other artifact)
PDFs haven't built yet and user gets 404 for https://docs.python.org/x.y/archives/python-x.y.z-docs-pdf-a4.zip

This has been partially addressed by adding a 404 that says "The archive you're trying to download has not been built yet. Please try again later or consult the archives for earlier versions."

Perhaps we could also mitigate this by renaming the files so they only have x.y in the filename and not x.y.z?

So https://docs.python.org/x.y/archives/python-x.y-docs-pdf-a4.zip instead.

For example, on the day we release 3.13.0, instead of getting a 404 page for the 3.13.0 PDF, they get the 3.13.0rc2 PDF.

I think this is fine. There's usually not much that's changed, and the benefit is everyone gets the HTML and PDF files sooner (all the time, not just for releases).

AA-Turner · 2024-09-23T22:18:32Z

I seem to remember that the release process includes building the docs + PDF etc, or perhaps it used to. If this is still the case, can part of the release process be to upload a copy of that archive to the docs server as well as the release server?

@ned-deily did this I think for rc2? Is this something we can formalise?

A

ned-deily · 2024-09-23T23:50:45Z

I seem to remember that the release process includes building the docs + PDF etc, or perhaps it used to. If this is still the case, can part of the release process be to upload a copy of that archive to the docs server as well as the release server?

This subject is confusingly complex for various reasons (some due to attempts to provide compatibility with older end-of-life releases) so there's a good chance that some or all of what follows is wrong but, to the best of my knowledge, it works today like this.

The release process currently does produce a quick build of the untranslated docs (html and PDF et als) for all rc and final releases (but not alpha and beta releases) built from the source release git tag (i.e. v3.13.0rc2). As mostly discussed in PEP 101, the release process saves the unpacked html files of the docset for every release on the doc server (under /srv/docs.python.org/release/ and saves the downloadable files to the download server (under /srv/www.python.org/ftp/python/doc/). The archived html is linked to from the Python Documentation by Version web page and, importantly, the links in their Download these documents pages, for example Download Python 3.12.6 Documentation, link to these archived downloadable files. Links to these archived versions of the docs may also be used by the RM on the individual release pages, often for the link to the changelog. This is all completely independent of the cron doc builds under discussion here.

At the same time, the cron jobs do their things and (try to) produce their daily/3-hour builds on the docs server that are served under various python.org URLs. The actual file names of the download files do include the version number but are served under the branch-specific directories (/srv/docs.python.org/x.y/archives/) resulting in the URLs mentioned earlier (like https://docs.python.org/x.y/archives/python-x.y.z-docs-pdf-a4.zip). But this does result in a curious set of files in these x.y/archives directories because the file name changes following the first cron run after the release manager merges the release engineering branch back into the main repo. The effect is that, at some somewhat random time following a release, the download files built by the cron jobs will switch from ...-x.y.z-... to ...-x.y.z+1.. file names (or even to ...-x.y+1.0a0...) in the same directory and the older files are never deleted even though they imprecisely imply their source, i.e. 3.12/archives/python-3.12.5-docs-pdf-a4.zip neither reflects the state of the documentation at the time of the 3.12.5 release nor necessarily the final state of the 3.12 docs prior to the 3.12.6 release; it's just what existed in the 3.12 git branch the last time the cron started building (and successfully finished) before the 3.12.6 release was merged into the 3.12 branch.

A partial look at a docs server branch directory shows this:

$ ls 3.12/archive
[...]
-rw-rw-r-- 1 docsbuild docs  8361535 Jul 31 11:06 python-3.12.4-docs-html.tar.bz2
-rw-rw-r-- 1 docsbuild docs 13120842 Jul 31 11:07 python-3.12.4-docs-html.zip
-rw-rw-r-- 1 docsbuild docs 17910690 Jul 31 11:17 python-3.12.4-docs-pdf-a4.tar.bz2
-rw-rw-r-- 1 docsbuild docs 17851043 Jul 31 11:17 python-3.12.4-docs-pdf-a4.zip
-rw-rw-r-- 1 docsbuild docs 18067471 Jul 31 11:26 python-3.12.4-docs-pdf-letter.tar.bz2
-rw-rw-r-- 1 docsbuild docs 18010439 Jul 31 11:26 python-3.12.4-docs-pdf-letter.zip
-rw-rw-r-- 1 docsbuild docs  7541776 Jul 31 11:30 python-3.12.4-docs-texinfo.tar.bz2
-rw-rw-r-- 1 docsbuild docs  9727536 Jul 31 11:31 python-3.12.4-docs-texinfo.zip
-rw-rw-r-- 1 docsbuild docs  2878769 Jul 31 11:08 python-3.12.4-docs-text.tar.bz2
-rw-rw-r-- 1 docsbuild docs  4025748 Jul 31 11:08 python-3.12.4-docs-text.zip
-rw-rw-r-- 1 docsbuild docs  6653572 Jul 31 11:27 python-3.12.4-docs.epub
-rw-rw-r-- 1 docsbuild docs  8360373 Sep  7 13:51 python-3.12.5-docs-html.tar.bz2
-rw-rw-r-- 1 docsbuild docs 13126773 Sep  7 13:51 python-3.12.5-docs-html.zip
-rw-rw-r-- 1 docsbuild docs 18385513 Sep  7 14:05 python-3.12.5-docs-pdf-a4.tar.bz2
-rw-rw-r-- 1 docsbuild docs 18342026 Sep  7 14:05 python-3.12.5-docs-pdf-a4.zip
-rw-rw-r-- 1 docsbuild docs 18559979 Sep  7 14:17 python-3.12.5-docs-pdf-letter.tar.bz2
-rw-rw-r-- 1 docsbuild docs 18515957 Sep  7 14:17 python-3.12.5-docs-pdf-letter.zip
-rw-rw-r-- 1 docsbuild docs  7550334 Sep  7 14:22 python-3.12.5-docs-texinfo.tar.bz2
-rw-rw-r-- 1 docsbuild docs  9733259 Sep  7 14:22 python-3.12.5-docs-texinfo.zip
-rw-rw-r-- 1 docsbuild docs  2885597 Sep  7 13:53 python-3.12.5-docs-text.tar.bz2
-rw-rw-r-- 1 docsbuild docs  4028513 Sep  7 13:53 python-3.12.5-docs-text.zip
-rw-rw-r-- 1 docsbuild docs  6658516 Sep  7 14:19 python-3.12.5-docs.epub
-rw-rw-r-- 1 docsbuild docs  8359767 Sep 21 08:07 python-3.12.6-docs-html.tar.bz2
-rw-rw-r-- 1 docsbuild docs 13132489 Sep 21 08:07 python-3.12.6-docs-html.zip
-rw-rw-r-- 1 docsbuild docs 18447737 Sep 21 08:20 python-3.12.6-docs-pdf-a4.tar.bz2
-rw-rw-r-- 1 docsbuild docs 18401603 Sep 21 08:20 python-3.12.6-docs-pdf-a4.zip
-rw-rw-r-- 1 docsbuild docs 18613117 Sep 16 03:08 python-3.12.6-docs-pdf-letter.tar.bz2
-rw-rw-r-- 1 docsbuild docs 18566788 Sep 16 03:08 python-3.12.6-docs-pdf-letter.zip
-rw-rw-r-- 1 docsbuild docs  7592417 Sep 21 08:25 python-3.12.6-docs-texinfo.tar.bz2
-rw-rw-r-- 1 docsbuild docs  9777614 Sep 21 08:25 python-3.12.6-docs-texinfo.zip
-rw-rw-r-- 1 docsbuild docs  2885648 Sep 21 08:09 python-3.12.6-docs-text.tar.bz2
-rw-rw-r-- 1 docsbuild docs  4030381 Sep 21 08:09 python-3.12.6-docs-text.zip
-rw-rw-r-- 1 docsbuild docs  6661795 Sep 21 08:22 python-3.12.6-docs.epub
-rw-rw-r-- 1 docsbuild docs  7852747 May 23  2023 python-3.13.0a0-docs-html.tar.bz2
-rw-rw-r-- 1 docsbuild docs 12366266 May 23  2023 python-3.13.0a0-docs-html.zip
-rw-rw-r-- 1 docsbuild docs 16804233 May 23  2023 python-3.13.0a0-docs-pdf-a4.tar.bz2
-rw-rw-r-- 1 docsbuild docs 16756843 May 23  2023 python-3.13.0a0-docs-pdf-a4.zip
-rw-rw-r-- 1 docsbuild docs 16928897 May 23  2023 python-3.13.0a0-docs-pdf-letter.tar.bz2
-rw-rw-r-- 1 docsbuild docs 16882274 May 23  2023 python-3.13.0a0-docs-pdf-letter.zip
-rw-rw-r-- 1 docsbuild docs  7098538 May 23  2023 python-3.13.0a0-docs-texinfo.tar.bz2
-rw-rw-r-- 1 docsbuild docs  9239149 May 23  2023 python-3.13.0a0-docs-texinfo.zip
-rw-rw-r-- 1 docsbuild docs  2811736 May 23  2023 python-3.13.0a0-docs-text.tar.bz2
-rw-rw-r-- 1 docsbuild docs  3927852 May 23  2023 python-3.13.0a0-docs-text.zip
-rw-rw-r-- 1 docsbuild docs  6715932 May 23  2023 python-3.13.0a0-docs.epub

AFAICT, there is no reason to be keeping the older files in this directory and this would be solved/mitigated, along with the sync problem alluded to above, if the download file names were changed as suggested above by @hugovk. The trick, of course, is to eliminate or minimize any compatibility issues with user expectations and with the separate release archives produced by the RMs.

To the other point:

@ned-deily did this I think for rc2? Is this something we can formalise?

I believe that what I did for 3.13.0rc2 was to add a temporary link on the Python Documentation by Version web page to the rc2 documentation produced by the release process; normally that page does not include links to pre-release versions. Maybe there was something else on the release page, too.

AA-Turner · 2024-09-25T05:43:52Z

See python/cpython#124489 to alter the build process.

We'll need to consider what to do with the existing /archives/ files. It likely doesn't hurt to keep them, even though they are in effect random snapshots of the documentation at some point in time. If we do remove them, we can create redirects in the docs server configuration so that links don't break.

A

AA-Turner · 2024-09-25T07:24:37Z

In terms of splitting the build into HTML and non-HTML, we have a triumvirate of patches:

Add --select-output docsbuild-scripts#199 to allow choosing between HTML or non-HTML outputs in build_docs.py
Restore the HTML docsbuild cron job psf-salt#497 to restore the cron job for HTML-only builds
Doc: Run HTML and non-HTML daily builds separately cpython#124493 to split dist-html from everything else in the autobuild targets that docsbuild-scripts uses.

The first can be merged and the builds will carry on as they are now without any change. The second will start building HTML files twice and could potentially overwite itself when copying to /srv/... if a full job and HTML-only job finish at the same time. The CPython change should be merged third, and fixes this by splitting the non-HTML and HTML-only builds into disjoint sets.

A

ned-deily · 2024-09-25T19:11:59Z

See python/cpython#124489 to alter the build process.

Thanks, that looks good to me. The only potential issue I can think of is that there might be users/scripts out there that might be periodically expecting to directly download the current built artifacts using the old URL formats. I would guess that is not common and I don't think we've ever provided any guarantees about the URLs other than linking through the download.html pages. ...

We'll need to consider what to do with the existing /archives/ files. It likely doesn't hurt to keep them, even though they are in effect random snapshots of the documentation at some point in time. If we do remove them, we can create redirects in the docs server configuration so that links don't break.

... And the links that would break, besides the above-mentioned possible scripts, would be from previously downloaded copies of the html-format documentation (the only artifact where the download links appear?) and from embedded copies of the HTML documentation that are provided, for example, in the python.org macOS installer. Others? So, if we did add redirects from the x.y.z to the new x.y URLs for releases where we apply this change (and are still building docs), that should solve the problem for all of those cases, I think. If it would make the redirecting easier, we could perhaps do a one-time create x.y symlinks for EOL releases.

AA-Turner · 2024-09-25T19:33:56Z

If we want to add redirects, I've opened python/psf-salt#498 as a draft.

A

ned-deily · 2024-09-25T19:50:32Z

I think this brings up another related issue inspired by the above discussion and a comment in the PR:

By just using x.y we are honest that the download reflects the state of the website rather than any one release -- for that, use https://docs.python.org/release/.

That is, like the file names and corresponding URLs, the Python version displayed in the daily document HTML and downloads is also imprecise and potentially misleading. The daily builds currently show the x.y.z version (i.e. 3.12.6) but the actual documentation reflects the current state of the branch (3.12, say) which is likely newer than the last release (3.12.6). If one builds Python from the current head of the branch, the version in sys.version and thus displayed in the REPL is the last release with a '+' appended, i.e. 3.12.6+. This comes from the #define PY_VERSION string in Include/patchlevel.h. The + is automatically added as one of the final steps of the release process when the release manager triggers the release engineering branch to be merged into the cpython release branch. We could consider using that more precise version for the daily builds. I thought we did do that at some time in the past but I may be misremembering (I know RMs used to do it manually for the README file in the repo itself).

AA-Turner · 2024-09-25T20:05:25Z

An alternative is to go the other way with less precision, and advertise the daily downloads page as for "Python 3.12" (helpfully this is also easier to achieve). The static /release/ versions would keep the full release (e.g. https://docs.python.org/release/3.12.0rc3/download.html).

A

merwok · 2024-10-01T20:49:19Z

advertise the daily downloads page as for "Python 3.12"

To be interpreted as «Python 3.12 as that branch looks today» ?

… seems good!

AA-Turner · 2024-10-02T21:30:25Z

On Monday (30 September 2024) we split the server into a HTML-only and non-HTML cron task, the formed scheduled hourly and the latter daily. After some initial teething problems, we've had a successful full rebuild of the non-HTML job, hence this note.

First, two tables of statistics with build times and durations:

Build times (HTML only)

Start	Version	Language	Build	Trigger
2024-09-30 17:17 UTC	3.14	en	2m 41s	Doc/ has changed
2024-09-30 17:20 UTC	3.14	es	5m 10s	Doc/ has changed
2024-09-30 17:25 UTC	3.14	fr	4m 18s	Doc/ has changed
2024-09-30 17:29 UTC	3.14	id	4m 49s	Doc/ has changed
2024-09-30 17:34 UTC	3.14	it	3m 7s	Doc/ has changed
2024-09-30 17:37 UTC	3.14	ja	6m 56s	Doc/ has changed
2024-09-30 17:44 UTC	3.14	ko	5m 12s	Doc/ has changed
2024-09-30 17:49 UTC	3.14	pl	3m 0s	Doc/ has changed
2024-09-30 17:52 UTC	3.14	pt-br	4m 40s	Doc/ has changed
2024-09-30 17:57 UTC	3.14	tr	3m 54s	Doc/ has changed
2024-09-30 18:01 UTC	3.14	uk	4m 55s	new translations
2024-09-30 18:06 UTC	3.14	zh-cn	1h 0m 32s	new translations
2024-09-30 19:06 UTC	3.14	zh-tw	1h 15m 51s	new translations
2024-09-30 21:00 UTC	3.13	uk	5m 3s	new translations
2024-09-30 21:05 UTC	3.13	zh-cn	56m 49s	new translations
2024-09-30 22:02 UTC	3.13	zh-tw	1h 10m 57s	new translations
2024-09-30 23:13 UTC	3.12	en	2m 45s	Doc/ has changed
2024-09-30 23:16 UTC	3.12	es	4m 27s	Doc/ has changed
2024-09-30 23:20 UTC	3.12	fr	3m 37s	Doc/ has changed
2024-09-30 23:24 UTC	3.12	id	4m 1s	Doc/ has changed
2024-09-30 23:28 UTC	3.12	it	2m 52s	Doc/ has changed
2024-09-30 23:31 UTC	3.12	ja	5m 32s	Doc/ has changed
2024-09-30 23:37 UTC	3.12	ko	3m 39s	Doc/ has changed
2024-09-30 23:40 UTC	3.12	pl	2m 41s	Doc/ has changed
2024-09-30 23:43 UTC	3.12	pt-br	3m 39s	Doc/ has changed
2024-09-30 23:47 UTC	3.12	tr	3m 27s	Doc/ has changed
2024-09-30 23:50 UTC	3.12	uk	3m 54s	new translations
2024-09-30 23:54 UTC	3.12	zh-cn	55m 9s	Doc/ has changed
2024-10-01 00:49 UTC	3.12	zh-tw	1h 9m 21s	Doc/ has changed
2024-10-01 02:04 UTC	--FULL-	-BUILD--	8h 48m 34s	-----------
2024-10-01 02:16 UTC	3.14	ja	6m 37s	new translations
2024-10-01 02:22 UTC	3.14	pt-br	4m 17s	new translations
2024-10-01 02:27 UTC	3.13	en	2m 32s	Doc/ has changed
2024-10-01 02:29 UTC	3.13	es	5m 8s	Doc/ has changed
2024-10-01 02:34 UTC	3.13	fr	4m 14s	Doc/ has changed
2024-10-01 02:39 UTC	3.13	id	4m 33s	Doc/ has changed
2024-10-01 02:43 UTC	3.13	it	3m 8s	Doc/ has changed
2024-10-01 02:46 UTC	3.13	ja	6m 13s	new translations
2024-10-01 02:53 UTC	3.13	ko	4m 23s	Doc/ has changed
2024-10-01 02:57 UTC	3.13	pl	2m 56s	new translations
2024-10-01 03:00 UTC	3.13	pt-br	5m 13s	new translations
2024-10-01 03:05 UTC	3.13	tr	3m 39s	Doc/ has changed
2024-10-01 03:09 UTC	3.13	uk	4m 36s	Doc/ has changed
2024-10-01 03:13 UTC	3.13	zh-cn	54m 51s	Doc/ has changed
2024-10-01 04:08 UTC	3.12	ja	5m 50s	new translations
2024-10-01 04:19 UTC	--FULL-	-BUILD--	2h 3m 10s	-----------
2024-10-01 05:16 UTC	3.14	zh-cn	1h 4m 6s	new translations
2024-10-01 06:20 UTC	3.13	uk	7m 20s	new translations
2024-10-01 06:27 UTC	3.13	zh-cn	1h 11m 29s	new translations
2024-10-01 07:44 UTC	--FULL-	-BUILD--	2h 27m 60s	-----------
2024-10-01 08:16 UTC	3.14	en	2m 33s	Doc/ has changed
2024-10-01 08:18 UTC	3.14	es	4m 58s	Doc/ has changed
2024-10-01 08:23 UTC	3.14	fr	4m 22s	Doc/ has changed
2024-10-01 08:27 UTC	3.14	id	4m 44s	Doc/ has changed
2024-10-01 08:32 UTC	3.14	it	3m 4s	Doc/ has changed
2024-10-01 08:35 UTC	3.14	ja	6m 48s	Doc/ has changed
2024-10-01 08:42 UTC	3.14	ko	4m 19s	Doc/ has changed
2024-10-01 08:46 UTC	3.14	pl	2m 39s	Doc/ has changed
2024-10-01 08:49 UTC	3.14	pt-br	4m 13s	Doc/ has changed
2024-10-01 08:53 UTC	3.14	tr	3m 33s	Doc/ has changed
2024-10-01 08:57 UTC	3.14	uk	4m 49s	new translations
2024-10-01 09:02 UTC	3.14	zh-cn	56m 53s	new translations
2024-10-01 09:59 UTC	3.14	zh-tw	1h 15m 57s	Doc/ has changed
2024-10-01 11:15 UTC	3.13	zh-cn	58m 51s	new translations
2024-10-01 12:14 UTC	3.12	en	2m 33s	Doc/ has changed
2024-10-01 12:16 UTC	3.12	es	4m 33s	Doc/ has changed
2024-10-01 12:21 UTC	3.12	fr	4m 4s	Doc/ has changed
2024-10-01 12:25 UTC	3.12	id	4m 41s	Doc/ has changed
2024-10-01 12:29 UTC	3.12	it	3m 22s	Doc/ has changed
2024-10-01 12:33 UTC	3.12	ja	5m 48s	Doc/ has changed
2024-10-01 12:39 UTC	3.12	ko	3m 57s	Doc/ has changed
2024-10-01 12:43 UTC	3.12	pl	3m 2s	Doc/ has changed
2024-10-01 12:46 UTC	3.12	pt-br	3m 53s	Doc/ has changed
2024-10-01 12:50 UTC	3.12	tr	3m 29s	Doc/ has changed
2024-10-01 12:53 UTC	3.12	uk	3m 54s	Doc/ has changed
2024-10-01 12:57 UTC	3.12	zh-cn	54m 25s	Doc/ has changed
2024-10-01 13:51 UTC	3.12	zh-tw	1h 15m 16s	Doc/ has changed
2024-10-01 15:12 UTC	--FULL-	-BUILD--	6h 56m 4s	-----------
2024-10-01 15:16 UTC	3.14	en	2m 38s	Doc/ has changed
2024-10-01 15:18 UTC	3.14	es	5m 35s	Doc/ has changed
2024-10-01 15:24 UTC	3.14	fr	4m 20s	Doc/ has changed
2024-10-01 15:28 UTC	3.14	id	5m 3s	Doc/ has changed
2024-10-01 15:33 UTC	3.14	it	3m 7s	Doc/ has changed
2024-10-01 15:36 UTC	3.14	ja	6m 38s	Doc/ has changed
2024-10-01 15:43 UTC	3.14	ko	4m 29s	Doc/ has changed
2024-10-01 15:47 UTC	3.14	pl	2m 51s	Doc/ has changed
2024-10-01 15:50 UTC	3.14	pt-br	4m 33s	Doc/ has changed
2024-10-01 15:55 UTC	3.14	tr	3m 44s	Doc/ has changed
2024-10-01 15:59 UTC	3.14	uk	4m 53s	Doc/ has changed
2024-10-01 16:04 UTC	3.14	zh-cn	58m 17s	new translations
2024-10-01 17:02 UTC	3.14	zh-tw	1h 19m 26s	Doc/ has changed
2024-10-01 18:21 UTC	3.13	zh-cn	59m 8s	new translations
2024-10-01 19:21 UTC	3.12	zh-tw	1h 14m 37s	Doc/ has changed
2024-10-01 20:40 UTC	--FULL-	-BUILD--	5h 24m 53s	-----------
2024-10-01 21:16 UTC	3.14	en	2m 41s	Doc/ has changed
2024-10-01 21:18 UTC	3.14	es	5m 26s	Doc/ has changed
2024-10-01 21:24 UTC	3.14	fr	4m 48s	new translations
2024-10-01 21:28 UTC	3.14	id	4m 54s	Doc/ has changed
2024-10-01 21:33 UTC	3.14	it	3m 12s	Doc/ has changed
2024-10-01 21:37 UTC	3.14	ja	6m 59s	Doc/ has changed
2024-10-01 21:44 UTC	3.14	ko	4m 42s	Doc/ has changed
2024-10-01 21:48 UTC	3.14	pl	2m 42s	Doc/ has changed
2024-10-01 21:51 UTC	3.14	pt-br	4m 10s	Doc/ has changed
2024-10-01 21:55 UTC	3.14	tr	3m 41s	Doc/ has changed
2024-10-01 21:59 UTC	3.14	uk	4m 46s	Doc/ has changed
2024-10-01 22:04 UTC	3.14	zh-cn	1h 2m 8s	new translations
2024-10-01 23:06 UTC	3.14	zh-tw	1h 12m 55s	Doc/ has changed
2024-10-02 00:19 UTC	3.13	fr	4m 15s	new translations
2024-10-02 00:23 UTC	3.13	pt-br	4m 13s	new translations
2024-10-02 00:27 UTC	3.12	fr	3m 57s	new translations
2024-10-02 00:36 UTC	--FULL-	-BUILD--	3h 20m 27s	-----------
2024-10-02 01:16 UTC	3.14	pt-br	4m 31s	new translations
2024-10-02 01:25 UTC	--FULL-	-BUILD--	9m 18s	-----------
2024-10-02 02:20 UTC	--FULL-	-BUILD--	4m 37s	-----------
2024-10-02 03:20 UTC	--FULL-	-BUILD--	4m 52s	-----------
2024-10-02 04:20 UTC	--FULL-	-BUILD--	4m 29s	-----------
2024-10-02 05:16 UTC	3.12	en	2m 19s	Doc/ has changed
2024-10-02 05:18 UTC	3.12	es	4m 5s	Doc/ has changed
2024-10-02 05:22 UTC	3.12	fr	3m 49s	Doc/ has changed
2024-10-02 05:26 UTC	3.12	id	4m 8s	Doc/ has changed
2024-10-02 05:30 UTC	3.12	it	2m 59s	Doc/ has changed
2024-10-02 05:33 UTC	3.12	ja	5m 37s	Doc/ has changed
2024-10-02 05:39 UTC	3.12	ko	4m 3s	Doc/ has changed
2024-10-02 05:43 UTC	3.12	pl	2m 47s	Doc/ has changed
2024-10-02 05:46 UTC	3.12	pt-br	3m 27s	Doc/ has changed
2024-10-02 05:49 UTC	3.12	tr	3m 24s	Doc/ has changed
2024-10-02 05:53 UTC	3.12	uk	3m 40s	Doc/ has changed
2024-10-02 05:56 UTC	3.12	zh-cn	1h 0m 33s	Doc/ has changed
2024-10-02 06:57 UTC	3.12	zh-tw	1h 22m 37s	Doc/ has changed
2024-10-02 08:25 UTC	--FULL-	-BUILD--	3h 9m 1s	-----------
2024-10-02 09:16 UTC	3.14	uk	5m 35s	new translations
2024-10-02 09:21 UTC	3.14	zh-cn	1h 7m 2s	new translations
2024-10-02 10:29 UTC	3.13	pl	3m 35s	new translations
2024-10-02 10:32 UTC	3.13	uk	6m 3s	new translations
2024-10-02 10:38 UTC	3.13	zh-cn	1h 2m 27s	new translations
2024-10-02 11:41 UTC	3.12	uk	4m 30s	new translations
2024-10-02 11:51 UTC	--FULL-	-BUILD--	2h 35m 7s	-----------
2024-10-02 12:16 UTC	3.14	pl	3m 16s	new translations
2024-10-02 12:25 UTC	--FULL-	-BUILD--	9m 14s	-----------
2024-10-02 13:21 UTC	--FULL-	-BUILD--	5m 35s	-----------
2024-10-02 14:22 UTC	--FULL-	-BUILD--	6m 3s	-----------
2024-10-02 15:22 UTC	--FULL-	-BUILD--	6m 2s	-----------
2024-10-02 16:16 UTC	3.14	zh-cn	1h 7m 58s	new translations
2024-10-02 17:24 UTC	3.14	zh-tw	1h 29m 58s	Doc/ has changed
2024-10-02 18:54 UTC	3.13	zh-cn	1h 3m 49s	new translations
2024-10-02 20:04 UTC	--FULL-	-BUILD--	3h 48m 42s	-----------
2024-10-02 20:16 UTC	3.14	en	2m 46s	Doc/ has changed
2024-10-02 20:18 UTC	3.14	es	5m 42s	Doc/ has changed
2024-10-02 20:24 UTC	3.14	fr	4m 39s	Doc/ has changed
2024-10-02 20:29 UTC	3.14	id	4m 49s	Doc/ has changed
2024-10-02 20:34 UTC	3.14	it	3m 13s	Doc/ has changed
2024-10-02 20:37 UTC	3.14	ja	6m 55s	Doc/ has changed
2024-10-02 20:44 UTC	3.14	ko	4m 44s	Doc/ has changed
2024-10-02 20:49 UTC	3.14	pl	In progress...	...

Build times (no HTML)

Start	Version	Language	Build	Trigger
2024-10-02 06:07 UTC	3.14	en	21m 14s	Doc/ has changed
2024-10-02 06:28 UTC	3.14	es	1h 12m 9s	Doc/ has changed
2024-10-02 07:40 UTC	3.14	fr	22m 8s	new translations
2024-10-02 08:02 UTC	3.14	id	28m 8s	Doc/ has changed
2024-10-02 08:30 UTC	3.14	it	18m 51s	Doc/ has changed
2024-10-02 08:49 UTC	3.14	ja	50m 0s	Doc/ has changed
2024-10-02 09:39 UTC	3.14	ko	33m 57s	Doc/ has changed
2024-10-02 10:13 UTC	3.14	pl	20m 53s	new translations
2024-10-02 10:34 UTC	3.14	pt-br	30m 38s	new translations
2024-10-02 11:05 UTC	3.14	tr	1h 0m 15s	Doc/ has changed
2024-10-02 12:05 UTC	3.14	zh-cn	27m 1s	new translations
2024-10-02 12:32 UTC	3.14	zh-tw	20m 46s	Doc/ has changed
2024-10-02 12:53 UTC	3.13	fr	19m 27s	new translations
2024-10-02 13:12 UTC	3.13	pl	18m 44s	new translations
2024-10-02 13:31 UTC	3.13	pt-br	24m 20s	new translations
2024-10-02 13:55 UTC	3.13	zh-cn	27m 22s	new translations
2024-10-02 14:23 UTC	3.12	en	18m 54s	Doc/ has changed
2024-10-02 14:42 UTC	3.12	es	1h 3m 45s	Doc/ has changed
2024-10-02 15:46 UTC	3.12	fr	17m 18s	new translations
2024-10-02 16:03 UTC	3.12	id	27m 4s	Doc/ has changed
2024-10-02 16:30 UTC	3.12	it	20m 18s	Doc/ has changed
2024-10-02 17:05 UTC	3.12	ko	30m 31s	Doc/ has changed
2024-10-02 17:36 UTC	3.12	pl	20m 52s	Doc/ has changed
2024-10-02 17:57 UTC	3.12	pt-br	25m 28s	Doc/ has changed
2024-10-02 18:22 UTC	3.12	tr	1h 4m 14s	Doc/ has changed
2024-10-02 19:27 UTC	3.12	zh-cn	28m 16s	Doc/ has changed
2024-10-02 19:55 UTC	3.12	zh-tw	18m 54s	Doc/ has changed
2024-10-02 20:19 UTC	--FULL-	-BUILD--	14h 12m 19s	-----------

Taking the most recent 16 rebuilds for HTML-only:

Start	Duration
2024-10-01 02:04 UTC	8h 48m 34s
2024-10-01 04:19 UTC	2h 3m 10s
2024-10-01 07:44 UTC	2h 27m 60s
2024-10-01 15:12 UTC	6h 56m 4s
2024-10-01 20:40 UTC	5h 24m 53s
2024-10-02 00:36 UTC	3h 20m 27s
2024-10-02 01:25 UTC	9m 18s
2024-10-02 02:20 UTC	4m 37s
2024-10-02 03:20 UTC	4m 52s
2024-10-02 04:20 UTC	4m 29s
2024-10-02 08:25 UTC	3h 9m 1s
2024-10-02 11:51 UTC	2h 35m 7s
2024-10-02 12:25 UTC	9m 14s
2024-10-02 13:21 UTC	5m 35s
2024-10-02 14:22 UTC	6m 3s
2024-10-02 15:22 UTC	6m 2s
2024-10-02 20:04 UTC	3h 48m 42s

These have an average (mean) time of 8344s, or 2h 19m 4s. Excluding the builds for which no work was done (5), we have 11 builds with an average of 3h 32m 3s.

These numbers are significantly skewed by the Chinese languages, which take more than an hour each. Excluding the Chinese, we have 109 HTML-only builds at an average (mean) time of 4m 15s.

We haven't yet observed a full rebuild for all versions and languages, which is to be expected as a benefit of splitting the workers is that the HTML job will have no work to do more frequently. The expected time for a full rebuild of all 13 languages and 3 versions is (11 x 4m15s + 2 x 66m) x 3, or just under nine hours (8:56:15).

This is a significant improvement to the status quo ante (c. 30 hours per rebuild), and will further improve dramatically when we resolve the issue with Chinese builds.

The Non-HTML archive builds are currently scheduled daily, and a basic projection estimates that a full rebuild of 12 languages x 3 versions would take just under 19 hours, so we have headroom here (though faster is of course better!).

Thank you to everyone involved in making this work happen, I'll now close this issue.

A

hugovk · 2024-10-02T22:46:22Z

And thank you @AA-Turner for all your work here!

Another idea to consider: a third cron for only the English HTML for the default /3 version. It takes about 3 minutes. We know from Plausible stats (numbers below from the July 2023 trial) that this one is served an order of magnitude more than the others.

But I agree, let's first try and figure out why the Chinese builds are so slow.

By language or version:

/3      8,596,047
/zh-cn    565,930
/3.11     484,517
/ja       380,342
/es       204,088
/ko       130,757
/fr       130,595
/pt-br    102,773
/zh-tw    100,002
/3.12      88,140
/3.13      42,304
/uk        18,591
/dev       17,835
/tr         7,404
/pl         5,085
/id         1,011
/it           147

By language and version:

/3           8,596,047
/zh-cn/3       502,665
/3.11          484,517
/ja/3          369,998
/es/3          187,400
/ko/3          125,512
/fr/3          124,258
/pt-br/3        94,112
/zh-tw/3        93,538
/3.12           88,140
/3.13           42,304
/zh-cn/3.11     40,785
/dev            17,835
/uk/3           17,270
/es/3.11        10,124
/zh-cn/3.12      9,552
/zh-cn/3.13      6,617
/zh-cn/dev       6,197
/tr/3            6,163
/ja/3.11         6,126
/es/dev          5,152
/pl/3            4,775
/pt-br/dev       3,964
/zh-tw/3.11      3,563
/pt-br/3.11      3,344
/fr/3.11         3,140
/ja/dev          2,816
/ko/dev          2,372
/ko/3.11         2,214
/fr/dev          1,991
/zh-tw/dev       1,247
/zh-tw/3.12      1,049
/fr/3.13           985
/uk/3.12           922
/ja/3.13           839
/tr/3.11           801
/es/3.13           745
/pt-br/3.13        692
/es/3.12           667
/pt-br/3.12        661
/zh-tw/3.13        604
/ja/3.12           563
/id/3              562
/ko/3.13           414
/tr/3.13           300
/ko/3.12           245
/fr/3.12           218
/id/3.11           208
/uk/3.11           199
/uk/dev            154
/tr/3.12           132
/pl/dev            123
/pl/3.11            98
/id/3.13            86
/id/dev             80
/it/3.11            77
/id/3.12            75
/uk/3.13            46
/pl/3.13            45
/pl/3.12            44
/it/3.12            44
/it/3.13            13
/it/3               13
/tr/dev              5

hugovk added the doc:cpython Related to cpython docs.python.org label Sep 12, 2024

AA-Turner mentioned this issue Sep 23, 2024

Doc: Add make dist-no-html python/cpython#124383

Merged

AA-Turner mentioned this issue Sep 25, 2024

Doc: Use major.minor for documentation distribution archive filenames python/cpython#124489

Merged

This was referenced Sep 25, 2024

Add --select-output python/docsbuild-scripts#199

Merged

Restore the HTML docsbuild cron job python/psf-salt#497

Merged

Doc: Run HTML and non-HTML daily builds separately python/cpython#124493

Merged

AA-Turner mentioned this issue Sep 25, 2024

Increase the build frequency of English HTML stable (and pre-release?) documentation python/docsbuild-scripts#163

Closed

AA-Turner mentioned this issue Sep 26, 2024

Doc: Use the short version for daily downloads python/cpython#124602

Merged

AA-Turner closed this as completed Oct 2, 2024

AA-Turner mentioned this issue Oct 2, 2024

Full build with PDF is taking more than 24h python/docsbuild-scripts#169

Closed

AA-Turner mentioned this issue Oct 10, 2024

Add an HTML-only (English) build variant python/docsbuild-scripts#219

Merged

hugovk mentioned this issue Dec 5, 2024

Bug: Missing docs archives for Python 3.14 alpha releases python/pythondotorg#2672

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One HTML-only cron + one everything-but-HTML cron #131

One HTML-only cron + one everything-but-HTML cron #131

hugovk commented Sep 12, 2024

hugovk commented Sep 12, 2024 •

edited by AA-Turner

Loading

hugovk commented Sep 23, 2024 •

edited

Loading

AA-Turner commented Sep 23, 2024

ned-deily commented Sep 23, 2024

AA-Turner commented Sep 25, 2024

AA-Turner commented Sep 25, 2024

ned-deily commented Sep 25, 2024

AA-Turner commented Sep 25, 2024

ned-deily commented Sep 25, 2024

AA-Turner commented Sep 25, 2024

merwok commented Oct 1, 2024

AA-Turner commented Oct 2, 2024

hugovk commented Oct 2, 2024

One HTML-only cron + one everything-but-HTML cron #131

One HTML-only cron + one everything-but-HTML cron #131

Comments

hugovk commented Sep 12, 2024

Current situation

HTML vs. PDF

Proposal

1. HTML only

2. Everything but HTML

hugovk commented Sep 12, 2024 • edited by AA-Turner Loading

TODO

hugovk commented Sep 23, 2024 • edited Loading

AA-Turner commented Sep 23, 2024

ned-deily commented Sep 23, 2024

AA-Turner commented Sep 25, 2024

AA-Turner commented Sep 25, 2024

ned-deily commented Sep 25, 2024

AA-Turner commented Sep 25, 2024

ned-deily commented Sep 25, 2024

AA-Turner commented Sep 25, 2024

merwok commented Oct 1, 2024

AA-Turner commented Oct 2, 2024

hugovk commented Oct 2, 2024

hugovk commented Sep 12, 2024 •

edited by AA-Turner

Loading

hugovk commented Sep 23, 2024 •

edited

Loading