Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop one of the two PDF sizes #128

Closed
hugovk opened this issue Sep 3, 2024 · 17 comments
Closed

Drop one of the two PDF sizes #128

hugovk opened this issue Sep 3, 2024 · 17 comments
Labels
discussion doc:cpython Related to cpython docs.python.org

Comments

@hugovk
Copy link
Member

hugovk commented Sep 3, 2024

Helps python/docsbuild-scripts#169.

A full docs build cycle takes about 40 hours to build all versions × languages:
https://github.com/hugovk/last-updated/actions/runs/10689864099/job/29632960987#step:7:44

Here's one cycle:

Start Language/version Build
2024-08-31 18:07 zh-tw/3.14 1h 49m
2024-08-31 19:58 zh-cn/3.14 1h 38m
2024-08-31 21:38 uk/3.14 4m
2024-08-31 21:43 tr/3.14 1h 50m
2024-08-31 23:35 pt-br/3.14 44m
2024-09-01 00:20 pl/3.14 34m
2024-09-01 00:56 ko/3.14 51m
2024-09-01 01:49 ja/3.14 1h 24m
2024-09-01 03:15 it/3.14 32m
2024-09-01 03:49 id/3.14 44m
2024-09-01 05:03 es/3.14 1h 57m
2024-09-01 07:01 en/3.14 31m
2024-09-01 07:34 zh-tw/3.13 1h 42m
2024-09-01 09:16 zh-cn/3.13 1h 33m
2024-09-01 10:51 uk/3.13 3m
2024-09-01 10:56 tr/3.13 1h 47m
2024-09-01 12:45 pt-br/3.13 43m
2024-09-01 13:28 pl/3.13 33m
2024-09-01 14:03 ko/3.13 51m
2024-09-01 14:55 ja/3.13 1h 20m
2024-09-01 16:17 it/3.13 34m
2024-09-01 16:52 id/3.13 43m
2024-09-01 18:06 es/3.13 1h 58m
2024-09-01 20:05 en/3.13 32m
2024-09-01 20:39 zh-tw/3.12 1h 45m
2024-09-01 22:26 zh-cn/3.12 1h 32m
2024-09-02 00:01 uk/3.12 3m
2024-09-02 00:06 tr/3.12 1h 51m
2024-09-02 01:59 pt-br/3.12 40m
2024-09-02 02:41 pl/3.12 33m
2024-09-02 03:17 ko/3.12 43m
2024-09-02 04:02 ja/3.12 1h 5m
2024-09-02 05:10 it/3.12 32m
2024-09-02 05:44 id/3.12 45m
2024-09-02 06:31 fr/3.12 30m
2024-09-02 07:04 es/3.12 1h 53m
2024-09-02 08:59 en/3.12 31m

The Ukrainian ones are HTML-only and take 3-4 minutes. The others build a full set and take somewhere between 30 minutes - 2 hours.

Most of this time is spent building PDFs. Looking at the numbers at python/docsbuild-scripts#169, from building locally, about 83% is building both the A4 and US Letter PDFs.

The A4 and Letter PDFs each take about the same time to build.

I don't think we need to build two different PDF sizes.

I expect many who download a PDF will use it on a device screen, and a slight aspect difference won't make much difference. And for people who also print them, it shouldn't matter too much either: PDF viewers can auto-resize to fit the local paper. These files haven't been carefully laid out, they're autogenerated so we don't need pixel-perfect output.

I propose we drop one of the two PDF formats, which should save us a huge amount of build time, let us get other docs builds out faster.

I don't mind which one, but I'll suggest dropping Letter, used in the US and Canada, and keeping A4, an international standard used globally.

A rough calculation shows this would decrease a full build cycle from around 40 hours to 24 hours.

Thoughts?

@hugovk hugovk added discussion doc:cpython Related to cpython docs.python.org labels Sep 3, 2024
@AA-Turner
Copy link
Member

PDF viewers can auto-resize to fit the local paper. These files haven't been carefully laid out, they're autogenerated

I think this is important; we are not painstakingly typesetting a manual here. Reducing A4 paper to print on US Letter paper will have slightly more whitespace, but I think this trade-off is worth it. If a reader absolutely requires the US letter, it can still be built - we just won't provide it.

I also agree that keeping the A4 paper version makes more sense out of the two.

A

@zware
Copy link
Member

zware commented Sep 3, 2024

Completely on board with producing only one size. I'm not sure how serious I am in suggesting dropping both and instead producing one with a custom size of 210x279 mm.

Or A0.

@hugovk
Copy link
Member Author

hugovk commented Sep 4, 2024

I'm not sure how serious I am in suggesting dropping both and instead producing one with a custom size of 210x279 mm.

You're not the first to suggest letter height and A4 width! :) https://graphicdesign.stackexchange.com/a/38920/41266 But yeah, let's just pick one.

Another option is dropping both PDFs and producing something like a single-file HTML instead (although we don't have a single PDF for the whole docs right now, we generate one PDF per section).

NumPy dropped PDF entirely a few releases back (1.25, June 2023) -- https://numpy.org/doc/ -- and I hear they haven't had any problems.

I expect building a single HTML should be much, much quicker than PDFs. As shown above, the current HTML takes ~3 minutes, a full build takes 0.5-2 hours and is mostly PDF building.

But we do get the occasional bug report about PDFs so I'm not proposing this (at least not just yet ;)

@humitos
Copy link
Contributor

humitos commented Sep 4, 2024

I don't mind which one, but I'll suggest dropping Letter, used in the US and Canada, and keeping A4, an international standard used globally.

I agree dropping Letter and keeping A4.

And for people who also print them, it shouldn't matter too much either: PDF viewers can auto-resize to fit the local paper. These files haven't been carefully laid out, they're autogenerated so we don't need pixel-perfect output.

I'd say that people that want to print it will probably built their own PDF version adding a lot of effort on top of the preliminary LaTeX version outputted by Sphinx. We usually print just the tutorial for Python Argentina and we created a whole project for that to get a print-quality book: https://github.com/PyAr/tutorial-en-papel

That said, I wouldn't worry too much about "people wanting to print it" because they would probably do something different by themselves.

I expect building a single HTML should be much, much quicker than PDFs

In my experience, this could be an issue when the resulting file is too big --in particular with memory issues. If you have the chance to give it a quick try, that would be good data to know how it performs in the current servers.

@methane
Copy link
Member

methane commented Sep 4, 2024

  • We can generate PDF from single file HTML by printing the HTML using headless browser.
  • We provide ePub already.

@hugovk
Copy link
Member Author

hugovk commented Sep 4, 2024

In my experience, this could be an issue when the resulting file is too big --in particular with memory issues. If you have the chance to give it a quick try, that would be good data to know how it performs in the current servers.

Building singlehtml took 57s on my macOS, peaking at 2.32 GB (compared with 38s and 463 MB for regular html). It produced 3 HTML files: index.html and download.html are the same as now. contents.html is the biggie at 41.5 MB.

Opening the print dialog in Chrome shows "Loading preview..." for a few seconds before the tab crashes with "Aw, Snap!". So a headless browser could be the way for that.

@hugovk
Copy link
Member Author

hugovk commented Sep 9, 2024

We'd at least need the following to drop letter PDF.

  1. The docs server script runs the autobuild-dev dist target of CPython's Docs/Makefile, which runs the dist target, so this chunk would need removing from dist:

    https://github.com/python/cpython/blob/05a401a5c3e385286a346df6f0b463b35df871b2/Doc/Makefile#L225-L233

  2. The release scripts also run make dist need these two lines removing as it won't need to wait for letter files to be ready:

    https://github.com/python/release-tools/blob/ef1065f6c417e26527433305ec8458f363ef4c83/run_release.py#L532-L533

We'd need to coordinate backports of (1) to make sure the release scripts don't wait indefinitely for letter PDF files that will never show up, and/or include a version check in (2).

Anything else?

@zware
Copy link
Member

zware commented Sep 9, 2024

We'd need to coordinate backports of (1) to make sure the release scripts don't wait indefinitely for letter PDF files that will never show up, and/or include a version check in (2).

It seems like (2) could go first, and (1) backports can then be done at leisure? The release script is looking for files that are produced after pdf-letter and doesn't appear to care about the exact contents of docs/ after that, so that ordering should be safe whether or not pdf-letter artifacts are actually produced or not, especially since we're phasing out the potentially-missing artifacts anyway.

@hugovk
Copy link
Member Author

hugovk commented Sep 10, 2024

Here's (2): python/release-tools#168

@willingc
Copy link
Collaborator

Completely agree with dropping Letter PDF. I would be fine at some point following Numpy's lead and dropping A4 PDF too. It seems environmentally wasteful to run for hours.

@hugovk
Copy link
Member Author

hugovk commented Sep 10, 2024

Here's (1): python/cpython#123912

@hugovk
Copy link
Member Author

hugovk commented Sep 15, 2024

They're both merged for 3.14/main, haven't merged the backports yet.

We've had a 3.14 batch built and https://docs.python.org/3.14/download.html only lists a single PDF which is the A4 ones, as expected.

However, there are still old letter versions on the server from the last time they were built:

Confirming by logging into the docs build server:

hugovk@docs:/mnt/volume_nyc3_07/docs.python.org$ ll 3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
-rw-rw-r-- 1 docsbuild docs 18886596 Sep 12 16:56 3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
hugovk@docs:/mnt/volume_nyc3_07/docs.python.org$ ll 3.14/archives/python-3.14.0a0-docs-pdf-a4.zip
-rw-rw-r-- 1 docsbuild docs 18728661 Sep 15 10:48 3.14/archives/python-3.14.0a0-docs-pdf-a4.zip

We don't want to keep serving out-of-date letter files, so I'll manually delete them (or move them to my home dir for a while), then we can do the backports, and repeat for those. Sounds good?

@hugovk
Copy link
Member Author

hugovk commented Sep 16, 2024

Done, tested with --dry-run first:

hugovk@docs:/mnt/volume_nyc3_07/docs.python.org$ rsync --archive --verbose --prune-empty-dirs --include='*/' --include='python-3.14.0a0-docs-pdf-letter.*' --exclude='*' --remove-source-files /mnt/volume_nyc3_07/docs.python.org ~/letter-pdfs/
building file list ... done
docs.python.org/
docs.python.org/3.14/
docs.python.org/3.14/archives/
docs.python.org/3.14/archives/python-3.14.0a0-docs-pdf-letter.tar.bz2
docs.python.org/3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
docs.python.org/es/
docs.python.org/es/3.14/
docs.python.org/es/3.14/archives/
docs.python.org/es/3.14/archives/python-3.14.0a0-docs-pdf-letter.tar.bz2
docs.python.org/es/3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
docs.python.org/fr/
docs.python.org/fr/3.14/
docs.python.org/fr/3.14/archives/
docs.python.org/fr/3.14/archives/python-3.14.0a0-docs-pdf-letter.tar.bz2
docs.python.org/fr/3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
docs.python.org/id/
docs.python.org/id/3.14/
docs.python.org/id/3.14/archives/
docs.python.org/id/3.14/archives/python-3.14.0a0-docs-pdf-letter.tar.bz2
docs.python.org/id/3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
docs.python.org/it/
docs.python.org/it/3.14/
docs.python.org/it/3.14/archives/
docs.python.org/it/3.14/archives/python-3.14.0a0-docs-pdf-letter.tar.bz2
docs.python.org/it/3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
docs.python.org/ja/
docs.python.org/ja/3.14/
docs.python.org/ja/3.14/archives/
docs.python.org/ja/3.14/archives/python-3.14.0a0-docs-pdf-letter.tar.bz2
docs.python.org/ja/3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
docs.python.org/ko/
docs.python.org/ko/3.14/
docs.python.org/ko/3.14/archives/
docs.python.org/ko/3.14/archives/python-3.14.0a0-docs-pdf-letter.tar.bz2
docs.python.org/ko/3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
docs.python.org/pl/
docs.python.org/pl/3.14/
docs.python.org/pl/3.14/archives/
docs.python.org/pl/3.14/archives/python-3.14.0a0-docs-pdf-letter.tar.bz2
docs.python.org/pl/3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
docs.python.org/pt-br/
docs.python.org/pt-br/3.14/
docs.python.org/pt-br/3.14/archives/
docs.python.org/pt-br/3.14/archives/python-3.14.0a0-docs-pdf-letter.tar.bz2
docs.python.org/pt-br/3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
docs.python.org/tr/
docs.python.org/tr/3.14/
docs.python.org/tr/3.14/archives/
docs.python.org/tr/3.14/archives/python-3.14.0a0-docs-pdf-letter.tar.bz2
docs.python.org/tr/3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
docs.python.org/zh-cn/
docs.python.org/zh-cn/3.14/
docs.python.org/zh-cn/3.14/archives/
docs.python.org/zh-cn/3.14/archives/python-3.14.0a0-docs-pdf-letter.tar.bz2
docs.python.org/zh-cn/3.14/archives/python-3.14.0a0-docs-pdf-letter.zip
docs.python.org/zh-tw/
docs.python.org/zh-tw/3.14/
docs.python.org/zh-tw/3.14/archives/
docs.python.org/zh-tw/3.14/archives/python-3.14.0a0-docs-pdf-letter.tar.bz2
docs.python.org/zh-tw/3.14/archives/python-3.14.0a0-docs-pdf-letter.zip

sent 517,389,459 bytes  received 1,308 bytes  206,956,306.80 bytes/sec
total size is 517,015,797  speedup is 1.00

Next, backports for (1).

I've merged the 3.12 backport: python/cpython#123999.

Please could I have a core dev review for the 3.13 backport (because of the RC)? python/cpython#123998

@zware
Copy link
Member

zware commented Sep 16, 2024

Please could I have a core dev review for the 3.13 backport (because of the RC)? python/cpython#123998

At this point you need Thomas, not just another core dev :). 3.13 is locked until 3.13.0 final.

@hugovk
Copy link
Member Author

hugovk commented Sep 16, 2024

Yup, but I think it'll help Thomas if it's already had a second reviewer by the time he checks it (and the other 48 pending PRs). And if I've missed something, I can update it right now, before Thomas get around to it.

@hugovk
Copy link
Member Author

hugovk commented Sep 18, 2024

Comparing a set of 3.14 builds from before dropping letter PDF:

Start Language/version Build
2024-09-12 02:07 zh-tw/3.14 1h 59m
2024-09-12 04:07 zh-cn/3.14 1h 41m
2024-09-12 05:51 uk/3.14 4m
2024-09-12 05:56 tr/3.14 2h 3m
2024-09-12 08:01 pt-br/3.14 47m
2024-09-12 08:49 pl/3.14 37m
2024-09-12 09:28 ko/3.14 58m
2024-09-12 10:28 ja/3.14 1h 30m
2024-09-12 12:01 it/3.14 38m
2024-09-12 12:40 id/3.14 50m
2024-09-12 14:04 es/3.14 2h 18m
2024-09-12 16:24 en/3.14 36m
Total 14h 06m

With this morning, after dropping letter PDF (for a fair comparison, removing fr/3.14=23m, because it didn't build earlier):

Start Language/version Build
2024-09-18 01:07 zh-tw/3.14 1h 45m
2024-09-18 02:53 zh-cn/3.14 1h 28m
2024-09-18 04:21 uk/3.14 4m
2024-09-18 04:26 tr/3.14 57m
2024-09-18 05:24 pt-br/3.14 29m
2024-09-18 05:54 pl/3.14 24m
2024-09-18 06:18 ko/3.14 37m
2024-09-18 06:56 ja/3.14 54m
2024-09-18 07:51 it/3.14 20m
2024-09-18 08:12 id/3.14 31m
2024-09-18 09:07 es/3.14 1h 10m
2024-09-18 10:17 en/3.14 22m
Total 9h 16m

That's 1.6 times as fast, saving about 5 hours per version set, and about 15 hours for the whole loop!

@AA-Turner
Copy link
Member

python/cpython#123998 has been merged; the final PR.

A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion doc:cpython Related to cpython docs.python.org
Projects
None yet
Development

No branches or pull requests

6 participants