Skip to content

Latest commit

 

History

History
439 lines (300 loc) · 19.3 KB

README.md

File metadata and controls

439 lines (300 loc) · 19.3 KB

Developing the Web Almanac

The Web Almanac can be developed on macOS, Windows or Linux. It requires Node v20, Python v3.12 and pip to be installed. You can use Docker to avoid manually configuring the development environment. It can be quickly deployed as a development container in GitHub Codespaces to develop in cloud: Open in GitHub Codespaces

Style guide

The Web Almanac uses a specific "style guide" for code, including 2 spaces for indentation (except Python which uses 4 spaces).

An .editorconfig file exists for those using EditorConfig to help enforce those styles. This may require installing the editorconfig node module via npm (we suggest installing globally - npm install -g editorconfig ) and then installing the extension for your IDE (for example there is a Visual Studio Code extension).

Run Locally

Make sure you run the following commands from within the src directory by executing cd src first.

Make sure Python (3.12 or above), pip and NodeJS (v20 or above) are installed on your machine.

  1. If you don't have virtualenv, install it using pip.
sudo pip install virtualenv

Or for Windows

py -m pip install --user virtualenv
  1. Create an isolated Python environment, and install dependencies:
virtualenv --python python3.12 .venv
source .venv/bin/activate

Or for those on Windows:

virtualenv --python python3 .venv
.venv\Scripts\activate.bat
  1. Install generate and run the website:
npm install
npm run start
  1. In your web browser, enter the following address: http://127.0.0.1:8080

To stop the server run the following:

npm run stop

More details on the npm run start script

The npm run start is a wrapper script which does the following steps:

  • pip install -r requirements.txt - Installs any python requirements
  • python main.py background * - to run the website, so you can see it while the rest of the steps are being followed.
  • npm run generate - to convert all the chapters from markdown to HTML
  • npm run pytest - to test the python code
  • npm run test - to test all the URLs
  • npm run watch & - run run the chapter watcher to automatically regenerate the chapter if you edit and save.
  • python main.py - we kill and webserver and restart it in debug mode so if you change any Python code it will auto restart.

These individual steps are listed in more detail below, but that one npm run start script should be enough for most people to get started.

One thing to note is that you may need to kill and restart the python web server (python main.py) if you change the JSON config files as they are not picked up dynamically. You do not need to do that for chapter edits as, while in debug mode, the Python web server will serve a fresh copy of that each time.

Generating chapters manually

Note chapters are automatically generated by the npm run start command above, and individual chapters will automatically be regenerated when the markdown is changed. However it is listed here separately in case you want to regenerate if making changes to chapters.

Install the dependencies:

npm install

Run the generate chapters script:

npm run generate

If you have the server up you can test all the pages are being served correctly:

npm run test

You can also run single chapters, so you don't have to wait for the full run time:

npm run generate en/2019/css

Or even patterns (note patterns must be in quotes to prevent OS attempting to match to files):

npm run generate ".*/2019/css"
npm run generate "en/.*/css"
npm run generate ".*/2020/.*"

There is also a file watcher, which monitors the content directory and automatically regenerates a chapter when it sees it being modified (this is run automatically as part of npm run start):

npm run watch

Generating chapter images

We can automate the generation of chapter images from the command line to save this onerous task.

This requires the figure markup to exist in the chapter's markdown file, including the image and chart_url attributes:

{{ figure_markup(
  image="pwa-timeseries-of-service-worker-installations.png",
  ...
  chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vRRpTSA4fsHwUap-ByQ08j95uo7Zm1kY6lTSvA-DZT54g2QZ0guV7db3QyQwQgMPzsKsJ43gbzqfJst/pubchart?oid=1883263914&format=interactive",
  ...
  )
}}

It can be run like below, by passing a chapter markdown (with or without the .md extension):

npm run figure-images en/2021/pwa

Which will then generate any missing figures based on the chapter markup, skipping images that already exist:

> [email protected] figure-images
> node ./tools/generate/generate_figure_images "en/2021/pwa"

Generating for chapter: pwa for year 2021
  Skipping: pwa-service-worker-controlled-pages-by-rank.png as image already exists
  Skipping: pwa-most-used-service-worker-events.png as image already exists
  Skipping: pwa-service-worker-and-manifest-usage.png as image already exists
  Skipping: pwa-top-pwa-manifest-properties.png as image already exists
  Skipping: pwa-top-pwa-manifest-icon-sizes.png as image already exists
  Skipping: pwa-manifest-display-values.png as image already exists
  Skipping: pwa-manifests-preferring-native-app.png as image already exists
  Skipping: pwa-industry-categories.png as image already exists
  Skipping: pwa-lighthouse-pwa-audits.png as image already exists
  Skipping: pwa-lighthouse-pwa-scores.png as image already exists
  Skipping: pwa-libraries-and-scripts.png as image already exists
  Skipping: pwa-top-workbox-versions.png as image already exists
  Skipping: pwa-top-workbox-packages.png as image already exists
  Generating image pwa-workbox-runtime-caching-strategies.png...
  Generating image pwa-notification-acceptance-rates.png...
  Generating image pwa-install-events.png...

Authors can delete images and rerun if they want to, to regenerate images.

Images will automatically be compressed by our Calibre GitHub Action when uploaded to GitHub, but you can get a lot more compression (about 44% more!) by running them through https://tinypng.com instead (at which point the Calibre Action will usually not find any further compression gains). It's quite simple to drag them up, and download them, so would encourage analysts/authors to take this step.

Running them through https://tinypng.com also has the added advantage of the compression being repeatable each time. So if you are not sure which images you have changed, you can delete them all, regenerate them all, run them through TinyPNG, and then a git diff will only show differences on the images that have changed. This will not be the case if you use the Calibre GitHub Action and it will look like all images have changed.

Linting files

When you push changes to GitHub they will be linted using the GitHub Super-Linter.

It is possible to run the Super-Linter locally if you have Docker installed.

Note: Be aware that this is a BIG Docker image and takes a while to download. For simple linting error details you can rely on GitHub telling you the errors rather than running it locally. For SQL linting we can run this outside of the GitHub Super-Linter (see below).

First up pull the Super-Linter Docker image (this takes some time to download but only need to do this once or when you want to upgrade the version of the super-linter):

docker pull github/super-linter/slim:latest

Then to run the linting do this:

npm run lint

This can take a while to run so you can lint just subsets of files or folders:

npm run lint tools/generate templates/base

Linting SQL files

SQL files will be linted by SQLFluff. This tools adds the ability to autofix some simple errors (e.g. spacing, capitalisation and commas) when run locally. To lint the 2020 resource-hints SQL files, for example, install the python environment as per above, and then issue the following command:

sqlfluff lint ../sql/2020/resource-hints

This will return errors like the following:

% sqlfluff lint ../sql/2020/resource-hints
== [../sql/2020/resource-hints/adoption_service_workers.sql] FAIL
L:  25 | P:  26 | L010 | Inconsistent capitalisation of keywords.
L:  26 | P:  37 | L010 | Inconsistent capitalisation of keywords.
L:  34 | P:  63 | L038 | Trailing comma in select statement forbidden
All Finished 📜 🎉!

This states that:

  • On line 25, in position 26 you are using lowercase for keywords (e.g. as instead of AS) so failed rule L010.
  • Similarly on line 26, position 37.
  • And finally on line 34, position 63 you have an unnecessary comma (e.g. SELECT a,b, FROM table) and so failed rule L038. Remove the extra comma.

The list of rules can be found in the SQLFLuff documentation though we have turned a few of them off and configured others for our style (see the our .sqlfluff file if curious).

If you see any "unparseable" or PRS errors, then this is either an error in your code, or perhaps you've discovered a bug. Reach out to Barry (@tunetheweb) for help if stuck.

To attempt to autofix the errors you can use the fix command, instead of lint:

sqlfluff fix ../sql/2020/resource-hints

Which will produce similar output but with an offer to fix the issues it thinks it can fix:

% sqlfluff fix ../sql/2020/resource-hints
==== finding fixable violations ====
== [../sql/2020/resource-hints/adoption_service_workers.sql] FAIL
L:  25 | P:  26 | L010 | Inconsistent capitalisation of keywords.
L:  26 | P:  37 | L010 | Inconsistent capitalisation of keywords.
L:  34 | P:  63 | L038 | Trailing comma in select statement forbidden
==== fixing violations ====
3 fixable linting violations found
Are you sure you wish to attempt to fix these? [Y/n]

If you lint again you should see most of the errors are fixed. Note that not all errors can be autofixed and some will require manual intervention but autofixing is useful for the simple errors. So while it's generally OK to run the fix command, do run the lint command when all clean to make sure the fix command didn't miss any issues.

Generating Ebooks

For generating PDFs of the ebook, you need to install Prince. Follow the instructions on the Prince Website and pdftk.

To actually generate the ebooks, start your local server, then run the following:

npm run ebooks

There is a GitHub Action which can be run manually from the Actions tab to generate the Ebooks and store the resulting files in an artifact. This is the easiest way to generate them.

Generating ebooks - including print-ready ebooks if you want a hardcopy

It is also possible to generate the ebook from the website (either production or 127.0.0.1), with some optional params (e.g. to print it with different settings).

prince "http://127.0.0.1:8080/en/2019/ebook?print" -o web_almanac_2019_en.pdf --pdf-profile='PDF/UA-1'

Note --pdf-profile='PDF/UA-1' may not be needed if just intend to print.

Query params accepted are:

  • print - this adds left, right pages, footnotes, and sets roman numerals for front matter page numbers and adds footnotes. It is used by default when running npm run ebooks but we could change that if prefer a less print-like ebook.
  • printer - this adds crop marks, bleeds and trims. Also adds two additional pages at front which will need to be deleted in Acrobat or similar to get clean starting page.
  • page-size - this allows you to override the default page size of A5.
  • inside-margin - this allows you to set an inside margin for binding (e.g. on right for left hand pages and vice versa)
  • bleed - add a bleed for printing (3mm by default)
  • prince-trim - add a bleed for printing (5mm by default)
  • base-font-size - set the base font-size (10px by default), which is useful if changing page size.
  • max-fig-height - defaults to 610px (for A5) and prevents large images from causing overflows on to other pages with heading and caption.
  • cover - this genarates a 4 page cover (front cover + spine + back cover, and same on inside which is blank). This ignores above options but has further params discussed below.

You can also download the HTML and override the inline styles there if you want to customise this for something we haven't exposed as a param, and then run prince against the file.

So for a printer-ready A5 version, that you can send to a print to bind, you can do the following:

prince "http://127.0.0.1:8080/en/2019/ebook?print&printer" -o static/pdfs/web_almanac_2019_en_print_A5.pdf

This is the same as below since it uses all the default settings:

prince "http://127.0.0.1:8080/en/2019/ebook?print&printer&page-size=A5&inside-margin=19.5mm&bleed=3mm&prince-trim=5mm&base-font-size=10px" -o static/pdfs/web_almanac_2019_en_print_A5.pdf

Note this will create two extra pages at the begining which will need to be removed with a PDF editor (e.g. Adobe Acrobat) to start with a clean page starting on right hand side for printing. Please remove these before deploying.

It is also possible to generate a cover using the &cover URL param. This consists of basically 2 pages - the first page is a double width-page with front and back cover as one page (with spine in between) and the second page is a blank inside page.

prince "http://127.0.0.1:8080/en/2019/ebook?cover" -o static/pdfs/web_almanac_2019_en_cover_A5.pdf

Extra params accepted for the cover are are (note spine and pageWidth are unit-less to allow for easy addition in the code):

  • spine - the width of the spine (defaults to 25 for 2019 and 34.5 for 2020)
  • spinePadding - padding of the spine (defaults to "(spine - 25) / 2") to allow spine to look similar across years even though 2020 was a lot thicker than 2019.
  • pageWidth - the front cover width (note is just the page width and not the full width of front cover and back cover and spine) - defaults to 148 (for A5).
  • pageHeight - defaults to 210 (for A5)
  • unit - which unit the above measurements are in (defaults to mm)
  • base-font-size - set the base font-size (10px by default), which may need to be increased if changing page size.

So default is the same as:

prince "http://127.0.0.1:8080/en/2019/ebook?cover&spine=25&pageWidth=148&pageHeight=210&unit=mm&base-font-size=10px" -o static/pdfs/web_almanac_2019_en_cover_A5.pdf

Note, similar to above, this will create one extra page at the begining which will need to be removed with a PDF editor to start with a clean page for printing. Please remove this before checking in versions into git.

With the print-ready eBook and Cover you can send them to a printer. I used https://www.digitalprintingireland.ie/ before and they were excellent and charge about €35 for a full-colour A5 ebook. Most of the settings above are for them, so tweak them based on your own printer's requirements.

Deploying changes

If you've been added to the "App Engine Deployers" role in the GCP project, you're able to push code changes to the production website.

You must first do some setup locally:

  1. Install the gcloud Google Cloud SDK.

  2. Authenticate the email address associated with the project with the webalmanac GCP project:

gcloud init
  1. Set Github personal access token to GITHUB_TOKEN environment variable

Use an existing token or create a new token in GitHub.com. In token configuration select Repository access = Public Repositories (read-only).

export GITHUB_TOKEN=...

Deploying staged versions of the site

It's sometimes useful to deploy a staged version of the site for others to see this.

npm run stage

You can redeploy the same staged url by passing it in with the -s parameter (note you also need the -- separater to differentiate these script params from npm params):

npm run stage -- -s privacy-2024

Deploying production changes

To deploy the site to production run the following:

Make sure you have updated the timestamps and generated the ebooks PDFs first in the main branch, by running the "Predeploy script" GitHub Action

npm run deploy

The deploy script will do the following:

  • Ask you to confirm you've run the pre-deploy script via GitHub Actions
  • Switch to the production branch
  • Merge changes from main
  • Do a clean install (remove generated chapters and e-books)
  • Download latest e-books from Github Action artifacts
  • Run and test the website
  • Ask you to complete any local tests and confirm good to deploy
  • Ask for a version number (suggesing the last verision tagged and incrementing the patch)
  • Tag the release (after asking you for the version number to use)
  • Deploy to GCP
  • Push changes to production branch on GitHub
  • Switch you back to the main branch.
  • Ask you to update the release section of GitHub

Browse the website in production to verify that the new changes have taken effect. Note we have 10 minute caching so add random query params to pages to ensure you get the latest version.

Developing in Docker

Assuming that you have Docker installed and running, ensure that the working directory is src, where the Dockerfile is present, before running the following commands.

  1. Build a Docker image named webalmanac (if you choose a different name, adjust following commands accordingly):
docker image build -t webalmanac .
  1. Run the application server (which is the default command of the Docker image, so no need to explicitly supply it as an argument):
docker container run --rm -it -v "$PWD":/app -p 8080:8080 webalmanac
  1. Open http://localhost:8080 in your web browser to access the site. You can kill the server when it is no longer needed using Ctrl+C.

  2. Make changes in the code using any text editor and run tests (need to build the image again if any Python or Node dependencies are changed):

docker container run --rm -it -v "$PWD":/app webalmanac npm run pytest
  1. To avoid running commands in one-off mode run bash in a container (with necessary volumes mounted and ports mapped) then run successive commands:
docker container run --rm -it -v "$PWD":/app -v /app/node_modules -p 8080:8080 webalmanac bash
root@[CID]:/app# python main.py
root@[CID]:/app# npm run generate
root@[CID]:/app# npm run pytest
root@[CID]:/app# exit
  1. To customize the image use PYVER, NODEVER, and SKIPGC build arguments to control which versions of Python and Node are used and whether Google Cloud SDK is installed.
docker image build --build-arg PYVER=3.12 --build-arg NODEVER=20.x --build-arg SKIPGC=false -t webalmanac:custom .
  1. If you want to run the GitHub Super-Linter without npm being installed you need to call the command directly as given in package.json.

This will depend on your operating system but for MacOS/Linux this would be:

docker container run -it --rm -v /app/node_modules -v "$PWD/..":/app -w /app/src --entrypoint=./tools/scripts/run_linter_locally.sh github/super-linter:slim-latest

And for Windows:

docker container run --rm -v /app/node_modules -v %cd%\\..:/app -w /app/src --entrypoint=./tools/scripts/run_linter_locally.sh github/super-linter:slim-latest