These are shell utilities intended to accompany my coming (still barely started) ePub book on ePub.
The Python utilities here are tested with Python 3.6 and may not work with older versions.
The python scripts that manipulate XML files use the minidom
library to do
so. That library does something I (and many others) am not too fond of, it
reorders all attributes in a node to alphabetical order.
Workarounds for this behavior can be found at
https://stackoverflow.com/questions/662624/preserve-order-of-attributes-when-modifying-with-minidom
However, all of them were beyond the scope of this project. I choose to just live with it, but feel free to implement one of those solutions if you want to.
All scripts here are MIT license. See the LICENSE file.
At least one of the scripts requires Python 3.6 and the others may as well. On older UNIX systems you may need to install Python 3.6. For well-maintained distributions this is usually easy, e.g. on CentOS 7 with EPEL:
yum install python36 python36-pip python36-pytz
In those cases, you need to change the shebang from
#!/usr/bin/env python
to something like (depending on your system)
#!/usr/bin/env python36
With the exception of enterprise (LTS) distributions, I believe most Linux distributions at this point already ship with Python 3.6 or newer and you do not have to do anything (except maybe install the pytz package) for these scripts to work.
I recommend putting the python scripts into ~/bin/
or if they need to be
made available in multiple user accounts, into /usr/local/bin/
Be sure to set the execution bit on them (e.g. chmod +x ~/bin/*.py
)
I would appreciate it if a MacOS user (or even a Windows user) would send me a README with Python 3.6 install instructions for those operating systems, I do not have regular access to them.
I know MacOS already has Python but last time I had access to MacOS it was a
really outdated version and I had to install a newer one, I believe using a
program called homebrew
but I can not recall. It is quite possible there are
official packages distributed by the Python maintainers for MacOS as well.
For the bash
shell script(s) they are not intended to be installed in a
directory in your path but are intended to be skeletons you modify as needed to
automate your workflow. For example, they assume your content directory is
called EPUB
and that your OPF file is called content.opf
but the ePub
specification does not make such assertions. Using OEBPS
for the content
directory is also extremely common.
This script can be used to create a new projet. It creates the META-INF
directory and the container.xml
file along with the content directory and the
OPF file. The script can be run without arguments in which case default values
are used, or you can use switches to override the default values:
usage: createSkeletonEpub.py [-h] [-t TITLE] [-d DESCRIPTION] [-g GENRE]
[-a AUTHOR] [-p PUBLISHER] [-e PUBLICATIONDATE]
[-x XMLLANG] [-l BOOKLANG] [-D OEBPS] [-f OPF]
Setup an initial ePub 3 container structure. All arguments are optional.
optional arguments:
-h, --help show this help message and exit
-t TITLE, --title TITLE
The title of the book
-d DESCRIPTION, --description DESCRIPTION
A short description of the book
-g GENRE, --genre GENRE
The genre the book fits into
-a AUTHOR, --author AUTHOR
The author of the book
-p PUBLISHER, --publisher PUBLISHER
The book publisher
-e PUBLICATIONDATE, --pubdate PUBLICATIONDATE
Publication date
-x XMLLANG, --xmllang XMLLANG
BCP 47 language string for the Package Document File
(OPF)
-l BOOKLANG, --lang BOOKLANG
BCP 47 language string for the language of the book
-D OEBPS, --contentdir OEBPS
Content directory for your ePub files
-f OPF, --opffile OPF
File name for the Package Document File (OPF)
The default for the publication data metadata is six weeks in the future and
will almost certainly need editing within the Package Document File but since it
is requires metadata, I had to use something. If you know the planned
publication date when running the script, the -e
or --pubdate
switches will
override that guess and accepts string dates, such as "5 dec 2020"
etc.
The default values when used without switches are at the top of the
createSkeletonEpub.py
file just under all the import whatever
declarations
if you want to customize the defaults for your environment.
The script uses a coulple python dependencies some systems may not have:
language_tags
- used to validate BCP 47 language tags.dateparser
- used to normalize date strings.
Both are available via pip
if your operating system vendor does not have
packages for them.
The createSkeletonEpub.py
will create a unique identifier using a UUID but you
can use the addIsbnNumber.py
to use an ISBN number instead if you have (or
get) one.
Every ePub has to have a Unique Identifier defined in the content.opf
file.
When your publication has an ISBN number, that is usually what is used. When
you do not have one, you can use a UUID
instead.
This script generates a UUID and creates the necessary nodes and attributes in
your content.opf
file to use the UUID as the unique identifier for your ePub.
UUID has no cost associated with it, nor does it have a central registry. It is simply a hex encoded 128-bit random number with some dashes inserted. As long as your operating system pRNG is not broken, you can have extremely high confidence the same UUID is not already in use elsewhere, there are literally 3.4 X 10^38 possible UUID values, duplicates when generated via a quality pRNG will not happen.
If and when you do decide to get an actual ISBN, you can change your Unique Identifier to that ISBN in the future, but note that doing so will mean that any obfuscated resources need to be re-obfuscated from their original source, as the cryptography key used to obfuscate the resources is generated from the Unique Identifier.
This script will exit if it detects the content.opf
file already has a
Unique Identifier set up.
This script takes a single argument: The path to your content.opf
file.
If you have an ISBN number for your publication, this script will add it to
your content.opf
file as the Unique Identifier. If fed a 10 digit ISBN it
will first be converted to a 13 digit ISBN, though that should not be needed
since 10 digit ISBN are not issued anymore and digital editions are suppose
to have a different ISBN than previous editions.
The script will exit if fed an ISBN number it detects as invalid. The script
will exit if it detects the ePub already has a Unique Identifuer unless the
id
attribute for that unique identifier is prng-uuid
which is the default
id
attribute set by the generateUniqueIdentifier.py
script. This is done to
prevent accidental alteration of the Unique Identifier.
If you intend to alter the Unique Identifier, manually edit your content.opf
file and remove the unique-identifier
attribute from the root package
node.
This script will not remove any existing dc:identifier
nodes, and it is okay
to have as many of those as you need, but only one can have an id
attribute
that corresponds with the package
unique-identifier
attribute.
When there are existing dc:identifier
nodes, this script will insert the
dc:identifier
for the ISBN number before the other(s). This is because some
ePub readers are not fully ePub 3 compliant and expect the first to be the ISBN
number.
The first argument to the script is the path to your content.opf
file and the
second argument is the ISBN number (with or without hyphens).
When any change is made to your ePub, the <meta property="dcterms:modified"></meta>
node is suppose to updated to reflect the modification time.
This script does that, you can call if from your script that generates the ePub before packing it into a zip archive and know that modification timestamp is proper.
This script will remove any existing <meta property="dcterms:modified"></meta>
tags within <metadata/>
and then create one using the current timestamp.
This script takes a single argument: The path to your content.opf
file.
Some third party resources have a license that requires you obfuscate the resource before embedding it in a product (such as an eBook) that you distribute.
For this reason, the ePub specification documents an obfuscation method that can be used at
https://www.w3.org/publishing/epub3/epub-ocf.html#sec-resource-obfuscation
Neither I nor the W3C can give advice on whether or not the algorithm there satisfies license requirements, but that method is the only obfuscation method that is part of the ePub specification and thus likely to be supported by the majority of ePub software.
This script implements that algorithm to obfuscate a resource.
Please note that this script does not modify or create the encryption.xml
file that obfuscated resources must be described in. The script does not care
what the path of the resource within your ePub archive will be, so it can not
modify that file.
Please note that running the ePub obfuscation algorithm on an obfuscated file will deobfuscate the file.
This script will not modify the file to be obfuscated on the filesystem. It will create a new file with a different file name, and it will exit if a file of that name already exists.
If filename.ext
is the file to be obfuscated, filename-obf.ext
will be the
obfuscated file that is created.
On the other hand if filename-obf.ext
is the file to be obfuscated, then
filename.ext
will be the obfuscated file that is created (which results in an
un-obfuscated file if filename-obf.ext
is an obfuscated file and the same
obfuscation key is used)
In either case (with or without -obf
at the end of the filename before the
file extension) the output file will not be created if a file of that name
already exists.
The first argument is the path to the OPF file. This is necessary to determine the obfuscation key.
The second argument is the path to the resource to be obfuscated (or de-obfuscated if it was already obfuscated with the same key)
iBooks (and possibly Apple Books, I do not yet know) has a special XML file
within the META-INF
directory that controls some of its options. This python
script allows you to easily create, modidy, or delete that special XML file.
usage: iBooksOptions.py [-h] [-p PLATFORM] [-l LAYOUT] [-f FONTS] [-s SPREAD]
[-i INTERACTIVE] [-o ORIENTATION] [-M METAINF]
Setup or modify iBooks custom META-INF XML file.
optional arguments:
-h, --help show this help message and exit
-p PLATFORM, --platform PLATFORM
The target iOS device platform
-l LAYOUT, --fixed-layout LAYOUT
True or False. Whether or not a fixed layout is being
used
-f FONTS, --publisher-fonts FONTS
True or False. Whether or not publisher fonts are
embedded
-s SPREAD, --open-to-spread SPREAD
True or False. Whether or not the iBook should open to
spread
-i INTERACTIVE, --interactive INTERACTIVE
True or False. Whether or not scripted content exists
-o ORIENTATION, --orientation-lock ORIENTATION
Portrait or Landscape or None. A forced orientation
for the ePub
-M METAINF, --META-INF METAINF
Path to META-INF directory.
For the --platform
option, you need to specify whether the options you are
specifying are for the iPhone, iPad, or all iOS devices. The default value is
all
.
There are four boolean settings you can specify and one non-boolean setting. For
the boolean settings, the iBooks default value is false
so it only makes sense
for the XML tag controlling them to be there when set to True.
For the boolean settings, it only makes sense for them to be defined in the
<device/>
node for all OR iphone OR ipad, so specifying one of those
options will remove all other references to the option. If one of those options
is already set, setting the option to false
will remove the option.
The fifth option, orientation-lock
, there may be use cases where the ePub
publisher wants the orientation lock to be landscape for iPhone and portrait for
iPad, so in that case, it may make sense to have it defined differently for
both iPhone and iPad.
You can run the script several times to fine-tune your selection. For example,
you can run it once to create the options for all iOS devices and then run it
a second time specifying iPad for an option you only want to apply to iPads
(such as the --spread
option).
Boolean option. If your ePub uses a fixed layout, you should probably set this
to true
for all iOS devices.
Boolean option. If your ePub has embedded fonts you want used in paragraph nodes
you probably should set this to true
for all devices.
Boolean option. If you want your ePub to open to spread, set this to true
. If
you use this at all, I recommend only setting it for the iPad device.
Boolean option. If your ePub includes scripted interactive content, you probably
want to set this to true
for all devices.
If you want your ePub to only be viewable in portrait or landscape, you can specify which with this option.
This is a bash wrapper script for the epubcheck utility.
Install it in ~/bin/
and make it executable:
cp epubcheck.sh ~/bin/ && chmod +x ~/bin/epubcheck.sh
You will want to change the EPUBCHECK
variable to point to the location where
you unpacked the download from their github project.
If you have more than one version a java
installed, you may need to change
the OPERATION
variable to specify the full path to the java
executable you
want used.
If there is an option to the epubcheck.jar
you always want used, you can
optionally change the OPTIONS
variable to specify that option after the $@
but make sure to put it after the $@
and that there is a space between
them.
This is an example shell script for creating an ePub archive from the UNIX command line. You will need to modify it for your own use.
The concept, it makes it easy to pull your ePub sources from a git
or other
revision control system and create the archive without needing fancy GUI tools.
The example shell script makes use of the updateTimestamp.py
script to update
the modification timestamp before it creates the archive.
The example shell script makes use of the epubcheck.sh
shell script to check
the result for validation errors.