Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pacparse and pacwget #57

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ tools/packages
*buildstamp
src/spidermonkey/js
src/pactester
src/pacparse

# OS specific files
.DS_Store
Expand Down
120 changes: 120 additions & 0 deletions docs/html/pacparse.1.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
<!-- manual page source format generated by PolyglotMan v3.2, -->
<!-- available at http://polyglotman.sourceforge.net/ -->

<html>
<head>
<title>"pacparse"("1") manual page</title>
</head>
<body bgcolor='white'>
<a href='#toc'>Table of Contents</a><p>

<h2><a name='sect0' href='#toc0'>Name</a></h2>
pacparse - tool to parse Proxy Auto-Config (PAC) files
<h2><a name='sect1' href='#toc1'>Synopsis</a></h2>
<b>pacparse
-p pacfile -u url [-h host] [-c clientip] [-U pacurl] [-46edv]</b>
<h2><a name='sect2' href='#toc2'>Description</a></h2>
<b>pacparse
</b> is a tool to parse Proxy Auto-Config (PAC) files. It returns the proxy config
string for the given URL and PAC file. <b>pacparse </b> uses the pacparser C library
for most of its functionality.
<h2><a name='sect3' href='#toc3'>Options</a></h2>

<dl>

<dt><b>-p pacfile</b> </dt>
<dd>PAC file to parse. Specify
"-" to read from standard input. </dd>

<dt><b>-u url</b> </dt>
<dd>URL to pass as a parameter to the
PAC file&rsquo;s FindProxyForURL function. </dd>

<dt><b>-h host</b> </dt>
<dd>Host part of the URL. If not specified,
it is determined from the URL. </dd>

<dt><b>-c clientip</b> </dt>
<dd>Client&rsquo;s IP address, as returned
by the function myIpAddress() in PAC files. If not specified, it defaults
to the IP address associated with the hostname of the machine on which
the tool is running, or 127.0.0.1 if that can&rsquo;t be found. </dd>

<dt><b>-U pacurl</b> </dt>
<dd>URL that
the PAC file came from, used to identify the client IP address in a more
reliable way. The tool parses the host name from the URL, attempts to connect
to each address associated with that host name with a UDP socket until
one is successful, and then uses the IP address associated with the client
side of that socket. </dd>

<dt><b>-4</b> </dt>
<dd>Use only IPv4 addresses for -U. </dd>

<dt><b>-6</b> </dt>
<dd>Use only IPv6 addresses
for -U. </dd>

<dt><b>-e</b> </dt>
<dd>Enable Microsoft PAC extensions (dnsResolveEx, myIpAddressEx, isResolvableEx).
</dd>

<dt><b>-d</b> </dt>
<dd>Enable debugging messages. </dd>

<dt><b>-v</b> </dt>
<dd>Print version and exit. </dd>
</dl>

<h2><a name='sect4' href='#toc4'>Examples</a></h2>
<p>
To find out
the proxy config string for the PAC file "wpad.dat" and the URL "<a href='http://www.google.com'>http://www.google.com</a>
":
<p>
$ pacparse -p wpad.dat -u <a href='http://www.google.com'>http://www.google.com</a>

<p> For a client with IP address
10.0.12.123: <p>
$ pacparse -p wpad.dat -c 10.0.12.123 -u <a href='http://www.google.com'>http://www.google.com</a>

<p> For a
PAC file hosted at <a href='http://wpad/wpad.dat:'>http://wpad/wpad.dat:</a>
<p>
$ curl -s <a href='http://wpad/wpad.dat'>http://wpad/wpad.dat</a>
| \
pacparse -p - -u <a href='http://google.com'>http://google.com</a>
-U http://wpad/wpad.dat<br>

<h2><a name='sect5' href='#toc5'>See Also</a></h2>
<a href='pacwget.1'>pacwget(1)</a>
, <a href='pacparser_init.3'>pacparser_init(3)</a>

<h2><a name='sect6' href='#toc6'>Bugs</a></h2>
If you have come across a bug
in pacparse, please submit a bug report at <a href='https://github.com/pacparser/pacparser/issues.'>https://github.com/pacparser/pacparser/issues.</a>


<h2><a name='sect7' href='#toc7'>Author</a></h2>
Written by Manu Garg (http://www.manugarg.com) and Dave Dykstra.
<h2><a name='sect8' href='#toc8'>Resources</a></h2>
Homepage:
<a href='https://github.com/pacparser/pacparser.'>https://github.com/pacparser/pacparser.</a>

<p> <p>

<hr><p>
<a name='toc'><b>Table of Contents</b></a><p>
<ul>
<li><a name='toc0' href='#sect0'>Name</a></li>
<li><a name='toc1' href='#sect1'>Synopsis</a></li>
<li><a name='toc2' href='#sect2'>Description</a></li>
<li><a name='toc3' href='#sect3'>Options</a></li>
<li><a name='toc4' href='#sect4'>Examples</a></li>
<li><a name='toc5' href='#sect5'>See Also</a></li>
<li><a name='toc6' href='#sect6'>Bugs</a></li>
<li><a name='toc7' href='#sect7'>Author</a></li>
<li><a name='toc8' href='#sect8'>Resources</a></li>
</ul>
</body>
</html>
200 changes: 200 additions & 0 deletions docs/html/pacwget.1.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
<!-- manual page source format generated by PolyglotMan v3.2, -->
<!-- available at http://polyglotman.sourceforge.net/ -->

<html>
<head>
<title>"pacwget"("1") manual page</title>
</head>
<body bgcolor='white'>
<a href='#toc'>Table of Contents</a><p>

<h2><a name='sect0' href='#toc0'>Name</a></h2>
pacwget - robustly get <a href='http'>http</a>
URLs using multiple proxies and servers

<h2><a name='sect1' href='#toc1'>Synopsis</a></h2>
<b>pacwget [--only-proxies] [GNU_WGET_OPTIONS]</b>
<h2><a name='sect2' href='#toc2'>Description</a></h2>
<b>pacwget</b> is
a tool that uses GNU wget in such a way that target <a href='http'>http</a>
URLs are retrieved
even if some of multiple proxies and/or target servers are not functioning.
The "pac" part of the name comes from its support of Proxy Auto-Config
(PAC) files for configuring proxies. <p>
The configuration of proxies (including
via PAC URLs) comes from the environment (see below), but multiple servers
are recognized through round-robin DNS names of the host part specified
in the target <a href='http'>http</a>
URLs. With each proxy (unless the option <b>--only-proxies</b>
is given), <b>pacwget</b> first tries the target URL and if that fails and the
host part of the URL is a round-robin DNS name, it tries replacing the host
part of the URL with each IP address from the round-robin while using the
same proxy.
<h2><a name='sect3' href='#toc3'>Environment</a></h2>
<b>pacwget</b> uses the following environment variables:

<dl>

<dt><b>HTTP_PROXIES</b> </dt>
<dd>A semicolon-separated list of URLs to try as <a href='http'>http</a>
proxies in
order. The last one in the list may be "DIRECT" which means to use no proxy
and connect directly to the host server in target <a href='http'>http</a>
URLs being retrieved.
</dd>

<dt><a href='http_proxy'><b>http_proxy</b></a>
</dt>
<dd>If HTTP_PROXIES is not set, but <a href='http_proxy'>http_proxy</a>
is, then it is used
as a single <a href='http'>http</a>
proxy URL. Note that it may identify a round-robin of more
than one proxy, but direct connections to the target server is not an option.
</dd>

<dt><b>PAC_URLS</b> </dt>
<dd>If neither HTTP_PROXIES nor <a href='http_proxy'>http_proxy</a>
is set, then PAC_URLS is
used as a semicolon-separated list of URLs to try to read Proxy Auto-Config
files to parse for a list of <a href='http'>http</a>
proxies. The word "auto" is converted
to "<a href='http://wpad/wpad.dat'>http://wpad/wpad.dat</a>
" as is commonly used for Web Proxy Auto Discovery.
The last one in the list may be "DIRECT" which means to directly connect
to the target server if no PAC file can be read. Otherwise the PAC URLs
may begin with <a href='http://'>http://</a>
or file://, although if file:// is used then the
myIpAddress() function available inside the PAC file uses a less reliable
method to determine the client&rsquo;s IP address (using a DNS lookup on the hostname
instead of using the client IP from the socket connecting to the PAC file&rsquo;s
http server). The url &amp; host parameters to the FindProxyForURL function in
the PAC file are derived from the first command line <a href='http://'>http://</a>
parameter.
If a PAC file is successfully read, it must return a list of proxies or
"DIRECT", otherwise it is a fatal error and no further PAC URLs are tried.
The default value of $PAC_URLS if it is not set is "auto; DIRECT". </dd>
</dl>

<h2><a name='sect4' href='#toc4'>Options</a></h2>
First,
note that unlike with wget, the ordering of options is significant with
<b>pacwget</b>: only the options that come before a URL apply to that URL. In
this way, different options can be specified with multiple URLs in the
same invocation of <b>pacwget</b>. The same list of proxies are applied to each
<a href='http://'>http://</a>
URL, although wget is invoked separately for each URL on the <b>pacwget</b>
command line. <p>
There is one option added by <b>pacwget</b>:
<dl>

<dt><b>--only-proxies</b> </dt>
<dd>Print to
stdout a $HTTP_PROXIES-like list of proxies (that is, semicolon separated
and may end in "DIRECT") that would be used instead of downloading the
given URL(s) with wget. This is useful for downloading and parsing PAC
files. Requires one URL that starts with <a href='http://.'>http://.</a>
</dd>
</dl>
<p>
All other options are
passed to wget, but some cause additional action in <b>pacwget</b> and are described
here:
<dl>

<dt><b>--connect-timeout=SECS</b> </dt>
<dd>Sets the connection timeout to SECS seconds for
retrieving both PAC URLs and target URLs. If a proxy or a target server
does not respond in that amount of time, the next one is tried. Default
5. </dd>

<dt><b>--read-timeout=SECS</b> </dt>
<dd>Sets the read timeout to SECS seconds for retrieving
both PAC URLs and target URLs. If no data is received from a proxy or a
target server in that amount of time, the next one is tried. Default 10.
</dd>

<dt><b>-T SECS</b> </dt>
<dd>Sets both the connect and read timeout to SECS seconds. </dd>

<dt><b>--tries=N</b> </dt>
<dd>Try
all wget connections for both PAC URLs and target URLs N times. Default
1. </dd>

<dt><b>--inet4-only or "-4"</b> </dt>
<dd>Use only IPv4 addresses for both wget and for the myIpAddress()
function in PAC files. </dd>

<dt><b>--inet6-only or "-6"</b> </dt>
<dd>Use only IPv6 addresses for both
wget and for the myIpAddress() function in PAC files. </dd>

<dt><b>--debug or "-d"</b> </dt>
<dd>In addition
to adding debug messages to all uses of wget, also enable debugging in
PAC file parsing. </dd>

<dt><b>--verbose or "-v"</b> </dt>
<dd>This is the default for wget for the target
URL, but if this is explicitly set then it is also used for PAC URLs. In
addition, if neither debug nor verbose is set, PAC URLs are retrieved with
the wget --quiet option. </dd>
</dl>

<h2><a name='sect5' href='#toc5'>Examples</a></h2>
<p>
To retrieve target URL "<a href='http://www.google.com'>http://www.google.com</a>
"
using proxies defined in "<a href='http://wpad/wpad.dat'>http://wpad/wpad.dat</a>
" and not allow direct connections:
<p>
$ export PAC_URLS=auto <br>
$ pacwget <a href='http://www.google.com'>http://www.google.com</a>
<p>
To try an additional WPAD server after the
usual one and allow direct connections if that also doesn&rsquo;t work: <p>
$ export
PAC_URLS="auto; <a href='http://wpad.shared.domain/wpad.dat;'>http://wpad.shared.domain/wpad.dat;</a>
DIRECT" <br>
$ pacwget <a href='http://www.google.com'>http://www.google.com</a>
<p>
To directly set a list of possible proxies,
with debugging enabled: <p>
$ export HTTP_PROXIES="<a href='http://squid:3128;http://squid.friend.dom:3128'>http://squid:3128;http://squid.friend.dom:3128</a>
"
<br>
$ pacwget -d <a href='http://www.google.com'>http://www.google.com</a>

<h2><a name='sect6' href='#toc6'>See Also</a></h2>
<a href='pacparse.1'>pacparse(1)</a>
, <a href='pacparser_init.3'>pacparser_init(3)</a>


<h2><a name='sect7' href='#toc7'>Bugs</a></h2>
If you have come across a bug in pacwget, please submit a bug report
at <a href='https://github.com/pacparser/pacparser/issues'>https://github.com/pacparser/pacparser/issues</a>

<h2><a name='sect8' href='#toc8'>Author</a></h2>
Written by Dave Dykstra.

<h2><a name='sect9' href='#toc9'>Resources</a></h2>
Homepage: <a href='https://github.com/pacparser/pacparser'>https://github.com/pacparser/pacparser</a>

<p> <p>

<hr><p>
<a name='toc'><b>Table of Contents</b></a><p>
<ul>
<li><a name='toc0' href='#sect0'>Name</a></li>
<li><a name='toc1' href='#sect1'>Synopsis</a></li>
<li><a name='toc2' href='#sect2'>Description</a></li>
<li><a name='toc3' href='#sect3'>Environment</a></li>
<li><a name='toc4' href='#sect4'>Options</a></li>
<li><a name='toc5' href='#sect5'>Examples</a></li>
<li><a name='toc6' href='#sect6'>See Also</a></li>
<li><a name='toc7' href='#sect7'>Bugs</a></li>
<li><a name='toc8' href='#sect8'>Author</a></li>
<li><a name='toc9' href='#sect9'>Resources</a></li>
</ul>
</body>
</html>
Loading
Loading