manugarg · DrDaveD · Oct 8, 2015 · Oct 8, 2015 · Oct 9, 2015 · Oct 9, 2015
diff --git a/.gitignore b/.gitignore
@@ -26,6 +26,7 @@ tools/packages
 *buildstamp
 src/spidermonkey/js
 src/pactester
+src/pacparse
 
 # OS specific files
 .DS_Store

diff --git a/docs/html/pacparse.1.html b/docs/html/pacparse.1.html
@@ -0,0 +1,120 @@
+<!-- manual page source format generated by PolyglotMan v3.2, -->
+<!-- available at http://polyglotman.sourceforge.net/ -->
+
+<html>
+<head>
+<title>"pacparse"("1") manual page</title>
+</head>
+<body bgcolor='white'>
+<a href='#toc'>Table of Contents</a><p>
+
+<h2><a name='sect0' href='#toc0'>Name</a></h2>
+pacparse - tool to parse Proxy Auto-Config (PAC) files 
+<h2><a name='sect1' href='#toc1'>Synopsis</a></h2>
+<b>pacparse
+-p pacfile -u url [-h host] [-c clientip] [-U pacurl] [-46edv]</b> 
+<h2><a name='sect2' href='#toc2'>Description</a></h2>
+<b>pacparse
+</b> is a tool to parse Proxy Auto-Config (PAC) files. It returns the proxy config
+string for the given URL and PAC file.  <b>pacparse </b> uses the pacparser C library
+for most of its functionality. 
+<h2><a name='sect3' href='#toc3'>Options</a></h2>
+
+<dl>
+
+<dt><b>-p pacfile</b> </dt>
+<dd>PAC file to parse. Specify
+"-" to read from standard input. </dd>
+
+<dt><b>-u url</b> </dt>
+<dd>URL to pass as a parameter to the
+PAC file&rsquo;s FindProxyForURL function. </dd>
+
+<dt><b>-h host</b> </dt>
+<dd>Host part of the URL. If not specified,
+it is determined from the URL. </dd>
+
+<dt><b>-c clientip</b> </dt>
+<dd>Client&rsquo;s IP address, as returned
+by the function myIpAddress() in PAC files. If not specified, it defaults
+to the IP address associated with the hostname of the machine on which
+the tool is running, or 127.0.0.1 if that can&rsquo;t be found. </dd>
+
+<dt><b>-U pacurl</b> </dt>
+<dd>URL that
+the PAC file came from, used to identify the client IP address in a more
+reliable way.  The tool parses the host name from the URL, attempts to connect
+to each address associated with that host name with a UDP socket until
+one is successful, and then uses the IP address associated with the client
+side of that socket. </dd>
+
+<dt><b>-4</b> </dt>
+<dd>Use only IPv4 addresses for -U. </dd>
+
+<dt><b>-6</b> </dt>
+<dd>Use only IPv6 addresses
+for -U. </dd>
+
+<dt><b>-e</b> </dt>
+<dd>Enable Microsoft PAC extensions (dnsResolveEx, myIpAddressEx, isResolvableEx).
+</dd>
+
+<dt><b>-d</b> </dt>
+<dd>Enable debugging messages. </dd>
+
+<dt><b>-v</b> </dt>
+<dd>Print version and exit. </dd>
+</dl>
+
+<h2><a name='sect4' href='#toc4'>Examples</a></h2>
+<p>
+To find out
+the proxy config string for the PAC file "wpad.dat" and the URL "<a href='http://www.google.com'>http://www.google.com</a>
+":
+<p>
+$ pacparse -p wpad.dat -u <a href='http://www.google.com'>http://www.google.com</a>
+
+<p> For a client with IP address
+10.0.12.123: <p>
+$ pacparse -p wpad.dat -c 10.0.12.123 -u <a href='http://www.google.com'>http://www.google.com</a>
+
+<p> For a
+PAC file hosted at <a href='http://wpad/wpad.dat:'>http://wpad/wpad.dat:</a>
+ <p>
+$ curl -s <a href='http://wpad/wpad.dat'>http://wpad/wpad.dat</a>
+ | \
+    pacparse -p - -u <a href='http://google.com'>http://google.com</a>
+ -U http://wpad/wpad.dat<br>
+
+<h2><a name='sect5' href='#toc5'>See Also</a></h2>
+<a href='pacwget.1'>pacwget(1)</a>
+, <a href='pacparser_init.3'>pacparser_init(3)</a>
+
+<h2><a name='sect6' href='#toc6'>Bugs</a></h2>
+If you have come across a bug
+in pacparse, please submit a bug report at <a href='https://github.com/pacparser/pacparser/issues.'>https://github.com/pacparser/pacparser/issues.</a>
+
+
+<h2><a name='sect7' href='#toc7'>Author</a></h2>
+Written by Manu Garg (http://www.manugarg.com) and Dave Dykstra. 
+<h2><a name='sect8' href='#toc8'>Resources</a></h2>
+Homepage:
+<a href='https://github.com/pacparser/pacparser.'>https://github.com/pacparser/pacparser.</a>
+
+<p> <p>
+
+<hr><p>
+<a name='toc'><b>Table of Contents</b></a><p>
+<ul>
+<li><a name='toc0' href='#sect0'>Name</a></li>
+<li><a name='toc1' href='#sect1'>Synopsis</a></li>
+<li><a name='toc2' href='#sect2'>Description</a></li>
+<li><a name='toc3' href='#sect3'>Options</a></li>
+<li><a name='toc4' href='#sect4'>Examples</a></li>
+<li><a name='toc5' href='#sect5'>See Also</a></li>
+<li><a name='toc6' href='#sect6'>Bugs</a></li>
+<li><a name='toc7' href='#sect7'>Author</a></li>
+<li><a name='toc8' href='#sect8'>Resources</a></li>
+</ul>
+</body>
+</html>
diff --git a/docs/html/pacwget.1.html b/docs/html/pacwget.1.html
@@ -0,0 +1,200 @@
+<!-- manual page source format generated by PolyglotMan v3.2, -->
+<!-- available at http://polyglotman.sourceforge.net/ -->
+
+<html>
+<head>
+<title>"pacwget"("1") manual page</title>
+</head>
+<body bgcolor='white'>
+<a href='#toc'>Table of Contents</a><p>
+
+<h2><a name='sect0' href='#toc0'>Name</a></h2>
+pacwget - robustly get <a href='http'>http</a>
+ URLs using multiple proxies and servers
+
+<h2><a name='sect1' href='#toc1'>Synopsis</a></h2>
+<b>pacwget [--only-proxies] [GNU_WGET_OPTIONS]</b> 
+<h2><a name='sect2' href='#toc2'>Description</a></h2>
+<b>pacwget</b> is
+a tool that uses GNU wget in such a way that target <a href='http'>http</a>
+ URLs are retrieved
+even if some of multiple proxies and/or target servers are not functioning.
+ The "pac" part of the name comes from its support of Proxy Auto-Config
+(PAC) files for configuring proxies. <p>
+The configuration of proxies (including
+via PAC URLs) comes from the environment (see below), but multiple servers
+are recognized through round-robin DNS names of the host part specified
+in the target <a href='http'>http</a>
+ URLs.  With each proxy (unless the option <b>--only-proxies</b>
+is given), <b>pacwget</b> first tries the target URL and if that fails and the
+host part of the URL is a round-robin DNS name, it tries replacing the host
+part of the URL with each IP address from the round-robin while using the
+same proxy. 
+<h2><a name='sect3' href='#toc3'>Environment</a></h2>
+<b>pacwget</b> uses the following environment variables:
+
+<dl>
+
+<dt><b>HTTP_PROXIES</b> </dt>
+<dd>A semicolon-separated list of URLs to try as <a href='http'>http</a>
+ proxies in
+order. The last one in the list may be "DIRECT" which means to use no proxy
+and connect directly to the host server in target <a href='http'>http</a>
+ URLs being retrieved.
+</dd>
+
+<dt><a href='http_proxy'><b>http_proxy</b></a>
+ </dt>
+<dd>If HTTP_PROXIES is not set, but <a href='http_proxy'>http_proxy</a>
+ is, then it is used
+as a single <a href='http'>http</a>
+ proxy URL.  Note that it may identify a round-robin of more
+than one proxy, but direct connections to the target server is not an option.
+</dd>
+
+<dt><b>PAC_URLS</b> </dt>
+<dd>If neither HTTP_PROXIES nor <a href='http_proxy'>http_proxy</a>
+ is set, then PAC_URLS is
+used as a semicolon-separated list of URLs to try to read Proxy Auto-Config
+files to parse for a list of <a href='http'>http</a>
+ proxies.  The word "auto" is converted
+to "<a href='http://wpad/wpad.dat'>http://wpad/wpad.dat</a>
+" as is commonly used for Web Proxy Auto Discovery.
+ The last one in the list may be "DIRECT" which means to directly connect
+to the target server if no PAC file can be read.  Otherwise the PAC URLs
+may begin with <a href='http://'>http://</a>
+ or file://, although if file:// is used then the
+myIpAddress() function available inside the PAC file uses a less reliable
+method to determine the client&rsquo;s IP address (using a DNS lookup on the hostname
+instead of using the client IP from the socket connecting to the PAC file&rsquo;s
+http server). The url &amp; host parameters to the FindProxyForURL function in
+the PAC file are derived from the first command line <a href='http://'>http://</a>
+ parameter.
+ If a PAC file is successfully read, it must return a list of proxies or
+"DIRECT", otherwise it is a fatal error and no further PAC URLs are tried.
+ The default value of $PAC_URLS if it is not set is "auto; DIRECT". </dd>
+</dl>
+
+<h2><a name='sect4' href='#toc4'>Options</a></h2>
+First,
+note that unlike with wget, the ordering of options is significant with
+<b>pacwget</b>: only the options that come before a URL apply to that URL.  In
+this way, different options can be specified with multiple URLs in the
+same invocation of  <b>pacwget</b>. The same list of proxies are applied to each
+<a href='http://'>http://</a>
+ URL, although wget is invoked separately for each URL on the  <b>pacwget</b>
+command line. <p>
+There is one option added by <b>pacwget</b>: 
+<dl>
+
+<dt><b>--only-proxies</b> </dt>
+<dd>Print to
+stdout a $HTTP_PROXIES-like list of proxies (that is, semicolon separated
+and may end in "DIRECT") that would be used instead of downloading the
+given URL(s) with wget.  This is useful for downloading and parsing PAC
+files.  Requires one URL that starts with <a href='http://.'>http://.</a>
+ </dd>
+</dl>
+<p>
+All other options are
+passed to wget, but some cause additional action in  <b>pacwget</b> and are described
+here: 
+<dl>
+
+<dt><b>--connect-timeout=SECS</b> </dt>
+<dd>Sets the connection timeout to SECS seconds for
+retrieving both PAC URLs and target URLs.  If a proxy or a target server
+does not respond in that amount of time, the next one is tried.  Default
+5. </dd>
+
+<dt><b>--read-timeout=SECS</b> </dt>
+<dd>Sets the read timeout to SECS seconds for retrieving
+both PAC URLs and target URLs.  If no data is received from a proxy or a
+target server in that amount of time, the next one is tried.  Default 10.
+</dd>
+
+<dt><b>-T SECS</b> </dt>
+<dd>Sets both the connect and read timeout to SECS seconds. </dd>
+
+<dt><b>--tries=N</b> </dt>
+<dd>Try
+all wget connections for both PAC URLs and target URLs N times. Default
+1. </dd>
+
+<dt><b>--inet4-only or "-4"</b> </dt>
+<dd>Use only IPv4 addresses for both wget and for the myIpAddress()
+function in PAC files. </dd>
+
+<dt><b>--inet6-only or "-6"</b> </dt>
+<dd>Use only IPv6 addresses for both
+wget and for the myIpAddress() function in PAC files. </dd>
+
+<dt><b>--debug or "-d"</b> </dt>
+<dd>In addition
+to adding debug messages to all uses of wget, also enable debugging in
+PAC file parsing. </dd>
+
+<dt><b>--verbose or "-v"</b> </dt>
+<dd>This is the default for wget for the target
+URL, but if this is explicitly set then it is also used for PAC URLs.  In
+addition, if neither debug nor verbose is set, PAC URLs are retrieved with
+the wget --quiet option. </dd>
+</dl>
+
+<h2><a name='sect5' href='#toc5'>Examples</a></h2>
+<p>
+To retrieve target URL "<a href='http://www.google.com'>http://www.google.com</a>
+"
+using proxies defined in "<a href='http://wpad/wpad.dat'>http://wpad/wpad.dat</a>
+" and not allow direct connections:
+<p>
+$ export PAC_URLS=auto <br>
+$ pacwget <a href='http://www.google.com'>http://www.google.com</a>
+ <p>
+To try an additional WPAD server after the
+usual one and allow direct connections if that also doesn&rsquo;t work: <p>
+$ export
+PAC_URLS="auto; <a href='http://wpad.shared.domain/wpad.dat;'>http://wpad.shared.domain/wpad.dat;</a>
+ DIRECT" <br>
+$ pacwget <a href='http://www.google.com'>http://www.google.com</a>
+ <p>
+To directly set a list of possible proxies,
+with debugging enabled: <p>
+$ export HTTP_PROXIES="<a href='http://squid:3128;http://squid.friend.dom:3128'>http://squid:3128;http://squid.friend.dom:3128</a>
+"
+<br>
+$ pacwget -d <a href='http://www.google.com'>http://www.google.com</a>
+
+<h2><a name='sect6' href='#toc6'>See Also</a></h2>
+<a href='pacparse.1'>pacparse(1)</a>
+, <a href='pacparser_init.3'>pacparser_init(3)</a>
+
+
+<h2><a name='sect7' href='#toc7'>Bugs</a></h2>
+If you have come across a bug in pacwget, please submit a bug report
+at <a href='https://github.com/pacparser/pacparser/issues'>https://github.com/pacparser/pacparser/issues</a>
+
+<h2><a name='sect8' href='#toc8'>Author</a></h2>
+Written by Dave Dykstra.
+
+<h2><a name='sect9' href='#toc9'>Resources</a></h2>
+Homepage: <a href='https://github.com/pacparser/pacparser'>https://github.com/pacparser/pacparser</a>
+
+<p> <p>
+
+<hr><p>
+<a name='toc'><b>Table of Contents</b></a><p>
+<ul>
+<li><a name='toc0' href='#sect0'>Name</a></li>
+<li><a name='toc1' href='#sect1'>Synopsis</a></li>
+<li><a name='toc2' href='#sect2'>Description</a></li>
+<li><a name='toc3' href='#sect3'>Environment</a></li>
+<li><a name='toc4' href='#sect4'>Options</a></li>
+<li><a name='toc5' href='#sect5'>Examples</a></li>
+<li><a name='toc6' href='#sect6'>See Also</a></li>
+<li><a name='toc7' href='#sect7'>Bugs</a></li>
+<li><a name='toc8' href='#sect8'>Author</a></li>
+<li><a name='toc9' href='#sect9'>Resources</a></li>
+</ul>
+</body>
+</html>