Skip to content

Commit

Permalink
ssl anomaly in pubmed dump
Browse files Browse the repository at this point in the history
  • Loading branch information
lindenb committed Feb 4, 2020
1 parent e6c7fc8 commit 5214820
Show file tree
Hide file tree
Showing 6 changed files with 128 additions and 83 deletions.
43 changes: 43 additions & 0 deletions docs/Pubmed404.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,11 @@ $ ./gradlew pubmed404

The java jar file will be installed in the `dist` directory.


## Creation Date

20181210

## Source code

[https://github.com/lindenb/jvarkit/tree/master/src/main/java/com/github/lindenb/jvarkit/tools/pubmed/Pubmed404.java](https://github.com/lindenb/jvarkit/tree/master/src/main/java/com/github/lindenb/jvarkit/tools/pubmed/Pubmed404.java)
Expand All @@ -76,3 +81,41 @@ The current reference is:
> Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare.
> [http://dx.doi.org/10.6084/m9.figshare.1425030](http://dx.doi.org/10.6084/m9.figshare.1425030)

## Example

```
$ java -jar dist/pubmeddump.jar 'bioinformatics 2001' 2> /dev/null |\
java -jar dist/pubmed404.jar 2> /dev/null
#PMID TITLE YEAR URL Status
29520589 Expression of Colocasia esculenta tuber agglutinin in Indian mustard provides resistance against Lipaphis erysimi and the expressed protein is non-allergenic.2018 http://www.fao.org/docrep/007/y0820e/y0820e00.HTM 200
29520589 Expression of Colocasia esculenta tuber agglutinin in Indian mustard provides resistance against Lipaphis erysimi and the expressed protein is non-allergenic.2018 http://www.icmr.nic.in/guide/Guidelines%20for%20Genetically%20Engineered%20Plants.pdf -1
28482857 Horizontal gene transfer is not a hallmark of the human genome. 2017 https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0607-3 200
27899642 The UCSC Genome Browser database: 2017 update. 2017 http://genome.ucsc.edu/ 200
27797935 High hospital research participation and improved colorectal cancer survival outcomes: a population-based study. 2017 http://www.bmj.com/company/products-services/rights-and-licensing/ 403
25505092 NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. 2015 http://pine.nmrfam.wisc.edu/download_packages.html 200
25505092 NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. 2015 http://www.nmrfam.wisc.edu/nmrfam-sparky-distribution.htm 200
25428374 The UCSC Genome Browser database: 2015 update. 2015 http://genome.ucsc.edu 200
26356339 A Simple but Powerful Heuristic Method for Accelerating k-Means Clustering of Large-Scale Data in Life Science. null http://mlab.cb.k.u-tokyo.ac.jp/~ichikawa/boostKCP/ 200
24794704 Usefulness of the Shock Index as a secondary triage tool. 2015 http://group.bmj.com/group/rights-licensing/permissions 403
24225322 Progenetix: 12 years of oncogenomic data curation. 2014 http://www.progenetix.org 200
24137000 Updates of the HbVar database of human hemoglobin variants and thalassemia mutations. 2014 http://globin.bx.psu.edu/hbvar 200
24137000 Updates of the HbVar database of human hemoglobin variants and thalassemia mutations. 2014 http://www.findbase.org 200
24137000 Updates of the HbVar database of human hemoglobin variants and thalassemia mutations. 2014 http://www.lovd.nl 200
23564938 DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. 2013 http://dambe.bio.uottawa.ca 200
22689647 SIFT web server: predicting effects of amino acid substitutions on proteins. 2012 http://sift-dna.org 200
22600740 Cyber-T web server: differential analysis of high-throughput data. 2012 http://cybert.ics.uci.edu/ 200
21742331 An open source lower limb model: Hip joint validation. 2011 https://simtk.org/home/low_limb_london 200
21593132 Java bioinformatics analysis web services for multiple sequence alignment--JABAWS:MSA. 2011 http://www.compbio.dundee.ac.uk/jabaws 200
20228129 DensiTree: making sense of sets of phylogenetic trees. 2010 http://compevol.auckland.ac.nz/software/DensiTree/ 404
19380317 CELLULAR OPEN RESOURCE (COR): current status and future directions. 2009 http://www.cellml.org/specifications/ 200
18948284 OperonDB: a comprehensive database of predicted operons in microbial genomes. 2009 http://operondb.cbcb.umd.edu 200
18368364 Simulator for neural networks and action potentials. 2007 http://snnap.uth.tmc.edu -1
18367465 An improved general amino acid replacement matrix. 2008 http://atgc.lirmm.fr/LG 404
18238804 Interoperability with Moby 1.0--it's better than sharing your toothbrush! 2008 http://www.biomoby.org/ 200
18174178 PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. 2008 http://www.ibi.vu.nl/programs/pralinewww 200
17221864 HbVar database of human hemoglobin variants and thalassemia mutations: 2007 update. 2007 http://globin.bx.psu.edu/hbvar 200
17221864 HbVar database of human hemoglobin variants and thalassemia mutations: 2007 update. 2007 http://www.goldenhelix.org/xprbase 403
(...)
```
8 changes: 8 additions & 0 deletions docs/PubmedDump.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ Usage: pubmeddump [options] Files
java property file ${HOME}/.ncbi.properties and key api_key
-o, --output
Output file. Optional . Default: stdout
-r, --retmax
value for 'retmax' parameter for Eutils.
Default: 10000
-skip, --skip
[20180302] Optional set of elements names to be ignored in the output.
Spaces or comma separated. .eg: 'AuthorList PubmedData '
Expand Down Expand Up @@ -64,6 +67,11 @@ $ ./gradlew pubmeddump

The java jar file will be installed in the `dist` directory.


## Creation Date

20140805

## Source code

[https://github.com/lindenb/jvarkit/tree/master/src/main/java/com/github/lindenb/jvarkit/tools/pubmed/PubmedDump.java](https://github.com/lindenb/jvarkit/tree/master/src/main/java/com/github/lindenb/jvarkit/tools/pubmed/PubmedDump.java)
Expand Down
4 changes: 3 additions & 1 deletion src/main/java/com/github/lindenb/jvarkit/io/IOUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -259,8 +259,10 @@ public static byte[] gzipString(final String s) {
}
}

public static boolean isRemoteURI(String uri)
public static boolean isRemoteURI(final String uri)
{
if(uri==null) return false;
if(!IOUtil.isUrl(uri)) return false;
return uri.startsWith("http://") ||
uri.startsWith("https://") ||
uri.startsWith("ftp://")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ of this software and associated documentation files (the "Software"), to deal
package com.github.lindenb.jvarkit.tools.pubmed;

import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.PrintWriter;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;
Expand Down Expand Up @@ -62,6 +62,7 @@ of this software and associated documentation files (the "Software"), to deal
import htsjdk.samtools.util.IOUtil;

/**
BEGIN_DOC
## Example
Expand Down Expand Up @@ -100,16 +101,18 @@ of this software and associated documentation files (the "Software"), to deal
17221864 HbVar database of human hemoglobin variants and thalassemia mutations: 2007 update. 2007 http://www.goldenhelix.org/xprbase 403
(...)
```
END_DOC
*/
@Program(name="pubmed404",
description="Test if URL in the pubmed abstracts are reacheable.",
keywords={"pubmed","url"}
keywords={"pubmed","url"},
creationDate="20181210",
modificationDate="20200204"
)
public class Pubmed404 extends Launcher{
private static final Logger LOG = Logger.build(Pubmed404.class).make();
@Parameter(names={"-o","--output"},description=OPT_OUPUT_FILE_OR_STDOUT)
private File outFile=null;
private Path outFile=null;
@Parameter(names={"-t","--timeout"},description="timeout in seconds")
private int timeoutSeconds = 5;
@Parameter(names={"-c","--collapse"},description="Only one URL per article. Print the '200/OK' first.")
Expand Down Expand Up @@ -223,7 +226,7 @@ else if(eltName.equals(rootName))
token=token.substring(0,token.length()-1);
}
if(token.isEmpty()) continue;
if(!IOUtil.isUrl(token)) {
if(!IOUtils.isRemoteURI(token)) {
if(token.startsWith("http")) LOG.debug("strange url: "+token);
continue;
}
Expand Down Expand Up @@ -273,8 +276,6 @@ public int doWork(final List<String> args) {
InputStream in=null;
try {
/** create http client */


this.httpClient = HttpClients.createSystem();//createDefault();


Expand All @@ -290,7 +291,7 @@ public Object resolveEntity(String publicID, String systemID, String baseURI, St
in=(inputName==null?stdin():IOUtils.openURIForReading(inputName));
r = xmlInputFactory.createXMLEventReader(in);

out = super.openFileOrStdoutAsPrintWriter(this.outFile);
out = super.openPathOrStdoutAsPrintWriter(this.outFile);
out.println("#PMID\tTITLE\tYEAR\tURL\thttp.code\thttp.reason");

while(r.hasNext()) {
Expand Down Expand Up @@ -320,8 +321,7 @@ public Object resolveEntity(String publicID, String systemID, String baseURI, St

}

public static void main(final String[] args)
{
public static void main(final String[] args) {
new Pubmed404().instanceMainWithExit(args);
}
}
Loading

0 comments on commit 5214820

Please sign in to comment.