Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get Python code for Wikidata query to get source code repos of software used by a paper identified by uppercase-normalized DOI #1

Open
Daniel-Mietchen opened this issue Oct 25, 2023 · 6 comments

Comments

@Daniel-Mietchen
Copy link
Collaborator

What we would like is to provide a DOI (which has to be uppercase-normalized, by Wikidata convention) and then find the source code repos for any software that Wikidata knows as having been used (via P4510) in the paper identified by the provided DOI.

@Daniel-Mietchen
Copy link
Collaborator Author

Here is a Wikidata query for that:

#title: Source code repos of software used by a paper identified by uppercase-normalized DOI
SELECT ?paper ?repo WHERE {
  VALUES ?doi { "10.1371/JOURNAL.PONE.0134894"}
  ?paper wdt:P356 ?doi ;
         wdt:P4510 ?software .
  ?software wdt:P1324 ?repo .
}

It can be run directly on the Wikidata SPARQL endpoint or via the following Python snippet:

# pip install sparqlwrapper
# https://rdflib.github.io/sparqlwrapper/

import sys
from SPARQLWrapper import SPARQLWrapper, JSON

endpoint_url = "https://query.wikidata.org/sparql"

query = """#title: Source code repos of software used by a paper identified by uppercase-normalized DOI
SELECT ?paper ?repo WHERE {
  VALUES ?doi { "10.1371/JOURNAL.PONE.0134894"}
  ?paper wdt:P356 ?doi ;
         wdt:P4510 ?software .
  ?software wdt:P1324 ?repo .
}"""


def get_results(endpoint_url, query):
    user_agent = "WDQS-example Python/%s.%s" % (sys.version_info[0], sys.version_info[1])
    # TODO adjust user agent; see https://w.wiki/CX6
    sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    return sparql.query().convert()


results = get_results(endpoint_url, query)

for result in results["results"]["bindings"]:
    print(result)

@Daniel-Mietchen
Copy link
Collaborator Author

Here is a query that gives the list of source code repos for which Wikidata has information about papers using the corresponding software.

@Daniel-Mietchen
Copy link
Collaborator Author

Here is a list of upper-case normalized DOIs for which Wikidata knows at least some software that (1) has been used in the corresponding paper and (2) has its source code repo indicated in Wikidata

@Daniel-Mietchen
Copy link
Collaborator Author

Here is a list of upper-case normalized DOIs for which Wikidata knows at least some software that (1) has been used in the corresponding paper and (2) has its CRAN repo indicated in Wikidata.

@Daniel-Mietchen
Copy link
Collaborator Author

Here is a list of CRAN packages sorted by number of papers for which Wikidata knows at least one paper having used the package.

@Daniel-Mietchen
Copy link
Collaborator Author

Here is a list of Bioconductor packages sorted by number of papers for which Wikidata knows at least one paper having used the package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant