Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr #5

Open
MarcusSorealheis opened this issue Apr 30, 2020 · 1 comment
Open

Solr #5

MarcusSorealheis opened this issue Apr 30, 2020 · 1 comment

Comments

@MarcusSorealheis
Copy link
Contributor

Hello there,

I like this project and have read about it in a few papers. Could you kindly share any tips you might have around querying Solr?

I will work on it, but asking here in case this has been discussed or my efforts can be streamlined in any way.

@mnmami
Copy link
Collaborator

mnmami commented Apr 30, 2020

Hi Marcus,

I'm very glad that you want to add support for Solr, and I'm ready to support you in doing so.

Luckily, there is a Solr connector for Spark that you could use off-the-shelf. To allow Squerall to connect to it, just add a case for it in SparkExecutor.scala here [1] with Solr connector code [2].

Like so:

case "solr" => df = spark.read.format("solr").options(options).load

Then in config file [3], add a JSON object to specify Solr options, for example:

       {
		"type": "solr",
		"options": {
			"collection": "abc",
			"zkhost": "xyz"
		},
		"source": "//Entity",
		"entity": "Entity"
	}

Then in mappings file [4], map that Solr entity to ontology class and properties, for example:

<#EntityMapping>
	rml:logicalSource [
		rml:source "//Entity";
		nosql:store nosql:solr
	];
	rr:subjectMap [
		rr:template "http://example.com/{nr}";
		rr:class bsbm:Producer
	];

	rr:predicateObjectMap [
		rr:predicate edm:country;
		rr:objectMap [rml:reference "country"]
	];

Note the //Entity should be the same between config and mappings file.

Please, try and let me know. If that doesn't work for you, share with me your files I look with you.

Note: Clone the develop branch instead of master, as that has some advanced logging messages. The command is very simple [5].

[1] https://github.com/EIS-Bonn/Squerall/blob/master/src/main/scala/org/squerall/SparkExecutor.scala#L78
[2] https://github.com/LucidWorks/spark-solr#via-dataframe.
[3] https://github.com/EIS-Bonn/Squerall/blob/master/evaluation/input_files/config
[4] https://github.com/EIS-Bonn/Squerall/blob/master/evaluation/input_files/mappings.ttl
[5] https://stackoverflow.com/a/1911126/1730115

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants