Document indexer service to generate inverted index mappings (document term matrices) for documents, such that the inverted index mappings can be utilized by search services for read optimization. Developed with Go / Gin, S3, DynamoDB.
The directory structure is as follows:
- README.md: Overview of the project and instructions for use.
- go.mod: Go module file specifying project dependencies.
- go.sum: Hashes of dependencies for module integrity.
- main.go: Entry point of the application.
- Contains code for interacting with S3 buckets where documents are stored.
- Configuration files for the application, including S3 and DynamoDB settings.
- Directory for storing local data related to the indexing process.
- Code for handling interactions with DynamoDB, which stores the inverted index.
- Core logic for indexing documents, creating the inverted index mappings.
- Data models defining the structure of documents, terms, and indexes.
- Contains code for handling object-level operations, likely related to S3.
- Defines API routes using Gin for exposing the indexing service.
- Utility functions and helpers for general use throughout the codebase.
The high level workflow of the document indexer can be found below. Similar services can be found here and below:
// Input
{
"tableName": "document-indexer-index-mapping",
"bucketName": "document-indexer-service-test-documents"
}
// Output
{
"Ok": true,
"Response": {
"\"Lorem": {
"documentIDs": [
"lorem_ipsum_3.json"
],
"documentTermFrequencies": [
1
],
"documentTermLocations": [
[
117
]
]
},
"\"de": {
"documentIDs": [
"lorem_ipsum_3.json"
],
"documentTermFrequencies": [
2
],
"documentTermLocations": [
[
79,
150
]
]
},
"'Content": {
"documentIDs": [
"lorem_ipsum_2.json"
],
"documentTermFrequencies": [
1
],
"documentTermLocations": [
[
44
]
]
},
"'lorem": {
"documentIDs": [
"lorem_ipsum_2.json"
],
"documentTermFrequencies": [
1
],
"documentTermLocations": [
[
75
]
]
},
"(The": {
"documentIDs": [
"lorem_ipsum_3.json"
],
"documentTermFrequencies": [
1
],
"documentTermLocations": [
[
84
]
]
},
"(injected": {
"documentIDs": [
"lorem_ipsum_2.json"
],
"documentTermFrequencies": [
1
],
"documentTermLocations": [
[
99
]
]
},
"1.10.32": {
"documentIDs": [
"lorem_ipsum_3.json"
],
"documentTermFrequencies": [
2
],
"documentTermLocations": [
[
75,
146
]
]
},
"1.10.32.": {
"documentIDs": [
"lorem_ipsum_3.json"
],
"documentTermFrequencies": [
1
],
"documentTermLocations": [
[
128
]
]
},
"1.10.33": {
"documentIDs": [
"lorem_ipsum_3.json"
],
"documentTermFrequencies": [
2
],
"documentTermLocations": [
[
77,
148
]
]
}
}
}