Truedat Business Glossary

TdBG is a back-end service developed as part of the Truedat project that supports the generation of a Business Glossary

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

Install dependencies with mix deps.get

To start your Phoenix server:

Installing

Create and migrate your database with mix ecto.create && mix ecto.migrate
Start Phoenix endpoint with mix phx.server
Now you can visit localhost:4002 from your browser.

Running the tests

Run all aplication tests with mix test

Environment variables

REDIS_AUDIT_STREAM_MAXLEN (Optional) Maximum length for Redis audit stream. Default: 100
REDIS_STREAM_MAXLEN (Optional) Maximum length for Redis stream. Default: 100

SSL Connection

DB_SSL: Boolean value to enable SSL configuration. Default is false.
DB_SSL_CACERTFILE: Path to the Certification Authority (CA) certificate file, e.g. /path/to/ca.crt.
DB_SSL_VERSION: Supported versions are tlsv1.2 and tlsv1.3. Default is tlsv1.2.
DB_SSL_CLIENT_CERT: Path to the client SSL certificate file.
DB_SSL_CLIENT_KEY: Path to the client SSL private key file.
DB_SSL_VERIFY: Specifies whether server certificates should be verified (true/false).

Elastic bulk page size configuration

BULK_PAGE_SIZE_CONCEPTS: default 500

Elastic aggregations

The aggregation variables are defined as follows: AGG_<AGGREGATION_NAME>_SIZE

ElasticSearch authentication

(Optional) Basic HTTP authentication

These environment variables will add the Authentication header on each request with value Basic <ES_USERNAME>:<ES_PASSWORD>

ES_USERNAME: Username
ES_PASSWORD: Password

(Optional) ApiKey authentication

This environment variables will add the Authentication header on each request with value ApiKey <ES_API_KEY>

ES_API_KEY: ApiKey

(Optional) HTTP SSL Configuration (Normally required for ApiKey authentication)

These environment variables will configure CA Certificates for HTTPS requests

ES_SSL: [true | false] required to activate following options
ES_SSL_CACERTFILE: (Optional) Indicate the cacert file path. If not set, a certfile will be automatically generated by :certifi.cacertfile()
ES_SSL_VERIFY: (Optional) [verify_peer | verify_none] defaults to verify_none

ElasticSearch Force Merge Configuration

These environment variables control the force merge operation for ElasticSearch indices, which optimizes index performance by merging segments.

ES_WAIT_FOR_COMPLETION:
- Purpose: Controls whether the force merge operation should wait for completion before returning
- Default: nil (no wait)
- Usage: When set to true, the operation will wait until the force merge is complete before returning. When false or nil, the operation returns immediately and runs asynchronously
- Performance: Setting to true ensures the operation is complete but may cause longer response times
ES_MAX_NUM_SEGMENTS:
- Purpose: Specifies the maximum number of segments to merge down to
- Default: 5
- Usage: Controls how aggressively the force merge operation consolidates segments. Lower values result in fewer, larger segments
- Performance: Fewer segments generally improve search performance but may increase memory usage during the merge operation

Oban configuration

OBAN_DB_SCHEMA: Purpose: Defines the database schema where Oban will create its tables Default value: "private" Usage: Configures the schema prefix for Oban tables (jobs, peers, etc.) Example: If set to "oban_schema", tables will be created in the schema oban_schema.jobs, oban_schema.peers, etc.
OBAN_CREATE_SCHEMA: Purpose: Controls whether Oban should automatically create the database schema Default value: "true" Usage: Determines if the Oban migration should create the schema specified in OBAN_DB_SCHEMA Valid values: "true": Automatically creates the schema "false": Does not create the schema (must exist beforehand)

Oban Cron Jobs Configuration

OUTDATED_EMBEDDINGS_CRON: Purpose: Defines the cron schedule for the OutdatedEmbeddings worker Default value: "0 */3 * * *" (every 3 hours) Usage: Controls when the system processes outdated embeddings for data structure versions Example: "0 2 * * *" for daily execution at 2 AM
EMBEDDINGS_DELETION_CRON: Purpose: Defines the cron schedule for the EmbeddingsDeletion worker Default value: "@hourly" Usage: Controls when the system performs cleanup of deleted embeddings Example: "0 */6 * * *" for execution every 6 hours

Oban Queue Configuration

OBAN_QUEUE_DEFAULT: Purpose: Sets the number of concurrent workers for the default queue Default value: "5" Usage: Controls the parallelism for general background jobs
OBAN_QUEUE_EMBEDDING_UPSERTS: Purpose: Sets the number of concurrent workers for embedding upsert operations Default value: "10" Usage: Controls the parallelism for creating and updating embeddings
OBAN_QUEUE_EMBEDDING_DELETION: Purpose: Sets the number of concurrent workers for embedding deletion operations Default value: "5" Usage: Controls the parallelism for embedding cleanup jobs

Embedding Management

LIMIT_OUTDATED_EMBEDDINGS:
- Purpose: Controls the maximum number of data structure versions that can be processed in a single batch when updating outdated embeddings
- Default: 50000
- Usage: Used by the OutdatedEmbeddings worker (runs every 3 hours via cron) to limit the number of data structure versions processed when finding and updating missing or outdated record embeddings
- Performance: Prevents memory issues and ensures system stability when processing large numbers of outdated embeddings
RECORD_EMBEDDINGS_BATCH_SIZE:
- Purpose: Controls the batch size used when processing record embeddings for business concepts
- Default: 100
- Usage: Defines how many business concept IDs are processed together in each batch when generating or updating embeddings. Used by both synchronous and asynchronous embedding operations
- Performance: Adjusting this value can help balance memory usage and processing efficiency when handling large numbers of embeddings
RECORD_EMBEDDINGS_DEFAULT_DELAY_MS:
- Purpose: Controls the default delay in milliseconds between batches when processing record embeddings asynchronously
- Default: 500
- Usage: Defines the delay applied between consecutive batches of embedding upsert jobs. Used by the upsert_from_concepts_async/2 function to schedule jobs with a delay, preventing system overload when processing large numbers of embeddings
- Performance: Adjusting this value can help control the rate of embedding processing and prevent overwhelming the system or external embedding services

Deployment

Ready to run in production? Please check our deployment guides.

Built With

Phoenix - Web framework
Ecto - Phoenix and Ecto integration
Postgrex - PostgreSQL driver for Elixir
Cowboy - HTTP server for Erlang/OTP
credo - Static code analysis tool for the Elixir language
guardian - Authentication library
canary - Elixir authorization and resource-loading library
canada - Permission definitions in Elixir apps
ex_machina - Create test data for Elixir applications

Authors

Bluetab Solutions Group, SL - Initial work - Bluetab

See also the list of contributors who participated in this project.

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

In order to use this software, it is necessary that, depending on the type of functionality that you want to obtain, it is assembled with other software whose license may be governed by other terms different than the GNU General Public License version 3 or later. In that case, it will be absolutely necessary that, in order to make a correct use of the software to be assembled, you give compliance with the rules of the concrete license (of Free Software or Open Source Software) of use in each case, as well as, where appropriate, obtaining of the permits that are necessary for these appropriate purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 2,315 Commits
ci		ci
config		config
lib		lib
priv/repo		priv/repo
rel		rel
test		test
.credo.exs		.credo.exs
.formatter.exs		.formatter.exs
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.whitesource		.whitesource
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock
td-bg.sh		td-bg.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Truedat Business Glossary

Getting Started

Prerequisites

Installing

Running the tests

Environment variables

SSL Connection

Elastic bulk page size configuration

Elastic aggregations

ElasticSearch authentication

(Optional) Basic HTTP authentication

(Optional) ApiKey authentication

(Optional) HTTP SSL Configuration (Normally required for ApiKey authentication)

ElasticSearch Force Merge Configuration

Oban configuration

Oban Cron Jobs Configuration

Oban Queue Configuration

Embedding Management

Deployment

Built With

Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

Bluetab/td-bg

Folders and files

Latest commit

History

Repository files navigation

Truedat Business Glossary

Getting Started

Prerequisites

Installing

Running the tests

Environment variables

SSL Connection

Elastic bulk page size configuration

Elastic aggregations

ElasticSearch authentication

(Optional) Basic HTTP authentication

(Optional) ApiKey authentication

(Optional) HTTP SSL Configuration (Normally required for ApiKey authentication)

ElasticSearch Force Merge Configuration

Oban configuration

Oban Cron Jobs Configuration

Oban Queue Configuration

Embedding Management

Deployment

Built With

Authors

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages