Skip to content
/ td-bg Public

TdBG is a back-end service developed as part of True Dat project that supports the genearation of a Business Glossary

License

Notifications You must be signed in to change notification settings

Bluetab/td-bg

Repository files navigation

Truedat Business Glossary

TdBG is a back-end service developed as part of the Truedat project that supports the generation of a Business Glossary

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

Install dependencies with mix deps.get

To start your Phoenix server:

Installing

  • Create and migrate your database with mix ecto.create && mix ecto.migrate

  • Start Phoenix endpoint with mix phx.server

  • Now you can visit localhost:4002 from your browser.

Running the tests

Run all aplication tests with mix test

Environment variables

  • REDIS_AUDIT_STREAM_MAXLEN (Optional) Maximum length for Redis audit stream. Default: 100
  • REDIS_STREAM_MAXLEN (Optional) Maximum length for Redis stream. Default: 100

SSL Connection

  • DB_SSL: Boolean value to enable SSL configuration. Default is false.
  • DB_SSL_CACERTFILE: Path to the Certification Authority (CA) certificate file, e.g. /path/to/ca.crt.
  • DB_SSL_VERSION: Supported versions are tlsv1.2 and tlsv1.3. Default is tlsv1.2.
  • DB_SSL_CLIENT_CERT: Path to the client SSL certificate file.
  • DB_SSL_CLIENT_KEY: Path to the client SSL private key file.
  • DB_SSL_VERIFY: Specifies whether server certificates should be verified (true/false).

Elastic bulk page size configuration

  • BULK_PAGE_SIZE_CONCEPTS: default 500

Elastic aggregations

  • The aggregation variables are defined as follows: AGG_<AGGREGATION_NAME>_SIZE

ElasticSearch authentication

(Optional) Basic HTTP authentication

These environment variables will add the Authentication header on each request with value Basic <ES_USERNAME>:<ES_PASSWORD>

  • ES_USERNAME: Username
  • ES_PASSWORD: Password

(Optional) ApiKey authentication

This environment variables will add the Authentication header on each request with value ApiKey <ES_API_KEY>

  • ES_API_KEY: ApiKey

(Optional) HTTP SSL Configuration (Normally required for ApiKey authentication)

These environment variables will configure CA Certificates for HTTPS requests

  • ES_SSL: [true | false] required to activate following options
  • ES_SSL_CACERTFILE: (Optional) Indicate the cacert file path. If not set, a certfile will be automatically generated by :certifi.cacertfile()
  • ES_SSL_VERIFY: (Optional) [verify_peer | verify_none] defaults to verify_none

ElasticSearch Force Merge Configuration

These environment variables control the force merge operation for ElasticSearch indices, which optimizes index performance by merging segments.

  • ES_WAIT_FOR_COMPLETION:

    • Purpose: Controls whether the force merge operation should wait for completion before returning
    • Default: nil (no wait)
    • Usage: When set to true, the operation will wait until the force merge is complete before returning. When false or nil, the operation returns immediately and runs asynchronously
    • Performance: Setting to true ensures the operation is complete but may cause longer response times
  • ES_MAX_NUM_SEGMENTS:

    • Purpose: Specifies the maximum number of segments to merge down to
    • Default: 5
    • Usage: Controls how aggressively the force merge operation consolidates segments. Lower values result in fewer, larger segments
    • Performance: Fewer segments generally improve search performance but may increase memory usage during the merge operation

Oban configuration

  • OBAN_DB_SCHEMA: Purpose: Defines the database schema where Oban will create its tables Default value: "private" Usage: Configures the schema prefix for Oban tables (jobs, peers, etc.) Example: If set to "oban_schema", tables will be created in the schema oban_schema.jobs, oban_schema.peers, etc.

  • OBAN_CREATE_SCHEMA: Purpose: Controls whether Oban should automatically create the database schema Default value: "true" Usage: Determines if the Oban migration should create the schema specified in OBAN_DB_SCHEMA Valid values: "true": Automatically creates the schema "false": Does not create the schema (must exist beforehand)

Oban Cron Jobs Configuration

  • OUTDATED_EMBEDDINGS_CRON: Purpose: Defines the cron schedule for the OutdatedEmbeddings worker Default value: "0 */3 * * *" (every 3 hours) Usage: Controls when the system processes outdated embeddings for data structure versions Example: "0 2 * * *" for daily execution at 2 AM

  • EMBEDDINGS_DELETION_CRON: Purpose: Defines the cron schedule for the EmbeddingsDeletion worker Default value: "@hourly" Usage: Controls when the system performs cleanup of deleted embeddings Example: "0 */6 * * *" for execution every 6 hours

Oban Queue Configuration

  • OBAN_QUEUE_DEFAULT: Purpose: Sets the number of concurrent workers for the default queue Default value: "5" Usage: Controls the parallelism for general background jobs

  • OBAN_QUEUE_EMBEDDING_UPSERTS: Purpose: Sets the number of concurrent workers for embedding upsert operations Default value: "10" Usage: Controls the parallelism for creating and updating embeddings

  • OBAN_QUEUE_EMBEDDING_DELETION: Purpose: Sets the number of concurrent workers for embedding deletion operations Default value: "5" Usage: Controls the parallelism for embedding cleanup jobs

Embedding Management

  • LIMIT_OUTDATED_EMBEDDINGS:

    • Purpose: Controls the maximum number of data structure versions that can be processed in a single batch when updating outdated embeddings
    • Default: 50000
    • Usage: Used by the OutdatedEmbeddings worker (runs every 3 hours via cron) to limit the number of data structure versions processed when finding and updating missing or outdated record embeddings
    • Performance: Prevents memory issues and ensures system stability when processing large numbers of outdated embeddings
  • RECORD_EMBEDDINGS_BATCH_SIZE:

    • Purpose: Controls the batch size used when processing record embeddings for business concepts
    • Default: 100
    • Usage: Defines how many business concept IDs are processed together in each batch when generating or updating embeddings. Used by both synchronous and asynchronous embedding operations
    • Performance: Adjusting this value can help balance memory usage and processing efficiency when handling large numbers of embeddings
  • RECORD_EMBEDDINGS_DEFAULT_DELAY_MS:

    • Purpose: Controls the default delay in milliseconds between batches when processing record embeddings asynchronously
    • Default: 500
    • Usage: Defines the delay applied between consecutive batches of embedding upsert jobs. Used by the upsert_from_concepts_async/2 function to schedule jobs with a delay, preventing system overload when processing large numbers of embeddings
    • Performance: Adjusting this value can help control the rate of embedding processing and prevent overwhelming the system or external embedding services

Deployment

Ready to run in production? Please check our deployment guides.

Built With

  • Phoenix - Web framework
  • Ecto - Phoenix and Ecto integration
  • Postgrex - PostgreSQL driver for Elixir
  • Cowboy - HTTP server for Erlang/OTP
  • credo - Static code analysis tool for the Elixir language
  • guardian - Authentication library
  • canary - Elixir authorization and resource-loading library
  • canada - Permission definitions in Elixir apps
  • ex_machina - Create test data for Elixir applications

Authors

  • Bluetab Solutions Group, SL - Initial work - Bluetab

See also the list of contributors who participated in this project.

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

In order to use this software, it is necessary that, depending on the type of functionality that you want to obtain, it is assembled with other software whose license may be governed by other terms different than the GNU General Public License version 3 or later. In that case, it will be absolutely necessary that, in order to make a correct use of the software to be assembled, you give compliance with the rules of the concrete license (of Free Software or Open Source Software) of use in each case, as well as, where appropriate, obtaining of the permits that are necessary for these appropriate purposes.

About

TdBG is a back-end service developed as part of True Dat project that supports the genearation of a Business Glossary

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages