Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* text=auto eol=lf
4 changes: 2 additions & 2 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: 3.0
ruby-version: 3.2
bundler-cache: true # runs 'bundle install' and caches installed gems automatically
- name: Code Style Check
run: bundle exec standardrb --no-fix
run: bundle exec standardrb --no-fix
15 changes: 2 additions & 13 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
ruby-version: ['3.0']

services:
redis:
image: redis:4.0.0-alpine
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 6379:6379
ruby-version: ['3.2']

steps:
- uses: actions/checkout@v4
Expand All @@ -36,4 +25,4 @@ jobs:
ruby-version: ${{ matrix.ruby-version }}
bundler-cache: true # runs 'bundle install' and caches installed gems automatically
- name: Run tests
run: bundle exec rake
run: bundle exec rake
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@
/pkg/
/spec/reports/
/tmp/
/db/test.sqlite3*
*.gem
2 changes: 1 addition & 1 deletion .ruby-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.0.6
3.2.8
35 changes: 35 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [2.0.0] - Unreleased

This is a major rewrite of Æternitas with the primary goals of removing the Redis and Sidekiq dependencies and simplifying the core functionality.

### Changed
- Updated Core Dependencies: The gem now requires Ruby 3.1+ and ActiveRecord/ActiveJob `>= 7.0`.
- Complete Backend Overhaul: Æternitas no longer depends on Redis or Sidekiq. It now uses a pure ActiveRecord and ActiveJob backend.
- Job Uniqueness: Replaced the `sidekiq-unique-jobs` dependency with a built-in, database-backed uniqueness mechanism (`Aeternitas::UniqueJobLock`) to ensure only one `PollJob` per pollable can be enqueued at a time.
- Job Processing: Replaced Sidekiq-specific workers with a backend-agnostic `Aeternitas::PollJob` that works with any ActiveJob adapter (e.g., SolidQueue, GoodJob).
- Locking Mechanism: Replaced the Redis-backed Guard with a robust, database-backed distributed lock (`Aeternitas::GuardLock`) using pessimistic locking to prevent race conditions.
- Metrics System: Replaced the complex, Redis-based `tabstabs` metrics with a simple, database-backed system (`Aeternitas::Metric`). Metrics are now disabled by default.
- Default Source Storage Path: The default directory for the file storage adapter was changed to `storage/aeternitas/` within a Rails application.

### Added
- Thundering Herd Prevention: The `PollJob` now intelligently staggers retries when a `GuardIsLocked` error occurs, preventing many jobs from retrying simultaneously and overwhelming a resource.
- Configurable Metrics: Added `Aeternitas.config.metrics_enabled` and `Aeternitas.config.metric_retention_period` to give users control over metrics collection and data retention.
- Built-in Maintenance Jobs: Added `Aeternitas::CleanupStaleLocksJob` and `Aeternitas::CleanupOldMetricsJob` to provide a clear, easy way to schedule necessary database cleanup.

### Removed
- Removed direct gem dependencies on `sidekiq`, `sidekiq-unique-jobs`, `redis`, `connection_pool`, and `tabstabs`.
- Removed the Sidekiq-specific middleware for handling `GuardIsLocked` errors.
- Removed the complex, multi-resolution time-series logic from the metrics system.

## [0.2.0 and older] - See legacy repository
- Initial versions of Æternitas: https://github.com/FHG-IMW/aeternitas
- Relied on a Sidekiq and Redis backend for job processing, locking, and metrics.
216 changes: 212 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,218 @@
# Æternitas - Version 2
# Æternitas

This is going to become version 2 of aeternitas. The goals are:
[![Tests](https://github.com/Dietech-Group/aeternitas/actions/workflows/tests.yml/badge.svg)](https://github.com/Dietech-Group/aeternitas/actions/workflows/tests.yml)
[![Lint](https://github.com/Dietech-Group/aeternitas/actions/workflows/lint.yml/badge.svg)](https://github.com/Dietech-Group/aeternitas/actions/workflows/lint.yml)

1. Remove dependency on Sidekiq and Redis
2. Reduce functionality to a core which allows easier usage
A Ruby gem for continuous source retrieval and data integration.

Æternitas provides means to regularly "poll" resources (i.e. a website, twitter feed or API) and to permanently store the retrieved results.
By default, it avoids putting too much load on external servers and stores raw results as compressed files on disk.
Aeternitas can be configured to a wide variety of polling strategies (e.g. frequencies, cooldown periods, error handling, deactivation on failure).

Æternitas is meant to be included in a Rails application and uses a pure ActiveJob and ActiveRecord backend.
All metadata, locks, and metrics are stored in your application's database, while raw source data is stored as compressed files on disk by default.

## Installation

Add this line to your application's Gemfile:

```ruby
gem 'aeternitas'
```

And then execute:

```bash
$ bundle install
$ rails generate aeternitas:install
$ rails db:migrate
```

This will install the gem, generate the necessary database tables, and create a configuration initializer.

### Maintenance

Æternitas creates lock and metric records in your database. To prevent this data from growing indefinitely, you should schedule periodic cleanup jobs. The two key maintenance jobs are:

- **`Aeternitas::CleanupStaleLocksJob`**: Removes old, expired lock records from crashed workers.
- **`Aeternitas::CleanupOldMetricsJob`**: Prunes metric data older than the configured `metric_retention_period`.

You should schedule these jobs to run periodically (e.g. weekly).

## Quickstart

Let's say you want to monitor several websites for the usage of a keyword, e.g. 'aeternitas'. First, create your model:

```bash
$ rails generate model Website url:string keyword_count:integer
```

Then, include `Aeternitas::Pollable` in your model and define your polling logic.

```ruby
class Website < ApplicationRecord
include Aeternitas::Pollable

polling_options do
polling_frequency :weekly
end

def poll
page_content = Net::HTTP.get(URI.parse(self.url))
add_source(page_content) # Store the retrieved page content permanently
count = page_content.scan('aeternitas').size
update(keyword_count: count)
end
end
```

The `poll` method is called each time Æternitas processes the job for this resource. In our example, this would be once a week.

To start the polling process, you need to regularly run `Aeternitas.enqueue_due_pollables` and have an ActiveJob backend (like SolidQueue, GoodJob, etc.) running to process the jobs.

In most cases it makes sense to store polling results as sources to allow further work to be done in separate jobs. In above example we already added the `page_content` as a source to the website with `add_source`.
Aeternitas only stores a new source if the source's fingerprint (MD5 Hash of the content) does not exist yet. If we wanted to process the word count in a separate job the following implementation would allow to do so:

```ruby
# app/models/website.rb
class Website < ApplicationRecord
include Aeternitas::Pollable

polling_options do
polling_frequency :weekly
end

def poll
page_content = Net::HTTP.get(URI.parse(self.url))
new_source = add_source(page_content) # returns nil if source already exists
CountKeywordJob.perform_later(new_source.id) if new_source
end
end

# app/jobs/count_keyword_job.rb
class CountKeywordJob < ApplicationJob
queue_as :default

def perform(source_id)
source = Aeternitas::Source.find(source_id)
page_content = source.raw_content
keyword_count = page_content.scan('aeternitas').size
website = source.pollable
website.update(keyword_count: keyword_count)
end
end
```

## Configuration

### Global Configuration

Global settings can be configured in `config/initializers/aeternitas.rb`.

#### Metrics
You can enable or disable metrics collection. By default, metrics are disabled.

```ruby
Aeternitas.configure do |config|
# Set to true to enable logging metrics to the database.
config.metrics_enabled = true

# Configure how long to keep metric data.
config.metric_retention_period = 180.days
end
```

#### Storage Adapter
By default, Æternitas stores source files as compressed files on disk. You can change this behavior by implementing a custom storage adapter. For an example you can have a look at `Aeternitas::StorageAdapter::File`.

```ruby
Aeternitas.configure do |config|
# To change the storage directory for the default File adapter:
config.storage_adapter_config = {
directory: File.join(Rails.root, 'public', 'sources')
}

# To use a custom adapter:
config.storage_adapter = Aeternitas::StorageAdapter::MyCustomAdapter
end
```

### Pollable Configuration

Pollables can be configured on a per-model basis using the `polling_options` block.

#### polling_frequency
_Default: :daily_

This option controls how often a pollable is polled and can be configured in two different ways.
Either use one of the presets specified in `Aeternitas::PollingFrequency` by specifying the presets name as a symbol:

```ruby
polling_options do
polling_frequency :weekly
end
```

Or, if you want to specify a more complex polling schema you can do so by using a custom lambda for dynamic frequency:

```ruby
polling_options do
# set frequency depending elements age (+ 1 month for every 3 months)
polling_frequency ->(context) { 1.month.from_now + (Time.now - context.created_at).to_i / 3.months * 1.month }
end
```

#### before_polling / after_polling
_Default: []_

Specify methods to run before each poll or after each successful poll. You can either specify a method name or a lambda:

```ruby
polling_options do
before_polling :log_start
after_polling ->(pollable) { puts "Finished polling #{pollable.id}" }
end
```

#### deactivate_on / ignore_error
_Default: []_

Define custom error handling rules.

`deactivate_on` will stop polling a resource permanently if a specified error occurs. This can be useful if the error implied that the resource does not exist anymore.

`ignore_error` will wrap the error within `Aeternitas::Errors::Ignored` which is then raised instead. This can be useful for filtering in exception tracking services like Airbrake.

```ruby
polling_options do
deactivate_on Twitter::Error::NotFound
ignore_error Twitter::Error::ServiceUnavailable
end
```

#### sleep_on_guard_locked
_Default: false_

Controls behavior when a guard lock cannot be acquired.
- **`false`:** The job will be retried with a smart, staggered backoff delay to prevent a "thundering herd." This is the recommended and most scalable option.
- **`true`:** The job will cause the ActiveJob worker thread to `sleep` until the lock is expected to be free, blocking that thread from processing other jobs. This is an aggressive strategy and should *only* be used in specific cases where you intend to pause a dedicated worker.

#### queue
_Default: 'polling'_

This option specifies the ActiveJob queue for the poll job.

#### guard_key
_Default: obj.class.name.to_s_

Defines the key used for resource locking. By default, all instances of a model share the same lock. The default is to lock on pollable class level, but you can also provide a block for more granular locking (e.g. per-instance or per-API-host):

```ruby
polling_options do
# Lock based on the instance's URL host
guard_key ->(website) { URI.parse(website.url).host }
end
```

## Development

Expand Down
18 changes: 6 additions & 12 deletions aeternitas.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ require "aeternitas/version"
Gem::Specification.new do |spec|
spec.name = "aeternitas"
spec.version = Aeternitas::VERSION
spec.authors = ["Michael Prilop", "Max Kießling"]
spec.authors = ["Michael Prilop", "Max Kießling", "Louis Franzke"]
spec.email = ["[email protected]"]

spec.summary = "æternitas - version 2"
Expand All @@ -20,20 +20,14 @@ Gem::Specification.new do |spec|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
spec.require_paths = ["lib"]

spec.add_dependency "activerecord", ">= 6.1"
spec.add_dependency "redis"
spec.add_dependency "connection_pool"
spec.add_dependency "activerecord", ">= 7.0"
spec.add_dependency "activejob", ">= 7.0"
spec.add_dependency "aasm"
spec.add_dependency "sidekiq", "> 4", "<= 5.2.7"
spec.add_dependency "sidekiq-unique-jobs", "~> 5.0"
spec.add_dependency "tabstabs"

spec.add_development_dependency "bundler"
spec.add_development_dependency "rake"
spec.add_development_dependency "rspec", "~> 3.0"
spec.add_development_dependency "sqlite3", "~> 1.4"
spec.add_development_dependency "database_cleaner", "~> 1.5"
spec.add_development_dependency "rspec-sidekiq", "~> 3.1"
spec.add_development_dependency "mock_redis"
spec.add_development_dependency "rspec-rails", "~> 7.0"
spec.add_development_dependency "sqlite3", "~> 2.1"
spec.add_development_dependency "database_cleaner", "~> 2.0"
spec.add_development_dependency "standard"
end
Loading