Skip to content

Arc 26.02.1

Choose a tag to compare

@github-actions github-actions released this 01 Feb 13:23
· 224 commits to main since this release
a97d16a

Arc 26.02.1 Release Summary

Quick Start

Docker

docker run -d \
  -p 8000:8000 \
  -v arc-data:/app/data \
  ghcr.io/basekick-labs/arc:26.02.1

Debian/Ubuntu (amd64, arm64)

# Download and install
wget https://github.com/basekick-labs/arc/releases/download/v26.02.1/arc_26.02.1_amd64.deb
sudo dpkg -i arc_26.02.1_amd64.deb

# Start and enable
sudo systemctl enable arc
sudo systemctl start arc

# Check status
curl http://localhost:8000/health

RHEL/Fedora/Rocky (x86_64, aarch64)

# Download and install
wget https://github.com/basekick-labs/arc/releases/download/v26.02.1/arc-26.02.1-1.x86_64.rpm
sudo rpm -i arc-26.02.1-1.x86_64.rpm

# Start and enable
sudo systemctl enable arc
sudo systemctl start arc

Kubernetes (Helm)

helm install arc https://github.com/basekick-labs/arc/releases/download/v26.02.1/arc-26.02.1.tgz

New Features

InfluxDB Client Compatibility

Arc's Line Protocol endpoints now use the same paths as InfluxDB, enabling drop-in compatibility with all official InfluxDB client libraries (Go, Python, JavaScript, Java, C#, PHP, Ruby, Telegraf, Node-RED). Point your existing InfluxDB clients at Arc - no code changes required.

MQTT Ingestion Support

Native MQTT subscription for IoT and edge data ingestion. Subscribe to MQTT topics with wildcard support, dynamic subscription management via REST API, TLS/SSL connections, auto-reconnect, and per-subscription monitoring. Passwords encrypted at rest, subscriptions auto-start on server restart.

S3 File Caching (Optional)

In-memory caching of S3 Parquet files via DuckDB's cache_httpfs extension. Improves query performance 5-10x for workloads with repeated file access (CTEs, subqueries, Grafana dashboards). Opt-in feature, disabled by default.

Contributed by @khalid244

Relative Time Expression Support

Queries using NOW() - INTERVAL now benefit from partition pruning. Previously only literal timestamps worked. Now expressions like time > NOW() - INTERVAL '20 days' properly prune to relevant partitions, dramatically reducing query times.

Bug Fixes

  • Control characters in measurement names - Fixed S3 failures caused by invalid characters in measurement names
  • Missing S3 partitions - Queries no longer fail when time range includes non-existent partitions (day-level file verification contributed by @khalid244)
  • Server timeout config ignored - Now respects configured read/write timeout values
  • Large payload rejection - Fixed 413 errors on payloads >4MB
  • Timestamp timezone inconsistency - All timestamps now normalized to UTC
  • Azure SSL errors on Linux - Fixed certificate validation issues (contributed by @schotime)
  • Compaction filename timezones - Files now use UTC consistently (contributed by @schotime)
  • S3 subprocess config - Fixed compaction failures on S3-compatible storage (Hetzner, MinIO)
  • Non-UTF8 data - Invalid UTF-8 automatically sanitized during ingestion
  • Nanosecond timestamps - MessagePack now correctly handles nanosecond precision
  • Multi-line query parsing - WHERE clause extraction now works across newlines (contributed by @khalid244)
  • String literals with SQL keywords - Partition pruning no longer breaks on embedded keywords
  • Buffer flush timing - Age-based flushes now fire closer to configured intervals under high load
  • Arrow writer panic - Fixed crash during high-concurrency writes with schema evolution
  • Empty directories - Cleaned up after daily compaction
  • Compactor OOM/segfaults - Streaming I/O, memory limit passthrough, file batching, adaptive splitting
  • Orphaned temp directories - Cleaned up on startup and after subprocess completion
  • Compaction data duplication - Manifest-based tracking prevents re-compaction after crashes (contributed by @khalid244)
  • WAL S3 recovery - Startup and periodic recovery from transient S3 failures (contributed by @khalid244)
  • Tiered storage routing - X-Arc-Database header now queries cold tier data
  • Retention policies - Now work with S3/Azure storage backends
  • Query timeout - Prevents indefinite hangs when S3 disconnects mid-query (contributed by @khalid244)

Improvements

  • Configurable server timeouts - Idle and shutdown timeouts now configurable
  • Automatic time function optimization - time_bucket() and date_trunc() rewritten to epoch arithmetic (2-2.5x faster GROUP BY)
  • Parallel partition scanning - Multi-partition queries execute concurrently (2-4x speedup)
  • Two-stage distributed aggregation - Cross-shard aggregations use scatter-gather (5-20x speedup, Enterprise only)
  • DuckDB query optimizations - Metadata caching, prefetching, insertion order preservation (18-24% faster aggregations) (SET GLOBAL fix contributed by @khalid244)
  • Regex-to-string optimization - URL domain extraction rewritten to native functions (2x+ faster)
  • Database header optimization - x-arc-database header skips regex parsing (4-17% faster queries)
  • MQTT auto-generated client ID - Prevents collisions when running multiple instances

Security

Token hashing uses bcrypt (cost 10) with SHA256-based prefixes for O(1) lookups. Legacy SHA256 tokens continue to work for backward compatibility.

Breaking Changes

Line Protocol endpoint paths renamed to match InfluxDB API:

  • /api/v1/write/write
  • /api/v1/write/influxdb/api/v2/write

Update client configurations. InfluxDB client libraries work unchanged with new paths.

Upgrade Notes

  • MQTT feature disabled by default. Enable with mqtt.enabled = true
  • Empty directory cleanup is automatic for new compaction runs only
  • Existing empty directories from previous runs not automatically cleaned

What's Changed

  • Feature/mqtt ingestion by @xe-nvdk in #91
  • fix: Clean up empty directories after daily compaction by @xe-nvdk in #95
  • fix: Support relative time expressions in partition pruning by @xe-nvdk in #96
  • Feature/time bucket optimization by @xe-nvdk in #97
  • feat: Add date_trunc() to epoch optimization for 2.5x faster GROUP BY by @xe-nvdk in #98
  • perf: Add fast-path checks to time function rewrites by @xe-nvdk in #99
  • fix: Add AzureTransportOptionType so that curl can be used when querying due to CA certificates error by @schotime in #92
  • feat(enterprise): Add license-gated CQ and retention schedulers by @xe-nvdk in #100
  • feat(enterprise): Add RBAC with security hardening by @xe-nvdk in #101
  • fix(compaction): Resolve OOM and segfaults with large datasets by @xe-nvdk in #103
  • fix: Buffer flush bug and Arrow endpoint SQL cache by @xe-nvdk in #104
  • chore: Upgrade DuckDB to 1.4.3 and fix RBAC tests by @xe-nvdk in #105
  • Fix/compaction batch race condition by @xe-nvdk in #106
  • Perf/query optimization by @xe-nvdk in #107
  • perf: Add x-arc-database header support for query optimization by @xe-nvdk in #108
  • perf: Optimize header-based query parsing with fast paths by @xe-nvdk in #109
  • feat(cluster): Add Phase 2 enterprise clustering foundation by @xe-nvdk in #110
  • feat(cluster): Add Phase 3 cluster routing and WAL replication by @xe-nvdk in #111
  • feat(cluster): Add Phase 4 multi-writer sharding foundation by @xe-nvdk in #112
  • fix(wal): Prevent integer overflow in payload allocation by @xe-nvdk in #114
  • feat(api): InfluxDB-compatible endpoints for drop-in client migration by @xe-nvdk in #115
  • fix(auth): Add CodeQL suppression comments for SHA256 false positives by @xe-nvdk in #116
  • docs: Fix MQTT configuration examples in release notes by @xe-nvdk in #117
  • fix(api): Apply MaxPayloadSize config to Fiber BodyLimit by @xe-nvdk in #118
  • feat(query): Parallel partition scanning and two-stage distributed aggregation by @xe-nvdk in #119
  • feat(query): Add regex-to-string function rewriter for 2.2x speedup by @xe-nvdk in #120
  • feat(api): Add REGEXP_EXTRACT to string function rewriter by @xe-nvdk in #121
  • fix(api): Validate measurement names to prevent S3 XML parsing errors by @xe-nvdk in #124
  • fix(pruning): Filter non-existent S3/Azure partitions before query execution by @xe-nvdk in #127
  • fix(config): Use configured server read/write timeout values by @xe-nvdk in #128
  • feat(config): Add server idle_timeout and shutdown_timeout config options by @xe-nvdk in #129
  • fix: Ensure UTC dates for compaction filenames by @schotime in #132
  • Fix/s3 subprocess credentials by @xe-nvdk in #135
  • fix(ingest): prevent panic during high-concurrency writes with schema evolution by @xe-nvdk in #137
  • fix(ingest): sanitize non-UTF8 strings to prevent query failures by @xe-nvdk in #138
  • Fix/utf8 sanitization ingestion by @xe-nvdk in #139
  • Fix/utf8 sanitization ingestion by @xe-nvdk in #140
  • feat(compaction): add adaptive batch sizing on failure by @xe-nvdk in #141
  • Fix/buffer flush timing issue by @xe-nvdk in #143
  • Fix WHERE clause regex to match across newlines by @khalid244 in #148
  • fix(pruning): verify files exist at day-level paths by @khalid244 in #145
  • Add S3 file caching via cache_httpfs extension by @khalid244 in #149
  • perf(pruning): cache S3 directory listings by parent to reduce API calls by @khalid244 in #150
  • feat(tiering): implement 2-tier hot/cold storage system by @xe-nvdk in #154
  • fix(security): prevent SQL injection in DuckDB credential configuration by @xe-nvdk in #158
  • perf(ingest): remove dead code and optimize ingestion path by @xe-nvdk in #159
  • refactor: sanitize query and compaction paths for 26.02.1 release by @xe-nvdk in #160
  • perf(pruning): remove unused code and add day-level file caching by @xe-nvdk in #161
  • fix(compaction): cleanup orphaned temp directories on startup by @xe-nvdk in #165
  • fix(compaction): add manifest-based tracking to prevent data duplication by @khalid244 in #163
  • fix: prevent data loss during S3 outages with WAL-based recovery by @khalid244 in #162
  • Add query timeout to prevent indefinite hangs on S3 disconnection by @khalid244 in #152
  • fix(tiering): route X-Arc-Database queries to cold tier by @xe-nvdk in #167
  • feat(retention): support S3/Azure storage backends by @xe-nvdk in #170
  • perf(duckdb): enable parquet metadata cache and prefetch by @khalid244 in #171
  • fix(duckdb): use SET GLOBAL for all settings to ensure pool consistency by @khalid244 in #172
  • fix(wal): native columnar recovery, database envelope, and shutdown purge by @xe-nvdk in #173
  • Fix/wal columnar recovery by @xe-nvdk in #174

New Contributors

Special thanks to @khalid244 and @schotime for their contributions to this release!


Download: https://github.com/basekick-labs/arc/releases/tag/v26.02.1
Documentation: https://docs.basekick.net/arc
Discord: https://discord.gg/nxnWfUxsdm