Instrumentation Hooks

SmarterCSV provides three optional callback hooks so you can observe file processing without wrapping every call site in timing code. The hooks work with SmarterCSV.process (library-controlled iteration). Enumerator modes (each, each_chunk) do not fire hooks — in those modes the caller owns the lifecycle and should instrument their own loop.

The Three Hooks

Hook	Fires when	Useful for
`on_start`	Once, before the first row is parsed	Logging intent, starting timers, counters
`on_chunk`	After each chunk is parsed, before block runs	Progress tracking, per-batch metrics
`on_complete`	Once, after the entire file is exhausted	Total duration, row counts, summary metrics

on_chunk only fires when chunk_size is set. In non-chunked mode only on_start and on_complete fire.

Usage

All three hooks are lambdas (or any callable) passed as options:

SmarterCSV.process('data.csv',
  chunk_size: 500,

  on_start: ->(info) {
    Rails.logger.info "Starting CSV import: #{info[:input]} (#{info[:file_size]} bytes)"
    Metrics.increment('csv.import.start')
  },

  on_chunk: ->(info) {
    Rails.logger.debug "Chunk #{info[:chunk_number]}: #{info[:rows_in_chunk]} rows " \
                       "(#{info[:total_rows_so_far]} so far)"
  },

  on_complete: ->(stats) {
    Rails.logger.info "Import complete: #{stats[:total_rows]} rows in #{stats[:duration].round(2)}s"
    Metrics.histogram('csv.import.duration', stats[:duration])
    Metrics.gauge('csv.import.rows', stats[:total_rows])
    Metrics.increment('csv.import.bad_rows', stats[:bad_rows]) if stats[:bad_rows] > 0
  },
) { |chunk| MyModel.insert_all(chunk) }

Hook Payloads

`on_start`

Key	Type	Description
`:input`	String	File path if input is a filename; class name (e.g. `"File"`) otherwise
`:file_size`	Integer / nil	File size in bytes if determinable; nil for IO objects
`:col_sep`	String	Effective column separator (after auto-detection)
`:row_sep`	String	Effective row separator (after auto-detection)

`on_chunk`

Key	Type	Description
`:chunk_number`	Integer	1-based index of this chunk
`:rows_in_chunk`	Integer	Number of rows in this chunk (≤ `chunk_size`)
`:total_rows_so_far`	Integer	Cumulative rows processed including this chunk

`on_complete`

Key	Type	Description
`:total_rows`	Integer	Total rows successfully parsed
`:total_chunks`	Integer	Number of chunks yielded (0 in non-chunked mode)
`:duration`	Float	Elapsed seconds from `on_start` to `on_complete`
`:bad_rows`	Integer	Number of rows that triggered `on_bad_row` handling (0 if none)

Non-chunked mode

When chunk_size is not set, on_chunk never fires. on_start and on_complete still fire and give you the full-file summary:

SmarterCSV.process('data.csv',
  on_start:    ->(info)  { @started_at = Time.now; log "Importing #{info[:input]}" },
  on_complete: ->(stats) { log "Done: #{stats[:total_rows]} rows in #{stats[:duration].round(3)}s" },
)

Execution order

on_start
  ├─ on_chunk (chunk 1 parsed) → block runs → returns
  ├─ on_chunk (chunk 2 parsed) → block runs → returns
  └─ on_chunk (chunk N parsed) → block runs → returns
on_complete

on_chunk fires before the block receives the chunk, so you can record timing or state before your processing logic runs.

Without Rails / ActiveSupport

The hooks are plain callables — no dependency on Rails or any framework:

require 'logger'
logger = Logger.new($stdout)

SmarterCSV.process('import.csv',
  on_start:    ->(i) { logger.info  "CSV import started: #{i[:input]}" },
  on_complete: ->(s) { logger.info  "CSV import done: #{s[:total_rows]} rows, #{s[:duration].round(2)}s" },
)

With `ActiveSupport::Notifications` (Rails)

If you prefer Rails-style instrumentation, wrap the hooks yourself:

# config/initializers/smarter_csv_instrumentation.rb
ON_START = ->(info) {
  ActiveSupport::Notifications.instrument('start.smarter_csv', info)
}
ON_COMPLETE = ->(stats) {
  ActiveSupport::Notifications.instrument('complete.smarter_csv', stats)
}

# Subscribe once at startup:
ActiveSupport::Notifications.subscribe('complete.smarter_csv') do |*, payload|
  StatsD.histogram('csv.duration', payload[:duration])
  StatsD.gauge('csv.rows', payload[:total_rows])
end

Then pass the cached lambdas to any process call:

SmarterCSV.process(file, on_start: ON_START, on_complete: ON_COMPLETE)

PREVIOUS: Bad Row Quarantine | NEXT: Examples | UP: README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Contents

Instrumentation Hooks

The Three Hooks

Usage

Hook Payloads

`on_start`

`on_chunk`

`on_complete`

Non-chunked mode

Execution order

Without Rails / ActiveSupport

With `ActiveSupport::Notifications` (Rails)

Uh oh!

FilesExpand file tree

instrumentation.md

Latest commit

History

instrumentation.md

File metadata and controls

Contents

Instrumentation Hooks

The Three Hooks

Usage

Hook Payloads

on_start

on_chunk

on_complete

Non-chunked mode

Execution order

Without Rails / ActiveSupport

With ActiveSupport::Notifications (Rails)

`on_start`

`on_chunk`

`on_complete`

With `ActiveSupport::Notifications` (Rails)