-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Store image provider cache in database to speed up application restarts #426
Conversation
- Introduced ImageCache model to store image metadata - Added methods to get, save, and retrieve image cache entries - Updated database auto-migration to include ImageCache model - Implemented methods in datastore interface for image caching operations
…or handling - Implemented robust image caching with database and memory support - Added background cache refresh mechanism with configurable TTL - Improved error handling in image fetching and caching - Simplified image provider interface and cache management - Added debug logging and metrics for image retrieval process
- Added mock implementations for datastore and error handling - Introduced tests for database failure scenarios - Implemented tests for cache refresh and nil store handling - Enhanced test coverage for image caching functionality - Added test cases for memory usage and size estimation
…urrency and caching
- Avoid unnecessary struct copying in image cache initialization and refresh - Use pointer references to improve memory efficiency - Simplified stale entry tracking in cache refresh mechanism
- Updated method signatures for GetClipsQualifyingForRemoval and UpdateNoteComment - Reordered and clarified parameter order for better readability - Maintained consistent method signature style in mock store implementation
… and test coverage - Added detailed debug logging for cache refresh and mock store operations - Improved test case for cache refresh with more comprehensive checks - Shortened refresh intervals and delays for more responsive testing - Added logging and error handling in cache initialization and refresh routines - Ensured robust testing of image cache refresh mechanism
WalkthroughThe pull request updates the bird image caching workflow. In Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Cache as BirdImageCache
participant Store as Datastore
participant Provider as Image Provider
Client->>Cache: Request image for species
alt Image in Memory
Cache-->>Client: Return cached image
else Not in Memory
Cache->>Store: GetImageCache(species)
alt Cache Found
Store-->>Cache: Return cached entry
Cache-->>Client: Return image
else Cache Not Found
Cache->>Provider: Fetch image for species
Provider-->>Cache: Return fetched image
Cache->>Store: SaveImageCache(new image)
Cache-->>Client: Return image
end
end
sequenceDiagram
participant Cache as BirdImageCache
participant Store as Datastore
participant Provider as Image Provider
Cache->>Cache: startCacheRefresh routine
Cache->>Store: GetAllImageCaches()
alt Stale Entry Detected
Cache->>Provider: refreshEntry(stale image)
Provider-->>Cache: Return updated image
Cache->>Store: SaveImageCache(updated image)
else No Stale Entries
Cache-->>Cache: Continue without refresh
end
Possibly related PRs
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (11)
internal/imageprovider/imageprovider.go (5)
35-43
: Potential confusion with a singleprovider
field.
BothSetNonBirdImageProvider
andSetImageProvider
overwrite the sameprovider
field. If the intent is to handle separate providers (bird vs. non-bird), aligning the field(s) and setter methods more clearly would help prevent accidental overrides.Also applies to: 53-60
62-67
: Refresh interval constants may need environment-based configurations.
A hard-coded 1-second refresh interval and 100ms delay could be excessive in production or too slow if there are a large number of entries. Consider providing these values as configuration settings based on environment (dev vs. production).
69-94
: Check for potential long-running refresh cycles.
Processing entries every second might scale poorly if the number of stale entries grows large. You could implement concurrency or a queue-based approach. Alternatively, ensure the batch size and interval remain appropriate under production loads.
183-189
: EnsureClose()
is idempotent.
CallingClose()
multiple times can lead to a panic when closing an already-closed channel. You may want to guard it with async.Once
or a conditional check to avoid repeated closes.
414-444
: Duplicate logic betweenInitCache
andCreateDefaultCache
.
Both functions contain similar steps (instantiatingBirdImageCache
, loading images, starting refresh). To avoid code drift, consider refactoring one function to call the other or extract common logic into a helper function.internal/imageprovider/imageprovider_test.go (4)
29-39
: Time-based unique URL in tests can cause intermittent issues.
If tests run quickly or system time changes, the URL might not change as expected. Typically this is stable, but in some CI scenarios, injecting a short random string or increment might avoid edge cases more reliably than relying onUnixNano()
.
188-239
: Excellent coverage of cache hits vs. misses.
VerifyingfetchCounter
ensures the in-memory cache is working as intended. Optionally, you could add concurrency tests to ensure correct behavior under parallel requests.
241-259
: Check for more detailed error assertions.
Currently, the test only checks whether an error is returned. For more robust validation, consider verifying the specific error message or type.
412-480
: Refresh routine test may be slow.
Sleeping for 5 seconds ensures you see a refresh, but it can slow down the test suite. Consider using a time mocking library or reducing the intervals for test builds to speed up automated testing.internal/datastore/manage.go (1)
27-27
: Include proper GORM tags or indexes forImageCache
.
AutoMigrate with&ImageCache{}
is correct, but ensure theImageCache
model includes any needed indexes onScientificName
, especially if queries will filter on that field frequently.internal/analysis/realtime.go (1)
327-329
: Consider enhancing error handling in the goroutine.While errors are logged, they might be important enough to track or report through metrics.
Consider tracking fetch failures through metrics:
if _, err := birdImageCache.Get(name); err != nil { log.Printf("Failed to fetch image for %s: %v", name, err) + if metrics != nil { + metrics.ImageFetchFailures.Inc() + } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
internal/analysis/realtime.go
(1 hunks)internal/datastore/interfaces.go
(2 hunks)internal/datastore/manage.go
(1 hunks)internal/datastore/model.go
(1 hunks)internal/imageprovider/imageprovider.go
(5 hunks)internal/imageprovider/imageprovider_test.go
(8 hunks)internal/imageprovider/wikipedia.go
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (14)
internal/imageprovider/imageprovider.go (1)
22-31
: NewBirdImage
struct looks good for caching.
The newly addedCachedAt
field is particularly useful for detecting stale entries. Consider adding validations or an ID field if you need to query or index these entries more flexibly in the future.internal/imageprovider/imageprovider_test.go (3)
41-122
:mockStore
implementation is comprehensive.
This well-structured mock provides crucial debug info. However, if test output becomes noisy, consider using a logger that can be silenced. Overall, this approach effectively emulates the datastore for tests.
151-180
:mockFailingStore
elegantly tests error handling.
Simulating failures inGet
,Save
, andGetAll
helps ensure the cache fallback logic is robust. Great approach to increasing coverage on failure paths.
294-323
: Memory usage test is insightful.
Measuring approximate usage is valuable for performance analysis. If available, running this test under load or with more entries might provide further assurance.internal/datastore/model.go (1)
113-123
: LGTM! Well-structured model definition.The ImageCache model is well-designed with:
- Appropriate field types for storing image metadata
- Proper use of GORM tags for constraints (uniqueIndex on ScientificName)
- Clear field naming that reflects the data purpose
internal/imageprovider/wikipedia.go (3)
151-151
: LGTM! Improved error propagation.Good change to pass through the user-friendly error from queryThumbnail instead of wrapping it.
164-165
: LGTM! Better error abstraction.Good improvement to use a generic error message instead of exposing internal error details to the user.
203-203
: LGTM! User-friendly error messages.The error messages are now more user-friendly and provide clear information about what went wrong:
- No Wikipedia page found
- No free-license image available
- Image metadata not available
Also applies to: 211-211, 219-219
internal/analysis/realtime.go (2)
304-304
: LGTM! Good concurrency control.Using a semaphore to limit concurrent fetches to 5 is a good practice to prevent resource exhaustion.
312-314
: LGTM! Efficient cache checking.Good optimization to skip already cached images before launching goroutines.
internal/datastore/interfaces.go (4)
56-58
: LGTM! Well-defined interface methods.Good addition of methods for managing image cache entries with clear signatures.
874-887
: LGTM! Robust implementation of GetImageCache.Good implementation with:
- Silent logging to reduce noise
- Proper handling of "not found" case
- Clear error wrapping
889-904
: LGTM! Solid SaveImageCache implementation.Good implementation with:
- Input validation
- Silent logging
- Proper upsert logic using FirstOrCreate
906-914
: LGTM! Clean GetAllImageCaches implementation.Good implementation with:
- Silent logging
- Clear error handling
- Added comprehensive tests for concurrent image cache initialization - Implemented robust handling of concurrent requests and initialization timeouts - Enhanced mock image provider to simulate network latency and race conditions - Added test cases for initialization failure and retry mechanisms - Improved cache initialization logic with better synchronization and error handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
internal/imageprovider/imageprovider.go (2)
63-68
: Consider increasing the refresh delay for production.The refresh delay of 100ms between refreshing individual entries seems too aggressive for production use and might cause unnecessary load on external APIs.
Apply this diff to increase the delay:
- refreshDelay = 100 * time.Millisecond // Delay between refreshing individual entries (shortened for testing) + refreshDelay = 1 * time.Second // Delay between refreshing individual entries
344-434
: Extract maxAttempts as a constant.For better maintainability and consistency with other constants, consider extracting the maximum number of initialization attempts as a constant.
Apply this diff to add the constant:
const ( defaultCacheTTL = 14 * 24 * time.Hour // 14 days refreshInterval = 1 * time.Second // How often to check for stale entries refreshBatchSize = 10 // Number of entries to refresh in one batch refreshDelay = 100 * time.Millisecond // Delay between refreshing individual entries + maxInitAttempts = 3 // Maximum number of initialization attempts ) // In Get method: - maxAttempts := 3 // Maximum number of initialization attempts + maxAttempts := maxInitAttemptsinternal/imageprovider/imageprovider_test.go (2)
119-169
: Extract unused methods to a separate interface.The mockStore implements many unused methods from the datastore.Interface. Consider extracting these methods to a separate interface to follow the Interface Segregation Principle.
Create a new interface in
internal/datastore/interfaces.go
:// ImageCacheStore defines the interface for image cache operations type ImageCacheStore interface { GetImageCache(scientificName string) (*ImageCache, error) SaveImageCache(cache *ImageCache) error GetAllImageCaches() ([]ImageCache, error) } // Update the BirdImageCache to use ImageCacheStore instead of Interface type BirdImageCache struct { provider ImageProvider dataMap sync.Map metrics *metrics.ImageProviderMetrics debug bool store ImageCacheStore logger *log.Logger quit chan struct{} Initializing sync.Map }
500-661
: Consider using table-driven tests for concurrency scenarios.The concurrency tests could be more maintainable using table-driven tests, similar to how other tests in the file are structured.
Example structure:
func TestConcurrencyScenarios(t *testing.T) { tests := []struct { name string fetchDelay time.Duration shouldFail bool numRequests int expectedFetch int wantErr bool }{ { name: "Concurrent initialization", fetchDelay: 200 * time.Millisecond, numRequests: 10, expectedFetch: 1, wantErr: false, }, { name: "Initialization timeout", fetchDelay: 2 * time.Second, numRequests: 2, expectedFetch: 3, wantErr: false, }, { name: "Initialization failure", shouldFail: true, numRequests: 2, expectedFetch: 2, wantErr: true, }, } // ... test implementation }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
internal/imageprovider/imageprovider.go
(5 hunks)internal/imageprovider/imageprovider_test.go
(7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (7)
internal/imageprovider/imageprovider.go (5)
70-156
: Well-implemented background refresh mechanism!The implementation includes:
- Proper goroutine management with quit channel
- Batch processing of stale entries
- Robust error handling
- Debug logging
192-309
: Robust initialization and loading mechanism!The implementation includes:
- Proper error handling
- Graceful fallback to empty cache on errors
- Debug logging
311-342
: Well-implemented thread-safe initialization!The implementation includes:
- Proper concurrency control using sync.Map
- Robust error handling
- Nil provider check
436-465
: Well-implemented fetch and store mechanism!The implementation includes:
- Proper error handling
- Debug logging
- Metrics tracking
497-527
: Well-implemented default cache creation!The implementation includes:
- Proper error handling
- Debug logging
- Background refresh initialization
internal/imageprovider/imageprovider_test.go (2)
207-342
: Well-implemented basic test suite!The tests cover:
- Basic cache functionality
- Error handling
- Memory usage calculation
344-498
: Comprehensive database failure test suite!The tests cover:
- Database failure scenarios
- Nil store handling
- Cache refresh functionality
- Prevent potential nil pointer dereference in cache refresh method - Add debug logging for cases where no image provider is available - Improve robustness of image cache refresh mechanism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
internal/imageprovider/imageprovider.go (7)
23-32
: Add field documentation for BirdImage struct.Consider adding documentation for each field to improve code maintainability.
type BirdImage struct { + // URL is the direct link to the bird image URL string + // ScientificName is the taxonomic name of the bird species ScientificName string + // LicenseName is the name of the license under which the image is available LicenseName string + // LicenseURL is the URL to the full license text LicenseURL string + // AuthorName is the name of the image author/photographer AuthorName string + // AuthorURL is the URL to the author's profile or website AuthorURL string + // CachedAt is the timestamp when this image was last cached CachedAt time.Time }
63-68
: Make cache configuration values configurable.Consider moving these constants to the configuration system to allow runtime adjustment.
-const ( - defaultCacheTTL = 14 * 24 * time.Hour // 14 days - refreshInterval = 1 * time.Second // How often to check for stale entries - refreshBatchSize = 10 // Number of entries to refresh in one batch - refreshDelay = 100 * time.Millisecond // Delay between refreshing individual entries -) +// CacheConfig holds the configuration for the image cache +type CacheConfig struct { + TTL time.Duration + RefreshInterval time.Duration + BatchSize int + RefreshDelay time.Duration +}
97-156
: Add exponential backoff for refresh retries.The current implementation uses a fixed delay between refresh attempts. Consider implementing exponential backoff to handle transient failures more gracefully.
func (c *BirdImageCache) refreshStaleEntries() { + retryCount := 0 + maxRetries := 3 + baseDelay := refreshDelay for i := 0; i < len(staleEntries); i += refreshBatchSize { // ... existing code ... for _, scientificName := range batch { select { case <-c.quit: return - case <-time.After(refreshDelay): + case <-time.After(baseDelay * time.Duration(1 << uint(retryCount))): if err := c.refreshEntry(scientificName); err != nil { + if retryCount < maxRetries { + retryCount++ + i-- // Retry this entry + continue + } + } + retryCount = 0 // Reset for next entry // ... rest of the code ...
192-198
: Improve shutdown handling.Consider adding a timeout for graceful shutdown and waiting for ongoing operations to complete.
-func (c *BirdImageCache) Close() error { +func (c *BirdImageCache) Close(timeout time.Duration) error { + done := make(chan struct{}) + go func() { + // Wait for ongoing operations + c.dataMap.Range(func(key, value interface{}) bool { + if _, initializing := c.Initializing.Load(key.(string)); initializing { + time.Sleep(100 * time.Millisecond) + } + return true + }) if c.quit != nil { close(c.quit) } + close(done) + }() + + select { + case <-done: + return nil + case <-time.After(timeout): + return fmt.Errorf("shutdown timed out after %v", timeout) + } }
259-284
: Add retry logic for database operations.Consider implementing retry logic with exponential backoff for transient database errors.
+func (c *BirdImageCache) saveToDBWithRetry(image *BirdImage, maxRetries int) error { + var err error + for i := 0; i < maxRetries; i++ { + err = c.saveToDB(image) + if err == nil { + return nil + } + if !isTransientError(err) { + return err + } + time.Sleep(time.Duration(1<<uint(i)) * 100 * time.Millisecond) + } + return fmt.Errorf("failed to save to database after %d retries: %w", maxRetries, err) +} + +func isTransientError(err error) bool { + // Add logic to identify transient database errors + return false +}
352-442
: Improve context propagation.Consider accepting a context parameter in the Get method to allow better cancellation control from the caller.
-func (c *BirdImageCache) Get(scientificName string) (BirdImage, error) { +func (c *BirdImageCache) Get(ctx context.Context, scientificName string) (BirdImage, error) { // ... existing code ... select { - case <-ctx.Done(): + case <-ctx.Done(): + return BirdImage{}, ctx.Err() case <-done: return result, fetchErr } }
475-493
: Add memory limits to prevent OOM.Consider implementing a maximum memory limit and an eviction policy for the cache.
+const maxMemoryUsage = 1 * 1024 * 1024 * 1024 // 1GB func (c *BirdImageCache) MemoryUsage() int { totalSize := 0 c.dataMap.Range(func(key, value interface{}) bool { if img, ok := value.(*BirdImage); ok { totalSize += img.EstimateSize() } return true }) + // Evict entries if memory usage exceeds limit + if totalSize > maxMemoryUsage { + c.evictOldest(totalSize - maxMemoryUsage) + } return totalSize } +func (c *BirdImageCache) evictOldest(bytesToFree int) { + type entry struct { + key string + cachedAt time.Time + size int + } + var entries []entry + + // Collect entries + c.dataMap.Range(func(key, value interface{}) bool { + if img, ok := value.(*BirdImage); ok { + entries = append(entries, entry{ + key: key.(string), + cachedAt: img.CachedAt, + size: img.EstimateSize(), + }) + } + return true + }) + + // Sort by age + sort.Slice(entries, func(i, j int) bool { + return entries[i].cachedAt.Before(entries[j].cachedAt) + }) + + // Evict oldest entries until we free enough space + freed := 0 + for _, e := range entries { + if freed >= bytesToFree { + break + } + c.dataMap.Delete(e.key) + freed += e.size + } +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/imageprovider/imageprovider.go
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
Persistent Caching for Thumbnail Image Metadata
This PR implements a persistent caching mechanism for thumbnail image metadata in the database, improving application performance and reducing external API calls.
Key Changes
Benefits
Technical Details