-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Feature(sitemap): named files chunking strategy #14471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Slackluky
wants to merge
29
commits into
withastro:main
Choose a base branch
from
Slackluky:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
ecee3fc
feat(sitemap): add chunking strategy for sitemaps
Slackluky 8bd654e
feat(sitemap): add chunks option to sitemap config
Slackluky bba98f6
feat(sitemap): add sitemap chunk writing functionality
Slackluky 63dfacd
fix(sitemap): fix empty callback in writeSitemap
Slackluky bcb9f59
feat(sitemap): add test fixture for sitemap chunking
Slackluky 4d40768
test(sitemap): add test for sitemap chunking with files
Slackluky e2d5198
feat(sitemap): add changeset for sitemap chunking
Slackluky 9687b55
Merge pull request #1 from Slackluky/feature/sitemap-chunking-strategy
Slackluky 74fc698
Merge branch 'withastro:main' into main
Slackluky a4edd79
build: update dependencies and add astro
Slackluky 4ce7284
chore: remove unused astro dependency
Slackluky 216983b
chore: remove unused entries from lockfile
Slackluky f5486f0
Merge remote-tracking branch 'origin/main' into feature/sitemap-chunk…
Slackluky 0d7eb59
Merge pull request #2 from Slackluky/feature/sitemap-chunking-strategy
Slackluky 30cde1b
refactor(sitemap): improve import ordering and formatting
Slackluky 18d8c70
refactor(sitemap): improve import ordering
Slackluky 7d6d225
refactor(sitemap): improve import ordering
Slackluky 8b55a0b
refactor(sitemap): improve import ordering
Slackluky 58e90f2
refactor(sitemap): improve import ordering
Slackluky e7816a2
refactor(sitemap): improve chunk file test readability
Slackluky a5c0126
test(sitemap): fix flaky chunk file tests
Slackluky b73e105
refactor(sitemap): improve import ordering
Slackluky b688960
Merge branch 'main' of https://github.com/withastro/astro
Slackluky 961c43f
Merge branch 'main' into main
Slackluky c6ed9f7
Update .changeset/floppy-times-grab.md
Slackluky 30f5ebe
chore(sitemap): update changeset to minor
Slackluky 4d4a5e3
feat(sitemap): add chunking support for sitemap generation
Slackluky 6cdd1ed
Merge remote-tracking branch 'upstream/main'
Slackluky c510a4c
Merge branch 'main' into main
Slackluky File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| --- | ||
| '@astrojs/sitemap': minor | ||
| --- | ||
|
|
||
| Adds the ability to split sitemap generation into chunks based on customizable logic. This allows for better management of large sitemaps and improved performance. The new `chunks` option in the sitemap configuration allows users to define functions that categorize sitemap items into different chunks. Each chunk is then written to a separate sitemap file. | ||
|
|
||
| ``` | ||
| integrations: [ | ||
| sitemap({ | ||
| serialize(item) { th | ||
| return item | ||
| }, | ||
| chunks: { // this property will be treated last on the configuration | ||
| 'blog': (item) => { // will produce a sitemap file with `blog` name (sitemap-blog-0.xml) | ||
| if (/blog/.test(item.url)) { // filter path that will be included in this specific sitemap file | ||
| item.changefreq = 'weekly'; | ||
| item.lastmod = new Date(); | ||
| item.priority = 0.9; // define specific properties for this filtered path | ||
| return item; | ||
| } | ||
| }, | ||
| 'glossary': (item) => { | ||
| if (/glossary/.test(item.url)) { | ||
| item.changefreq = 'weekly'; | ||
| item.lastmod = new Date(); | ||
| item.priority = 0.7; | ||
| return item; | ||
| } | ||
| } | ||
|
|
||
| // the rest of the path will be stored in `sitemap-pages.0.xml` | ||
| }, | ||
| }), | ||
| ], | ||
|
|
||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
128 changes: 128 additions & 0 deletions
128
packages/integrations/sitemap/src/write-sitemap-chunk.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
| import { createWriteStream, type WriteStream } from 'node:fs'; | ||
| import { mkdir } from 'node:fs/promises'; | ||
| import { normalize, resolve } from 'node:path'; | ||
| import { pipeline, Readable } from 'node:stream'; | ||
| import { promisify } from 'node:util'; | ||
| import type { AstroConfig } from 'astro'; | ||
| import { SitemapAndIndexStream, SitemapIndexStream, SitemapStream } from 'sitemap'; | ||
| import replace from 'stream-replace-string'; | ||
| import type { SitemapItem } from './index.js'; | ||
|
|
||
|
|
||
| type WriteSitemapChunkConfig = { | ||
| filenameBase: string; | ||
| hostname: string; | ||
| sitemapHostname?: string; | ||
| sourceData: Record<string, SitemapItem[]>; | ||
| destinationDir: string; | ||
| customSitemaps?: string[]; | ||
| publicBasePath?: string; | ||
| limit?: number; | ||
| xslURL?: string; | ||
| lastmod?: string; | ||
| namespaces?: { | ||
| news?: boolean; | ||
| xhtml?: boolean; | ||
| image?: boolean; | ||
| video?: boolean; | ||
| }; | ||
| }; | ||
|
|
||
| // adapted from sitemap.js/sitemap-simple | ||
| export async function writeSitemapChunk( | ||
| { | ||
| filenameBase, | ||
| hostname, | ||
| sitemapHostname = hostname, | ||
| sourceData, | ||
| destinationDir, | ||
| limit = 50000, | ||
| customSitemaps = [], | ||
| publicBasePath = './', | ||
| xslURL: xslUrl, | ||
| lastmod, | ||
| namespaces = { news: true, xhtml: true, image: true, video: true }, | ||
| }: WriteSitemapChunkConfig, | ||
| astroConfig: AstroConfig, | ||
| ) { | ||
| await mkdir(destinationDir, { recursive: true }); | ||
|
|
||
| // Normalize publicBasePath | ||
| let normalizedPublicBasePath = publicBasePath; | ||
| if (!normalizedPublicBasePath.endsWith('/')) { | ||
| normalizedPublicBasePath += '/'; | ||
| } | ||
|
|
||
| // Array to collect all sitemap URLs for the index | ||
| const sitemapUrls: Array<{ url: string; lastmod?: string }> = []; | ||
|
|
||
| // Process each chunk separately | ||
| for (const [chunkName, items] of Object.entries(sourceData)) { | ||
| const sitemapAndIndexStream = new SitemapAndIndexStream({ | ||
| limit, | ||
| xslUrl, | ||
| getSitemapStream: (i) => { | ||
| const sitemapStream = new SitemapStream({ | ||
| hostname, | ||
| xslUrl, | ||
| // Custom namespace handling | ||
| xmlns: { | ||
| news: namespaces?.news !== false, | ||
| xhtml: namespaces?.xhtml !== false, | ||
| image: namespaces?.image !== false, | ||
| video: namespaces?.video !== false, | ||
| }, | ||
| }); | ||
|
|
||
| const path = `./${filenameBase}-${chunkName}-${i}.xml`; | ||
| const writePath = resolve(destinationDir, path); | ||
| const publicPath = normalize(normalizedPublicBasePath + path); | ||
|
|
||
| let stream: WriteStream; | ||
| if (astroConfig.trailingSlash === 'never' || astroConfig.build.format === 'file') { | ||
| // workaround for trailing slash issue in sitemap.js | ||
| const host = hostname.endsWith('/') ? hostname.slice(0, -1) : hostname; | ||
| const searchStr = `<loc>${host}/</loc>`; | ||
| const replaceStr = `<loc>${host}</loc>`; | ||
| stream = sitemapStream | ||
| .pipe(replace(searchStr, replaceStr)) | ||
| .pipe(createWriteStream(writePath)); | ||
| } else { | ||
| stream = sitemapStream.pipe(createWriteStream(writePath)); | ||
| } | ||
|
|
||
| const url = new URL(publicPath, sitemapHostname).toString(); | ||
|
|
||
| // Collect this sitemap URL for the index | ||
| sitemapUrls.push({ url, lastmod }); | ||
|
|
||
| return [{ url, lastmod }, sitemapStream, stream]; | ||
| }, | ||
| }); | ||
|
|
||
| // Create a readable stream from this chunk's items | ||
| const dataStream = Readable.from(items); | ||
|
|
||
| // Write this chunk's sitemap(s) | ||
| await promisify(pipeline)(dataStream, sitemapAndIndexStream); | ||
| } | ||
|
|
||
| // Now create the sitemap index with all the generated sitemaps | ||
| const indexStream = new SitemapIndexStream({ xslUrl }); | ||
| const indexPath = resolve(destinationDir, `./${filenameBase}-index.xml`); | ||
| const indexWriteStream = createWriteStream(indexPath); | ||
|
|
||
| // Add custom sitemaps to the index | ||
| for (const url of customSitemaps) { | ||
| indexStream.write({ url, lastmod }); | ||
| } | ||
|
|
||
| // Add all generated sitemaps to the index | ||
| for (const sitemapUrl of sitemapUrls) { | ||
| indexStream.write(sitemapUrl); | ||
| } | ||
|
|
||
| indexStream.end(); | ||
|
|
||
| return await promisify(pipeline)(indexStream, indexWriteStream); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need the
Promise.resolve()?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
Promise.resolve()call is indeed necessary in this case. The callback functioncbprovided in thechunksoption can return either a directSitemapItemobject or aPromisethat resolves to aSitemapItemobject.Promise.resolve()ensures that the result is always a Promise, allowing for consistent handling of both synchronous and asynchronous results.