-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fileio] Add file-io.allow-cache for RESTTokenFileIO #5054
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -80,6 +80,12 @@ | |
<td>Integer</td> | ||
<td>Configure the size of the connection pool.</td> | ||
</tr> | ||
<tr> | ||
<td><h5>file-io.allow-cache</h5></td> | ||
<td style="word-wrap: break-word;">true</td> | ||
<td>Boolean</td> | ||
<td>Whether to allow static cache in file io implementation. If not allowed, this means that there may be a large number of FileIO instances generated, enabling caching can lead to resource leakage.</td> | ||
</tr> | ||
<tr> | ||
<td><h5>format-table.enabled</h5></td> | ||
<td style="word-wrap: break-word;">true</td> | ||
|
@@ -116,6 +122,12 @@ | |
<td>String</td> | ||
<td>Metastore of paimon catalog, supports filesystem, hive and jdbc.</td> | ||
</tr> | ||
<tr> | ||
<td><h5>resolving-file-io.enabled</h5></td> | ||
<td style="word-wrap: break-word;">false</td> | ||
<td>Boolean</td> | ||
<td>Whether to enable resolving fileio, when this option is enabled, in conjunction with the table's property data-file.external-paths, Paimon can read and write to external storage paths, such as OSS or S3. In order to access these external paths correctly, you also need to configure the corresponding access key and secret key.</td> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fileio -> file io same with line 87? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is ok in description, we can just be careful for option key, it is API part. |
||
</tr> | ||
<tr> | ||
<td><h5>sync-all-properties</h5></td> | ||
<td style="word-wrap: break-word;">false</td> | ||
|
@@ -140,11 +152,5 @@ | |
<td>String</td> | ||
<td>The warehouse root path of catalog.</td> | ||
</tr> | ||
<tr> | ||
<td><h5>resolving-fileio.enabled</h5></td> | ||
<td style="word-wrap: break-word;">false</td> | ||
<td>Boolean</td> | ||
<td>Whether to enable resolving fileio, when this option is enabled, in conjunction with the table's property data-file.external-paths, Paimon can read and write to external storage paths, such as OSS or S3. In order to access these external paths correctly, you also need to configure the corresponding access key and secret key.</td> | ||
</tr> | ||
</tbody> | ||
</table> |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,6 +21,7 @@ | |
import org.apache.paimon.catalog.CatalogContext; | ||
import org.apache.paimon.fs.FileIO; | ||
import org.apache.paimon.options.Options; | ||
import org.apache.paimon.utils.IOUtils; | ||
|
||
import org.apache.hadoop.conf.Configuration; | ||
import org.apache.hadoop.fs.FileSystem; | ||
|
@@ -35,11 +36,14 @@ | |
import java.util.Map; | ||
import java.util.Objects; | ||
import java.util.concurrent.ConcurrentHashMap; | ||
import java.util.function.Supplier; | ||
|
||
import static org.apache.paimon.options.CatalogOptions.FILE_IO_ALLOW_CACHE; | ||
|
||
/** OSS {@link FileIO}. */ | ||
public class OSSFileIO extends HadoopCompliantFileIO { | ||
|
||
private static final long serialVersionUID = 1L; | ||
private static final long serialVersionUID = 2L; | ||
|
||
private static final Logger LOG = LoggerFactory.getLogger(OSSFileIO.class); | ||
|
||
|
@@ -68,7 +72,11 @@ public class OSSFileIO extends HadoopCompliantFileIO { | |
*/ | ||
private static final Map<CacheKey, AliyunOSSFileSystem> CACHE = new ConcurrentHashMap<>(); | ||
|
||
// create a shared config to avoid load properties everytime | ||
private static final Configuration SHARED_CONFIG = new Configuration(); | ||
|
||
private Options hadoopOptions; | ||
private boolean allowCache = true; | ||
|
||
@Override | ||
public boolean isObjectStore() { | ||
|
@@ -77,6 +85,7 @@ public boolean isObjectStore() { | |
|
||
@Override | ||
public void configure(CatalogContext context) { | ||
allowCache = context.options().get(FILE_IO_ALLOW_CACHE); | ||
hadoopOptions = new Options(); | ||
// read all configuration with prefix 'CONFIG_PREFIXES' | ||
for (String key : context.options().keySet()) { | ||
|
@@ -101,11 +110,12 @@ public void configure(CatalogContext context) { | |
protected FileSystem createFileSystem(org.apache.hadoop.fs.Path path) { | ||
final String scheme = path.toUri().getScheme(); | ||
final String authority = path.toUri().getAuthority(); | ||
return CACHE.computeIfAbsent( | ||
new CacheKey(hadoopOptions, scheme, authority), | ||
key -> { | ||
Configuration hadoopConf = new Configuration(); | ||
key.options.toMap().forEach(hadoopConf::set); | ||
Supplier<AliyunOSSFileSystem> supplier = | ||
() -> { | ||
// create config from base config, if initializing a new config, it will | ||
// retrieve props from the file, which comes at a high cost | ||
Configuration hadoopConf = new Configuration(SHARED_CONFIG); | ||
hadoopOptions.toMap().forEach(hadoopConf::set); | ||
URI fsUri = path.toUri(); | ||
if (scheme == null && authority == null) { | ||
fsUri = FileSystem.getDefaultUri(hadoopConf); | ||
|
@@ -124,7 +134,22 @@ protected FileSystem createFileSystem(org.apache.hadoop.fs.Path path) { | |
throw new UncheckedIOException(e); | ||
} | ||
return fs; | ||
}); | ||
}; | ||
|
||
if (allowCache) { | ||
return CACHE.computeIfAbsent( | ||
new CacheKey(hadoopOptions, scheme, authority), key -> supplier.get()); | ||
} else { | ||
return supplier.get(); | ||
} | ||
} | ||
|
||
@Override | ||
public void close() { | ||
if (!allowCache) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This means, if allowCache, the fileIo need to close by outside? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if not allow cache, the fileio should be closed. If allow cache, the fileio is in the static cache, we cannot close them. |
||
fsMap.values().forEach(IOUtils::closeQuietly); | ||
fsMap.clear(); | ||
} | ||
} | ||
|
||
private static class CacheKey { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question here:
If enabling caching, the number of FileIO instances will be reduce. Why enabling caching can lead to resource leakage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our original design was to place it in the cache of static variables, but for the case of REST Catalog, it may have a lot of FileIO generated, and there may be a FileIO for each table because each table has different file access permissions.
So if there are too many FileIOs in this situation, we cannot cache them in memory casually. If there are too many, it will lead to too many resources, that is, resource leakage.