-
Notifications
You must be signed in to change notification settings - Fork 37
Not works for me in redshift. #48
Comments
I'm experiencing something similar. I'm using
and I'm trying to read a table with ~65 billion rows with the sample code: const { Pool } = require('pg');
const config = require('./src/lib/config');
const credentials = config.get('warehousedb');
const QueryStream = require('pg-query-stream');
const JSONStream = require('JSONStream');
const pool = new Pool({
user: credentials.user,
host: credentials.host,
database: credentials.name,
password: credentials.pass,
port: credentials.port
});
const query = 'select * from <table_name>';
const start = Date.now();
pool.connect().then(function(client) {
const stream = client.query(new QueryStream(query));
stream.on('end', () => {
console.log('\n ==> pulling all data in took: ', ((Date.now() - start)/1000));
client.release();
});
stream.pipe(JSONStream.stringify()).pipe(process.stdout);
}).then(() => pool.end()); But all it does is: sits there for some minutes and crashes punching a hole in the heap 🤷♂️ For sake of my experiment I tried In all this not a single line of data is sent to I ran a test with a LIMIT of 1 billion rows (both with default heap settings as well as --max-old-space-size=4096): it ran until the end, with output, but is sit there until the query was actually completed before starting to give any output. I ran a test with a LIMIT of 10 billion rows (--max-old-space-size=4096 => ok, default heap size => death): it ran until the end, with output, but is sit there until the query was actually completed before starting to give any output. In both cases the memory grew of several G despite the original data been only 1.5G total (yet another signal that something behind is stuffing its memory like I'd do with chocolate in my mouth :P). I can see how @jose920405 might link the |
could it be that |
Honestly, I've no idea (and not much time to dig deeper right now) 🤷♂️ |
So i got a bit frustrated by this and run a small experiment with Looks like, as usual, AWS is the root of the problem :/ |
Hi @unlucio, I am also looking for a streaming solution to pipe from Redshift query. Interested to know you if were able to find any further findings or an alternative approach for querying large amount of records without hitting memory limitations? Thanks! |
well, since my investigations pointed out it was most likely a RS limitation and no one maintaining either this project or node-pg-cursor seemed to be interested in the problem., I just user AWS GLUE instead 🤷♂️ |
I'm in the same boat, long running Redshift queries. Can't find a way to stream them to fs without a |
@dreallday my temporary "solution" (it's just a trick but it works) as for now is to jump the DB as much as possible:
BTW I've been a little experimenting around "streams RS and P" and, even if not a final data, I've the feeling that the cursor package tends to buffer anyway (but I've yet to set aside some time to read that package's code). |
Hi, It looks like the issue was on the PG cursor side. There is currently an open pg-cursor PR(brianc/node-pg-cursor#44) , which should address the streaming problems. I was able to run some local manual tests using the cursor updates from (https://github.com/jafl/node-pg-cursor.git#redshift-requires-portal-name), and was able to stream a large dataset with low memory footprint. Hopefully PR(brianc/node-pg-cursor#44) gets merged soon into pg-cursor! |
Can someone test this again, the PR that @ysfdev references has been merged a month ago. I'd be interested in hearing if this works for redshift or not. |
@christophersbarrett I saw it as well, but I haven't had time to test it yet and right now I'm not dealing with RS @ work. |
well, I just got the opportunity to use it against an Aurora instance and it works. just a feedback. cheers |
I'm not sure the reason, or if simply
query stream
is not supported by redshift. Any idea?My code:
If my query
(info.sql)
has 1'000.000 of records, it doesn't works.In this case never shows any log inside
data
listener.Works perfect pointing to local DB with
postgress
.Maybe related with #32 in the
vitaly-t's
question.Using:
"pg-query-stream": "1.1.1",
node: 8.9.1
The text was updated successfully, but these errors were encountered: