Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import-books.js JSON.stringify RangeError #1

Open
mharoot opened this issue Dec 26, 2017 · 3 comments
Open

import-books.js JSON.stringify RangeError #1

mharoot opened this issue Dec 26, 2017 · 3 comments

Comments

@mharoot
Copy link

mharoot commented Dec 26, 2017

output:

beginning directory walk
/home/michael/Documents/DistributedSystemsNodeJS/Databases/node_modules/json-stringify-safe/stringify.js:5
return JSON.stringify(obj, serializer(replacer, cycleReplacer), spaces)
^

RangeError: Invalid string length

I am not sure why this is not working correctly. I took it directly from your github and got stuck here. I know this code was added 4 years ago. So I'm guessing some of the new updates in NodeJS made this program crash? Is there any new books I can follow. I learned a lot from 'Node.js the Right Way' and up until this point I'm stuck. Thank's in advance.

@mharoot
Copy link
Author

mharoot commented Dec 26, 2017

After taking out 54,000 folders from the cache directory I get a different error:

<--- Last few GCs --->

215777 ms: Scavenge 1408.3 (1447.1) -> 1408.3 (1447.1) MB, 0.7 / 0 ms (+ 6.2 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
215826 ms: Mark-sweep 1408.3 (1447.1) -> 1374.1 (1413.2) MB, 49.3 / 0 ms (+ 6.7 ms in 2 steps since start of marking, biggest step 6.2 ms) [last resort gc].
215868 ms: Mark-sweep 1374.1 (1413.2) -> 1374.1 (1413.2) MB, 41.6 / 0 ms [last resort gc].

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x816f69b4629
1: Join(aka Join) [native array.js:154] [pc=0x375975d719d8] (this=0x816f69041b9 ,o=0x386457e9e8a1 <JS Array[10]>,v=10,C=0x23f77d814f51 <String[1]: ,>,B=0x816f69b3e71 <JS Function ConvertToString (SharedFunctionInfo 0x816f6951c21)>)
2: InnerArrayJoin(aka InnerArrayJoin) [native array.js:331] [pc=0x375975d7084a] (this=0x816f69041b9 ,C=0x23f77d814f51 <String[1]: ,>,o=0...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted

@mharoot
Copy link
Author

mharoot commented Dec 26, 2017

I commented out the map functions and It began to enter the data in the CouchDB database. Why does the program crash using the map function from jQuery upon the first time it is called inside rdfParser.js?

      // authors: $('pgterms\\:agent pgterms\\:name').map(collect),
    
      // subjects: $('[rdf\\:resource$="/LCSH"] ~ rdf\\:value').map(collect)

I can see map function clearly works for getting the authors but it also gets all the other extras (not sure if this is a problem)

beginning directory walk, importing data to database books
{ _id: '1',
title: 'The Declaration of Independence of the United States of America',
authors:
{ '0': 'Jefferson, Thomas',
options:
{ withDomLvl1: true,
normalizeWhitespace: false,
xml: false,
decodeEntities: true },
_root: { '0': [Object], options: [Object], length: 1, _root: [Circular] },
length: 1,
prevObject:
{ '0': [Object],
options: [Object],
_root: [Object],
length: 1,
prevObject: [Object] } } }

@mharoot
Copy link
Author

mharoot commented Dec 26, 2017

'use strict';
const
  fs = require('fs'),
  cheerio = require('cheerio');

/**
 * Like the request module we used earlier, this module sets its exports to a function.  Users of the module will call this function, passing in a path to a file and a callback to invoke with the extracted data.
 */
module.exports = function(filename, callback) {
    
    function extract_array($obj) {
      let obj_array = Array();

      for (let i = 0; i < $obj.length; i++) {
        obj_array.push($obj[i]);
      }
      return obj_array;
    }
    // The main module function reads the specified file asynchronously, then loads the data into cheerio.
    
    fs.readFile(filename, function(err, data) {
      if (err) { 
        callback(err); 
        return;
      }
      let 
      // cheerio gives back an object we assign to the $ variable.  This object works much like the jQuery global function $--it provides methods for querying and modifying elements.
        $ = cheerio.load(data.toString()),
      
      // The collect function is a utility method for extracting an array of text nodes from a set of element nodes.
        collect = function(index, elem) {
            return $(elem).text();
        };

      // The bulk of the logic for this module is encapsulated in these four lines.
        callback(null, {
        // we look for the <pgterms:ebook> tag, read its rdf:about=attribute, and pull out just the numerical portion.
          _id: $('pgterms\\:ebook').attr('rdf:about').replace('ebooks/', ''),

        // we grab the text content of the <dcterms:title> tag.
          title: $('dcterms\\:title').text(),

        // we find all the <pgterms:name> elements under a <pgterms:agent>
          authors: extract_array( $('pgterms\\:agent pgterms\\:name').map(collect) ),
        
        // Lastly, we use the sibling operator (~) to find the <rdf:value> elements that are sibilings of any element whose rdf:resource = attribute ends in LCSH, and collect their text contents.
          //subjects: $('[rdf\\:resource$="/LCSH"] ~ rdf\\:value').map(collect)
            subjects: extract_array( $('dcterms\\:subject rdf\\:Description rdf\\:value').map(collect) )
        });
    });
};

I basically added a function to get rid of all the extra junk and it started working for the massive amount of files in cache/epub directory. Also the subjects were not being found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant