Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created ScatteredLabelledArcsASCIIGraph #12

Open
wants to merge 86 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
2d22671
extracted ArcLabelledBatchGraph from transposeOffline
lfoscari Feb 17, 2023
b06c525
moved ArcLabelledImmutableGraph, changed processTransposeBatch to ret…
lfoscari Feb 18, 2023
a1f6db1
created ScatteredLabelledArcsASCIIGraph by copying ScatteredArcsASCII…
lfoscari Feb 18, 2023
25e86cb
fixed batchSize parameter in scatteredlabelledarcsasciigraph construc…
lfoscari Feb 19, 2023
72289cf
reflected changes to iterator constructor
lfoscari Feb 19, 2023
5202550
added unit tests adapting the ones from ScatteredArcsASCIIGraph, now …
lfoscari Feb 20, 2023
b83835d
applied law formatter
lfoscari Feb 20, 2023
821eec1
added toString with labels
lfoscari Feb 20, 2023
e0ae0c7
fixed toString error
lfoscari Feb 20, 2023
10c1152
changed processTransposeBatch to prune duplicate arcs
lfoscari Feb 20, 2023
d654603
minor refactoring and formatting
lfoscari Feb 20, 2023
8a9162c
better logging, added label tests (failing)
lfoscari Feb 20, 2023
69327b1
added labelMergeStartegy parameter
lfoscari Feb 20, 2023
b9b758d
started implementing label merge strategy to handle duplicate arcs wi…
lfoscari Feb 20, 2023
e9a1294
now same arcs with different labels will have their labels merged
lfoscari Feb 21, 2023
e851e42
introduced label merging also when scanning the batched, adapted cons…
lfoscari Feb 21, 2023
9a6040c
fixed wrongly stated tests
lfoscari Feb 21, 2023
5d7b8d2
added labelmergestrategy tests, fixed bug in which the label output b…
lfoscari Feb 21, 2023
5df294b
comments cleanup
lfoscari Feb 21, 2023
9ef2378
added node mapping function to constructor with iterator
lfoscari Feb 21, 2023
8c34c33
Towards reducing class duplication
vigna Feb 21, 2023
7bc41c1
Towards reducing class duplication; ported back main method
vigna Feb 21, 2023
a1fda51
Abstracted id code
vigna Feb 21, 2023
d4ab003
Reimplemented the id big map in the new fastutil style
vigna Feb 22, 2023
34e8b86
Better name
vigna Feb 22, 2023
98dc864
Better handling of the zero key
vigna Feb 22, 2023
1dbbcc1
Towards reducing class duplication
vigna Feb 21, 2023
fc5750e
Towards reducing class duplication; ported back main method
vigna Feb 21, 2023
9ffc0b9
Abstracted id code
vigna Feb 21, 2023
97c6ce4
Reimplemented the id big map in the new fastutil style
vigna Feb 22, 2023
4befba4
Better name
vigna Feb 22, 2023
6ae41ba
Better handling of the zero key
vigna Feb 22, 2023
83b1951
Added script to process BlockChair files
vigna Feb 22, 2023
090a952
Fixed merge
vigna Feb 22, 2023
32f759f
Added fix in docs
vigna Feb 22, 2023
241e703
Fixed coinbase test
vigna Feb 22, 2023
8da9b9a
switched to Id2NodeMap inside constructor
lfoscari Feb 22, 2023
60c9298
Abstracted Elias-Fano size estimation
vigna Feb 22, 2023
a30ca77
fixed main method to read graph from stdin
lfoscari Feb 22, 2023
ec7139e
reordered parameters in constructor for consistency, added more const…
lfoscari Feb 22, 2023
33e5d77
Added script to process BlockChair files
vigna Feb 22, 2023
ff83063
Fixed merge
vigna Feb 22, 2023
f1f69b6
Added fix in docs
vigna Feb 22, 2023
b9b89fc
Fixed coinbase test
vigna Feb 22, 2023
21d0fc9
Abstracted Elias-Fano size estimation
vigna Feb 22, 2023
d38aab1
nulled label offset array to trick gc
lfoscari Feb 23, 2023
797c55e
removed prototype from iterator constructor
lfoscari Feb 23, 2023
d385177
Added end-of-line test; fixed C collation order for sort
vigna Feb 24, 2023
5cc1c31
Automatic end-of-line fixing
vigna Feb 24, 2023
ad21243
Removed fixing, cut is sufficient
vigna Feb 24, 2023
63b9f56
Transaction first
vigna Feb 24, 2023
62380ad
Restored check for efficiency
vigna Feb 24, 2023
4589edf
extracted ArcLabelledBatchGraph from transposeOffline
lfoscari Feb 17, 2023
b3301a3
moved ArcLabelledImmutableGraph, changed processTransposeBatch to ret…
lfoscari Feb 18, 2023
5e3b33d
created ScatteredLabelledArcsASCIIGraph by copying ScatteredArcsASCII…
lfoscari Feb 18, 2023
7d8f50d
fixed batchSize parameter in scatteredlabelledarcsasciigraph construc…
lfoscari Feb 19, 2023
f70442f
reflected changes to iterator constructor
lfoscari Feb 19, 2023
f159e50
added unit tests adapting the ones from ScatteredArcsASCIIGraph, now …
lfoscari Feb 20, 2023
0609852
applied law formatter
lfoscari Feb 20, 2023
c952e2d
added toString with labels
lfoscari Feb 20, 2023
9f9fceb
fixed toString error
lfoscari Feb 20, 2023
bab8f64
changed processTransposeBatch to prune duplicate arcs
lfoscari Feb 20, 2023
7f86ce9
minor refactoring and formatting
lfoscari Feb 20, 2023
ac507d3
better logging, added label tests (failing)
lfoscari Feb 20, 2023
3d3453f
added labelMergeStartegy parameter
lfoscari Feb 20, 2023
36df18f
started implementing label merge strategy to handle duplicate arcs wi…
lfoscari Feb 20, 2023
4fcec28
now same arcs with different labels will have their labels merged
lfoscari Feb 21, 2023
25797c0
introduced label merging also when scanning the batched, adapted cons…
lfoscari Feb 21, 2023
b7ec40d
fixed wrongly stated tests
lfoscari Feb 21, 2023
2d3d695
added labelmergestrategy tests, fixed bug in which the label output b…
lfoscari Feb 21, 2023
d2a080b
comments cleanup
lfoscari Feb 21, 2023
abe9f56
added node mapping function to constructor with iterator
lfoscari Feb 21, 2023
e779e69
switched to Id2NodeMap inside constructor
lfoscari Feb 22, 2023
ce47046
fixed main method to read graph from stdin
lfoscari Feb 22, 2023
a8f8b0f
reordered parameters in constructor for consistency, added more const…
lfoscari Feb 22, 2023
e84ec22
nulled label offset array to trick gc
lfoscari Feb 23, 2023
3698357
removed prototype from iterator constructor
lfoscari Feb 23, 2023
7a099a2
merge fix
lfoscari Feb 27, 2023
dc1c516
Revert "removed prototype from iterator constructor"
lfoscari Feb 27, 2023
e104562
found error in Transform, to be fixed
lfoscari Mar 2, 2023
687f2a6
refactoring labelMergeStrategy
lfoscari Mar 3, 2023
bfd8c0d
removed overwrite on the labels, instead we delay writing the label u…
lfoscari Mar 3, 2023
241690a
cleanup
lfoscari Mar 3, 2023
c50b268
now computing outdegree when copying by calling the proper method
lfoscari Mar 6, 2023
5e4c452
minor refactoring
lfoscari Mar 7, 2023
ccc1ee0
missing dots, spelling
lfoscari Apr 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions bash/blockchair.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/bin/bash -e

if [[ "$2" == "" ]]; then
echo "$(basename $0) DIR NTHREADS [OUTPUT]" 1>&2
echo "Reads files in DIR and processes them using NTHREADS parallel sorts." 1>&2
echo "Files are processed as input files unless OUTPUT is specified." 1>&2
echo "FILES MUST END WITH A NEWLINE. Fix them with \"sed -i -e '\$a\\' *\"." 1>&2
exit 1
fi

DIR=$1
NTHREADS=$2
OUTPUT=$3

function file_ends_with_newline() {
[[ $(tail -c1 "$1" | wc -l) -gt 0 ]]
}

FILES=$(mktemp)
find $DIR -type f >$FILES

# Check that all files end with a newline

while read FILE; do
if ! file_ends_with_newline $FILE; then
echo "File $FILE does not end with a newline" 1>&2
exit 1
fi
done <$FILES

NFILES=$(cat $FILES | wc -l)

# To avoid empty splits, there must be at least as many threads as files

if (( NFILES < NTHREADS )); then
NTHREADS=$NFILES
echo "Not enough files: number of threads set to $NFILES" 1>&2
fi

SPLITBASE=$(mktemp)
split -n l/$NTHREADS $FILES $SPLITBASE
SPLITS=$(for file in ${SPLITBASE}?*; do echo $file; done)

for SPLIT in $SPLITS; do
mkfifo $SPLIT.pipe
if [[ "$OUTPUT" != "" ]]; then
(tail -q -n+2 $(cat $SPLIT) | cut -f2,7,10 | awk '{ if ($3 == 0) print $1 "\t" $2 }' | LC_ALL=C sort -S2G >$SPLIT.pipe) &
else
(tail -q -n+2 $(cat $SPLIT) | cut -f7,13 | awk '{ print $2 "\t" $1 }' | LC_ALL=C sort -S2G >$SPLIT.pipe) &
fi
done

LC_ALL=C sort -S2G -m $(for SPLIT in $SPLITS; do echo $SPLIT.pipe; done)

rm -f $FILES
rm -f ${SPLITBASE}*
6 changes: 3 additions & 3 deletions src/it/unimi/dsi/webgraph/EFGraph.java
Original file line number Diff line number Diff line change
Expand Up @@ -637,14 +637,14 @@ public static EFGraph loadOffline(final CharSequence basename) throws IOExceptio
return EFGraph.loadMapped(basename, null);
}

/** An iterator returning the offsets. */
private final static class OffsetsLongIterator implements LongIterator {
/** An iterator returning offsets by reading &delta;-encoded gaps. */
public final static class OffsetsLongIterator implements LongIterator {
private final InputBitStream offsetIbs;
private final long n;
private long offset;
private long i;

private OffsetsLongIterator(final InputBitStream offsetIbs, final long n) {
public OffsetsLongIterator(final InputBitStream offsetIbs, final long n) {
this.offsetIbs = offsetIbs;
this.n = n;
}
Expand Down
Loading