Skip to content

Conversation

@nleroy917
Copy link
Contributor

@nleroy917 nleroy917 commented Sep 30, 2025

I had a use case to pipe things around in my terminal.... particularly with bigWigAverageOverBed. Super small tweak to allow a user to specify - as the output path argument and it will pipe it out instead of writing to a file.

I suppose this logic could be duplicated to other tools. Btw thanks for this tool its much cleaner and faster than kentUtils 😄

Copy link
Owner

@jackh726 jackh726 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR :)

I like the idea in theory. In practice, I'm not a fan of the Box<dyn Write>. Rather, I think the write logic should just all get moved into a separate function with a generic O: Write, then a call would be made in each arm of the match.

Would love to apply this to the other files - maybe at least bigbedtobed, bigwigtobedgraph, and bigwigvaluesoverbed. Though, I'm okay if you want to just keep scope limited for this PR, and can open an issue to apply this to the other bins.

@nleroy917
Copy link
Contributor Author

nleroy917 commented Oct 7, 2025

@jackh726 Thanks for the response! I agree pulling out the logic and applying it to other files would be great. I see what you mean about using static dispatch with a generic function instead of Box<dyn Write>. To make sure we're on the same page, would that approach necessitate some restructure? Most things after if parallel would go inside run_computation

fn run_computation<W: Write, R: Reopen + BBIFileRead>(...) -> Result<...> { 
  // ... core logic ...
}

And in the main function:

match bedoutpath.as_str() {
    "-" => {
        // ... setup stdout writer
        run_computation(writer, ...)
    },
    _ => {
        // ... setup file writer
        run_computation(writer, ...)
    }
}

@jackh726
Copy link
Owner

jackh726 commented Oct 7, 2025

Yes, that's exactly what I had in mind :)

@nleroy917
Copy link
Contributor Author

Awesome that makes sense. yeah this isnt too bad at all! I can try that out, re-push and you can let me know what you think

@nleroy917
Copy link
Contributor Author

nleroy917 commented Oct 7, 2025

@jackh726 I've got functionality going for bigwigaverageoverbed but a followup question: I'm seeing a lot of buffer switching for cases where there's an overlap bed file. It usually looks like this

runtime.block_on(async move {
        loop {
            let next = buf_rcv.next().await;
            let Some(mut buf) = next else {
                data_handle.await.unwrap()?;
                return Ok::<_, BBIReadError>(());
            };

            buf.switch(out_file);
            while !buf.is_real_file_ready() {
                tokio::task::yield_now().await;
            }
            out_file = buf.await_real_file();
        }
    })?;

This makes static dispatch difficult since these functions are sort of inherently assuming that we will be giving them a file so its hard to make it abstract via traits and just pass in a Write. Does that make sense? It works easily for the single-threaded case. Maybe its not possible for multi-threaded (a bit outside my wheelhouse now).

@nleroy917
Copy link
Contributor Author

In summary:

  • bigwigaverageoverbed ✅ easy enough and done (should be)
  • bigwigvaluesoverbed ✅ easy enough and done (should be)
  • bigbedtobed almost complete, but getting tripped up by multi-threaded case
  • bigwigtobedgraph almost complete, but getting tripped up by multi-threaded case

@jackh726
Copy link
Owner

jackh726 commented Oct 8, 2025

Hmm, not sure I follow. TempFileBuffer is generic over R: Write. Maybe it's the Send + 'static bounds? But just quickly checking, and everything is fine with those removed.

Maybe you want to push a commit where things break and I can help give some concrete feedback.

@nleroy917
Copy link
Contributor Author

@jackh726 just pushed. I worked more this morning on it, but still stuck. I can try to be more specific. I think that core issue (from what I can see) is this: the original write_bed function is tightly coupled to the idea of writing to temporary files and then concatenating them. While clearly beneficial for parallelizing I/O to a single output file I think it breaks down when the target is a non-seekable, single stream like stdout.

Essentially, the crux of the issue is when we try to run the future in runtime.block_on. This is the current implementation:

runtime.block_on(async move {
        loop {
            let next = buf_rcv.next().await;
            let Some(mut buf) = next else {
                data_handle.await.unwrap()?;
                return Ok::<_, BBIReadError>(());
            };

            buf.switch(out_file);
            while !buf.is_real_file_ready() {
                tokio::task::yield_now().await;
            }
            out_file = buf.await_real_file();
        }
    })?;

It seems to be waiting for the buffer to become available, then it attempts to switch the buffer from that TempFileBuffer<File> struct into the out_file (a File). I initially thought that i could do something like this and use io::copy to transfer the contents to anything that implements Write:

runtime.block_on(async move {
    loop {
        let next = buf_rcv.next().await;
        let Some(buf) = next else {
            data_handle.await.unwrap()?;
            return Ok::<_, BBIReadError>(());
        };

        // get a readable handle to the temp buffer.
        let mut reader = buf.into_reader()?;

        // copy the entire contents of the temp buffer to our generic writer.
        io::copy(&mut reader, &mut writer)?;
    }
})?;

Which I think is close, but I think we'd need to implement Read for TempFileBuffer<File>:

no method named `into_reader` found for struct `utils::file::tempfilebuffer::TempFileBuffer` in the current scope

Does that make sense? I might be way over complicating things. But regardless, I pushed what I have and what I think is close... its erroring and maybe you can take a glance

@jackh726
Copy link
Owner

jackh726 commented Oct 8, 2025

I just pushed a commit that fixes this. I was just hard-coding File in a couple places that was unnecessary.

Haven't otherwise reviewed all your changes, but two things that seem like should be done:

  • Update bigwigtobedgraph to allow -
  • Can you also remove the Send + 'static bounds in this area? Like I said, doesn't seem like they're needed, so the drive-by cleanup is helpful.

@nleroy917
Copy link
Contributor Author

Oh that was a simple fix I see now. Ok thank you I should have caught that.

  • Went ahead and wrapped up the four functions
  • Removed Send + 'static from trait bounds where it was possible

Copy link
Owner

@jackh726 jackh726 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes overall look good - though, I think we should be keeping the BufWriter around the inner O: Writes, since we don't want to be syscalling for every byte write (though, tbh, I haven't benchmarked one vs the other, it may be that removing the BufWriter is actually faster)

Also, can you squash you commits?


let bedin = File::open(bedinpath).unwrap();
let mut bedstream = StreamingLineReader::new(BufReader::new(bedin));
let mut outwriter = BufWriter::new(out);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

@nleroy917
Copy link
Contributor Author

nleroy917 commented Oct 8, 2025

Changes overall look good - though, I think we should be keeping the BufWriter around the inner O: Writes, since we don't want to be syscalling for every byte write (though, tbh, I haven't benchmarked one vs the other, it may be that removing the BufWriter is actually faster)

Also, can you squash you commits?

Gotcha. Essentially this?

match bedpath.as_str() {
    "-" => {
        let stdout = io::stdout();
        write_bed_from_bed(bigbed, writer, overlap_bed)?;
    }
     _ => {
        let file = File::create(bedpath)?;
        write_bed_from_bed(bigbed, writer, overlap_bed)?;
    }
}

then, inside write_bed_from_bed:

fn write_bed_from_bed<O: Write>(writer: O) {
    let bufwriter = BufWriter::new(writer);
    // use it
}

And yes, I'll squash those commits

@nleroy917
Copy link
Contributor Author

nleroy917 commented Oct 9, 2025

@jackh726 still working on this got pulled away last night/afternoon. Just to follow up and clarify:

You essentially want the BufWriter created inside the actually functions that run computation instead of prior to passing into them -- is that correct? Basically, what I laid out above. Just want to confirm to you that I wasnt removing theBufWriters. I saw you commented on those lines being removed in the diff, but they are still there (just passed in).

Once I'm done with the changes, I'll squash the commits as well

@jackh726
Copy link
Owner

jackh726 commented Oct 9, 2025

Ah, maybe I missed that they were created before passing in - but yes, please keep them (and keep them in the same place with the same args on creation).

@nleroy917
Copy link
Contributor Author

Ok, @jackh726 moved back to BufWriter being created next to args. Linted. Commits squashed (I think). Lmk what other feedback you have and any other thoughts!

@nleroy917 nleroy917 requested a review from jackh726 October 9, 2025 23:37
@jackh726
Copy link
Owner

Hmm, don't see your changes. Maybe you didn't force push (and thought the push was successful?)

@nleroy917
Copy link
Contributor Author

nleroy917 commented Oct 16, 2025

Not sure... I think they are there if I look at "files changed." This is what I can see on my end:

nathanleroy@macbookpro-9 bigtools % git push
Everything up-to-date
nathanleroy@macbookpro-9 bigtools % git push --force
Everything up-to-date
nathanleroy@macbookpro-9 bigtools % git status
On branch bwaob_std_out
Your branch is up to date with 'origin/bwaob_std_out'.

nothing to commit, working tree clean
nathanleroy@macbookpro-9 bigtools

The changes were easy enough, I could just close this, start a new branch/fork, and retry. That side-steps the commit squashing altogether as well.

@jackh726
Copy link
Owner

No worries - I'll just get this in later today :) (I can do the last bits needed.) Thanks for your PR!

@jackh726 jackh726 force-pushed the bwaob_std_out branch 3 times, most recently from 66b18a6 to 2811cf8 Compare October 16, 2025 22:01
@jackh726
Copy link
Owner

Okay, made all the changes.

Again, thanks for the pull request. This is great work, and I appreciate you putting up with the review process :)

@jackh726 jackh726 merged commit 4415da5 into jackh726:master Oct 16, 2025
5 checks passed
@nleroy917
Copy link
Contributor Author

Thanks for reviewing and letting me contribute!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants