MultipartKit V5 #100

ptoffy · 2024-10-07T18:52:36Z

No description provided.

adam-fowler

Some initial comments.

Sources/MultipartKit/MultipartParser+AsyncStream.swift

Sources/MultipartKit/MultipartParser.swift

Sources/MultipartKit/MultipartParser+AsyncStream.swift

Sources/MultipartKit/MultipartParser.swift

ptoffy · 2024-10-30T12:05:04Z

@adam-fowler Do you want to take a look at this again and see if stuff makes more sense now? I added in some binary data (which even contains hex-CRLF 😄) to the tests so it should be able to parse anything now

ptoffy · 2024-11-18T20:10:28Z

@Joannis @simonjbeaumont @czechboy0 pulling you into this so you can take a look if you want

Package.swift

Sources/MultipartKit/FormDataDecoder/FormDataDecoder.SingleValueContainer.swift

Joannis · 2024-11-19T08:32:52Z

Sources/MultipartKit/FormDataDecoder/FormDataDecoder.swift

-    public func decode<D: Decodable>(_ decodable: D.Type, from data: String, boundary: String) throws -> D {
-        try decode(D.self, from: ByteBuffer(string: data), boundary: boundary)
+    public func decode<D: Decodable>(_ decodable: D.Type, from string: String, boundary: String) throws -> D {
+        try decode(D.self, from: Array(string.utf8), boundary: boundary)


This makes a copy from string, can't we use withContiguousMemoryIfAvailable?

Mhh I don't think that would work any better because we need the Collection there. We can't pass in the raw bytes and if we're copying them to an array we're still making a copy at that point right?

Since you're doing an .append on the parser you're right that you're already making a copy. Except right now you're making two copies of the same data, and each time you're also allocating space for that data.

No, I mean we can't pass the raw pointer to the decode method because it expects the collection of bytes, so this can't be done

try string.utf8.withContiguousStorageIfAvailable { bytes in decode(D.self, from: bytes, boundary: boundary) } ?? decode(D.self, from: Array(string.utf8), boundary: boundary)

And if we were to do something like

if let bytes = string.utf8.withContiguousStorageIfAvailable(Array.init) { try decode(D.self, from: bytes, boundary: boundary) } else { try decode(D.self, from: Array(string.utf8), boundary: boundary) }

we're still initialising an array with the raw bytes so I think this doesn't really save us a copy.
Unless you mean a different way of using withContiguousStorageIfAvailable

I would really prefer not copying unnecessarily in a library like this

Joannis · 2024-11-19T08:35:11Z

Sources/MultipartKit/MultipartParser+parse.swift

+        var currentBody = Body()
+
+        // Append data to the parser and process the sections
+        parser.append(buffer: data)


Is it necessary to copy data out before parsing?

This way we avoid a whole bunch of needMoreData returns from the parser

Could it be restructured to be parseOrAppend then?

if parserBuffer.isEmpty { let result = parser.parse(data) if result == .needMoreData { parser.append(data) } } else { parser.append(data) parser.parse() }

But why? We're just doing that once at the beginning, this is the sync parse

adam-fowler

Generic parameter changes look good

Sources/MultipartKit/FormDataEncoder/FormDataEncoder.swift

Sources/MultipartKit/FormDataEncoder/Storage.swift

Sources/MultipartKit/FormDataDecoder/FormDataDecoder+Decoder.swift

Sources/MultipartKit/FormDataEncoder/FormDataEncoder+Encoder.swift

Sources/MultipartKit/FormDataEncoder/FormDataEncoder+KeyedContainer.swift

adam-fowler · 2024-11-19T10:43:13Z

Sources/MultipartKit/MultipartParserAsyncSequence.swift

+                }
+            }
+        }
+    }


As I understand it the user can receive a body as multiple MultipartSections if the underlying AsyncSequence has broken that body up. This is great as we don't want to pay the memory for large bodies if we can. But there are situations where we want to ensure we have a complete body section eg a block of data we want to run a JSON decode on. Is it possible to add a helper function to the Iterator to do this? eg Iterator.collectBody(upTo: memoryLimit) which returns a header, complete body section

There's currently a parse method on MultipartParser which loads the input all at once. I'm guessing you mean something of a middle ground between this and stream parsing? E.g just for one part of the message

Yes just one part of the message eg I have a Multipart message with a zip file in there plus some metadata. I want to save the zip files to disk (using the least amount of memory possible ie streaming it) but parse the metadata with Codable so need the whole of it in memory.

Added 38bc3c1#diff-0d53ff899089448c57c2915f2315bbf9d9e50f9a16caa3e2669249e84ff87d35R76

0xTim

Some comments. This is starting to look good and we're getting close to be able to tag an alpha I think.

Would be good to get the performance tests set up so we can measure changes going forward and code coverage to see which edge cases we don't currently test

Sources/MultipartKit/MultipartSection.swift

Sources/MultipartKit/FormDataEncoder/Storage.swift

Sources/MultipartKit/MultipartParser.swift

adam-fowler

Looking good. Maybe we should have two async sequences though. One that does the collation automatically and one that requires you to use nextCollatedPart when you want a collated part.

I would rename the current AsyncSequence StreamingMultipartParserAsyncSequence and add the following

public struct MultipartParserAsyncSequence<BackingSequence: AsyncSequence>: AsyncSequence
where BackingSequence.Element: MultipartPartBodyElement & RangeReplaceableCollection {
    let streamingSequence: StreamingMultipartParserAsyncSequence<BackingSequence>

    public init(boundary: String, buffer: BackingSequence) {
        self.streamingSequence = .init(boundary: boundary, buffer: buffer)
    }

    public struct AsyncIterator: AsyncIteratorProtocol {
        public mutating func next() async throws -> MultipartSection<BackingSequence.Element>? {
            try await self.streamingIterator.nextCollatedPart()
        }

        var streamingIterator: StreamingMultipartParserAsyncSequence<BackingSequence>.AsyncIterator
    }

    public func makeAsyncIterator() -> AsyncIterator {
        return .init(streamingIterator: self.streamingSequence.makeAsyncIterator())
    }
}

Most people won't care about streaming sections of the multipart file, so the default should provide the collated parts. But if someone wants to stream the body part of a file they can use the StreamingMultipartParserAsyncSequence version.

adam-fowler · 2025-01-01T16:59:56Z

Sources/MultipartKit/MultipartParserAsyncSequence.swift

+                    self.currentCollatedBody.append(contentsOf: chunk)
+                    if !headerFields.isEmpty {
+                        let returningFields = headerFields
+                        headerFields = .init()


Why are you re-initializing the header fields, its a variable local to this function

Well if there's different parts in a message there might other headers later and this empties them before possibly reading other ones

adam-fowler · 2025-01-01T17:01:34Z

Sources/MultipartKit/MultipartParserAsyncSequence.swift

+            while let part = try await next() {
+                switch part {
+                case .headerFields(let fields):
+                    headerFields.append(contentsOf: fields)


Is there no chance you can read a header part after having read body parts?

Yes there is if the message has different parts

FYI in this case fields will always have just one field because the original sequence returns single header fields and body chunks. I'm not sure if this is misleading and should be clarified somehow, either with docs or with a name change

Sources/MultipartKit/MultipartParserAsyncSequence.swift

ptoffy · 2025-01-01T18:59:56Z

I agree with the proposed changes but I moved the nextCollatedPart() to simply be the next() method of the new sequence which makes more sense to me

0xTim

Nothing blocking on my end. Once other comments have been resolved we can look at integrating this into projects to see how it actually works

Joannis · 2025-01-04T17:10:41Z

Sources/MultipartKit/MultipartSerializer.swift

-                buffer.writeString(": ")
-                buffer.writeString(val)
-                buffer.writeString("\r\n")
+            buffer.append(contentsOf: Array("--\(boundary)".utf8) + crlf)


Instead of copying the boundary into an array, write the leading tokens, boundary and CRLF separately. Saves copies and allocs, thus a lot of work.

Joannis · 2025-01-04T17:10:55Z

Sources/MultipartKit/MultipartSerializer.swift

-                buffer.writeString("\r\n")
+            buffer.append(contentsOf: Array("--\(boundary)".utf8) + crlf)
+            for field in part.headerFields {
+                buffer.append(contentsOf: Array("\(field.description)".utf8) + crlf)


Same here, you can easily prevent a copy

Joannis · 2025-01-04T17:11:02Z

Sources/MultipartKit/MultipartSerializer.swift

-        buffer.writeString("--")
-        buffer.writeString(boundary)
-        buffer.writeString("--\r\n")
+        buffer.append(contentsOf: Array("--\(boundary)--".utf8) + crlf)


Unnecessary copy

Joannis · 2025-01-04T17:12:27Z

Sources/MultipartKit/MultipartSerializer.swift

+        parts: [MultipartPart<some MultipartPartBodyElement>],
+        into buffer: inout OutputBody
+    ) throws where OutputBody: RangeReplaceableCollection {
+        let crlf = Array("\r\n".utf8)


CRLF is copied into this a lot. I would keep it as "\r\n".utf8 and append those values. Don't copy it into an array if possible

Joannis

My main (only) issue is that this library makes a lot of copies out of every byte carrier (String, Array etc)

Start making the parser async

fc69abc

ptoffy added the semver-major Breaking changes label Oct 7, 2024

ptoffy self-assigned this Oct 7, 2024

ptoffy added 5 commits October 21, 2024 10:28

Make header parsing work

00d854e

Add body parsing support

00bc9d4

Housekeeping

232c2a8

Add more complex example test

86414b7

Move error throwing up one level

9822834

adam-fowler reviewed Oct 23, 2024

View reviewed changes

Apply suggestions and add binary data test

122018f

ptoffy requested a review from adam-fowler October 30, 2024 12:02

ptoffy added 3 commits November 5, 2024 15:23

Add sync parsing and serialising

64a014c

Wip

d6c26c3

Make encoding work again

d9a056e

ptoffy force-pushed the v5 branch from ad06379 to d9a056e Compare November 7, 2024 09:54

ptoffy added 2 commits November 7, 2024 10:55

Start generifying stuff

fee44de

Make encoders work with generics

e7c852a

ptoffy mentioned this pull request Nov 18, 2024

Compiler crash in Swift Testing swiftlang/swift#77674

Closed

Finish up en/decoding

93e27c9

ptoffy marked this pull request as ready for review November 18, 2024 15:54

ptoffy requested review from 0xTim and gwynne as code owners November 18, 2024 15:54

Remove NIO and add some docs

2e683dd

ptoffy requested review from czechboy0 and Joannis and removed request for adam-fowler November 18, 2024 20:06

Joannis reviewed Nov 19, 2024

View reviewed changes

Fix imports and rename some files

eabe133

Fix imports again

c230a4a

adam-fowler reviewed Nov 19, 2024

View reviewed changes

ptoffy added 2 commits November 20, 2024 10:25

Remove unnecessary Sendable conformances

b6b647e

Add iterator method to parse collated parts

38bc3c1

ptoffy requested a review from adam-fowler December 28, 2024 15:06

🤦‍♂️

5f6904d

ptoffy requested a review from Joannis December 28, 2024 15:08

ptoffy added 3 commits December 29, 2024 11:34

Add some tests and move errors to own files

a924a25

Housekeeping

658af49

More housekeeping

dc9b9b1

0xTim requested changes Jan 1, 2025

View reviewed changes

adam-fowler reviewed Jan 1, 2025

View reviewed changes

Sources/MultipartKit/MultipartParserAsyncSequence.swift Outdated Show resolved Hide resolved

Add separate sequence for collated parts streaming

f396391

ptoffy requested a review from adam-fowler January 1, 2025 18:58

Address feedback

69ab2c8

ptoffy requested a review from 0xTim January 1, 2025 21:17

0xTim approved these changes Jan 2, 2025

View reviewed changes

Joannis reviewed Jan 4, 2025

View reviewed changes

Joannis requested changes Jan 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultipartKit V5 #100

MultipartKit V5 #100

ptoffy commented Oct 7, 2024

adam-fowler left a comment

ptoffy commented Oct 30, 2024

ptoffy commented Nov 18, 2024

Joannis Nov 19, 2024

ptoffy Nov 19, 2024

Joannis Nov 19, 2024

ptoffy Nov 19, 2024 •

edited

Loading

Joannis Jan 4, 2025

Joannis Nov 19, 2024

ptoffy Nov 19, 2024

Joannis Nov 19, 2024

ptoffy Nov 19, 2024

adam-fowler left a comment

adam-fowler Nov 19, 2024

ptoffy Nov 19, 2024

adam-fowler Nov 19, 2024

ptoffy Dec 28, 2024

0xTim left a comment

adam-fowler left a comment •

edited

Loading

adam-fowler Jan 1, 2025

ptoffy Jan 1, 2025

adam-fowler Jan 1, 2025

ptoffy Jan 1, 2025

ptoffy Jan 1, 2025

ptoffy commented Jan 1, 2025

0xTim left a comment

Joannis Jan 4, 2025

Joannis Jan 4, 2025

Joannis Jan 4, 2025

Joannis Jan 4, 2025

Joannis left a comment

MultipartKit V5 #100

Are you sure you want to change the base?

MultipartKit V5 #100

Conversation

ptoffy commented Oct 7, 2024

adam-fowler left a comment

Choose a reason for hiding this comment

ptoffy commented Oct 30, 2024

ptoffy commented Nov 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptoffy Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adam-fowler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0xTim left a comment

Choose a reason for hiding this comment

adam-fowler left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptoffy commented Jan 1, 2025

0xTim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Joannis left a comment

Choose a reason for hiding this comment

ptoffy Nov 19, 2024 •

edited

Loading

adam-fowler left a comment •

edited

Loading