Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script tags can contain "less-than" symbol #140

Merged
merged 2 commits into from
Oct 20, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions html-conduit/ChangeLog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## 1.3.2

* Fix a bug that was removing `<` symbols in script tags.

## 1.3.1

* Inline tagstream-conduit for entity decoding in attribute value bug
Expand Down
2 changes: 1 addition & 1 deletion html-conduit/html-conduit.cabal
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Name: html-conduit
Version: 1.3.1
Version: 1.3.2
Synopsis: Parse HTML documents using xml-conduit datatypes.
Description: This package uses tagstream-conduit for its parser. It automatically balances mismatched tags, so that there shouldn't be any parse failures. It does not handle a full HTML document rendering, such as adding missing html and head tags. Note that, since version 1.3.1, it uses an inlined copy of tagstream-conduit with entity decoding bugfixes applied.
Homepage: https://github.com/snoyberg/xml
Expand Down
2 changes: 1 addition & 1 deletion html-conduit/src/Text/HTML/TagStream.hs
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ tillScriptEnd open =
chunk <- takeTill (== '<')
let acc' = acc <> B.fromText chunk
finish = pure [open, Text $ L.toStrict $ B.toLazyText acc', TagClose "script"]
hasContent = (string "/script>" *> finish) <|> loop acc'
hasContent = (string "/script>" *> finish) <|> loop (acc' <> "<")
(char '<' *> hasContent) <|> finish

tokens :: Parser [Token]
Expand Down
4 changes: 2 additions & 2 deletions html-conduit/test/main.hs
Original file line number Diff line number Diff line change
Expand Up @@ -120,9 +120,9 @@ main = hspec $ do

describe "script tags" $ do
it "ignores funny characters" $
let html = "<script>hello > world</script>"
let html = "<script>hello <> world</script>"
doc = X.Document (X.Prologue [] Nothing []) root []
root = X.Element "script" Map.empty [X.NodeContent "hello > world"]
root = X.Element "script" Map.empty [X.NodeContent "hello <> world"]
in H.parseLBS html @?= doc

{-
Expand Down