From d6e5aa3bdcae7c5f84f794ccc09c59dbe6666725 Mon Sep 17 00:00:00 2001
From: Richard Feldman <oss@rtfeldman.com>
Date: Mon, 16 Mar 2020 00:13:02 -0400
Subject: [PATCH 1/6] Write some Str docs

---
 compiler/builtins/docs/Str.roc | 110 +++++++++++++++++++++++++++++++--
 1 file changed, 105 insertions(+), 5 deletions(-)

diff --git a/compiler/builtins/docs/Str.roc b/compiler/builtins/docs/Str.roc
index 450189640c..39664350be 100644
--- a/compiler/builtins/docs/Str.roc
+++ b/compiler/builtins/docs/Str.roc
@@ -2,12 +2,66 @@ api Str provides Str, isEmpty, join
 
 ## Types
 
-## A sequence of [UTF-8](https://en.wikipedia.org/wiki/UTF-8) text characters.
+## A [Unicode](https://unicode.org) text value.
+##
+## Dealing with text is deep topic, so by design, Roc's `Str` module sticks
+## to the basics. For more advanced uses such as working with raw [code points](https://en.wikipedia.org/wiki/Code_point),
+## see the [roc/unicode](roc/unicode) package, and for locale-specific text
+## functions (including capitalization, as capitalization rules vary by locale)
+## see the [roc/locale](roc/locale) package.
+##
+## ### Unicode
+##
+## Unicode can represent text values which span multiple languages, symbols, and emoji.
+## Here are some valid Roc strings:
+##
+## * "Roc"
+## * "鹏"
+## * "🐦"
+##
+## Every Unicode string is a sequence of [grapheme clusters](https://unicode.org/glossary/#grapheme_cluster).
+## A grapheme cluster corresponds to what a person reading a string might call
+## a "character", but because the term "character" is used to mean many different
+## concepts across different programming languages, we intentionally avoid it in Roc.
+## Instead, we use the term "clusters" as a shorthand for "grapheme clusters."
+##
+## You can get the number of grapheme clusters in a string by calling `Str.countClusters` on it:
+##
+## >>> Str.countClusters "Roc"
+##
+## >>> Str.countClusters "音乐"
+##
+## >>> Str.countClusters "👍"
+##
+## > The `countClusters` function traverses the entire string to calculate its answer,
+## > so it's much better for performance to use `Str.isEmpty` instead of
+## > calling `Str.countClusters` and checking whether the count was `0`.
+##
+## ### Escape characters
+##
+## ### String interpolation
+##
+## ### Encoding
+##
+## Whenever any Roc string is created, its [encoding](https://en.wikipedia.org/wiki/Character_encoding)
+## comes from a configuration option chosen by [the host](guide|hosts).
+## Because of this, None of the functions in this module
+## make assumptions about the underlying encoding. After all, different hosts
+## may choose different encodings! Here are some factors hosts may consider
+## when deciding which encoding to choose:
+##
+## * Linux APIs typically use [UTF-8](https://en.wikipedia.org/wiki/UTF-8) encoding
+## * Windows APIs and Apple [Objective-C](https://en.wikipedia.org/wiki/Objective-C) APIs typically use [UTF-16](https://en.wikipedia.org/wiki/UTF-16) encoding
+## * Hosts which call [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) functions may choose [MUTF-8](https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8) to disallow a valid UTF-8 character which can prematurely terminate C strings
+##
+## > Roc strings only support Unicode, so they do not support non-Unicode character
+## > encodings like [ASCII](https://en.wikipedia.org/wiki/ASCII).
+##
+## To write code which behaves differently depending on which encoding the host chose,
+## the #Str.codeUnits function will do that. However, if you are doing encoding-specific work,
+## you should take a look at the [roc/unicode](roc/unicode) pacakge;
+## it has many more tools than this module does.
 ##
-## One #Str can be up to 2 gigabytes in size. If you need to store larger
-## strings than that, you can split them into smaller chunks and operate
-## on those instead of on one large #Str. This often runs faster in practice,
-## even for strings much smaller than 2 gigabytes.
 Str : [ @Str ]
 
 ## Convert
@@ -59,3 +113,49 @@ padStart : Str, Int, Str -> Str
 padEnd : Str, Int, Str -> Str
 
 
+foldClusters : Str, { start: state, step: (state, Str -> state) } -> state
+
+## Returns #True if the string begins with a capital letter, and #False otherwise.
+##
+## >>> Str.isCapitalized "hi"
+##
+## >>> Str.isCapitalized "Hi"
+##
+## >>> Str.isCapitalized " Hi"
+##
+## >>> Str.isCapitalized "Česká"
+##
+## >>> Str.isCapitalized "Э"
+##
+## >>> Str.isCapitalized "東京"
+##
+## >>> Str.isCapitalized "🐦"
+##
+## >>> Str.isCapitalized ""
+##
+## Since the rules for how to capitalize an uncapitalized string vary by locale,
+## see the [roc/locale](roc/locale) package for functions which do that.
+isCapitalized : Str -> Bool
+
+
+## Deconstruct the string into raw code unit integers. (Note that code units
+## are not the same as code points; to work with code points, see [roc/unicode](roc/unicode)).
+##
+## This returns a different tag depending on the string encoding chosen by the host.
+##
+## The size of an individual code unit depends on the encoding. For example,
+## in UTF-8 and MUTF-8, a code unit is 8 bits, so those encodings
+## are returned as `List U8`. In contrast, UTF-16 encoding uses 16-bit code units,
+## so the `Utf16` tag holds a `List U16` instead.
+##
+## > Code units are no substitute for grapheme clusters!
+## >
+## > For example, `Str.countGraphemes "👍"` always returns `1` no matter what,
+## > whereas `Str.codeUnits "👍"` could give you back a `List U8` with a length
+## > of 4, or a `List U16` with a length of 2, neither of which is equal to
+## > the correct number of grapheme clusters in that string.
+## >
+## > This function exists for more advanced use cases like those found in
+## > [roc/unicode](roc/unicode), and using code points when grapheme clusters would
+## > be more appropriate can very easily lead to bugs.
+codeUnits : Str -> [ Utf8 (List U8), Mutf8 (List U8), Ucs2 (List U16), Utf16 (List U16), Utf32 (List U32) ]

From aa3030ab85fbb940cce5eb506e484e175a9f1c2e Mon Sep 17 00:00:00 2001
From: Richard Feldman <oss@rtfeldman.com>
Date: Mon, 16 Mar 2020 01:51:12 -0400
Subject: [PATCH 2/6] Revise Str docs

---
 compiler/builtins/docs/Str.roc | 59 ++++++++++++++++------------------
 1 file changed, 27 insertions(+), 32 deletions(-)

diff --git a/compiler/builtins/docs/Str.roc b/compiler/builtins/docs/Str.roc
index 39664350be..db8ff78c1e 100644
--- a/compiler/builtins/docs/Str.roc
+++ b/compiler/builtins/docs/Str.roc
@@ -43,25 +43,16 @@ api Str provides Str, isEmpty, join
 ##
 ## ### Encoding
 ##
-## Whenever any Roc string is created, its [encoding](https://en.wikipedia.org/wiki/Character_encoding)
-## comes from a configuration option chosen by [the host](guide|hosts).
-## Because of this, None of the functions in this module
-## make assumptions about the underlying encoding. After all, different hosts
-## may choose different encodings! Here are some factors hosts may consider
-## when deciding which encoding to choose:
-##
-## * Linux APIs typically use [UTF-8](https://en.wikipedia.org/wiki/UTF-8) encoding
-## * Windows APIs and Apple [Objective-C](https://en.wikipedia.org/wiki/Objective-C) APIs typically use [UTF-16](https://en.wikipedia.org/wiki/UTF-16) encoding
-## * Hosts which call [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) functions may choose [MUTF-8](https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8) to disallow a valid UTF-8 character which can prematurely terminate C strings
-##
-## > Roc strings only support Unicode, so they do not support non-Unicode character
-## > encodings like [ASCII](https://en.wikipedia.org/wiki/ASCII).
-##
-## To write code which behaves differently depending on which encoding the host chose,
-## the #Str.codeUnits function will do that. However, if you are doing encoding-specific work,
-## you should take a look at the [roc/unicode](roc/unicode) pacakge;
-## it has many more tools than this module does.
+## Roc strings are not coupled to any particular
+## [encoding](https://en.wikipedia.org/wiki/Character_encoding). As it happens,
+## they are currently encoded in UTF-8, but this module is intentionally designed
+## not to rely on that implementation detail so that a future release of Roc can
+## potentially change it without breaking existing Roc applications.
 ##
+## This module has functions to can convert a #Str to a #List of raw code unit integers
+## in a particular encoding, but if you are doing encoding-specific work,
+## you should take a look at the [roc/unicode](roc/unicode) pacakge.
+## It has many more tools than this module does!
 Str : [ @Str ]
 
 ## Convert
@@ -137,25 +128,29 @@ foldClusters : Str, { start: state, step: (state, Str -> state) } -> state
 ## see the [roc/locale](roc/locale) package for functions which do that.
 isCapitalized : Str -> Bool
 
-
-## Deconstruct the string into raw code unit integers. (Note that code units
-## are not the same as code points; to work with code points, see [roc/unicode](roc/unicode)).
+## ## Code Units
 ##
-## This returns a different tag depending on the string encoding chosen by the host.
+## Besides grapheme clusters, another way to break down strings is into
+## raw code unit integers.
 ##
-## The size of an individual code unit depends on the encoding. For example,
-## in UTF-8 and MUTF-8, a code unit is 8 bits, so those encodings
-## are returned as `List U8`. In contrast, UTF-16 encoding uses 16-bit code units,
-## so the `Utf16` tag holds a `List U16` instead.
+## The size of a code unit depends on the string's encoding. For example, in a
+## string encoded in UTF-8, a code unit is 8 bits. This is why #Str.toUtf8
+## returns a `List U8`. In contrast, UTF-16 encoding uses 16-bit code units,
+## so #Str.toUtf16 returns a `List U16` instead.
 ##
 ## > Code units are no substitute for grapheme clusters!
 ## >
 ## > For example, `Str.countGraphemes "👍"` always returns `1` no matter what,
-## > whereas `Str.codeUnits "👍"` could give you back a `List U8` with a length
-## > of 4, or a `List U16` with a length of 2, neither of which is equal to
-## > the correct number of grapheme clusters in that string.
+## > whereas `Str.toUtf8 "👍"` returns a list with a length of 4,
+## > and `Str.toUtf16 "👍"` returns a list with a length of 2.
 ## >
-## > This function exists for more advanced use cases like those found in
-## > [roc/unicode](roc/unicode), and using code points when grapheme clusters would
+## > These functions exists for more advanced use cases like those found in
+## > [roc/unicode](roc/unicode), and using code units when grapheme clusters would
 ## > be more appropriate can very easily lead to bugs.
-codeUnits : Str -> [ Utf8 (List U8), Mutf8 (List U8), Ucs2 (List U16), Utf16 (List U16), Utf32 (List U32) ]
+
+toUtf8 : Str -> List U8
+
+toUtf16 : Str -> List U16
+
+toUtf32 : Str -> List U16
+

From 1bee949ad077f3dd9852feeceb4bd082dcacccf4 Mon Sep 17 00:00:00 2001
From: Richard Feldman <oss@rtfeldman.com>
Date: Mon, 16 Mar 2020 02:06:12 -0400
Subject: [PATCH 3/6] Fix some Str docs

---
 compiler/builtins/docs/Str.roc | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/compiler/builtins/docs/Str.roc b/compiler/builtins/docs/Str.roc
index db8ff78c1e..092cdc053a 100644
--- a/compiler/builtins/docs/Str.roc
+++ b/compiler/builtins/docs/Str.roc
@@ -5,7 +5,7 @@ api Str provides Str, isEmpty, join
 ## A [Unicode](https://unicode.org) text value.
 ##
 ## Dealing with text is deep topic, so by design, Roc's `Str` module sticks
-## to the basics. For more advanced uses such as working with raw [code points](https://en.wikipedia.org/wiki/Code_point),
+## to the basics. For more advanced use cases like working with raw [code points](https://en.wikipedia.org/wiki/Code_point),
 ## see the [roc/unicode](roc/unicode) package, and for locale-specific text
 ## functions (including capitalization, as capitalization rules vary by locale)
 ## see the [roc/locale](roc/locale) package.
@@ -133,24 +133,18 @@ isCapitalized : Str -> Bool
 ## Besides grapheme clusters, another way to break down strings is into
 ## raw code unit integers.
 ##
-## The size of a code unit depends on the string's encoding. For example, in a
-## string encoded in UTF-8, a code unit is 8 bits. This is why #Str.toUtf8
-## returns a `List U8`. In contrast, UTF-16 encoding uses 16-bit code units,
-## so #Str.toUtf16 returns a `List U16` instead.
+## Code units are no substitute for grapheme clusters!
+## These functions exist to support advanced use cases like those found in
+## [roc/unicode](roc/unicode), and using code units when grapheme clusters would
+## be more appropriate can very easily lead to bugs.
 ##
-## > Code units are no substitute for grapheme clusters!
-## >
-## > For example, `Str.countGraphemes "👍"` always returns `1` no matter what,
-## > whereas `Str.toUtf8 "👍"` returns a list with a length of 4,
-## > and `Str.toUtf16 "👍"` returns a list with a length of 2.
-## >
-## > These functions exists for more advanced use cases like those found in
-## > [roc/unicode](roc/unicode), and using code units when grapheme clusters would
-## > be more appropriate can very easily lead to bugs.
+## For example, `Str.countGraphemes "👍"` returns `1`,
+## whereas `Str.toUtf8 "👍"` returns a list with a length of 4,
+## and `Str.toUtf16 "👍"` returns a list with a length of 2.
 
 toUtf8 : Str -> List U8
 
 toUtf16 : Str -> List U16
 
-toUtf32 : Str -> List U16
+toUtf32 : Str -> List U32
 

From 0ed8f90f110e200dda21068ff48b9a9a44896b8c Mon Sep 17 00:00:00 2001
From: Richard Feldman <oss@rtfeldman.com>
Date: Mon, 16 Mar 2020 02:25:31 -0400
Subject: [PATCH 4/6] Fix some type signatures in Str docs

---
 compiler/builtins/docs/Str.roc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/compiler/builtins/docs/Str.roc b/compiler/builtins/docs/Str.roc
index 092cdc053a..452f828991 100644
--- a/compiler/builtins/docs/Str.roc
+++ b/compiler/builtins/docs/Str.roc
@@ -66,10 +66,10 @@ Str : [ @Str ]
 ## but it's recommended to pass much smaller numbers instead.
 ##
 ## Passing a negative number for decimal places is equivalent to passing 0.
-decimal : Int, Float -> Str
+decimal : Float *, Int * -> Str
 
 ## Convert an #Int to a string.
-int : Float -> Str
+int : Int * -> Str
 
 ## Check
 

From 3fa75dc2f7b073aaa64fc36ea3c1b51cf7f1be58 Mon Sep 17 00:00:00 2001
From: Richard Feldman <oss@rtfeldman.com>
Date: Mon, 16 Mar 2020 02:26:03 -0400
Subject: [PATCH 5/6] Add Str.reverseClusters to docs

---
 compiler/builtins/docs/Str.roc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/compiler/builtins/docs/Str.roc b/compiler/builtins/docs/Str.roc
index 452f828991..5fab75eba7 100644
--- a/compiler/builtins/docs/Str.roc
+++ b/compiler/builtins/docs/Str.roc
@@ -103,6 +103,7 @@ padStart : Str, Int, Str -> Str
 
 padEnd : Str, Int, Str -> Str
 
+reverseClusters : Str -> Str
 
 foldClusters : Str, { start: state, step: (state, Str -> state) } -> state
 

From 6637bfb226a05b439ec204dcb591515858178e09 Mon Sep 17 00:00:00 2001
From: Richard Feldman <oss@rtfeldman.com>
Date: Mon, 16 Mar 2020 02:39:49 -0400
Subject: [PATCH 6/6] Add some more Str docs

---
 compiler/builtins/docs/Str.roc | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/compiler/builtins/docs/Str.roc b/compiler/builtins/docs/Str.roc
index 5fab75eba7..bfb757b542 100644
--- a/compiler/builtins/docs/Str.roc
+++ b/compiler/builtins/docs/Str.roc
@@ -71,6 +71,18 @@ decimal : Float *, Int * -> Str
 ## Convert an #Int to a string.
 int : Int * -> Str
 
+## Split a string around a separator.
+##
+## >>> Str.splitClusters "1,2,3" ","
+##
+## Passing `""` for the separator is not useful; it returns the original string
+## wrapped in a list.
+##
+## >>> Str.splitClusters "1,2,3" ""
+##
+## To split a string into its grapheme clusters, use #Str.clusters
+splitClusters : Str, Str -> List Str
+
 ## Check
 
 isEmpty : Str -> Bool
@@ -103,6 +115,16 @@ padStart : Str, Int, Str -> Str
 
 padEnd : Str, Int, Str -> Str
 
+## Grapheme Clusters
+
+## Split a string into its grapheme clusters.
+##
+## >>> Str.clusters "1,2,3"
+##
+## >>> Str.clusters  "👍👍👍"
+##
+clusters : Str -> List Str
+
 reverseClusters : Str -> Str
 
 foldClusters : Str, { start: state, step: (state, Str -> state) } -> state