Regular expressions in Go
Non-greedy regular expression matching
The example below shows a case of a non-greedy regular expression matching in Go, searching for any number of any characters followed by a certain string.
We will from a line: someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords.
. We will search for the shortest match of any number of any characters ending by WWW
.
line1 := "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. "
re1 := regexp.MustCompile("^.*WWW")
fmt.Printf("input: %q, regexp: %q, match: %q\n", line1, re1, re1.FindStringSubmatch(line1))
The code above outputs: input: "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. ", regexp: "^.*WWW", match: ["someWord 9898 another word WWW"]
Now we will add an additional WWW
at the end of the line, so it becomes: someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. WWW
. Applying the same regular expression as above will produce a greedy match up to the second WWW
.
line2 := "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. WWW"
re1 := regexp.MustCompile("^.*WWW")
fmt.Printf("input: %q, regexp: %q, match: %q\n", line2, re1, re1.FindStringSubmatch(line2))
The code above outputs: input: "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. WWW", regexp: "^.*WWW", match: ["someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. WWW"]
.
In order to get the shortest match, we will modify the regular expression.
line1 := "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. "
line2 := "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. WWW"
re2 := regexp.MustCompile("^.*?WWW")
fmt.Printf("input: %q, regexp: %q, match: %q\n", line1, re2, re2.FindStringSubmatch(line1))
fmt.Printf("input: %q, regexp: %q, match: %q\n", line2, re2, re2.FindStringSubmatch(line2))
The code above outputs:
input: "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. ", regexp: "^.*?WWW", match: ["someWord 9898 another word WWW"]
input: "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. WWW", regexp: "^.*?WWW", match: ["someWord 9898 another word WWW"]
`
As we can see from the ouput, the second regular expression finds the shortest match in both lines.
And in the final example, we will search for the shortest sequence of any number of any characters ending by `WWW` or `AAA`. The regular expression in this example will be: `^.*?WWW|AAA`.
```go
re3 := regexp.MustCompile("^.*?WWW|AAA")
fmt.Printf("input: %q, regexp: %q, match: %q\n", line1, re3, re3.FindStringSubmatch(line1))
fmt.Printf("input: %q, regexp: %q, match: %q\n", line2, re3, re3.FindStringSubmatch(line2))
The code above outputs:
input: "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. ", regexp: "^.*?WWW|AAA", match: ["someWord 9898 another word WWW"]
input: "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. WWW", regexp: "^.*?WWW|AAA", match: ["someWord 9898 another word WWW"]
Complete listing of all the examples:
import (
"fmt"
"regexp"
)
const (
line1 = "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. "
line2 = "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. WWW"
)
func printMatch(re *regexp.Regexp) {
fmt.Printf("input: %q, regexp: %q, match: %q\n", line1, re, re.FindStringSubmatch(line1))
fmt.Printf("input: %q, regexp: %q, match: %q\n\n", line2, re, re.FindStringSubmatch(line2))
}
func main() {
printMatch(regexp.MustCompile("^.*WWW"))
printMatch(regexp.MustCompile("^.*?WWW"))
printMatch(regexp.MustCompile("^.*?WWW|AAA"))
}
The code above outputs:
input: "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. ", regexp: "^.*WWW", match: ["someWord 9898 another word WWW"]
input: "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. WWW", regexp: "^.*WWW", match: ["someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. WWW"]
input: "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. ", regexp: "^.*?WWW", match: ["someWord 9898 another word WWW"]
input: "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. WWW", regexp: "^.*?WWW", match: ["someWord 9898 another word WWW"]
input: "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. ", regexp: "^.*?WWW|AAA", match: ["someWord 9898 another word WWW"]
input: "someWord 9898 another word WWWand8!!)another word AAA and &&some -morewords. WWW", regexp: "^.*?WWW|AAA", match: ["someWord 9898 another word WWW"]