Quick! Write a regular expression that matches an unbroken sequence of
alphanumeric characters and underscores, but does not start with a digit.
Whoa, whoa, hold on. Wait a minute. If you are like me, you might be thinking
“what an earth is a ‘regular expression’?” I can think of a whole bunch of
irregular expressions. What makes an expression “regular”?
It turns out those computer scientists have invented a whole new language for
describing sequences of printable characters…and they call that language
“regular expressions”. For example, they have come up with the idea that
[0-9]+ describes any positive integer number. They’ve also decided that
cat|dog matches the word “cat” or the word “dog”, but not “catdog”, “dogcat”,
or “gadoct”. They have even come up with the bizarre idea that
[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,} describes almost any one person’s e-mail
address.
I don’t know about you, but that is all Greek to me. Fortunately, Komodo IDE has
just the right tool that speaks my language: Rx Toolkit. Even if you speak
Greek, Rx Toolkit has exactly what you need to easily understand, write, and
work with regular expressions, colloquially known as “regex”.
Understanding Regexes
Let us take that last regex above, the one that supposedly matches e-mail
addresses.
Well what do you know, it actually does work… I admit I did have to toggle the
“case insensitive” flag over on the right of the window — you will see it
highlighted in blue. (Mousing over each options gives you the rundown on what
it actually does.)
Whenever you have a regex that just does not make sense, throw it into Rx
Toolkit along with some input, and the tool will help you decipher that Greek,
like this doozy:
Writing Regexes
Okay, so Rx Toolkit can easily help verify that a regex works, but what about if
*gulp* you actually have to write one?
The easiest way to get started is to go ahead and put the text you want to match
against in the “Search Text” box. You can put multiple possibilities on multiple
lines, as Rx Toolkit will match each line individually if you keep the “m”
option turned off.
Let us consider the original question in this blog post, writing a regex that
matches an “unbroken sequence of alphanumeric characters and underscores, but
does not start with a digit”.
Well I can think of a few character sequences that fit that criteria:
- activestate
- KomodoIDE_10_1
- X
Let us put those in the “Search Text” box.
Now, where to begin with the regex? Well since we have been told that `[0-9]+`
matches the digits of a positive integer, we will try `[A-Z]+` and see where
that gets us.
Progress! Rx Toolkit shows us we got it at least partially right by highlighting
each part of the matching input text and showing results in the bottom pane.
After a bit more tinkering, I think I managed to get something:
Awesome! All of the text highlights and the bottom pane shows success. Now we
need to make sure we do not accidentally match character sequences that are not
valid, like the number `11`.
As you probably guessed, I got it wrong. After some more fiddling around I came
up with this possible solution:
Thank you Rx Toolkit!
Working With Regexes
Even if you are already a regex wizard, juggling between the different regex
implementations between programming languages can be tricky. Fortunately, Rx
Toolkit supports Perl, Python, PHP, Ruby, JavaScript, and Tcl regex syntaxes.
Write one regex, test for all languages right from Komodo! Just click on the
little icon to the right of the “Regular Expression” input box to switch between
implementations.
Conclusion
Komodo IDE’s Rx Toolkit is your one-stop shop for understanding, writing, and
working with regular expressions in real time. The ability to toggle between six
major programming languages’ regex implementations on-the-fly is incredibly
powerful and useful. Whether you are a regex newbie or a seasoned veteran, this
versatile tool is a must have in your developer arsenal.
Title photo courtesy of Jared Tarbell on Flickr under Creative Commons License.