Java Regex Search and Replace

For many years, I felt that there was nothing “regular” about Regular Expressions, but lately I have been warming up to them a bit. The QuickRex Eclipse plug-in has really helped make them easier to manage, but that’s not what this post is about.

I recently needed to do a regex-based search and replace operation to convert all the html entities in a string to their actual character equivalents, basically unescape all the entities in an html string (don’t ask why). With a little regex and a little searching documentation browsing I found that it is very easy to do.

Start out with the pattern, which should be a static class member (it is thread-safe once created):

private static final Pattern entityPattern = Pattern.compile("(&[a-z]*;)");

The pattern will match any html entity, which have the form &name;. Next we need the search and replace code:

private String unescapeEntities(final String html){
    final StringBuffer buffer = new StringBuffer();
    final Matcher matcher = entityPattern.matcher(html);
    while (matcher.find()) {
        matcher.appendReplacement(buffer, StringEscapeUtils.unescapeHtml(matcher.group()));
    }
    matcher.appendTail(buffer);

    return buffer.toString();
}

Your StringBuffer will end up with the replaced content of your string. The StringEscapeUtils class is from the Jakarta Commons - Lang API.

Sorry, this isn’t much of a tutorial… it’s more of a code snippet for future use.

No votes yet