Approximate string matching methods for duplicate detection and clustering tasks