UTL_MATCH package was introduced in Oracle 10g
Release 2, but first documented (and therefore supported) in Oracle 11g
Release 2. It contains a variety of functions that are helpful for
testing the level of similarity/difference between strings.Related articles.
Setup
The examples in this article require the following table definition.DROP TABLE match_tab; CREATE TABLE match_tab ( id NUMBER, col1 VARCHAR2(15), col2 VARCHAR2(15), CONSTRAINT match_tab_pk PRIMARY KEY (id) ); INSERT INTO match_tab VALUES (1, 'Peter Parker', 'Pete Parker'); INSERT INTO match_tab VALUES (2, 'Peter Parker', 'peter parker'); INSERT INTO match_tab VALUES (3, 'Clark Kent', 'Claire Kent'); INSERT INTO match_tab VALUES (4, 'Wonder Woman', 'Ponder Woman'); INSERT INTO match_tab VALUES (5, 'Superman', 'Superman'); INSERT INTO match_tab VALUES (6, 'The Hulk', 'Iron Man'); COMMIT;
EDIT_DISTANCE
The "Edit Distance", or "Levenshtein Distance", test measures the similarity between two strings by counting the number of character changes (inserts, updates, deletes) required to transform the first string into the second. The number of changes required is know as the distance.SELECT col1,
col2,
UTL_MATCH.edit_distance(col1, col2) AS ed
FROM match_tab
ORDER BY id;
COL1 COL2 ED
--------------- --------------- ----------
Peter Parker Pete Parker 1
Peter Parker peter parker 2
Clark Kent Claire Kent 2
Wonder Woman Ponder Woman 1
Superman Superman 0
The Hulk Iron Man 8
6 rows selected.
SQL>
EDIT_DISTANCE_SIMILARITY
TheEDIT_DISTANCE_SIMILARITY function uses the same method as the EDIT_DISTANCE function
to determine the similarity of the strings, but it returns a normalized
result ranging from 0 (no match) to 100 (complete match).SELECT col1,
col2,
UTL_MATCH.edit_distance_similarity(col1, col2) AS eds
FROM match_tab
ORDER BY id;
COL1 COL2 EDS
--------------- --------------- ----------
Peter Parker Pete Parker 92
Peter Parker peter parker 84
Clark Kent Claire Kent 82
Wonder Woman Ponder Woman 92
Superman Superman 100
The Hulk Iron Man 0
6 rows selected.
SQL>
SELECT id,
col1,
col2,
UTL_MATCH.edit_distance_similarity(col1, col2) AS eds
FROM match_tab
WHERE UTL_MATCH.edit_distance_similarity(col1, col2) > 90
ORDER BY id;
ID COL1 COL2 EDS
---------- --------------- --------------- ----------
1 Peter Parker Pete Parker 92
4 Wonder Woman Ponder Woman 92
5 Superman Superman 100
SQL>
JARO_WINKLER
The "Jaro-Winkler Algorithm" provides a different method for finding the distance between two strings.SELECT col1,
col2,
UTL_MATCH.jaro_winkler(col1, col2) AS jw
FROM match_tab
ORDER BY id;
COL1 COL2 JW
--------------- --------------- ----------
Peter Parker Pete Parker 9.288E-001
Peter Parker peter parker 8.889E-001
Clark Kent Claire Kent 9.083E-001
Wonder Woman Ponder Woman 9.444E-001
Superman Superman 1.0E+000
The Hulk Iron Man 4.167E-001
6 rows selected.
SQL>
JARO_WINKLER_SIMILARITY
TheJARO_WINKLER_SIMILARITY function uses the same method as the JARO_WINKLER function
to determine the similarity of the strings, but it returns a normalized
result ranging from 0 (no match) to 100 (complete match).SELECT col1,
col2,
UTL_MATCH.jaro_winkler_similarity(col1, col2) AS jws
FROM match_tab
ORDER BY id;
COL1 COL2 JWS
--------------- --------------- ----------
Peter Parker Pete Parker 92
Peter Parker peter parker 88
Clark Kent Claire Kent 90
Wonder Woman Ponder Woman 94
Superman Superman 100
The Hulk Iron Man 41
6 rows selected.
SQL>
SELECT col1,
col2,
UTL_MATCH.jaro_winkler_similarity(col1, col2) AS jws
FROM match_tab
WHERE UTL_MATCH.jaro_winkler_similarity(col1, col2) > 90
ORDER BY id;
COL1 COL2 JWS
--------------- --------------- ----------
Peter Parker Pete Parker 92
Wonder Woman Ponder Woman 94
Superman Superman 100
SQL>






0 comentarios:
Publicar un comentario