UTL_MATCH
package was introduced in Oracle 10g
Release 2, but first documented (and therefore supported) in Oracle 11g
Release 2. It contains a variety of functions that are helpful for
testing the level of similarity/difference between strings.Related articles.
Setup
The examples in this article require the following table definition.DROP TABLE match_tab; CREATE TABLE match_tab ( id NUMBER, col1 VARCHAR2(15), col2 VARCHAR2(15), CONSTRAINT match_tab_pk PRIMARY KEY (id) ); INSERT INTO match_tab VALUES (1, 'Peter Parker', 'Pete Parker'); INSERT INTO match_tab VALUES (2, 'Peter Parker', 'peter parker'); INSERT INTO match_tab VALUES (3, 'Clark Kent', 'Claire Kent'); INSERT INTO match_tab VALUES (4, 'Wonder Woman', 'Ponder Woman'); INSERT INTO match_tab VALUES (5, 'Superman', 'Superman'); INSERT INTO match_tab VALUES (6, 'The Hulk', 'Iron Man'); COMMIT;
EDIT_DISTANCE
The "Edit Distance", or "Levenshtein Distance", test measures the similarity between two strings by counting the number of character changes (inserts, updates, deletes) required to transform the first string into the second. The number of changes required is know as the distance.SELECT col1, col2, UTL_MATCH.edit_distance(col1, col2) AS ed FROM match_tab ORDER BY id; COL1 COL2 ED --------------- --------------- ---------- Peter Parker Pete Parker 1 Peter Parker peter parker 2 Clark Kent Claire Kent 2 Wonder Woman Ponder Woman 1 Superman Superman 0 The Hulk Iron Man 8 6 rows selected. SQL>
EDIT_DISTANCE_SIMILARITY
TheEDIT_DISTANCE_SIMILARITY
function uses the same method as the EDIT_DISTANCE
function
to determine the similarity of the strings, but it returns a normalized
result ranging from 0 (no match) to 100 (complete match).SELECT col1, col2, UTL_MATCH.edit_distance_similarity(col1, col2) AS eds FROM match_tab ORDER BY id; COL1 COL2 EDS --------------- --------------- ---------- Peter Parker Pete Parker 92 Peter Parker peter parker 84 Clark Kent Claire Kent 82 Wonder Woman Ponder Woman 92 Superman Superman 100 The Hulk Iron Man 0 6 rows selected. SQL> SELECT id, col1, col2, UTL_MATCH.edit_distance_similarity(col1, col2) AS eds FROM match_tab WHERE UTL_MATCH.edit_distance_similarity(col1, col2) > 90 ORDER BY id; ID COL1 COL2 EDS ---------- --------------- --------------- ---------- 1 Peter Parker Pete Parker 92 4 Wonder Woman Ponder Woman 92 5 Superman Superman 100 SQL>
JARO_WINKLER
The "Jaro-Winkler Algorithm" provides a different method for finding the distance between two strings.SELECT col1, col2, UTL_MATCH.jaro_winkler(col1, col2) AS jw FROM match_tab ORDER BY id; COL1 COL2 JW --------------- --------------- ---------- Peter Parker Pete Parker 9.288E-001 Peter Parker peter parker 8.889E-001 Clark Kent Claire Kent 9.083E-001 Wonder Woman Ponder Woman 9.444E-001 Superman Superman 1.0E+000 The Hulk Iron Man 4.167E-001 6 rows selected. SQL>
JARO_WINKLER_SIMILARITY
TheJARO_WINKLER_SIMILARITY
function uses the same method as the JARO_WINKLER
function
to determine the similarity of the strings, but it returns a normalized
result ranging from 0 (no match) to 100 (complete match).SELECT col1, col2, UTL_MATCH.jaro_winkler_similarity(col1, col2) AS jws FROM match_tab ORDER BY id; COL1 COL2 JWS --------------- --------------- ---------- Peter Parker Pete Parker 92 Peter Parker peter parker 88 Clark Kent Claire Kent 90 Wonder Woman Ponder Woman 94 Superman Superman 100 The Hulk Iron Man 41 6 rows selected. SQL> SELECT col1, col2, UTL_MATCH.jaro_winkler_similarity(col1, col2) AS jws FROM match_tab WHERE UTL_MATCH.jaro_winkler_similarity(col1, col2) > 90 ORDER BY id; COL1 COL2 JWS --------------- --------------- ---------- Peter Parker Pete Parker 92 Wonder Woman Ponder Woman 94 Superman Superman 100 SQL>
0 comentarios:
Publicar un comentario