Skip to Main Content

Guide to OpenRefine

Welcome to Dupre Library's OpenRefine Guide!

Clustering

This feature allows you to identify and merge similar records in a dataset. This helps to clean and standardize the data, reducing errors and inconsistencies.

How to cluster:

1. Locate the column you want to transform and click on the arrow button on the column header.

2. Select the “Edit cells” option.

3. Click the “Cluster and edit” option.

Clustering

4. At the top of the window, choose the type of “Method” and “Keying Function” that you prefer.

5. In the middle of the window, enter in the “New cell value” the value you want the merged rows to be named. (You can also use one of the names of the existing roles).

Clustering

6. Click “Merge selected & re-cluster” or “Merge selected & Close”.

Clustering

Editing the selected values using "Merge Selected & Re-Cluster" causes the clustering method to be automatically run again on the same column. After editing the selected values, "Merge Selected & Close" closes the Clustering window.