Removing Duplicate Domain URLs From the Text File Using Bash

2024/7/8 8:36:10

Text file

Expected Output:

What I Tried

awk -F'/' '!a[$3]++' $file;


I already tried various codes and none of them work as expected. I just want to pick only one unique domain URL per domain from the list.

Please tell me how I can do it by using the Bash script or Python.

PS: I want to filter and save full URLs from the list and not only the root domain.


With awk and / as field separator:

awk -F '/' '!seen[$3]++' file

If your file contains Windows line breaks (carriage returns) then I suggest:

dos2unix < file | awk -F '/' '!seen[$3]++'


