Find duplicate records in text file

zjun Lv4

Example

1
2
3
4
5
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452  
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452
aer 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452

UNIX

display the no of occurance and the record

1
2
3
4
5
sort f1.txt|uniq -c

2 abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
1 aer 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
2 tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452

display only the duplicate records

sort f1.txt|uniq -d

abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452  
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452  

display distinct records

sort f1.txt|uniq

abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452  
aer 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452  
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452  

Windows

Notepad++ can sort by line, and remove the duplicate lines at the same time.

  1. Open the menu under: TextFX–>TextFX Tools
  2. Make sure “sort outputs only unique…“ is checked
  3. select a block of text (ctrl-a to select the entire document).
  4. click “sort lines case sensitive“ or “sort lines case insensitive

参考阅读

How to find Duplicate Records in a text file
Shell: How To Remove Duplicate Text Lines
How to Remove Duplicate Lines in Unix
remove duplicates from a text file in free editor

  • 标题: Find duplicate records in text file
  • 作者: zjun
  • 创建于 : 2012-04-11 16:52:00
  • 更新于 : 2023-12-04 23:48:57
  • 链接: https://zjun.site/2012/04/d175cbab2927.html
  • 版权声明: 本文章采用 CC BY-NC-SA 4.0 进行许可。
评论
此页目录
Find duplicate records in text file