admin管理员组文章数量:1345134
I want to perform a select distinct
on a DataTable
using Columns
that are stored in string array: string[] columnsToBeUnique
.
This is what I have at the moment but it doesn't return any values...
var result = dataTable1
.AsEnumerable()
.DistinctBy(x => x.Table.Columns.Contains(string.Join(",", columnsToBeUnique)))
.ToArray();
Could someone assist me?
I want to perform a select distinct
on a DataTable
using Columns
that are stored in string array: string[] columnsToBeUnique
.
This is what I have at the moment but it doesn't return any values...
var result = dataTable1
.AsEnumerable()
.DistinctBy(x => x.Table.Columns.Contains(string.Join(",", columnsToBeUnique)))
.ToArray();
Could someone assist me?
Share Improve this question edited yesterday Panagiotis Kanavos 132k16 gold badges203 silver badges265 bronze badges asked yesterday Rico StrydomRico Strydom 6111 gold badge9 silver badges28 bronze badges 6 | Show 1 more comment2 Answers
Reset to default 1To count duplicates in SQL you'd use GROUP BY ... HAVING COUNT(*)>1
, not DISTINCT
. DISTINCT returns single rows, not just duplicates.
In .NET 9 CountBy can be used as a shortcut :
var dups=dataTable1.CountBy(row => keyCols.Select(name=>row[name]).ToArray())
.Where(pair=>pair.Value>1);
var dup_count=duplicates.Count();
The code could be cleaned up a bit by creating an extension method to return the values of selected columns in a row :
public static object[] GetColumnValues(this DataRow,string[] columns)
{
return columns.Select(name=>row[name]).ToArray();
}
...
var dups=dataTable1.CountBy(row => row.GetColumnValues(keyCols))
.Where(pair=>pair.Value>1);
var dup_count=duplicates.Count();
keyCols.Select(name=>x[name]).ToArray()
collects the the values of all the key columns in a row. It works because AsEnumerable()
returns an IEnumerable<DataRow>
. In turn, DataRow
has an Item[] indexer that allows accessing values by column name or index.
In previous .NET versions we'd need GroupBy
to group the columns, then a Select
to return each group's row count :
var dups=dataTable1.GroupBy(row=>row.GetColumnValues(keyCols))
.Select(g=>new {Key=g.Key,Count=g.Count()})
.Where(pair=>pair.Count>1);
var dup_count=duplicates.Count();
If the question was how to get distinct rows from the DataTable, there would be no need for LINQ :
var uniques=dataTable1.DefaultView.ToTable(true,columnsToBeUnique);
A DataTable already allows filtering and sorting. It's also possible to create DataView objects that show a filtered and sorted subset of the data. The view contents can be copied into a new DataTable with DataView.ToTable(bool distinct, params string[] columnNames), possibly discarding duplicates.
Your approach doesn't work because you're just checking if the table contains columns with the given names. Furthermore you are concatenating your column names with comma, which doesn't make any sense:
.DistinctBy(x => x.Table.Columns.Contains(string.Join(",", columnsToBeUnique)))
You want to remove all duplicate rows according to these columns. So this apporach should work:
var result = dataTable1
.AsEnumerable()
.DistinctBy( row => string.Join( separator, columnsToBeUnique.Select( c => row[c]?.ToString() ?? string.Empty ) ) )
.ToArray();
However, this is not fail safe, for example if the separator is contained in any of the row's fields, you would get a wrong result. So a more robust approach is to use a custom IEqualityComparer<DataRow>
:
public class DataRowComparer : IEqualityComparer<DataRow>
{
private readonly string[] _columnsToBeUnique;
public DataRowComparer(string[] columnsToBeUnique)
{
_columnsToBeUnique = columnsToBeUnique;
}
public bool Equals(DataRow? x, DataRow? y)
{
if (x == null && y == null) return true;
if (x == null || y == null) return false;
foreach (string column in _columnsToBeUnique)
{
if (!Equals(x[column], y[column]))
{
return false;
}
}
return true;
}
public int GetHashCode(DataRow obj)
{
int hash = 17;
foreach (string column in _columnsToBeUnique)
{
object value = obj[column];
hash = hash * 23 + (value?.GetHashCode() ?? 0);
}
return hash;
}
}
Now you can use that for the Distinct
:
var result = dataTable1
.AsEnumerable()
.Distinct(new DataRowComparer(columnsToBeUnique))
.ToArray();
本文标签: cCount duplicate rows in a DataTableStack Overflow
版权声明:本文标题:c# - Count duplicate rows in a DataTable - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743792394a2539812.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
DistinctBy
is very, very different from SQL'sDISTINCT
. While SQL'sDISTINCT
will return distinct rows, LINQ'sDistincBy
will return the first row for each set of key values. In either case, you DON'T need to concatenate column names – Panagiotis Kanavos Commented yesterdaydataTable1.DefaultView().ToTable(true)
will return distinct rows in the SQL sense. You can select and return only specific columns withdataTable1.DefaultView.ToTable(true,columnsToBeUnique)
– Panagiotis Kanavos Commented yesterdayDistinctBy
, remember you're working withDataRow
objects.x
is aDataRow
. You get specific column values withx["someName"]
. You can get the desired row values withcolumnsToBeUnique.Select(name=>x[name])
, eg.DistinctBy(x => columnsToBeUnique.Select(name=>x[name]).ToArray())
– Panagiotis Kanavos Commented yesterdayI just want a record count of the duplicate record
that's a completely different question. In SQL you'd do aGROUP BY ... HAVING COUNT(*)>1
. Same in ADO.NET and LINQ.DISTINCT
andDISTINCT BY
will return single rows as well, not just duplicates. – Panagiotis Kanavos Commented yesterday