admin管理员组

文章数量:1345134

I want to perform a select distinct on a DataTable using Columns that are stored in string array: string[] columnsToBeUnique.

This is what I have at the moment but it doesn't return any values...

                var result = dataTable1
                    .AsEnumerable()
                    .DistinctBy(x => x.Table.Columns.Contains(string.Join(",", columnsToBeUnique)))
                    .ToArray();

Could someone assist me?

I want to perform a select distinct on a DataTable using Columns that are stored in string array: string[] columnsToBeUnique.

This is what I have at the moment but it doesn't return any values...

                var result = dataTable1
                    .AsEnumerable()
                    .DistinctBy(x => x.Table.Columns.Contains(string.Join(",", columnsToBeUnique)))
                    .ToArray();

Could someone assist me?

Share Improve this question edited yesterday Panagiotis Kanavos 132k16 gold badges203 silver badges265 bronze badges asked yesterday Rico StrydomRico Strydom 6111 gold badge9 silver badges28 bronze badges 6
  • 1 Please supply a minimal reproducible example. – Gert Arnold Commented yesterday
  • 2 What results do you expect to begin with? LINQ's DistinctBy is very, very different from SQL's DISTINCT. While SQL's DISTINCT will return distinct rows, LINQ's DistincBy will return the first row for each set of key values. In either case, you DON'T need to concatenate column names – Panagiotis Kanavos Commented yesterday
  • 2 If you want distinct rows in the SQL sense, you don't need LINQ at all. An ADO.NET DataTable has methods for filtering,sorting, project columns and ways to create a DataTableView. That view can in turn be converted to a table with ToTable, optionally with distinct values. So dataTable1.DefaultView().ToTable(true) will return distinct rows in the SQL sense. You can select and return only specific columns with dataTable1.DefaultView.ToTable(true,columnsToBeUnique) – Panagiotis Kanavos Commented yesterday
  • 1 If you want LINQ's DistinctBy, remember you're working with DataRow objects. x is a DataRow. You get specific column values with x["someName"]. You can get the desired row values with columnsToBeUnique.Select(name=>x[name]), eg .DistinctBy(x => columnsToBeUnique.Select(name=>x[name]).ToArray()) – Panagiotis Kanavos Commented yesterday
  • 1 @RicoStrydom I just want a record count of the duplicate record that's a completely different question. In SQL you'd do a GROUP BY ... HAVING COUNT(*)>1. Same in ADO.NET and LINQ. DISTINCT and DISTINCT BY will return single rows as well, not just duplicates. – Panagiotis Kanavos Commented yesterday
 |  Show 1 more comment

2 Answers 2

Reset to default 1

To count duplicates in SQL you'd use GROUP BY ... HAVING COUNT(*)>1, not DISTINCT. DISTINCT returns single rows, not just duplicates.

In .NET 9 CountBy can be used as a shortcut :

var dups=dataTable1.CountBy(row => keyCols.Select(name=>row[name]).ToArray())
                   .Where(pair=>pair.Value>1);
var dup_count=duplicates.Count();

The code could be cleaned up a bit by creating an extension method to return the values of selected columns in a row :

public static object[] GetColumnValues(this DataRow,string[] columns)
{
    return columns.Select(name=>row[name]).ToArray();
}
...

var dups=dataTable1.CountBy(row => row.GetColumnValues(keyCols))
                   .Where(pair=>pair.Value>1);
var dup_count=duplicates.Count();

keyCols.Select(name=>x[name]).ToArray() collects the the values of all the key columns in a row. It works because AsEnumerable() returns an IEnumerable<DataRow>. In turn, DataRow has an Item[] indexer that allows accessing values by column name or index.

In previous .NET versions we'd need GroupBy to group the columns, then a Select to return each group's row count :

var dups=dataTable1.GroupBy(row=>row.GetColumnValues(keyCols))
                   .Select(g=>new {Key=g.Key,Count=g.Count()})
                   .Where(pair=>pair.Count>1);
var dup_count=duplicates.Count();

If the question was how to get distinct rows from the DataTable, there would be no need for LINQ :

var uniques=dataTable1.DefaultView.ToTable(true,columnsToBeUnique);

A DataTable already allows filtering and sorting. It's also possible to create DataView objects that show a filtered and sorted subset of the data. The view contents can be copied into a new DataTable with DataView.ToTable(bool distinct, params string[] columnNames), possibly discarding duplicates.

Your approach doesn't work because you're just checking if the table contains columns with the given names. Furthermore you are concatenating your column names with comma, which doesn't make any sense:

.DistinctBy(x => x.Table.Columns.Contains(string.Join(",", columnsToBeUnique)))

You want to remove all duplicate rows according to these columns. So this apporach should work:

var result = dataTable1
    .AsEnumerable()
    .DistinctBy( row => string.Join( separator, columnsToBeUnique.Select( c => row[c]?.ToString() ?? string.Empty ) ) )
    .ToArray();

However, this is not fail safe, for example if the separator is contained in any of the row's fields, you would get a wrong result. So a more robust approach is to use a custom IEqualityComparer<DataRow>:

public class DataRowComparer : IEqualityComparer<DataRow>
{
    private readonly string[] _columnsToBeUnique;

    public DataRowComparer(string[] columnsToBeUnique)
    {
        _columnsToBeUnique = columnsToBeUnique;
    }

    public bool Equals(DataRow? x, DataRow? y)
    {
        if (x == null && y == null) return true;
        if (x == null || y == null) return false;

        foreach (string column in _columnsToBeUnique)
        {
            if (!Equals(x[column], y[column]))
            {
                return false;
            }
        }
        return true;
    }

    public int GetHashCode(DataRow obj)
    {
        int hash = 17;
        foreach (string column in _columnsToBeUnique)
        {
            object value = obj[column];
            hash = hash * 23 + (value?.GetHashCode() ?? 0);
        }
        return hash;
    }
}

Now you can use that for the Distinct:

var result = dataTable1
    .AsEnumerable()
    .Distinct(new DataRowComparer(columnsToBeUnique))
    .ToArray();

本文标签: cCount duplicate rows in a DataTableStack Overflow