admin管理员组

文章数量:1410712

a = np.array([[1,2,4],[3,6,2],[3,4,7],[9,7,7],[6,3,1],[3,5,9]])

b = np.array([[4,5,2],[9,2,5],[1,5,6],[4,5,6],[1,2,6],[6,4,3]])

a = array([[1, 2, 4],
       [3, 6, 2],
       [3, 4, 7],
       [9, 7, 7],
       [6, 3, 1],
       [3, 5, 9]])

b = array([[4, 5, 2],
       [9, 2, 5],
       [1, 5, 6],
       [4, 5, 6],
       [1, 2, 6],
       [6, 4, 3]])

I would like to calculate the pearson correlation coefficient between the first row of a and first row of b, the second row of a and second row b and so on for each rows to follow.

desired out put should be 1D array:

array([__ , __ , __)

column wise i can do it as below:

corr = np.corrcoef(a.T, b.T).diagonal(a.shape[1])

Output:

array([-0.2324843 , -0.03631365, -0.18057878])

UPDATE

Though i accepted the answer below but there is this alternative solution to the question and also addresses zero division error issues:

def corr2_coeff(A, B):
  # Rowwise mean of input arrays & subtract from input arrays themeselves
  A_mA = A - A.mean(1)[:, None]
  B_mB = B - B.mean(1)[:, None]

  # Sum of squares across rows
  ssA = (A_mA**2).sum(1)
  ssB = (B_mB**2).sum(1)
  
  deno = np.sqrt(np.dot(ssA[:, None],ssB[None])) + 0.00000000000001

  # Finally get corr coeff
  return np.dot(A_mA, B_mB.T) / deno
a = np.array([[1,2,4],[3,6,2],[3,4,7],[9,7,7],[6,3,1],[3,5,9]])

b = np.array([[4,5,2],[9,2,5],[1,5,6],[4,5,6],[1,2,6],[6,4,3]])

a = array([[1, 2, 4],
       [3, 6, 2],
       [3, 4, 7],
       [9, 7, 7],
       [6, 3, 1],
       [3, 5, 9]])

b = array([[4, 5, 2],
       [9, 2, 5],
       [1, 5, 6],
       [4, 5, 6],
       [1, 2, 6],
       [6, 4, 3]])

I would like to calculate the pearson correlation coefficient between the first row of a and first row of b, the second row of a and second row b and so on for each rows to follow.

desired out put should be 1D array:

array([__ , __ , __)

column wise i can do it as below:

corr = np.corrcoef(a.T, b.T).diagonal(a.shape[1])

Output:

array([-0.2324843 , -0.03631365, -0.18057878])

UPDATE

Though i accepted the answer below but there is this alternative solution to the question and also addresses zero division error issues:

def corr2_coeff(A, B):
  # Rowwise mean of input arrays & subtract from input arrays themeselves
  A_mA = A - A.mean(1)[:, None]
  B_mB = B - B.mean(1)[:, None]

  # Sum of squares across rows
  ssA = (A_mA**2).sum(1)
  ssB = (B_mB**2).sum(1)
  
  deno = np.sqrt(np.dot(ssA[:, None],ssB[None])) + 0.00000000000001

  # Finally get corr coeff
  return np.dot(A_mA, B_mB.T) / deno
Share Improve this question edited Mar 4 at 16:28 prem asked Mar 4 at 13:41 premprem 4293 silver badges13 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 1

I would do the calculation presented in the Q as:

[np.corrcoef(x, y)[0,1] for x, y in zip(a.T, b.T)]

with the result:

[-0.23248430170889073, -0.03631365196012811, -0.18057877962865382]

The row-by_row correlations are then obtained by simply removing the transpose:

[np.corrcoef(x,y)[0,1] for x, y in zip(a, b)]

with the result

[-0.7857142857142855,
 -0.661143091251952,
 0.8170571691028832,
 -0.8660254037844387,
 -0.9011271137791659,
 -0.9285714285714285]

If you want the solution using the approach in Q., it can be done using:

np.corrcoef(a, b).diagonal(a.shape[0])

OR

np.corrcoef(a.T, b.T, rowvar=False).diagonal(a.shape[0])

Mistake in Your Approach

Your approach:

corr = np.corrcoef(a.T, b.T).diagonal(a.shape[1])

What Went Wrong?

  1. Incorrect Use of 'np.corrcoef' -'np.corrcoef(a.T, b.T)' computes the correlation matrix for column-wise comparisons, not row-wise.
    -Since '.T' transposes the matrix, it correlates columns rather than rows.

  2. Wrong diagonal() Usage
    -diagonal(a.shape[1]) is incorrect.
    -diagonal() extracts elements from the main diagonal, but in this case, the relevant values are not located where expected.

Code With Correct approach: python

import numpy as np
from scipy.stats import pearsonr

-Define the arrays

a = np.array([[1, 2, 4], [3, 6, 2], [3, 4, 7], [9, 7, 7], [6, 3, 1], [3, 5, 9]])
b = np.array([[4, 5, 2], [9, 2, 5], [1, 5, 6], [4, 5, 6], [1, 2, 6], [6, 4, 3]])

-Compute correlation for each row

correlation_values = np.array([pearsonr(row_a, row_b)[0] for row_a, row_b in zip(a, b)])

-Display results

print(correlation_values)

[Tell me if its useful or not]

本文标签: pythoncalculate pearson correlation of each rows in 2D numpy array (nm)Stack Overflow