admin管理员组

文章数量:1405319

I use the fgetcsv() function to break a .CSV file into an array so I can insert the values into a database.

The problem is that sometimes the corporation that I download these files from change columns around which changes their array number. Here is an example of a print_r of the CSV file:

Array ( [0] => Symbol [1] => Description [2] => Qty (Quantity) [3] => Price [4] => Price Chng % (Price Change %) [5] => Price Chng (Price Change ) [6] => Day Chng % (Day Change %) [7] => Day Chng (Day Change ) [8] => Cost Basis [9] => Gain % (Gain/Loss %) [10] => Gain (Gain/Loss ) [11] => Reinvest? [12] => Reinvest Capital Gains? [13] => Last Div (Last Dividend) [14] => Volume [15] => Security Type ) 
Array ( [0] => Test1 [1] => test desc [2] => 820 [3] => 19.505 [4] => -1.84% [5] => -0.365 [6] => -1.84% [7] => -300.37 [8] => 601.73 [9] => -18.4% [10] => -607.63 [11] => No [13] => 0.72025 [14] => 7041528 [15] => ETFs & Closed End Funds ) 
Array ( [0] => Test2 [1] => test desc again [2] => 110 [3] => 49.715 [4] => -1.83% [5] => -0.925 [6] => -1.76% [7] => -98.00 [8] => 95.2 [9] => -8.78% [10] => -526.55 [11] => Yes [14] => 28668328 [15] => ETFs & Closed End Funds ) 
Array ( [0] => Test3 [1] => test desc example [2] => 740 [3] => 21.71 [4] => -3.98% [5] => -0.9 [6] => -3.93% [7] => -657.59 [8] => 242.13 [9] => -20.63% [10] => -4176.73 [11] => No [13] => 2.0216 [14] => 2759846 [15] => ETFs & Closed End Funds ) 

You can see in the second array it has titles such as $column[0] is Symbol and $column[3] is Price.

[0] => Symbol [1] => Description [2] => Qty (Quantity) [3] => Price

Can I add code so that $column[0] for example will always display the symbol name? If the corporation decides to add a column or rearrange stuff, then it will change the Price column into another one like moving it from $column[3] to $column[5]. I am guessing there is a way I can have it search for whichever key that matches a column like Volume for example, then whatever key that is it could make sure all of the keys in every array for Volume will match that key number.

I was manually just changing the array numbers when they did this, but they change it a lot so I am wondering if I can add some code to where the Symbol column would always be 0 and Price would always be in column 3 and the same applies to all of the column names. If they get rearranged they still have the same number as before?

A small example of the CSV file:

"Symbol","Description","Qty (Quantity)","Price","Price Chng % (Price Change %)","Price Chng $ (Price Change $)","Day Chng % (Day Change %)","Day Chng $ (Day Change $)","Cost Basis","Gain % (Gain/Loss %)","Gain $ (Gain/Loss $)","Reinvest?","Reinvest Capital Gains?","Last Div (Last Dividend)","Volume","Security Type"
"Test1","ETF","820","$19.39","-2.42%","-$0.48","-2.42%","-$393.6","$601.73","-18.89%","-$3701.93","No","--","$0.72025","14,626,464","ETFs & Closed End Funds"
"Test2"," ETF","110","$49.43","-2.39%","-$1.21","-2.39%","-$133.1","$995.2","-9.31%","-$557.9","Yes","--","N/A","59,351,095","ETFs & Closed End Funds"
"Test3"," ETF","760","$21.77","-3.72%","-$0.84","-3.72%","-$638.4","$687.73","-20.02%","-$142.53","No","--","$2.0216","6,402,084","ETFs & Closed End Funds"

I use the fgetcsv() function to break a .CSV file into an array so I can insert the values into a database.

The problem is that sometimes the corporation that I download these files from change columns around which changes their array number. Here is an example of a print_r of the CSV file:

Array ( [0] => Symbol [1] => Description [2] => Qty (Quantity) [3] => Price [4] => Price Chng % (Price Change %) [5] => Price Chng (Price Change ) [6] => Day Chng % (Day Change %) [7] => Day Chng (Day Change ) [8] => Cost Basis [9] => Gain % (Gain/Loss %) [10] => Gain (Gain/Loss ) [11] => Reinvest? [12] => Reinvest Capital Gains? [13] => Last Div (Last Dividend) [14] => Volume [15] => Security Type ) 
Array ( [0] => Test1 [1] => test desc [2] => 820 [3] => 19.505 [4] => -1.84% [5] => -0.365 [6] => -1.84% [7] => -300.37 [8] => 601.73 [9] => -18.4% [10] => -607.63 [11] => No [13] => 0.72025 [14] => 7041528 [15] => ETFs & Closed End Funds ) 
Array ( [0] => Test2 [1] => test desc again [2] => 110 [3] => 49.715 [4] => -1.83% [5] => -0.925 [6] => -1.76% [7] => -98.00 [8] => 95.2 [9] => -8.78% [10] => -526.55 [11] => Yes [14] => 28668328 [15] => ETFs & Closed End Funds ) 
Array ( [0] => Test3 [1] => test desc example [2] => 740 [3] => 21.71 [4] => -3.98% [5] => -0.9 [6] => -3.93% [7] => -657.59 [8] => 242.13 [9] => -20.63% [10] => -4176.73 [11] => No [13] => 2.0216 [14] => 2759846 [15] => ETFs & Closed End Funds ) 

You can see in the second array it has titles such as $column[0] is Symbol and $column[3] is Price.

[0] => Symbol [1] => Description [2] => Qty (Quantity) [3] => Price

Can I add code so that $column[0] for example will always display the symbol name? If the corporation decides to add a column or rearrange stuff, then it will change the Price column into another one like moving it from $column[3] to $column[5]. I am guessing there is a way I can have it search for whichever key that matches a column like Volume for example, then whatever key that is it could make sure all of the keys in every array for Volume will match that key number.

I was manually just changing the array numbers when they did this, but they change it a lot so I am wondering if I can add some code to where the Symbol column would always be 0 and Price would always be in column 3 and the same applies to all of the column names. If they get rearranged they still have the same number as before?

A small example of the CSV file:

"Symbol","Description","Qty (Quantity)","Price","Price Chng % (Price Change %)","Price Chng $ (Price Change $)","Day Chng % (Day Change %)","Day Chng $ (Day Change $)","Cost Basis","Gain % (Gain/Loss %)","Gain $ (Gain/Loss $)","Reinvest?","Reinvest Capital Gains?","Last Div (Last Dividend)","Volume","Security Type"
"Test1","ETF","820","$19.39","-2.42%","-$0.48","-2.42%","-$393.6","$601.73","-18.89%","-$3701.93","No","--","$0.72025","14,626,464","ETFs & Closed End Funds"
"Test2"," ETF","110","$49.43","-2.39%","-$1.21","-2.39%","-$133.1","$995.2","-9.31%","-$557.9","Yes","--","N/A","59,351,095","ETFs & Closed End Funds"
"Test3"," ETF","760","$21.77","-3.72%","-$0.84","-3.72%","-$638.4","$687.73","-20.02%","-$142.53","No","--","$2.0216","6,402,084","ETFs & Closed End Funds"
Share Improve this question edited Mar 8 at 21:58 mickmackusa 48.3k13 gold badges94 silver badges161 bronze badges Recognized by PHP Collective asked Mar 8 at 4:13 user26136009user26136009 291 silver badge8 bronze badges 3
  • Please post an example of the CSV file. The arrays you show don't look like they come from a proper CSV. The first row should be the headers, but those look like they're in the 3rd array. – Barmar Commented Mar 8 at 4:28
  • 1 You claim that the titles are in the second array, but they're in the third one. You didn't count the empty 2nd array (a blank line in the file). – Barmar Commented Mar 8 at 5:10
  • My mistake, it was in the third array but thanks to you guys I added in a count($column) if statement and now it shows up as the first line. I added in some of the CSV code that you requested. Thanks – user26136009 Commented Mar 8 at 20:35
Add a comment  | 

3 Answers 3

Reset to default 1

If your incoming data is not reliably structured, then use a whitelist of expected columns in your desired order to filter and map the parsed data. Demo >= PHP7.4, Demo PHP7 - PHP8.3, Demo PHP5.2 - PHP8.3

$whitelist = [
    "Symbol",
    "Description",
    "Qty (Quantity)",
    "Price",
    "Price Chng % (Price Change %)",
    "Price Chng $ (Price Change $)",
    "Day Chng % (Day Change %)",
    "Day Chng $ (Day Change $)",
    "Cost Basis",
    "Gain % (Gain/Loss %)",
    "Gain $ (Gain/Loss $)",
    "Reinvest?",
    "Reinvest Capital Gains?",
    "Last Div (Last Dividend)",
    "Volume",
    "Security Type"
];

$result = [];
if (($handle = fopen("file.csv", "r")) !== false) {
    $headers = fgetcsv($handle, escape: '') ?: [];
    $map = array_flip($headers);
    while (($values = fgetcsv($handle, escape: '')) !== false) {
        $row = [];
        foreach ($whitelist as $col) {
            $row[$col] = $values[$map[$col]] ?? null;
        }
        $result[] = $row;
    }
    fclose($handle);
}
var_export($result);

File Contents:

"Price","Bogus","Symbol","Description","Qty (Quantity)","Price Chng % (Price Change %)","Price Chng $ (Price Change $)","Day Chng % (Day Change %)","Day Chng $ (Day Change $)","Cost Basis","Gain % (Gain/Loss %)","Gain $ (Gain/Loss $)","Reinvest?","Reinvest Capital Gains?","Last Div (Last Dividend)","Volume","Security Type"
"$19.39","foo1","Test1","ETF","820","-2.42%","-$0.48","-2.42%","-$393.6","$601.73","-18.89%","-$3701.93","No","--","$0.72025","14,626,464","ETFs & Closed End Funds"
"$49.43","foo2","Test2"," ETF","110","-2.39%","-$1.21","-2.39%","-$133.1","$995.2","-9.31%","-$557.9","Yes","--","N/A","59,351,095","ETFs & Closed End Funds"
"$21.77","foo3","Test3"," ETF","760","-3.72%","-$0.84","-3.72%","-$638.4","$687.73","-20.02%","-$142.53","No","--","$2.0216","6,402,084","ETFs & Closed End Funds"

Output:

array (
  0 => 
  array (
    'Symbol' => 'Test1',
    'Description' => 'ETF',
    'Qty (Quantity)' => '820',
    'Price' => '$19.39',
    'Price Chng % (Price Change %)' => '-2.42%',
    'Price Chng $ (Price Change $)' => '-$0.48',
    'Day Chng % (Day Change %)' => '-2.42%',
    'Day Chng $ (Day Change $)' => '-$393.6',
    'Cost Basis' => '$601.73',
    'Gain % (Gain/Loss %)' => '-18.89%',
    'Gain $ (Gain/Loss $)' => '-$3701.93',
    'Reinvest?' => 'No',
    'Reinvest Capital Gains?' => '--',
    'Last Div (Last Dividend)' => '$0.72025',
    'Volume' => '14,626,464',
    'Security Type' => 'ETFs & Closed End Funds',
  ),
  1 => 
  array (
    'Symbol' => 'Test2',
    'Description' => ' ETF',
    'Qty (Quantity)' => '110',
    'Price' => '$49.43',
    'Price Chng % (Price Change %)' => '-2.39%',
    'Price Chng $ (Price Change $)' => '-$1.21',
    'Day Chng % (Day Change %)' => '-2.39%',
    'Day Chng $ (Day Change $)' => '-$133.1',
    'Cost Basis' => '$995.2',
    'Gain % (Gain/Loss %)' => '-9.31%',
    'Gain $ (Gain/Loss $)' => '-$557.9',
    'Reinvest?' => 'Yes',
    'Reinvest Capital Gains?' => '--',
    'Last Div (Last Dividend)' => 'N/A',
    'Volume' => '59,351,095',
    'Security Type' => 'ETFs & Closed End Funds',
  ),
  2 => 
  array (
    'Symbol' => 'Test3',
    'Description' => ' ETF',
    'Qty (Quantity)' => '760',
    'Price' => '$21.77',
    'Price Chng % (Price Change %)' => '-3.72%',
    'Price Chng $ (Price Change $)' => '-$0.84',
    'Day Chng % (Day Change %)' => '-3.72%',
    'Day Chng $ (Day Change $)' => '-$638.4',
    'Cost Basis' => '$687.73',
    'Gain % (Gain/Loss %)' => '-20.02%',
    'Gain $ (Gain/Loss $)' => '-$142.53',
    'Reinvest?' => 'No',
    'Reinvest Capital Gains?' => '--',
    'Last Div (Last Dividend)' => '$2.0216',
    'Volume' => '6,402,084',
    'Security Type' => 'ETFs & Closed End Funds',
  ),
)

I was manually just changing the array numbers when they did this, but they change it a lot so I am wondering if I can add some code to where the Symbol column would always be 0 and Price would always be in column 3 and the same applies to all of the column names.

It's not of much magic actually, perhaps the perception of your own history of trials (and errors) block your mind a bit.

Let's see if we can shake this up.

First the good news: Especially with your array, this is straight forward as the CSV file is already parsed.

And this all boils down to a single function, array_keys().

Good!

But first things first. Let's be clear we're talking about an array here:

if (!is_array($array))
{
    throw new Error('Array required, got ' . gettype($array));
}

Okay, perhaps no news so far. But better safe than sorry!

Now, let's be clear about another thing, and that is, that this $array is a list:

if (!array_is_list($array))
{
    throw new Error('Array is not a list');
}

A list means, that each key in the the array is a consecutive integer number starting at zero. That is, each key has the value of the previous key plus one (+1 or ++).

And if there was no previous key, that virtually previous number would be minus one (-1).

An array with no members is an example of such a list, as would be the return value of array("one") or array("one", "two") etc.

For mathematicians it feels quite natural, they only have to specify whether or not the number zero is part of natural numbers. In PHP, the array first index (key) is always zero (0).

So, as speaking about array_keys() here, for a non-empty array, we can always say:

assert(is_array($array) && !empty($array));

range(0, -1 + count($array)) === array_keys($array);
                                 //^^^^^^^^(......)

Or to even better understand array_keys(), as it returns the list of keys:

array_keys($array) === array_keys(array_keys($array));
//^^^^^^^^(......)     ^^^^^^^^^^(^^^^^^^^^^(......))

Good to know!

But enough with the array_keys() recursion for now and let's take a look at the bigger picture of this beloved array.

The given $array is all CSV parsed, with the following common layout of such a CSV file:

TITLE
[... TEXT]

SYMBOLS
[... DATA]

This layout is line based, and the $array is a list (see above) of such lines.

Let's first obtain the title, as it is first in list, we can assign it with list deconstruction. It actually looks like assigning to an array:

[$title] = $array;

Now, the next thing is finding the SYMBOLS. We are looking for the first empty line.

[$indexOfFirstEmpty] = array_keys($array, null);

Good to know: array_keys() in action for the keys of value.

And then I've to admit that I cheated on you: It's not all about array_keys(), there is also array_slice() and array_shift():

$data = array_slice($array, 1 + $indexOfFirstEmpty);
$symbols = array_shift($data);

Okay, that's probably a bit fast now, all those functions remain documented in the PHP manual at https://php/manual, therefore, take your time for reading if you need to. You can always continue later here.

Let's peek what we got so far:

print_r($symbols);

Array
(
    [0] => Symbol
    [1] => Description
    [2] => Qty (Quantity)
    [3] => Price
    [4] => Price Chng % (Price Change %)
    [5] => Price Chng (Price Change )
[ ... 11 more lines ]

print_r($data);

Array
(
    [0] => Array
        (
            [0] => Test1
            [1] => test desc
            [2] => 820
            [3] => 19.505
[ ... 51 more lines ]

The output shows us already how the symbol keys match with the data keys.

If those of each date can be combined with their symbol, you get the mapping and can access by name next to the numeric index.

As each symbol (string) will represent an array key, the symbols need to be checked if they are fitting, e.g. a duplicate symbol would ruin the game. Table flip:

$keys = array_flip($symbols);
assert($symbols === array_keys($keys));

If this asserts, we can now go on and use the keys by symbol (string) to obtain the index (integer). In case there is no such index, we can use the null coalescing operator and NULL as fallback.

foreach ($data as $index => $date)
{
    printf(
        "#%d: %s %s\n",
        $index,
        var_export($date[$keys['Symbol']] ?? null, true),
        var_export($date[$keys['Last Div (Last Dividend)']] ?? null, true),
    );
}

#0: 'Test1' '0.72025'
#1: 'Test2' NULL
#2: 'Test3' '2.0216'

So far for the arrays. All in all it works this way.

Naturally it's possible to wrap this up more but then it becomes more abstract.

For example, I personally prefer the ArrayAccess interface for symbolic keys (key aliasing) and SplFileObject for comma separated values (CSV).

From what I remember one example I've put on Stackoverflow was Process CSV Into Array With Column Headings For Key.

You can dynamically map column positions by using the header row. First, get the header positions, then use those positions to access values in each row.

if (($handle = fopen("file.csv", "r")) !== FALSE) {
    $headers = fgetcsv($handle); // Read header row
    $map = array_flip($headers); // Map column names to indexes

    while (($data = fgetcsv($handle)) !== FALSE) {
        $symbol = $data[$map['Symbol']] ?? null;
        $price = $data[$map['Price']] ?? null;
        echo "Symbol: $symbol, Price: $price\n";
    }
    fclose($handle);
}

本文标签: phpParse a CSV file which may have columns out of the expected orderStack Overflow