How to efficiently read and parse a big CSV file with few lines of code PHP

Let’s have a look at how to efficiently read a big CSV file and parse its data in PHP.

If you are working for a company that offers development services and for more organizations in the same industry. This task will be very helpful for you to import a large database from a client to the database engine that is suggested by your company. For instance, you may work with MySQL in your company and your client came up with a CSV file that is of around 25GB and has over 7.5 million rows!

What?

Thats way too huge!

You cannot import such a huge database through tools like PHPMyAdmin. Each row in your CSV should be modified to fit well in your new database design.

In this tutorial, let’s have a look at our approach to efficiently read a huge CSV file in PHP.

1. Split File in Smaller Chunks

When dealing with huge CSV files that have millions of rows for instance in this case, with a CSV file of 25GB. The best approach for such cases is not to work with the file directly, but rather work with smaller files.

The smaller the file chunk, the better it will be to have optimal performance and control over your script. In fact, not just about the performance perspective but your coding logic as well. We have previously written an article on how to split huge CSV datasets into smaller chunks using CSV Splitter, a tool for Windows 10. You can of course do the same using another approach. But, did you get the idea? Split the file into smaller chunks that you can easily process by your scripts later.

2. Implement the Iteration Script

We will use the fopen function to read a file in PHP. This is an in-built function that is used to simply open a file from a local URL. It’s used to bind a resource to a steam. It expects as a second argument the mode in which we’ll operate. In this example, we are just reading with the r identifier. This method returns a file pointer as long as the file exists. Otherwise, it will return false in case of failure.

We will read the file using this method and will store a pointer in the $handle variable. The variable will store the current line number as we will iterate over the rows with an entry-controlled loop (while loop). Using a while loop, we can iterate over every single row of the file. This will verify the condition that fgets always returns some content.

The fgets function of PHP will return a line from an open file with fopen and it will return false when there is nothing left to read. Then, inside the while loop, we can parse the raw CSV string with the str_getcsv function. With all the basic stuff implemented, we will be ready to modify the script to do whatever we need to do for the best possible performance in PHP:

// Read a CSV file
$handle = fopen("my_huge_csv_file.csv", "r");

// Optionally, you can keep the number of the line where
// the loop its currently iterating over
$lineNumber = 1;

// Iterate over every line of the file
while (($raw_string = fgets($handle)) !== false) {
    // Parse the raw csv string: "1, a, b, c"
    $row = str_getcsv($raw_string);

    // into an array: ['1', 'a', 'b', 'c']
    // And do what you need to do with every line
    var_dump($row);
    
    // Increase the current line
    $lineNumber++;
}

fclose($handle);

The advantages of using this approach are:

  • You are not directly reading the entire file in memory as you do using file_get_contents function, so the maximum amount of memory required to run the script depends on the longest line in the input data.
  • Pretty easy code to read and understand.

Happy Coding ❤️!

Leave a Comment