Migrate Spreadsheet
Provides a migrate source plugin for importing data from spreadsheet files using the PhpOffice/PhpSpreadsheet library.
migrate_spreadsheet
Install
composer require 'drupal/migrate_spreadsheet:^2.2'
composer require 'drupal/migrate_spreadsheet:^2.0'
Overview
Migrate Spreadsheet is a Drupal migration module that enables importing data from various spreadsheet file formats into Drupal. It provides a powerful and flexible migrate source plugin that leverages the PhpOffice/PhpSpreadsheet library to read and parse spreadsheet files.
The module supports a wide range of spreadsheet formats including Open Document Format (.ods), Microsoft Excel formats (.xls, .xlsx), CSV, SpreadsheetML (.xml), and SYLK (.slk) files. It offers fine-grained control over data extraction through configuration options for specifying data origin, header rows, column selection, and primary key definitions.
The plugin handles Excel's internal date format by providing integration with PhpSpreadsheet's date conversion utilities, allowing seamless migration of date and time values. The iterator-based architecture efficiently processes large spreadsheets while maintaining low memory footprint.
Features
- Migrate source plugin (id: 'spreadsheet') for importing data from spreadsheet files
- Support for multiple file formats: Open Document Format (.ods), Office Open XML (.xlsx), BIFF 8 (.xls), BIFF 5 (.xls), SpreadsheetML (.xml), SYLK (.slk), and CSV
- Configurable data origin with cell reference notation (e.g., A2, B3) to specify where data starts
- Flexible header row configuration to map column names from spreadsheet headers
- Selective column import - specify exactly which columns to extract from the spreadsheet
- Composite primary key support with field storage schema definitions
- Row index pseudo-column for using row position as a primary key or additional data field
- Formula calculation support - computed cell values are automatically resolved
- Stream wrapper support for file paths, enabling reading from various storage backends
- Efficient iterator-based processing with internal caching for large file handling
- Date/time conversion support using PhpSpreadsheet's excelToTimestamp() method
Use Cases
Importing user data from an Excel spreadsheet
Migrate user accounts from a corporate Excel file containing employee information. Configure the spreadsheet source to map columns like 'Employee ID', 'Email', 'First Name', 'Last Name' to Drupal user fields. Use the Employee ID as the primary key for tracking migration status and enabling rollback.
Migrating product catalog from CSV
Import product data from a CSV export from an e-commerce platform or ERP system. Map product columns to Commerce Product fields including SKU, title, description, and price. Handle product variations by using composite keys.
Content migration from LibreOffice Calc (.ods)
Migrate legacy content stored in LibreOffice Calc spreadsheets to Drupal content types. Useful for organizations transitioning from manual content management in spreadsheets to a structured CMS.
Importing taxonomy terms with hierarchies
Build a taxonomy vocabulary from a spreadsheet containing category data. Use row index as the term ID and map parent relationships from a 'Parent' column to establish term hierarchies.
Periodic data synchronization
Set up recurring imports from regularly exported spreadsheets (e.g., inventory updates, pricing changes). The primary key system allows for updating existing content rather than creating duplicates.
Handling date fields from Excel
Migrate date and datetime values from Excel files where dates are stored as numeric serial numbers. Use the PhpSpreadsheet Date::excelToTimestamp() callback in the process pipeline to convert Excel's date format to Unix timestamps, then optionally format to the desired date string format.
Tips
- Always specify the 'columns' configuration to limit which columns are processed - this improves performance and clarity
- When spreadsheets have no natural unique identifier, use 'row_index_column' to generate a positional key
- Excel stores dates as numeric values (days since 1900-01-01). Use the callback process plugin with PhpOffice\PhpSpreadsheet\Shared\Date::excelToTimestamp to convert them
- The 'origin' setting excludes the header row - if your header is on row 1 and data starts on row 2, set origin to 'A2'
- For CSV files, the worksheet name should typically be the filename without extension or just use any string as PhpSpreadsheet handles it
- The iterator calculates formula values automatically - cell references like '=A1+B1' will return the computed result
- Use stream wrappers in the 'file' path (e.g., 'private://imports/data.xlsx') to read from non-public directories
- When migrating large spreadsheets, the module reads data efficiently row-by-row rather than loading the entire file into memory
Technical Details
Troubleshooting 7
Verify the file path is correct. Paths can be relative to Drupal root or absolute. Ensure the file has proper read permissions for the web server user.
The 'worksheet' configuration is required. Specify the exact worksheet name as it appears in the spreadsheet file (case-sensitive).
Ensure the column name in 'columns' configuration exactly matches the header cell value in the spreadsheet. Check for leading/trailing whitespace in both the config and the spreadsheet.
When no 'keys' are defined, you must set 'row_index_column' to provide a name for the pseudo-column that will be used as the primary key.
Each header cell value must be unique within the header row. Edit the source spreadsheet to ensure all header cells have distinct values.
Excel stores dates internally as serial numbers. Add a callback process plugin using PhpOffice\PhpSpreadsheet\Shared\Date::excelToTimestamp to convert these values.
The specified origin cell reference is outside the data range of the worksheet. Verify the origin coordinates are within the actual data boundaries of your spreadsheet.
Security Notes 3
- Ensure spreadsheet files are stored in secure locations with appropriate file permissions to prevent unauthorized access to potentially sensitive data
- When using stream wrappers for file paths, verify that the configured locations are not publicly accessible via web
- Be cautious when migrating data from untrusted spreadsheet sources as they may contain malicious formulas that could be evaluated