File Hash
Generates and stores cryptographic hashes for each file uploaded to the site, enabling file verification, duplicate detection, and content-addressable storage.
filehash
Install
composer require 'drupal/filehash:^3.2'
composer require 'drupal/filehash:^2.1'
Overview
File Hash module generates and stores cryptographic hashes for each file uploaded to a Drupal site. Hashes of uploaded files, commonly found on sites from archive.org to wikileaks.org, allow files to be uniquely identified, enable duplicate files to be detected, and allow copies to be verified against the original source.
The module supports 18 hash algorithms across multiple cryptographic families: BLAKE2b (128, 160, 224, 256, 384, 512 bit variants), MD5, SHA-1, SHA-2 family (SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256), and SHA-3 family (SHA3-224, SHA3-256, SHA3-384, SHA3-512). Hash values are stored as base fields on File entities, making them available to Views, tokens, templates, and entity queries.
Key capabilities include duplicate file prevention at both site-wide and per-field levels, lazy or bulk hash generation for existing files, optional storage of original file hashes (useful when files are processed after upload), and visual hash representation via Identicons. The module integrates with Views for duplicate file filtering and provides Drush commands for command-line operations.
Features
- Generates cryptographic hashes automatically when files are uploaded
- Supports 18 hash algorithms including BLAKE2b, MD5, SHA-1, SHA-2, and SHA-3 families
- Stores hashes as base fields on File entities for easy access via Views, templates, and entity queries
- Duplicate file detection and prevention at both site-wide and per-field levels
- Optional storage of original file hashes before any processing modifications
- Lazy hash generation option to auto-generate missing hashes when files are loaded
- Bulk hash generation for pre-existing files via admin UI or Drush command
- Clean-up functionality to remove database columns for disabled algorithms
- Token support for file hashes including pairtree tokens for content-addressable storage
- Views integration with duplicate file filter
- Identicon formatter for visual hash representation (requires third-party library)
- Table formatter for displaying files with their hash values
- Entity query support for searching files by hash value
- Drush commands for generating hashes, cleaning up, and reporting duplicates
- Migration support from Drupal 7
Use Cases
File Verification and Integrity Checking
Use file hashes to verify that downloaded copies of files match the original. Display the hash on download pages so users can verify file integrity using command-line tools like sha256sum or b2sum.
Duplicate File Prevention
Enable the dedupe setting to prevent users from uploading files that already exist on the site. Useful for reducing storage costs and ensuring canonical file references. Can be enabled site-wide or per-field.
Content-Addressable Storage
Use pairtree tokens ([file:filehash-sha256-pair-1]/[file:filehash-sha256-pair-2]) to organize files in directories based on their hash. For example, a file with SHA-256 hash e3b0c44298fc1c149... would be stored in files/e3/b0/.
Finding Duplicate Files
Use the Drush report command (drush filehash:report) or create a View with the 'Has duplicate hash' filter to identify duplicate files on the site for cleanup or consolidation.
File Search by Hash
Use entity queries to find files by their hash value. Useful for programmatically locating specific files or checking if a file with a particular hash already exists.
Audit Trail and Forensics
Store original file hashes separately from current hashes when files are processed (e.g., image resizing). This allows verification of both the original uploaded file and its current state.
Tips
- For best performance, only enable the hash algorithms you actually need
- SHA-256 is a good general-purpose choice balancing security and compatibility
- Use SHA-512/256 for better performance on 64-bit systems while maintaining 256-bit security
- BLAKE2b algorithms offer faster performance than SHA-2 but require the Sodium PHP extension
- The autohash setting can impact performance on sites with many files - consider using bulk generation instead
- Use strict dedupe mode if you need to prevent the same file from being uploaded simultaneously by multiple users
- Pairtree tokens are useful for distributing files across directories to avoid filesystem limitations with many files in one folder
Technical Details
Admin Pages 3
/admin/config/media/filehash
Configure file hash settings including which hash algorithms to use, duplicate file detection behavior, and hash generation options.
/admin/config/media/filehash/generate
Bulk generate file hashes for all previously uploaded files. This is useful when enabling the module on a site with existing files or when enabling new hash algorithms.
/admin/config/media/filehash/clean
Remove database columns for disabled hash algorithms. This permanently deletes hash data for algorithms that have been disabled.
Hooks 11
hook_entity_base_field_info
Adds hash value base fields to File entities for each enabled algorithm. Fields are stored as 'filehash' field type with indexed varchar_ascii columns.
hook_entity_storage_load
Generates missing hashes when files are loaded, if autohash setting is enabled.
hook_ENTITY_TYPE_create
Called when a new file entity is created. Sets initial hash values including original hash if configured.
hook_ENTITY_TYPE_presave
Called before a file entity is saved. Generates missing hashes or regenerates all hashes based on rehash setting.
hook_field_widget_single_element_form_alter
Adds FileHashDedupe upload validator to file/image field widgets when dedupe is enabled in field settings.
hook_form_FORM_ID_alter
Adds dedupe settings to file field configuration form, allowing per-field duplicate detection configuration.
hook_views_data_alter
Adds 'Has duplicate hash' filter for each enabled algorithm to Views file_managed table.
hook_token_info
Provides token information for file hashes including full hash and pairtree tokens.
hook_tokens
Generates token replacements for file hash values and pairtree components.
hook_help
Provides help text for the module on admin pages.
hook_requirements
Checks Sodium PHP extension status when BLAKE2b algorithms are enabled.
Drush Commands 3
drush filehash:generate
Generate hashes for all existing files that are missing hash values. Runs as a batch process.
drush filehash:clean
Remove database columns for disabled hash algorithms. Permanently deletes hash data for disabled algorithms.
drush filehash:report
Print a list of duplicate files by querying the database for files with duplicate hashes.
Troubleshooting 6
The BLAKE2b algorithms require the Sodium PHP extension. Check Administration > Reports > Status report for the Sodium extension status. Install the extension or use the paragonie/sodium_compat polyfill.
Run cron first to complete the deletion of the database column, then enable the algorithm again.
Enable the 'Always rehash file when saving' setting to regenerate hashes whenever files are saved. By default, hashes are only generated once.
Visit /admin/config/media/filehash/generate or run 'drush filehash:generate' to generate hashes for all existing files in bulk.
The Identicon formatter requires the third-party library. Run 'composer require yzalis/identicon:^2.0' to install it.
Try running cron before proceeding. The error may occur if field deletion is still in progress.
Security Notes 4
- Enabling duplicate detection has privacy implications - users may be able to determine if a specific file exists on the site by attempting to upload it
- MD5 and SHA-1 are considered cryptographically weak and should only be used for compatibility with systems that require them
- For security-sensitive applications, use SHA-256 or stronger algorithms
- The module validates file hashes during upload to prevent duplicate file attacks