File Hash

Generates and stores cryptographic hashes for each file uploaded to the site, enabling file verification, duplicate detection, and content-addressable storage.

filehash
2,167 sites
34
drupal.org

Install

Drupal 11 v3.2.2
composer require 'drupal/filehash:^3.2'
Drupal 10, 9, 8 v2.1.1
composer require 'drupal/filehash:^2.1'

Overview

File Hash module generates and stores cryptographic hashes for each file uploaded to a Drupal site. Hashes of uploaded files, commonly found on sites from archive.org to wikileaks.org, allow files to be uniquely identified, enable duplicate files to be detected, and allow copies to be verified against the original source.

The module supports 18 hash algorithms across multiple cryptographic families: BLAKE2b (128, 160, 224, 256, 384, 512 bit variants), MD5, SHA-1, SHA-2 family (SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256), and SHA-3 family (SHA3-224, SHA3-256, SHA3-384, SHA3-512). Hash values are stored as base fields on File entities, making them available to Views, tokens, templates, and entity queries.

Key capabilities include duplicate file prevention at both site-wide and per-field levels, lazy or bulk hash generation for existing files, optional storage of original file hashes (useful when files are processed after upload), and visual hash representation via Identicons. The module integrates with Views for duplicate file filtering and provides Drush commands for command-line operations.

Features

  • Generates cryptographic hashes automatically when files are uploaded
  • Supports 18 hash algorithms including BLAKE2b, MD5, SHA-1, SHA-2, and SHA-3 families
  • Stores hashes as base fields on File entities for easy access via Views, templates, and entity queries
  • Duplicate file detection and prevention at both site-wide and per-field levels
  • Optional storage of original file hashes before any processing modifications
  • Lazy hash generation option to auto-generate missing hashes when files are loaded
  • Bulk hash generation for pre-existing files via admin UI or Drush command
  • Clean-up functionality to remove database columns for disabled algorithms
  • Token support for file hashes including pairtree tokens for content-addressable storage
  • Views integration with duplicate file filter
  • Identicon formatter for visual hash representation (requires third-party library)
  • Table formatter for displaying files with their hash values
  • Entity query support for searching files by hash value
  • Drush commands for generating hashes, cleaning up, and reporting duplicates
  • Migration support from Drupal 7

Use Cases

File Verification and Integrity Checking

Use file hashes to verify that downloaded copies of files match the original. Display the hash on download pages so users can verify file integrity using command-line tools like sha256sum or b2sum.

Duplicate File Prevention

Enable the dedupe setting to prevent users from uploading files that already exist on the site. Useful for reducing storage costs and ensuring canonical file references. Can be enabled site-wide or per-field.

Content-Addressable Storage

Use pairtree tokens ([file:filehash-sha256-pair-1]/[file:filehash-sha256-pair-2]) to organize files in directories based on their hash. For example, a file with SHA-256 hash e3b0c44298fc1c149... would be stored in files/e3/b0/.

Finding Duplicate Files

Use the Drush report command (drush filehash:report) or create a View with the 'Has duplicate hash' filter to identify duplicate files on the site for cleanup or consolidation.

File Search by Hash

Use entity queries to find files by their hash value. Useful for programmatically locating specific files or checking if a file with a particular hash already exists.

Audit Trail and Forensics

Store original file hashes separately from current hashes when files are processed (e.g., image resizing). This allows verification of both the original uploaded file and its current state.

Tips

  • For best performance, only enable the hash algorithms you actually need
  • SHA-256 is a good general-purpose choice balancing security and compatibility
  • Use SHA-512/256 for better performance on 64-bit systems while maintaining 256-bit security
  • BLAKE2b algorithms offer faster performance than SHA-2 but require the Sodium PHP extension
  • The autohash setting can impact performance on sites with many files - consider using bulk generation instead
  • Use strict dedupe mode if you need to prevent the same file from being uploaded simultaneously by multiple users
  • Pairtree tokens are useful for distributing files across directories to avoid filesystem limitations with many files in one folder

Technical Details

Admin Pages 3
File hash settings /admin/config/media/filehash

Configure file hash settings including which hash algorithms to use, duplicate file detection behavior, and hash generation options.

Generate /admin/config/media/filehash/generate

Bulk generate file hashes for all previously uploaded files. This is useful when enabling the module on a site with existing files or when enabling new hash algorithms.

Clean up /admin/config/media/filehash/clean

Remove database columns for disabled hash algorithms. This permanently deletes hash data for algorithms that have been disabled.

Hooks 11
hook_entity_base_field_info

Adds hash value base fields to File entities for each enabled algorithm. Fields are stored as 'filehash' field type with indexed varchar_ascii columns.

hook_entity_storage_load

Generates missing hashes when files are loaded, if autohash setting is enabled.

hook_ENTITY_TYPE_create

Called when a new file entity is created. Sets initial hash values including original hash if configured.

hook_ENTITY_TYPE_presave

Called before a file entity is saved. Generates missing hashes or regenerates all hashes based on rehash setting.

hook_field_widget_single_element_form_alter

Adds FileHashDedupe upload validator to file/image field widgets when dedupe is enabled in field settings.

hook_form_FORM_ID_alter

Adds dedupe settings to file field configuration form, allowing per-field duplicate detection configuration.

hook_views_data_alter

Adds 'Has duplicate hash' filter for each enabled algorithm to Views file_managed table.

hook_token_info

Provides token information for file hashes including full hash and pairtree tokens.

hook_tokens

Generates token replacements for file hash values and pairtree components.

hook_help

Provides help text for the module on admin pages.

hook_requirements

Checks Sodium PHP extension status when BLAKE2b algorithms are enabled.

Drush Commands 3
drush filehash:generate

Generate hashes for all existing files that are missing hash values. Runs as a batch process.

drush filehash:clean

Remove database columns for disabled hash algorithms. Permanently deletes hash data for disabled algorithms.

drush filehash:report

Print a list of duplicate files by querying the database for files with duplicate hashes.

Troubleshooting 6
BLAKE2b hashes are not being generated

The BLAKE2b algorithms require the Sodium PHP extension. Check Administration > Reports > Status report for the Sodium extension status. Install the extension or use the paragonie/sodium_compat polyfill.

Error about enabling algorithm column after disabling it

Run cron first to complete the deletion of the database column, then enable the algorithm again.

Hashes not updating when files are modified

Enable the 'Always rehash file when saving' setting to regenerate hashes whenever files are saved. By default, hashes are only generated once.

Missing hashes for existing files

Visit /admin/config/media/filehash/generate or run 'drush filehash:generate' to generate hashes for all existing files in bulk.

Identicon formatter shows error

The Identicon formatter requires the third-party library. Run 'composer require yzalis/identicon:^2.0' to install it.

Entity storage error when cleaning up columns

Try running cron before proceeding. The error may occur if field deletion is still in progress.

Security Notes 4
  • Enabling duplicate detection has privacy implications - users may be able to determine if a specific file exists on the site by attempting to upload it
  • MD5 and SHA-1 are considered cryptographically weak and should only be used for compatibility with systems that require them
  • For security-sensitive applications, use SHA-256 or stronger algorithms
  • The module validates file hashes during upload to prevent duplicate file attacks