Media Duplicates
Provides a framework to identify, compare, detect, and optionally restrict duplicate media entities within a Drupal site.
media_duplicates
Install
composer require 'drupal/media_duplicates:^2.0'
composer require 'drupal/media_duplicates:8.x-1.2'
Overview
Media Duplicates is a comprehensive module that helps site administrators manage and control duplicate media entities in their Drupal installation. The module generates SHA256 checksums for media source files and uses these checksums to detect when the same file or oEmbed resource is uploaded multiple times.
The module provides a pluggable checksum system that supports various media source types including files, images, audio, video, and oEmbed content. Site administrators can configure whether to simply report duplicates or actively prevent users from creating them. Progressive enforcement options allow sites with existing duplicates to prevent new ones while still permitting edits to legacy duplicate items.
A dedicated duplicates report page shows all media entities that share the same checksum, providing direct links to each duplicate item for easy review. The module integrates seamlessly with Drupal's validation system to block duplicate creation when configured, displaying clear error messages with links to existing duplicate media items.
Features
- Automatic checksum generation for media entities using SHA256 hashing on save
- Duplicate detection report page showing all media entities with matching checksums
- Configurable duplicate restriction with validation constraint on media entity save
- Progressive enforcement mode allowing existing duplicates while blocking new ones
- Bundle-specific comparison option to only check duplicates within the same media type
- Pluggable checksum architecture supporting custom media source types
- Built-in support for file-based media (file, image, audio_file, video_file) and oEmbed content
- Batch processing system for rebuilding checksums on existing media entities
- Drush command for CLI-based checksum rebuilding with bundle filtering
- Integration with Entity Usage module recommended for consolidating duplicates
Use Cases
Preventing duplicate media uploads on a new site
Install Media Duplicates at the start of your project and enable 'Restrict users from creating duplicate media items'. Users will receive validation errors when attempting to upload files that already exist in the media library, with helpful links to the existing media item they can use instead.
Auditing an existing site for duplicate media
Install Media Duplicates and run the checksum rebuild process. Visit the Media duplicates report to see all groups of duplicate media items. Leave duplicate restriction disabled to avoid disrupting existing workflows while you plan cleanup.
Progressive cleanup of legacy duplicates
Enable 'Restrict users from creating duplicate media items' along with 'Only restrict duplicates on new media items'. This prevents new duplicates while allowing content editors to continue working with existing duplicate items until you can consolidate them.
Multi-bundle media with intentional duplicates
If your site uses the same image in both a 'Photo Gallery' bundle and a 'Document Attachment' bundle intentionally, enable 'Compare within same bundle only' to allow this pattern while still preventing duplicates within each bundle.
Custom media source type support
If you have a custom media source plugin, implement hook_media_duplicates_checksum_info_alter() to associate it with an existing checksum plugin, or create a new checksum plugin extending MediaDuplicatesChecksumBase to provide custom checksum logic.
Tips
- Install this module as early as possible in your development process to prevent duplicate media from accumulating
- Use the Entity Usage module in conjunction with Media Duplicates to identify where duplicate media items are used before consolidating them
- The Drush command supports processing specific bundles, which is useful for incremental rebuilds or targeted checksum updates
- Custom checksum plugins must implement MediaDuplicatesChecksumInterface and extend MediaDuplicatesChecksumBase for consistency
- The module stores checksums in a translatable, revisionable field, so each translation/revision can have its own checksum if the source differs
Technical Details
Admin Pages 3
/admin/config/media/media-duplicates
Configure how the module handles duplicate media detection and restriction. This page allows administrators to enable or disable duplicate restrictions and fine-tune the comparison behavior.
/admin/reports/media-duplicates
Displays a comprehensive report of all duplicate media entities in the system. Shows each unique checksum that has multiple media items, along with the count and direct links to each duplicate entity.
/admin/config/media/media-duplicates/refresh
Rebuilds checksums for all or selected media bundles. Use this form after initial module installation on a site with existing media, or after changes to checksum plugins.
Permissions 1
Hooks 1
hook_media_duplicates_checksum_info_alter
Alters the definition of checksum plugins. Use this hook to add support for custom media types to existing checksum plugins or modify plugin definitions.
Drush Commands 1
drush media-duplicates:checksums:rebuild
Rebuilds all checksums for media entities. Essential for initial setup on existing sites or after changes to checksum algorithms or plugins.
Troubleshooting 4
Checksums may not have been generated for existing media. Run 'drush media-duplicates:checksums:rebuild all' or use the Rebuild checksums form to generate checksums for all media entities.
No checksum plugin exists for this media source type. Either use hook_media_duplicates_checksum_info_alter() to map it to an existing plugin, or create a custom checksum plugin for your media source.
Some media source types may not have a matching checksum plugin. Check that the media source plugin ID matches one of the 'media_types' defined in a checksum plugin. Use hook_media_duplicates_checksum_info_alter() to add support.
For sites with many media items, use the Drush command with specific bundle arguments to process one bundle at a time, or run during low-traffic periods. The batch process is designed to handle large datasets but will take time proportional to the number of media entities.
Security Notes 2
- The 'administer media duplicates' permission is marked as restricted access and should only be granted to trusted administrators
- Checksum rebuilding bypasses access checks to ensure all media entities are processed regardless of the current user's permissions