Media Duplicates

Provides a framework to identify, compare, detect, and optionally restrict duplicate media entities within a Drupal site.

media_duplicates
1,839 sites
20
drupal.org

Install

Drupal 11, 10, 9 v2.0.3
composer require 'drupal/media_duplicates:^2.0'
Drupal 8 v8.x-1.2
composer require 'drupal/media_duplicates:8.x-1.2'

Overview

Media Duplicates is a comprehensive module that helps site administrators manage and control duplicate media entities in their Drupal installation. The module generates SHA256 checksums for media source files and uses these checksums to detect when the same file or oEmbed resource is uploaded multiple times.

The module provides a pluggable checksum system that supports various media source types including files, images, audio, video, and oEmbed content. Site administrators can configure whether to simply report duplicates or actively prevent users from creating them. Progressive enforcement options allow sites with existing duplicates to prevent new ones while still permitting edits to legacy duplicate items.

A dedicated duplicates report page shows all media entities that share the same checksum, providing direct links to each duplicate item for easy review. The module integrates seamlessly with Drupal's validation system to block duplicate creation when configured, displaying clear error messages with links to existing duplicate media items.

Features

  • Automatic checksum generation for media entities using SHA256 hashing on save
  • Duplicate detection report page showing all media entities with matching checksums
  • Configurable duplicate restriction with validation constraint on media entity save
  • Progressive enforcement mode allowing existing duplicates while blocking new ones
  • Bundle-specific comparison option to only check duplicates within the same media type
  • Pluggable checksum architecture supporting custom media source types
  • Built-in support for file-based media (file, image, audio_file, video_file) and oEmbed content
  • Batch processing system for rebuilding checksums on existing media entities
  • Drush command for CLI-based checksum rebuilding with bundle filtering
  • Integration with Entity Usage module recommended for consolidating duplicates

Use Cases

Preventing duplicate media uploads on a new site

Install Media Duplicates at the start of your project and enable 'Restrict users from creating duplicate media items'. Users will receive validation errors when attempting to upload files that already exist in the media library, with helpful links to the existing media item they can use instead.

Auditing an existing site for duplicate media

Install Media Duplicates and run the checksum rebuild process. Visit the Media duplicates report to see all groups of duplicate media items. Leave duplicate restriction disabled to avoid disrupting existing workflows while you plan cleanup.

Progressive cleanup of legacy duplicates

Enable 'Restrict users from creating duplicate media items' along with 'Only restrict duplicates on new media items'. This prevents new duplicates while allowing content editors to continue working with existing duplicate items until you can consolidate them.

Multi-bundle media with intentional duplicates

If your site uses the same image in both a 'Photo Gallery' bundle and a 'Document Attachment' bundle intentionally, enable 'Compare within same bundle only' to allow this pattern while still preventing duplicates within each bundle.

Custom media source type support

If you have a custom media source plugin, implement hook_media_duplicates_checksum_info_alter() to associate it with an existing checksum plugin, or create a new checksum plugin extending MediaDuplicatesChecksumBase to provide custom checksum logic.

Tips

  • Install this module as early as possible in your development process to prevent duplicate media from accumulating
  • Use the Entity Usage module in conjunction with Media Duplicates to identify where duplicate media items are used before consolidating them
  • The Drush command supports processing specific bundles, which is useful for incremental rebuilds or targeted checksum updates
  • Custom checksum plugins must implement MediaDuplicatesChecksumInterface and extend MediaDuplicatesChecksumBase for consistency
  • The module stores checksums in a translatable, revisionable field, so each translation/revision can have its own checksum if the source differs

Technical Details

Admin Pages 3
Media duplicates settings /admin/config/media/media-duplicates

Configure how the module handles duplicate media detection and restriction. This page allows administrators to enable or disable duplicate restrictions and fine-tune the comparison behavior.

Media duplicates /admin/reports/media-duplicates

Displays a comprehensive report of all duplicate media entities in the system. Shows each unique checksum that has multiple media items, along with the count and direct links to each duplicate entity.

Rebuild checksums /admin/config/media/media-duplicates/refresh

Rebuilds checksums for all or selected media bundles. Use this form after initial module installation on a site with existing media, or after changes to checksum plugins.

Permissions 1
Administer media duplicates

Allows users to configure media duplication settings and rebuild checksums. This is a restricted permission that should only be granted to trusted administrators.

Hooks 1
hook_media_duplicates_checksum_info_alter

Alters the definition of checksum plugins. Use this hook to add support for custom media types to existing checksum plugins or modify plugin definitions.

Drush Commands 1
drush media-duplicates:checksums:rebuild

Rebuilds all checksums for media entities. Essential for initial setup on existing sites or after changes to checksum algorithms or plugins.

Troubleshooting 4
Media duplicates report shows no results but duplicates are known to exist

Checksums may not have been generated for existing media. Run 'drush media-duplicates:checksums:rebuild all' or use the Rebuild checksums form to generate checksums for all media entities.

Warning message 'Unable to create checksum for [type]' appears in logs

No checksum plugin exists for this media source type. Either use hook_media_duplicates_checksum_info_alter() to map it to an existing plugin, or create a custom checksum plugin for your media source.

Duplicate restriction is not working for certain media types

Some media source types may not have a matching checksum plugin. Check that the media source plugin ID matches one of the 'media_types' defined in a checksum plugin. Use hook_media_duplicates_checksum_info_alter() to add support.

Checksum rebuild is taking too long

For sites with many media items, use the Drush command with specific bundle arguments to process one bundle at a time, or run during low-traffic periods. The batch process is designed to handle large datasets but will take time proportional to the number of media entities.

Security Notes 2
  • The 'administer media duplicates' permission is marked as restricted access and should only be granted to trusted administrators
  • Checksum rebuilding bypasses access checks to ensure all media entities are processed regardless of the current user's permissions