CKEditor 5 Paste Filter

Filters content pasted into CKEditor 5 through configurable regular expression-based search and replace rules.

ckeditor5_paste_filter
13,057 sites
30
drupal.org

Install

Drupal 11, 10, 9 v1.1.1
composer require 'drupal/ckeditor5_paste_filter:^1.1'

Overview

CKEditor 5 Paste Filter allows you to clean and sanitize content pasted into CKEditor 5 text areas using a customizable set of regular expression-based filters. Each filter consists of a JavaScript regular expression pattern and a replacement string.

The most common use case is cleaning up the messy markup generated by rich text sources such as Microsoft Word and Google Docs. While Drupal core's CKEditor 5 includes some paste cleanup functionality, it does not provide the comprehensive filtering capabilities that were available in CKEditor 4's 'Paste from Word' feature. This module fills that gap by providing configurable regex-based content filtering.

This module is the CKEditor 5 successor to the CKEditor Paste Filter module. It has been created as a separate project to allow sites transitioning from CKEditor 4 to CKEditor 5 to have both modules installed simultaneously during the migration process.

Features

  • Filters pasted content in CKEditor 5 using JavaScript regular expressions with full regex support (global, ignoreCase, multiline, dotAll, and unicode flags)
  • Comes pre-configured with 13 default filters optimized for cleaning up Microsoft Word and Google Docs content
  • Removes unwanted Office-specific markup including <o:p> tags, style attributes, class attributes, face attributes, and valign attributes
  • Strips unnecessary font and span tags while preserving meaningful formatting
  • Removes empty paragraph, bold, and italic elements
  • Cleans up OLE_LINK anchor tags inserted by Microsoft Office
  • Fully customizable filter configuration: add, edit, remove, enable/disable, and reorder filters via drag-and-drop
  • Replacement strings support capture group references ($1, $2, etc.) from the search expression
  • Configurable per text format - different formats can have different paste filtering rules
  • Graceful error handling for invalid regular expressions with console error logging

Use Cases

Cleaning content pasted from Microsoft Word

When content editors copy and paste text from Microsoft Word documents, the pasted content typically includes excessive markup such as inline styles, Office-specific XML tags (<o:p>), font and span tags, class attributes, and other formatting that clutters the HTML. The default filters remove these elements, preserving only semantic markup like paragraphs, bold, italic, and links.

Cleaning content pasted from Google Docs

Similar to Word, Google Docs generates unnecessary span tags, style attributes, and class names when content is copied. The paste filter strips these while maintaining the document's semantic structure.

Custom content transformations on paste

Beyond cleanup, the regex-based system can be used for custom transformations. For example, automatically converting certain text patterns, fixing common formatting issues, or standardizing markup according to site-specific requirements.

Enforcing clean HTML in text areas

Organizations with strict content standards can configure filters to ensure all pasted content meets their HTML cleanliness requirements, preventing inline styles and unnecessary attributes from entering the content database.

Migration from CKEditor 4

Sites that relied on CKEditor 4's 'Paste from Word' functionality can use this module to replicate and customize that behavior in CKEditor 5, making the transition smoother for content editors accustomed to the cleanup features.

Tips

  • Use the browser console to test regular expressions before adding them to the configuration. JavaScript's RegExp constructor uses the same syntax.
  • The replacement string can reference capture groups from the search pattern using $1, $2, etc. This is useful for preserving parts of matched content while modifying surrounding markup.
  • Filters run in weight order, so complex cleanup operations can be broken into multiple sequential filters for better maintainability.
  • To remove all default filters and start fresh, simply empty the search expression field for each row and save - empty rows are automatically removed.
  • When debugging, temporarily disable filters one by one to identify which one is causing unexpected behavior.
  • Special characters in regex patterns that need escaping include: \ . + * ? ^ $ { } [ ] ( ) | / - Use backslash before each.
  • The 'g' (global) flag is automatically applied, so all matches in the pasted content are replaced, not just the first occurrence.

Technical Details

Admin Pages 1
Configure text format - Paste Filter settings /admin/config/content/formats/manage/{format_name}

Configure the CKEditor 5 Paste Filter plugin within a text format's CKEditor 5 settings. The paste filter configuration appears as a vertical tab under 'CKEditor 5 plugin settings' when CKEditor 5 is selected as the text editor.

Troubleshooting 5
Filters are not being applied to pasted content

Ensure the 'Filter pasted content' checkbox is enabled in the text format configuration. Also verify that individual filters in the table have their 'Enabled' checkbox checked.

Invalid regular expression error appears in browser console

Check the search expression syntax. Remember that special regex characters (like forward slashes) must be escaped with backslash. The error message will identify which search expression is invalid.

Filters are running in unexpected order

Check the weight values for each filter. Filters with lower weights run first. Use the drag-and-drop handles or adjust weight values to control execution order.

Filter configuration not saving

Ensure both the search expression and replacement fields are filled appropriately. Rows with empty search expressions are automatically removed on save.

Content still contains unwanted formatting after paste

Add custom filters to target the specific unwanted patterns. Use browser developer tools to inspect the pasted HTML and create appropriate regex patterns. Consider that some markup may be added by CKEditor 5 itself after filtering.

Security Notes 2
  • Regular expressions are executed in the browser's JavaScript context on pasted content. While this operates on content being edited rather than displayed, care should be taken with user-configurable patterns in multi-user environments.
  • This module has security coverage from the Drupal Security Team, indicating it has been reviewed for security best practices.