File Type Detection in IntelliJ  Custom Language Development

File Type Detection in IntelliJ

This post explains how IntelliJ's file type detection works. It outlines a solution to detect a file type based on content alone. It continues to demonstrate an alternative, less efficient solution to detect the file type based on more than a file's content.

Introduction

Each VirtualFile has an attached file type. Associating files to types by a file extensions is a common practise. IntelliJ comes with a predefined set of file extensions to file types. Plugins add their own set of extensions to this list.

This rather simple model isn’t suitable for everyone. For example, the following use cases need more than the extension alone:

  • you want to look at content to detect the file type
  • you want to change the type based on the file’s location in the file system
  • you want to support files without an extension

There are two different ways to implement these cases. The first, FileTypeRegistry.FileTypeDetector is solely based on filename and content. The second solution, FileTypeIdentifiableByVirtualFile, allows detection based on more properties but is less efficient.

Detecting a file type based on content alone

Implement the interface FileTypeRegistry.FileTypeDetector to detect a file type by content.

The method detect(...) is called by IntelliJ to compute the file type:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    /**
     * Detects file type by its content. Detection must rely only of the filename and content.
     * @param file to analyze
     * @param firstBytes of the file for identifying its file type
     * @param firstCharsIfText - characters, converted from first bytes parameter
     *                           if the file content was determined to be text,
     *                           or null otherwise
     * @return detected file type, or null if was unable to detect
     */
    public FileType detect(@NotNull VirtualFile file, 
                           @NotNull ByteSequence firstBytes, 
                           @Nullable CharSequence firstCharsIfText) {
    }

The results are cached. The method won’t be called again as long as the file doesn’t change. When a VirtualFile is renamed, moved or modified then your implementation will be called again to re-evalutate the file type.

It’s perfectly fine if you return a different file type than before.

Because the results are cached you have to notify IntelliJ when you changed the detection logic of detect(...). This is done with int getVersion(). Increment the return value each time you modify the logic of your detect(...) method. Never return a smaller value than before because that might mix up the caches.

A change to the version basically resets all cached file types. IntelliJ then re-evaluates the types of all files of a project. So be careful that you change it only when necessary. All your users will see the IntelliJ is indexing... message when they open a project for the first time after updating your plugin.

Be careful to not access properties beyond file name and content. For example, it would possible to access the file’s parent directory or your plugin’s settings in detect(). This works great on first run but will fail after the parent directory or the settings were changed. Your implementation won’t be called after changes to settings or parent directory.

Configuration in your plugin.xml

The configuration is straigh-forward:

1
2
3
4
5
<extensions defaultExtensionNs="com.intellij">
    <!-- Optional: specify order="FIRST" to make your detector the first one called,
         order="LAST" to make it the fallback detector -->
    <fileTypeDetector implementation="com.plugindev.fileType.ImageFileTypeDetector" />
</extensions>

Sample code

Basic implementation
A very basic implementation with a unit test is available at intellij-file-type-detection/ImageFileTYpeDetector.java on github.com.
Switching file types
A demonstration of switching file types is available at intellij-file-type-detection/SwitchingFileTypeDetector. It also shows how to unit-test this because it’s not straight-forward in IntelliJ (see unit test implementation)

Detecting a file’s type based on more than content

Implementation

As you can see, a FileTypeDetector isn’t very flexible. If you need to access more than filename and content to implement the detection of the file’s type then you need to add a custom file type which implements the interface FileTypeIdentifiableByVirtualFile. This interface declares the method boolean isMyFileType(@NotNull VirtualFile file).

Your implementation is free to use more than just the in its implementation. Because you’re implementing a custom file type here that method is used for a single Language only, i.e. the language specified in your file type.

Disadvantages

As you may have guessed, these advantages don’t come without a downside. The great disadvantage is that isMyFileType() is called A LOT by IntelliJ and it’s never cached.

Here are some numbers I found while experimenting with it:

  • I invalidated caches in IntelliJ and then opened the sources of IntelliJ Community edition, version 171.4424
  • The project contained 89,971 files
  • My FileTypeIdentifiableByVirtualFile was called 648,577 times. This is more than 7 times per file, on average.
  • My FileTypeDetector was called 11,204 times.

Each write action seems to trigger calls of the available implementations FileTypeIdentifiableByVirtualFile several times. In my experiments a single key press in a file resulted in 6 distinct calls. IntelliJ’s handling of these seems to quite inefficient.

A slow implementation will slow down the startup of IntelliJ and the step Scanning files to index. Tune your implementation and your user’s will thank you :)

The detector wasn’t called for each file, though. FileTypeIdentifiableByVirtualFile and the extension mappings defined in the settings are called before content detectors. See below for further details on this.

Configuration in your plugin.xml

This is almost as simple as declaraing a detector. You need to declare acom.intellij.openapi.fileTypes.FileTypeFactory to publish your custom file type.

1
2
3
<extensions defaultExtensionNs="com.intellij">
    <fileTypeFactory implementation="com.plugindev.fileType.JsonDataOverrideFactory"/>
</extensions>

Sample code of a factory:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import com.intellij.openapi.fileTypes.FileTypeConsumer;
import com.intellij.openapi.fileTypes.FileTypeFactory;
import org.jetbrains.annotations.NotNull;

public class JsonDataOverrideFactory extends FileTypeFactory {
    static final JsonOverrideByParent JSON_OVERRIDE = new JsonOverrideByParent();

    @Override
    public void createFileTypes(@NotNull FileTypeConsumer consumer) {
        consumer.consume(JSON_OVERRIDE);
    }
}

Sample code

Basic implementation
An example with a unit test is available at intellij-file-type-detection/JsonOverrideByParent on github.com.

Guidelines

IntelliJ calls file type detectors last (reDetect(...) in FileTypeManagerImpl, source).

  1. At first the available implementations of FileTypeIdentifiableByVirtualFile are asked for the file type.
  2. If there’s no match then IntelliJ’s static mapping of extensions to file types is queried
  3. If there still wasn’t a match then the registered FileTypeDetector extensions are called to detect the file type.

Here’s a little table to help you to choose the right way to do this:

FileTypeDetectorFileTypeIdentifiableByVirtualFile
Handle file without extension
Based on content alone
Override a pre-defined file type
Access settings for detection
Access file’s parent hierarchy
Override another FileTypeDetector✓ (with order)
Override another FileTypeIdentifiableByVirtualFile✓ (with order)

Classes

Checkout these classes if you want to dive deeper into the code:

com.intellij.openapi.fileTypes.FileTypeRegistry.FileTypeDetector
IntelliJ’s interface, as discussed above
com.intellij.openapi.fileTypes.ex.FileTypeIdentifiableByVirtualFile
A file type which is based on a virtual file’s properties. Discussed above.
com.intellij.openapi.fileTypes.FileTypeManager
The class managing the relation between files and types. Provides lower-level access to the file types.
com.intellij.openapi.fileTypes.UnknownFileType
Used in IntelliJ to denote an unknown file type. You can usually return null instead of using this file type in your implementations of a detector, for example.
com.intellij.psi.search.FileTypeIndexImpl
The index which access the version numbers of the registered FileTypeDetector extensions.
intellij-file-type-detection (github.com)
A simple plugin to demonstrate the different ways to detect file types. This project also shows to cover file type detection in unit tests.