The ExtractTermsFromAttachment script include provides methods to extract terms from an attachment.

This script include is called with the ScriptedExtractor object, SysAttachmentInputStream, the sys_id for the attachment, and the extension for the attachment. The getTerms() method is called to extract the terms from the attachment that should be indexed. The getTerms() method should just return a string that contains the terms. If you prefer to input a file rather than an inputStream, call extractor.getFile() to get the File object containing the attachment.