Extracting Attachments from Outlook Mailboxes using C#


Friday, 16 October 2015

Share with: 
 

Introduction

My personal mailbox, with emails going back to the late 90's, is full of old attachments that bloat the PST file, but aren't really needed. The PST file, with attachments, is around 40Gb now.

I decided to write a simple C# console app to extract them to reduce the size of my PST file.

The application itself will perform a few simple tasks:

  1. Find the root folder in the Outlook Datastore
  2. Iterate recursively through the folder structure
  3. Iterate through each email message in each folder, looking for attachments
  4. When found, save each attachment in a folder structure on the hard disk, representing the outlook folder structure

Prerequisites

adding the Microsoft.Office.Interop.Outlook assembly reference in visual studio

Firstly, create a C# console application in Visual Studio, targeting the .NET 4.5 or higher framework.

The application makes use of the Microsoft.Office.Interop.Outlook assembly, so you'll need to add this as a reference in your project.

The Outlook Primary Interop Assembly (PIA) Reference provides help for developing managed applications for Outlook 2013 and 2016. It extends the Outlook 2013 and 2016 Developer Reference from the COM environment to the managed environment, allowing to you interact with Outlook from a .NET application.

You also need to have Microsoft Outlook installed on your PC - otherwise the Interop assembly has nothing to talk to.

Learn more on MSDN.


Iterating through Outlook Accounts

Before we can go through each folder and email in Outlook, we need to find an actual account, and build the root folder from this.

The root folder is in the format \\foldername\, and the inbox is located one level below this, at \\foldername\Inbox\.

To do this, we simply iterate through the Outlook.Application.Session.Accounts collection.


Outlook.Application Application = new Outlook.Application();
Outlook.Accounts accounts = Application.Session.Accounts;
foreach (Outlook.Account account in accounts)
    {
        Console.WriteLine(account.DisplayName);
    }
    

From these, we can derive the root folder name.

Recursing through folders

Using the function below, we initially pass it the root folder. It then looks for any child (sub) folders, and passes this to itself recursively, following the folder structure until it reaches the end.


static void EnumerateFolders(Outlook.Folder folder)
{
    Outlook.Folders childFolders = folder.Folders;
    if (childFolders.Count > 0)
    {
        foreach (Outlook.Folder childFolder in childFolders)
        {
            // We only want Inbox folders - ignore Contacts and others
            if (childFolder.FolderPath.Contains("Inbox"))
            {
                Console.WriteLine(childFolder.FolderPath);
                // Call EnumerateFolders using childFolder, to see if there are any sub-folders within this one
                EnumerateFolders(childFolder);
            }
        }
    }
}

Iterating through Emails in a folder and listing their attachments

Using the function below, we initially pass it the current folder. It will then iterate through the folder.Items object, which literally contains a collection of the actual email messages in the outlook folder.

Each email is a returned as an item, containing the property .Attachments.Count, which indicates how many attachments the email message has.

Where this is not zero (!= 0), we simply list out each attachment in the email. From here you can save the attachment, delete it, or otherwise process it however you wish.


static void IterateMessages(Outlook.Folder folder)
{
    var fi = folder.Items;
    if (fi != null)
    {
        foreach (Object item in fi)
        {
            Outlook.MailItem mi = (Outlook.MailItem)item;
            var attachments = mi.Attachments;
            if (attachments.Count != 0)
            {
                for (int i = 1; i <= mi.Attachments.Count; i++)
                {
                    Console.WriteLine("Attachment: " + mi.Attachments[i].FileName);
                }
            }
        }
    }
}

Looking for specific types of attachments

It's quite common for Outlook to store embedded images (such as logos in an email) and other files you wouldn't normally need as attachments, so I create an array of extension types that I'd like to extract, ignoring those that aren't useful to me.

By comparing the attachment filename to the array of extensions, I can then determine what to keep.

As this is only performing a basic string comparison, any file containing one of the strings in the array will be identified. For example both hellowworld.doc (Office) and hellowworld.docx (Office Open XML format from Outlook 2007 onwards) contain .doc so will both be identified.


// attachment extensions to save
string[] extensionsArray = { ".pdf", ".doc", ".xls", ".ppt", ".vsd", ".zip", ".rar", ".txt", ".csv", ".proj" };
if (extensionsArray.Any(mi.Attachments[i].FileName.Contains)) {
    // the filename contains one of the extensions
}

Saving and Deleting the attachments

Saving each attachment is remarkably easy, and the assembly provides a function to perform the save to the local disk. In the example below, pathToSaveFile is a local disk path, such as c:\temp\


    mi.Attachments[i].SaveAsFile(pathToSaveFile);

Similarly, deleting attachments is as simple as invoking the .Delete function.


    mi.Attachments[i].Delete();

In the example code below, we save each attachment to a folder based on the structure:

(basepath)(accountname)(folderstructure)(sender)

outlook attachment extract folder structure

Download

You can download the code to this project from GitHub, or check out the code below.

Download Follow @matthewproctor

The Full Code

Testing

I've tested this code on mailboxes hosted with an on-premises Exchange 2013 environment, Office 365 and a POP3/IMAP mailbox as well - all functioning exactly the same.

Further reading

The links below provide more information on how to use the Outlook Interop service.

Tags

Email, Outlook, Exchange, C#, PST, OST, CodeProject
My personal mailbox, with emails going back to the late 90's, is full of old attachments that bloat the PST file, but aren't really needed. I decided to write a simple C# console app to extract them to reduce the size of my PST file.
 
 

Popular Articles

What is Kutamo?
Kilimanjaro 2015
Exploring Lync and IoT
Exchange 2013 in 60 minutes
Monitoring Lync with MRTG
Lync UCWA Tutorial - Introduction
Tutorial Parts 1 | 2 | 3 | 4 | 5

Recent Articles

Australian Postal Codes
Skype Web SDK
Using the Skype Web SDK from any language or framework
Building a Skype for Business Auto Responder using the Skype Web SDK
Exporting Lync or Skype for Business Contacts with the Skype Web SDK

Favourite Links

Kutamo Studios
ErrLog.IO
Kutamo
Telco Together Foundation
Cloud on Kilimanjaro