nmr

Using Python To Automate Topspin

Topspin contains a very powerful tool for automation – Python. If you aren’t familiar with python, what you should know is that it is extremely easy to pick up, it generalizes very well, and is coding is done ‘in plain english’. By ‘plain english’ I mean that it is very simple to get started compared to other coding languages. In the interests of time, I won’t be going into python, there have been countless tutorials on that posted to youtube, and I simply can’t compete or offer insight into anything better than they can.

Assuming you’ve used Python, you’re in great shape to begin processing your data through topspin. Luckily, topspin installs a native python so that you can begin working with it without having to pull yourself into ‘which python’ territory.

Let’s get started!

  • Commands are typed in bold - edpy/edmac

  • Parameters are in italics - TD/SI/LPbin

First, we need to open the python module in Topspin. We can do this by simply typing ‘edpy’ into the command line. This brings up a menu allowing you to see some sample python scripts that Bruker has included. For now, lets pull one up and take a look at how we may be able to use this. Perhaps one of the best scripts to get you started would be the ‘py-test-suite’ script.

 

In this script, we can see that we define some dialog boxes – but importantly, these dialog boxes are defined using topspin specific commands that do not exist in standard python and there are many other commands that are hard coded into topspin python.  Bruker’s team has done a lot of work for you already!

As is the case with the last lesson, we are going to design a python script to automate something we already do – in this case we’ll be performing some linear prediction and zero filling, reprocessing the data, and peak picking. For this demonstration, we are going to use linear prediction in both dimensions, although this is usually not done since we can normally acquire enough points in F2 to not need it.

Some basic steps still apply, we are going to create a new python script for this, but in order to do so, you must switch the python module directory to the' “…../py/user” directory in the edpy menu. Once there, select new and give your new masterpiece a title.

This opens up a built in notepad that we can code in and execute the script from directly. This is useful for testing whether a command works or not, so I constantly click that execute button. You’ll notice the “Warn on Execute” button there…I’d disable it when troubleshooting, since it sends a pop-up your way every time you test the code out.

Start_Of_SI_TD.PNG

In this first chunk of code, we are simply retrieving the values for the parameter TD from the OPEN dataset. Next, we use a built in dialog box to send a message to the user about what those parameters actually are, and then do the same for SI. You may notice that I tell the script to convert the value of TD1 and TD2 to strings using the str() command. This is actually not required, since the GETPAR( ) function returns values as strings, but I choose to force the hand.

Using these values, we then simply multiply the TD by 2 to find a viable SI value which will allow us to do both linear prediction and zero filling. In order to do this, however, we need to remember that the value of TD1 and TD2 are given as strings - so we tell python to convert that string into an integer. Here, you notice that when we are setting the value, I’ve changed the convention from GETPAR(parameter,axis=XYZ) to PUTPAR(‘axis# variable’, value). You can retrieve the values from GETPAR using this convention as well if you desire.

Placing_Vals_py.PNG

Once our new SI values are set, we want to tell the system that we plan on using linear prediction, and what kind. We do this by setting the ME_mod parameter to “LPfr'“ (which stands for linear prediction in the forward direction using real points) using the same conventions we used earlier. Then, we multiply TD1 and TD2 by 1.5 to give us the extrapolated points we wish and we store those values as LPBin values for each axis. The remaining points that are not accounted for by LPbin or TD are zero filled automatically.

LP_Bin_pyu.PNG

Now that we have all of the relevant values set, we are ready to process the spectrum using our new settings. This can be done using a built in command as well, simply XFB( ). However, let us assume that this command WASN’T hard coded into topspin. In this case, there is a lovely function called XCMD( ) where you simply type in the command you would use in the command line. In this case, we would use XCMD(‘xfb’) to perform this action.

xfb_py.PNG

After this, we have our new spectrum returned with linear prediction and zero filling performed. We could end there, but there is one more feature that you might like to know about. Using the built in functions, some of them can be passed variables or commands that alter the way the function is performed. Take, for instance, peak picking. If we were using this script to do automatic peak picking on the spectrum, the last thing we want is to have the peak picking dialog box pop up for each of our 100 samples - so we' disable the pop up box by instead opting for the silent mode of peak picking.

pp_py.PNG

And voila!

As you can see, the python option allows you to manipulate data in a very similar manner to the Macro’s, but also allows for a bit more control. For instance, there are even options to browse through all your data to selectively process things. It also allows you to pull data into the script and compare directly - handy for dereplication if you have a library… I’ll post a tutorial in a few weeks showing exactly what I mean by that, as well as a lovely little function I’ve created that allows for semi-supervised peak picking of metabolomics data.

Although there does exist explicit documentation for using python in Topspin, I’ve found that I wish it had a list of all of the built in functions that was readily accessible. However, they do offer about 15-20 pages on selective use cases, so it’s a good start.

Getting Started With Macros In Topspin

Macros are the most basic processing schemes available to a user in Topspin, and provide a great level of automation with very basic skills. Simply put, a well designed macro could automate a large portion of your workflow, allowing you to process data in your sleep.

It is overly simplistic, but for the sake of this quick start guide, think of each line of a macro as the command line at the bottom of Topspin. When you open your data, you use commands in this line for basic transformations of the data, editing processing variables, and even adjusting the viewing window. These commands can have variables passed to them in the command line as well, which is how we should think about them in a processing macro.

Lets take a simple example of 1D data – using only the command line to process it. Then, we’ll write a macro which will do the same transformations on the data, and finally, we’ll link it to the Topspin “serialcommand to automate processing for multiple datasets.

(Find the dataset here)

Command Line Processing

When we open this dataset, we need to transform it first. For this, we use the line command “ft”. Once we have a spectrum, we can see that we need to apply phase correction. If we use automated phase correction, the command for 1D data is “apk”. Next, we perform baseline correction without integration by “absn”. Following these three commands, we have processed our data to the point an investigator might begin looking at the spectrum for peaks of interest. There are, of course, other commands and combinations – depending on what your processing scheme might be. As an example, if you wish to include integration into this scheme, you have three choices. You can either change the commands fed into the system – replacing “absn” with “abs” which uses automatic integration, you can implement integration in another step, or you can choose to integrate the spectra yourself. Hopefully, you can see the flexibility of having all three options available, depending on your application.

Since you have to perform these basic functions on every spectra, why not construct a macro that would do it for you with an easier command? These few seconds you save may not seem like much, but with even a 3 second savings per spectra, a sample size of 100 samples processed could save you more time than you spend writing the macro: so lets do that.

Writing and Editing Macros

First, you have to open up the Macro menu in Tospin by typing the command “edmac”. This launches the Macro menu, likely populated with a lot of Bruker provided scripts that automate a large chunk of processing. First, lets look one of the provided examples, the example_efp macro – open this up by highlighting the script and selecting edit.

Burker_Macro_Example.PNG

 

By selecting edit, you launch the macro edit utility, which is similar to a idle processor/notepad. By looking at this example, we can see that – much like python – we can write notes alongside commands by using the # sign before writing in a line. As the program moves down the line, these lines are ignored completely, allowing you to leave a detailed explanation of each step – or slip in some user information or metadata about the sample sets you are writing the macro for. Keeping highly detailed coding notes is a VERY SMART MOVE. The line structure of the Macro allows you to command the program to do one task, and when it is complete, it moves on to the next task. Dissecting the example script above, we can see that it uses a similar approach to basic processing:

·       Perform exponential window multiplication with “em”

·       Perform Fourier transformation with “ft”

·       Perform phase correction with “pk”

For the sake of this quick start, we’re going to start fresh and write our own script. In order for us to edit or create a new macro, we need to change the source directory where topspin is looking for our macros. By default, this normally opens to the bruker directory (C:…..expt\stan\nmr\lists\mac) – to navigate to your directory, simply select the drop down menu and select the (…mac\user) version of the directory. If you’ve never experimented with Macros, this will be empty. Select File > New. Here, you’ll be prompted for a name, which can change later. For now, let’s name this something easy – ‘JE_tut1’.

Tut1_Fresh.jpg

Lets try writing a quick macro to do the commands we outlined on our 1D data – ft, apk, absn. Once we’re done, you can simply click execute to test the command – if it processes without flagging an error, it worked!

tut1_simple.jpg

If you’re satisfied with the macro, you can save it and recall it any time with a variety of different methods. My favorite, is the ability to call on a macro/script/python script by simply using it’s name in the command line. Try it by saving the script, exiting out of the macro window, and typing “JE_tut1” in the command line. Alternatively, you can launch the macro by using the command “xmac[space]name_of_macro” – this is helpful if you have different versions of a script floating around – such a Macro and Python script both called ‘process 1D’.

Partnering Macros with serial

Macros, scripts, and python scripts are great time savers, but the real power comes when you can automate processing on more than one spectra at a time. Topspin has a built in function to do this called serial that allows you to perform a single task on many spectra at a time.

Step 1: Define the list of samples to process

In topspin 3, select process spectrum > advanced > serial (or simply type ‘serial’ into the command line). From there, you’ll see all three options: define list, define command, execute. For ease of use, we’ll be using the option to ‘build dataset list using find’.

Step 2: Using “find” to build a dataset list

Launching this window, you’ll see lots of different methods of filtering data; name, experiment number, pulse program, etc.. we’ll start by applying our 1D NMR quick script to a large sample subset of 1D data. To do this, we filter all of the data in our data directory using the pulse program ‘zg’. This returns a list of all the experiments in the selected directory(ies) that fit that pulse program – however, you may notice it does a simple string search and you will get results from any pulse program that contains the characters you searched for. Be sure you only select the datasets that use the ‘zg’ pulse, not – for instance – a ‘zgpg’. Once you hit ‘ok’ you’ll see a message at the bottom telling you where it saved the list – if you’d like, you can recall this list later – but you should copy it from the TEMP folder and rename it something easier to remember.  

Step 3: Define Command

The last thing to do is define the command you wish to execute on all of the selected data sets. Since we wrote a macro to process all of the 1H spectra using zg, we will apply this macro here by typing JE_Tut1 as the command – remember, you can call scripts/macros directly by name!

Execute the macro – and watch it work! If you’re sitting on 100 spectra, it will chug through these in order until it’s complete. Perfect to set up right before that meeting you have down the hall.

Expanding Macros to suit your needs

You can add other features into the macros as well, such as the ability to zoom into certain regions of a spectra, peak picking in only one region, and more. Lets look at a more complex example here – NUS 2D HSQC data. There are a few more things we need to consider when looking at 2D data, as well as NUS data processing. For the purposes of this tutorial, I’m not going to get into things like linear prediction or zero filling – but these are completely automatable using macros. Instead, there are a few complications with these data having been NUS collected, so we will keep those in and you can read up on linear prediction on your own.

This script also uses arguments, which are simply provided by following the command with a space and then the argument value you are setting. As an example, if we were changing the “SI” of a processed spectrum, we can set it by:

“Command value1 value2”

“SI 4k 1024”

When working with NUS data, there are ‘holes’ in the data – it needs a special kind of reconstruction. Since Topspin 3.5pl6, there are reconstruction algorithms that are provided for use without a special license. However, if you’ve been processing NUS data, and have seen a little error message that pops up telling you only have access to the free reconstruction techniques, we can get rid of that in our macro.

Tut_HSQC.PNG

We’ve woven in a couple of small QOL features in this macro that save us a few clicks and a few seconds per spectra. For instance, we are not able to phase correct the spectra if we do not have the imaginary part of the spectrum, which we do not collect in NUS data – so we calculate it with the Hilbert transformations for each axis. Once that is done, it’s simple to do phase correction and have a good starting point for analysis.

 

So there you have a quick entry into the world of Topspin macros and two small examples to get you going. Remember, there’s extensive documentation on how to automate your processing with topspin in the manuals section. By combining these simple macros with the serial command, you can quickly optimize your NMR processing for many datasets at a time.

Hello World

MADByTE is a tool built for community development of NMR metabolomics, and that can be complicated.

There are a lot of factors that had to be considered for the development of MADByTE and other NMR metabolomics platforms - sample complexity, pulse sequence selection, peak picking, and automation of processing -just to name a few. Many of these small ‘huh…wish I knew that’ issues are worth discussion and I hope to provide a place to chat about them in the context of MADByTE’s development and usage.

There are already some wonderful tools and blogs out there, namely Stan’s NMR Blog and Glen Facey’s University of Ottawa NMR Blog, both of which have helped me figure out what the heck I’ve messed up and how I can improve it. There are other places to learn out there as well, such as The Resonance, which is a great example of Bruker’s outreach to learning and improving NMR as a research tool. When I can, I’ll post about a problem, how I solved it, and where I found the information from.

One truth I’ve learned though this process is just how much the community cares about developing new ideas, and how willing they tend to be in developing them into usable tools. People who - I thought I had no business talking to yet - have taken the time at conferences to discuss some of the challenges and be candid about the limitations they see. These problems are hard, and the willingness to share strategies have helped to catapult this project into fruition, and for that I’m eternally grateful.