The perl script tm_calc.pl reformats and analyzes Tm curves. This document describes the requirements, input format, command line options, processing and output format for the script. The basic command,
> perl tm_calc.pl path/filename.csv
writes the XML output file
path/filename.tmc.csv
containing the original data and values such as
Tm derived from basic analysis of the major transition.
The derived values are also written in a human-readable
comma-separated value table near the top, viewable e.g. in Excel.
The command
> perl tm_calc.pl path/filename.csv -f
fits one or more transition models to the data for each well, and includes derived values for those models in the output file. The command
> perl tm_calc.pl path/filename.csv -w
does the same as -f and also produces a directory structure containing web pages and plots of the observed data and models:
path/filename_fit/
| (for well 1) (well 1, transition 1) |
Use the -h command or see below for more documentation of
command line options.
PROSPERO uses the first two forms of the command for initial input and complete
analysis after well selection, respectively. The third form is useful for
viewing plots when using tm_calc.pl on your own computer.
Requirements for running the script
tm_calc.pl ...
", rather than
"perl tm_calc.pl ...
"
The script can read a variety of input formats:
Files in other formats, e.g. with temperature in one row followed by sample header and fluorescence in following rows, will need to be converted e.g. by using Excel's Edit - paste special - transpose to put the data into columns instead of rows.
You can get a help message by running tm_calc.pl with no input, or with -h or --help options. You get a summary of options and defaults. See below for more details on options for input, output, sample selection, fitting and plotting, fit parameters and tracking .
USAGE: perl tm_calc.pl <in_file> | - [-o (- | <out_file>)] [-c <columns>] [-r|--raw] [-w | --web] | [(-f | --fitdir) [<fit_dir>]] [(-p | --plot) [<plot_type]] [(-m | --mindelta) <min_delta> ] [-x | -exp] [(-s | --smooth) <window>] [-t | --track] [-v | --verbose] [-d | --debug] [--version] [-h | --help] >>>> You must supply either <in_file> or '-' (dash) meaning STANDARD IN. <<<<< Items in <angle brackets> should be replaced by the real thing, Items in [square brackets] are optional. "a | b" means a or b. -h or --help or no input specifier prints this text. Output: XML including a csv table of derived values including Tm as T at max. slope and Tm as mean of T's at half max. slope, followed by raw data wrapped in XML tags: <X>X1,X2...</X> and <SAMPLE>Y1,Y2...</SAMPLE>. Flags are: -r or --raw Include raw data as standardized csv in the XML output. -f or --fit Fit 1 or more Boltzmann curves to data using gnuplot -p or --plot Plot raw data, total fit, each transition and derivatives; -p png puts Tm_Details.html & Tm_Summary.html in <fit_dir>. -w or --web shortcut for '-f <in_name>_fit -p png' where <in_file> is <in_name>.csv; puts web pages in subfolder next to <in_file>. -x or --exp Use exponential decay background in curve fitting. -t or --track Write the sample number and name (column heading) to STD ERR -v or --verbose Write some intermediate values to STD OUT -d or --debug Write copious intermediate values to STD OUT All flags are OFF by default. Default values are: <out_file> ... <in_file>.tmc.csv if <in_file> is given, or STANDARD OUTPUT (e.g. screen) if no <in_file> is given. <columns> ... Use all sample columns. <fit_dir> ... Use a temporary fit directory e.g. /tmp/<in_file>/<column>; do NOT read existing fits or write to a named directory. <min_delta> 0.02 = fraction of total intensity change for smallest peak. <window> ... 3 = size in degrees of smoothing window (15 points for OM RTPCR) <plot_type> ps for Postscript; the only other tested option is png. Only works if -f is used Use -w to fit and make web pages with png plots in <in_file>_fit or ./fit, OR use -f <fit_dir> -p png to fit and make web pages in <fit_dir>. Pages are Tm_Summary.html with 1 plot per sample and Tm_Details.html, 3 per transition+. If given, <columns> can be a comma-separated list of numbers, wells, or ranges, e.g."1-3,B3-5,C10-d02". The first sample column after "Temp" is number 1 Wells match the letter and number ignoring case and leading 0: "A2" = "a02". Ranges are number-number or well-well where well ranges go from the first to the last column as found in the file, if both first and last are found. NOTE: Lists must be either WITHOUT_SPACES or "enclosed in quotes." Buffers with spaces and commas are not (yet) recognized.
> perl tm_calc.pl ../TM_data_dir/TM_data_file.csvmakes file
../TM_data_dir/TM_data_file.tmc.csv
In more detail:
The script always estimates Tm for the major transition and other values without curve fitting first. It only does curve fitting if you use -f or -w. If you use either -f or -w, the script sends the raw data to gnuplot for curve fitting (gnuplot worked better than the Perl implementations of the same fitting algorithm available at the time the script was developed). The script and gnuplot exchange data through intermediate files of data, models, residuals and derivatives.
If you use -f <fitdir> you can also use the -p option to creates postscript (ps) or portable network graphics (png) plots of your data in <fitdir> (see below for detail).
> perl tm_calc.pl ../TM_data_dir/TM_data_file.csv -wmakes the regular output file AND a directory containing web pages and subdirectories of images:
../TM_data_dir/TM_data_file.tmc.csv ../TM_data_dir/TM_data_file_fit/ ../TM_data_dir/TM_data_file_fit/Tm_Summary.html ../TM_data_dir/TM_data_file_fit/Tm_Details.html ../TM_data_dir/TM_data_file_fit/TM_data_file_1/tm_file_1_1.png ...You only need to find one of the html files; it contains links to the other html file and to all the images.
Data Processing
The basic steps carried out by the script are:
The script first determines if the input is a .tmc.csv file produced by this script; if so it reads in the xml fields using special processing which puts data into the same form as regular processing would. If not:
I = Imin + ΔI / 1 + e( Tm - T) / Tw
The main output, <outfile>.tmc.csv
, is an XML file containing
one or more (XML-wrapped) comma-separated value tables for human readability.
Curve fitting also produces directories of intermediate files and, if requested,
plots and HTML files. This document covers the main output file in detail, and
briefly describes the other files.
See an example.
of the XML output from the command:
perl tm_calc.pl DSF_sample_data.csv -f -r
which uses the sample input file shown above,
DSF_sample_data.csv
.
The XML file has the following heirarchy of TAGS, attributes
-
and brief descriptions or links:
TM_DATA, version mindelta runtime background source dest
HEADER
- lines from original data file before column header row
DERIVED_VALUES
TABLE
- see description of
derived values table
DATA
X_AXIS , label
- comma-separated temperatures
Y_AXIS , label
SAMPLE, name well_number
Tm_max_slope Tm_avg FWHM Rmt R30 I30 Itm Imax_obs transition_count
major_transition max_delta_transition RMSD R_abs Imin_transition
I_initial I_exp_decay data_quality_problem
TRANSITION, number
I_delta I_tm Tm sd_tm FWHM quality
TRANSITION ...
SAMPLE ...
RAW_DATA
- see description of
raw data table
FOOTER
- lines from source file after last data row
The values
of these tags, where not obvious, are:
version
- tm_calc.pl version e.g. tm_calc_pl_2.11
background
- either constant
or
exponential
source
- path and name of input file for tm_calc.pl dest
- path and name of output file TABLE
- comma-separated values with
a header row and data in columns, one row for each transition (one per sample unless
you do fitting). You get one row per sample with these headers:
"sample_name", well_number, Tm_max_slope, Tm_avg, FWHM, Rmt, R30, I30, Itm, Imax_obs
Number_of_Transitions, Major_Transition, Max_Delta_Transition, R_abs, I_min_fraction, I_exp_initial_fraction, I_exp_decay, Transition_Number, Tm, SD_of_Tm, dI/dT, I_delta_fraction, FWHM, Major_Transition_Flag, Transition_Quality, Data_Quality
SAMPE
attributes.
SAMPLE
- one entry for each column in the
source file, or for each column specified with the -c option. The same
values are in the csv table above, with similar but not always identical headers.
Attributes
( csv headers
if different) and value
descriptions are:
Attribute | Description |
---|---|
name ("sample_name") |
text from column header in source file |
well_number |
1 = first column to the right of Temp. |
Tm_max_slope |
Tm as T at maximum dI/dT |
Tm_avg |
Tm as mean of T at half max. dI/dT |
FWHM |
Range of T between half max. dI/dT |
Rmt |
Ratio of minor transitions to total ΔI i.e. (1 - ΔImajor) / ΔIobs |
R30 |
Ratio of fluorescence intensity at 30 °C, I30, to intensity at Tm-max, ITm |
I30 |
Fluorescence intensity at 30 °C |
Itm |
Fluorescence intensity at Tm-max |
Imax_obs |
Maximum intensity in the whole curve |
TRANSITION
- one entry for each transition with values
from the fit model, if fitting was done. Attributes (csv headers) and descriptions are:
Attribute | Description |
---|---|
number |
counting from 1 up |
Tm |
Temperature at transition midpoint, from the model |
sd_tm (SD_of_Tm) |
Standard deviation of the Tm estimate, based on the deviation of the points near Tm from the model. |
(dI/dT) |
(Only in csv: slope at transition midpoint) |
I_delta |
Fluorescence intensity change, ΔI, for this transition
(for csv, I_delta as a fraction of ΔIobs) |
FWHM |
Width of the transition at half the max. dI/dT, from the model |
(Major_Transition_Flag) |
(only in csv: '#' for steepest transition) |
quality (Transition_Quality) |
issues with individual transitions: high error, too wide, etc. |
I_tm (not in csv) |
Intensity at the transition midpoint, ITm |
RAW_DATA
- comma-separated values
table with headers as in the source files, starting with Temp. in the first
column.
For the -w option or the equivalent -f <fitdir> -p png, you also get TM_Summary.html and TM_Details.html files. These files both have an index at the top so you can quickly jump to any well. For each well, they have "next", "previous" and "index" buttons for quick navigation, as well as links to each other, to the same well in the other file. The summary contains one plot for each well, and a table of transitions with numerical values and text for data quality issues. The details contain 3 plots for each attempted transition model (including the bad one after the last good one, if any): the data as given and the model fit to it in one plot; the data with background subtracted and each transition shown rising from zero; and the derivative plot showing the slope maximum (or maxima).
For each column (sample) number N and each round of transition
modelling M you get plots:
<fitdir>/<fitdir>_N_M.png
<fitdir>/<fitdir>_N_M_solution.png
<fitdir>/<fitdir>_N_M_derivative.png
These are the plots displayed in the HTML files.