(Orthogonal Distance Regression)
Debugging
Hopefully, the warnings and error messages you receive will help your resolve any technical issues you have. If you are unable to solve your issues on your own, come to the help room.
Report any bugs to phy_introlabs@stonybrook.edu (but please make your own attempts, with TAs in the help room, to resolve said bugs first). Please include a screenshot of inputs (if using the manual interface) or the CSV file used (if using the CSV file interface).
Do not take a lack of warnings to indicate a perfect plot. Other errors (especially misplaced decimal points) can still cause problems, and you are responsible for making sure these do not occur.
You should check your data and make sure it follows the trendline (if you expect it to), and check that your error bars are reasonably sized (in particular, not larger than the range of your data, in most cases!).
Note that slopes and intercepts with very large or small magnitude will display in scientific notation. Make sure to notive when this happens, so that your answers are not wring by orders of magnitude!
Advanced Techniques for Plotting Tool Use
Here are a few more technical features that can either make your life easier or can make your plot look nicer:
- If you do not include x/y min/max, the plotting tool will choose defaults based on the range of your data (10% of the range of your data above/below the top/bottom of your data).
- To input in scientific notation, format your numbers as "1.2e-6" (this is valid for any numerical input).
- To input mathematical expressions (such as \(x^2\)) in your title or axis labels, first write your expression in \(\LaTeX\), then enclose it between dollar signs. E.g.: "$\sqrt{\frac{L}{g}}$" will output as \(\sqrt{\frac{L}{g}}\); "$ax^2+bx+c$" will output as \(ax^2+bx+c\).
- An equation editor (which will give you the \(\LaTeX\) code for a mathematical expression) is available here.
- This is one way to have Greek letters in your axis labels, but you can also just copy-paste the character from elsewhere into the relevant text box.
- To have a multi-line title (or axis label), type a "\n" where you want the line break.
Your inputs here are first forwarded to a PHP script. This PHP script compiles your data (extracting it from the CSV file if you choose to use that interface) and runs basic validation on it (giving warnings or errors if something is wrong, and doing its best to fix such issues in a minimally-intrusive way).
This script then saves your actual data to a file, and calls a Python script (from the command line) with your other plot parameters. This python script reads your data back from that file, runs the fitting algorithm, and makes the plot. It then saves the plot as an image which is read by the (HTML surrounding the) PHP script.
The python script makes the plot using the Matplotlib library. If you want to make a plot with a similar style, you can read the documentation provided there.
The script then gets an estimate of the slope an intercept from a vertical least-squares linear fit, either with or without intercept (as specified). This matches what most simple programs (such as Excel or Google Sheets' LINEST routine) will do if asked to estimate the best fit line.
However, this fit does not take into account error bars, so (if you enter error bars) we then improve upon this estimate. How we improve depends on what kind of error bars we include.
If you include both, we perform an orthogonal distance regression using scipy.odr. We use the least-squares fit as a first estimate of the parameters (to avoid issues of nonconvergence of the optimization algorithm). Then, the ODR algorithm optimizes the new "best fit parameter" (weighted orthogonal distance) with the least-squares fit as its starting point. The details of this fit can be found in the User Guide for scipy.odr.
If you only include one kind of error, we instead run a weighted least squares (vertical least squares with y error bars, horizontal with x error bars). We perform this fit with scipy.optimize.curve_fit". (For x error bars only, this requires some jury-rigging, temporarily flipping axes, since this package only directly runs vertical least squares.) Otherwise, the idea is the same: use the ordinary least-squares fit as a first estimate, and optimize from there.
Of particular note: you might be surprised to find that doubling all your uncertainties doesn't change the uncertainty on your fit. This is because the fit determines uncertainty based on the scatter in your points, not based on the uncertainties you enter (although these should agree, if your uncertainties at different data points are uncorrelated!).
In our usage, the uncertainties you enter instead serve to tell the program how well the points are known relative to one another; for instance, the program "cares more" if it misses a data point with uncertainty "0.03 units" than if it misses a data point with uncertainty "3 units" by the same amount. (Similarly, if you enter both \(x\) and \(y\) error bars, it might care more about distance in one direction than the other.)
Given that we're not being totally careful with our uncertainty, this is a reasonable compromise between two opposing goals: on the one hand, sometimes, we know about our uncertainties from how we took our data (that's what all our uncertainty propagation is for!); on the other hand, we don't take into account all of the details, and so really we should be doing some degree of reading our uncertainty based on how much our data varied (analogous to formulas (3) and (4) in the Guide to Uncertainty and Error Analysis).